Re: [R] get latest dates for different people in a dataset
Thank you! -Original Message- From: Chel Hee Lee [mailto:chl...@mail.usask.ca] Sent: Friday, January 23, 2015 8:09 PM To: Tan, Richard; 'r-help@R-project.org' Subject: Re: [R] get latest dates for different people in a dataset do.call(rbind, lapply(split(data, data$Name), function(x) x[order(x$CheckInDate),][nrow(x),])) Name CheckInDate Temp John John 2014-04-01 99.0 Mary Mary 2014-03-01 98.1 Sam Sam 2014-04-01 97.5 Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/23/2015 05:43 PM, Tan, Richard wrote: Hi, Can someone help for a R question? I have a data set like: NameCheckInDate Temp John 1/3/2014 97 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 John 1/4/2014 99 I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be NameCheckInDate Temp John 1/4/2014 99 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 Thank you for your help! Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get latest dates for different people in a dataset
Thank you! -Original Message- From: David Barron [mailto:dnbar...@gmail.com] Sent: Saturday, January 24, 2015 7:56 AM To: Tan, Richard; r-help@R-project.org Subject: Re: [R] get latest dates for different people in a dataset Hi Richard, You could also do it using the package dplyr: dta - data.frame(Name=c('John','Mary','Sam','John'), CheckInDate=as.Date(c('1/3/2014','1/3/2014','1/4/2014','1/4/2014'), format='%d/%m/%Y'), Temp=c(97,98.1,97.5,99)) library(dplyr) dta %% group_by(Name) %% filter(CheckInDate==max(CheckInDate)) Source: local data frame [3 x 3] Groups: Name Name CheckInDate Temp 1 Mary 2014-03-01 98.1 2 Sam 2014-04-01 97.5 3 John 2014-04-01 99.0 On 24 January 2015 at 01:09, Chel Hee Lee chl...@mail.usask.ca wrote: do.call(rbind, lapply(split(data, data$Name), function(x) x[order(x$CheckInDate),][nrow(x),])) Name CheckInDate Temp John John 2014-04-01 99.0 Mary Mary 2014-03-01 98.1 Sam Sam 2014-04-01 97.5 Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/23/2015 05:43 PM, Tan, Richard wrote: Hi, Can someone help for a R question? I have a data set like: NameCheckInDate Temp John 1/3/2014 97 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 John 1/4/2014 99 I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be NameCheckInDate Temp John 1/4/2014 99 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 Thank you for your help! Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get latest dates for different people in a dataset
Thank you! From: William Dunlap [mailto:wdun...@tibco.com] Sent: Friday, January 23, 2015 7:14 PM To: Tan, Richard Cc: r-help@R-project.org Subject: Re: [R] get latest dates for different people in a dataset Here is one way. Sort the data.frame, first by Name then break ties with CheckInDate. Then choose the rows that are the last in a run of identical Name values. txt - NameCheckInDate Temp + John 1/3/2014 97 + Mary 1/3/2014 98.1 + Sam 1/4/2014 97.5 + John 1/4/2014 99 d - read.table(header=TRUE, colClasses=c(character,character,numeric), text=txt) d$CheckInDate - as.Date(d$CheckInDate, as.Date, format=%d/%m/%Y) isEndOfRun - function(x) c(x[-1] != x[-length(x)], TRUE) dSorted - d[order(d$Name, d$CheckInDate), ] dLatestVisit - dSorted[isEndOfRun(dSorted$Name), ] dLatestVisit Name CheckInDate Temp 4 John 2014-04-01 99.0 2 Mary 2014-03-01 98.1 3 Sam 2014-04-01 97.5 Bill Dunlap TIBCO Software wdunlap tibco.comhttp://tibco.com On Fri, Jan 23, 2015 at 3:43 PM, Tan, Richard r...@panagora.commailto:r...@panagora.com wrote: Hi, Can someone help for a R question? I have a data set like: NameCheckInDate Temp John 1/3/2014 97 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 John 1/4/2014 99 I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be NameCheckInDate Temp John 1/4/2014 99 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 Thank you for your help! Richard [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get latest dates for different people in a dataset
Hi, Can someone help for a R question? I have a data set like: NameCheckInDate Temp John 1/3/2014 97 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 John 1/4/2014 99 I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be NameCheckInDate Temp John 1/4/2014 99 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 Thank you for your help! Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] categorize a character column
Hi, I know I can do this with a for loop with strsplit and grep, but is there more efficient way? Given a data dataframe (input) and a category column (lst), input item loc 1 item 1.1: earnings sep item 1.2: w2 sep shelf 1 2item 1.3: deductions drawer 1 3 item 1.1: earnings shelf 2 lst item cat 1 item 1.1 A 2 item 1.2 B 3 item 1.3 C how to get a result frame like result item loc cat 1 item 1.1: earnings sep item 1.2: w2 sep shelf 1 AB 2item 1.3: deductions drawer 1 C 3 item 1.1: earnings shelf 2 A Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] categorize a character column
Sorry I should have included the r code for the dataframes for ease of test: input - rbind(data.frame(item=item 1.1: earnings sep item 1.2: w2 sep, loc=shelf 1), data.frame(item=item 1.3: deductions sep, loc=drawer 1), data.frame(item=item 1.1: earnings sep, loc=shelf 2)) lst - rbind(data.frame(item=item 1.1, cat=A),data.frame(item=item 1.2, cat=B),data.frame(item=item 1.3, cat=C)) want to get result like: result item loc cat 1 item 1.1: earnings sep item 1.2: w2 sep shelf 1 AB 2item 1.3: deductions drawer 1 C 3 item 1.1: earnings shelf 2 A Thanks, Richard From: Tan, Richard Sent: Wednesday, January 05, 2011 5:55 PM To: 'r-help@r-project.org' Subject: categorize a character column Hi, I know I can do this with a for loop with strsplit and grep, but is there more efficient way? Given a data dataframe (input) and a category column (lst), input item loc 1 item 1.1: earnings sep item 1.2: w2 sep shelf 1 2item 1.3: deductions drawer 1 3 item 1.1: earnings shelf 2 lst item cat 1 item 1.1 A 2 item 1.2 B 3 item 1.3 C how to get a result frame like result item loc cat 1 item 1.1: earnings sep item 1.2: w2 sep shelf 1 AB 2item 1.3: deductions drawer 1 C 3 item 1.1: earnings shelf 2 A Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate a Date column does not work?
Hi, I am trying to aggregate max a Date type column but have weird result, how do I fix this? a - rbind( + data.frame(name='Tom', payday=as.Date('1999-01-01')), + data.frame(name='Tom', payday=as.Date('2000-01-01')), + data.frame(name='Pete', payday=as.Date('1998-01-01')), + data.frame(name='Pete', payday=as.Date('1999-01-01')) + ) a name payday 1 Tom 1999-01-01 2 Tom 2000-01-01 3 Pete 1998-01-01 4 Pete 1999-01-01 aggregate(a$payday, list(a$name), max) Group.1 x 1 Tom 10957 2Pete 10592 Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate a Date column does not work?
Thanks, add as.Date('1970-01-01') to the result column works. Richard -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, November 22, 2010 3:51 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] aggregate a Date column does not work? On Nov 22, 2010, at 3:39 PM, Tan, Richard wrote: Hi, I am trying to aggregate max a Date type column but have weird result, how do I fix this? In the process of getting max() you coerced the Dates to numeric and now you need to re-coerce them back to Dates ?as.Date as.Date(your result) (possibly with an origin it the default 1970-01-01 doesn't get used. -- David. a - rbind( + data.frame(name='Tom', payday=as.Date('1999-01-01')), + data.frame(name='Tom', payday=as.Date('2000-01-01')), + data.frame(name='Pete', payday=as.Date('1998-01-01')), + data.frame(name='Pete', payday=as.Date('1999-01-01')) + ) a name payday 1 Tom 1999-01-01 2 Tom 2000-01-01 3 Pete 1998-01-01 4 Pete 1999-01-01 aggregate(a$payday, list(a$name), max) Group.1 x 1 Tom 10957 2Pete 10592 Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate a Date column does not work?
Yes, I meant something like one of these: b - aggregate(a$payday, list(a$name), max) b$x - as.Date('1970-01-01') + b$x or b$x - as.Date(b$x, origin='1970-01-01') Thanks. -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, November 22, 2010 3:58 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] aggregate a Date column does not work? On Nov 22, 2010, at 3:54 PM, Tan, Richard wrote: Thanks, add as.Date('1970-01-01') to the result column works. But that should make them all the same date in 1970. Since aggregate renames the date column to x, this should work: as.Date( aggregate(a$payday, list(a$name), max)$x ) [1] 2000-01-01 1999-01-01 Richard -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, November 22, 2010 3:51 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] aggregate a Date column does not work? On Nov 22, 2010, at 3:39 PM, Tan, Richard wrote: Hi, I am trying to aggregate max a Date type column but have weird result, how do I fix this? In the process of getting max() you coerced the Dates to numeric and now you need to re-coerce them back to Dates ?as.Date as.Date(your result) (possibly with an origin it the default 1970-01-01 doesn't get used. -- David. a - rbind( + data.frame(name='Tom', payday=as.Date('1999-01-01')), + data.frame(name='Tom', payday=as.Date('2000-01-01')), + data.frame(name='Pete', payday=as.Date('1998-01-01')), + data.frame(name='Pete', payday=as.Date('1999-01-01')) + ) a name payday 1 Tom 1999-01-01 2 Tom 2000-01-01 3 Pete 1998-01-01 4 Pete 1999-01-01 aggregate(a$payday, list(a$name), max) Group.1 x 1 Tom 10957 2Pete 10592 Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate text column by a few rows
Hi, R function aggregate can only take summary stats functions, can I aggregate text columns? For example, for the dataframe below, a - rbind(data.frame(id=1, name='Tom', hobby='fishing'),data.frame(id=1, name='Tom', hobby='reading'),data.frame(id=2, name='Mary', hobby='reading'),data.frame(id=3, name='John', hobby='boating'),data.frame(id=2, name='Mary', hobby='running')) a id name hobby 1 1 Tom fishing 2 1 Tom reading 3 2 Mary reading 4 3 John boating 5 2 Mary running I want output as b id name hobbies 1 Tomfishing reading 2 Mary reading running 3 John boating Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate text column by a few rows
Thank you! Richard -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Thursday, October 07, 2010 12:08 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] aggregate text column by a few rows try this using sqldf: a id name hobby 1 1 Tom fishing 2 1 Tom reading 3 2 Mary reading 4 3 John boating 5 2 Mary running require(sqldf) sqldf('select name, group_concat(hobby) hobby from a group by id', method='raw') name hobby 1 Tom fishing,reading 2 Mary reading,running 3 John boating On Thu, Oct 7, 2010 at 11:52 AM, Tan, Richard r...@panagora.com wrote: Hi, R function aggregate can only take summary stats functions, can I aggregate text columns? For example, for the dataframe below, a - rbind(data.frame(id=1, name='Tom', hobby='fishing'),data.frame(id=1, name='Tom', hobby='reading'),data.frame(id=2, name='Mary', hobby='reading'),data.frame(id=3, name='John', hobby='boating'),data.frame(id=2, name='Mary', hobby='running')) a id name hobby 1 1 Tom fishing 2 1 Tom reading 3 2 Mary reading 4 3 John boating 5 2 Mary running I want output as b id name hobbies 1 Tom fishing reading 2 Mary reading running 3 John boating Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get top n rows group by a column from a dataframe
Hi, is there an R function like sql's TOP key word? I have a dataframe that has 3 columns: company, person, salary How do I get top 5 highest paid person for each company, and if I have fewer than 5 people for a company, just return all of them? Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get top n rows group by a column from a dataframe
Hi Richard Thanks for the suggestion, but I want top 5 salary for each company, not the whole list. I don't see how your way can work? Thanks, Richard From: RICHARD M. HEIBERGER [mailto:r...@temple.edu] Sent: Thursday, September 16, 2010 11:53 AM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] get top n rows group by a column from a dataframe tmp - data.frame(matrix(rnorm(30), 10, 3, dimnames=list(letters[1:10], c(company, person, salary tmp company person salary a -1.04590176 -0.7841855 1.07150503 b -1.06643101 0.6545647 0.43920454 c 0.72894531 -1.3812867 0.41313659 d -0.39265263 -0.3871271 0.69404325 e 0.54028124 0.7124772 0.66630904 f -1.46931714 -0.3823353 0.03069797 g -0.33283666 -0.6351862 0.37920017 h -0.79977129 0.2605315 0.92373900 i 0.80614119 0.3727227 -1.16560563 j 0.03165012 0.4690400 -0.81966285 order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))] [1] 5 2 10 9 8 tmp[order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))],] companyperson salary e 0.54028124 0.7124772 0.6663090 b -1.06643101 0.6545647 0.4392045 j 0.03165012 0.4690400 -0.8196628 i 0.80614119 0.3727227 -1.1656056 h -0.79977129 0.2605315 0.9237390 You can easily write a function for that. top - function(DF, varname, howmany) {} On Thu, Sep 16, 2010 at 11:39 AM, Tan, Richard r...@panagora.com wrote: Hi, is there an R function like sql's TOP key word? I have a dataframe that has 3 columns: company, person, salary How do I get top 5 highest paid person for each company, and if I have fewer than 5 people for a company, just return all of them? Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get top n rows group by a column from a dataframe
Thanks, it works. Richard From: Henrique Dallazuanna [mailto:www...@gmail.com] Sent: Thursday, September 16, 2010 1:56 PM To: Tan, Richard Cc: RICHARD M. HEIBERGER; r-help@r-project.org Subject: Re: [R] get top n rows group by a column from a dataframe You can try this: (Using Richard's example): aggregate(sdata['salary'], sdata[c('company')], function(x)tail(sort(x), 5)) On Thu, Sep 16, 2010 at 2:26 PM, Tan, Richard r...@panagora.com wrote: Hi Richard Thanks for the suggestion, but I want top 5 salary for each company, not the whole list. I don't see how your way can work? Thanks, Richard From: RICHARD M. HEIBERGER [mailto:r...@temple.edu] Sent: Thursday, September 16, 2010 11:53 AM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] get top n rows group by a column from a dataframe tmp - data.frame(matrix(rnorm(30), 10, 3, dimnames=list(letters[1:10], c(company, person, salary tmp company person salary a -1.04590176 -0.7841855 1.07150503 b -1.06643101 0.6545647 0.43920454 c 0.72894531 -1.3812867 0.41313659 d -0.39265263 -0.3871271 0.69404325 e 0.54028124 0.7124772 0.66630904 f -1.46931714 -0.3823353 0.03069797 g -0.33283666 -0.6351862 0.37920017 h -0.79977129 0.2605315 0.92373900 i 0.80614119 0.3727227 -1.16560563 j 0.03165012 0.4690400 -0.81966285 order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))] [1] 5 2 10 9 8 tmp[order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))],] companyperson salary e 0.54028124 0.7124772 0.6663090 b -1.06643101 0.6545647 0.4392045 j 0.03165012 0.4690400 -0.8196628 i 0.80614119 0.3727227 -1.1656056 h -0.79977129 0.2605315 0.9237390 You can easily write a function for that. top - function(DF, varname, howmany) {} On Thu, Sep 16, 2010 at 11:39 AM, Tan, Richard r...@panagora.com wrote: Hi, is there an R function like sql's TOP key word? I have a dataframe that has 3 columns: company, person, salary How do I get top 5 highest paid person for each company, and if I have fewer than 5 people for a company, just return all of them? Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame select max group by like function
Hi, I have a data frame with 3 columns: ID, year and score. How can I select for each unique ID, the year that has the max score? For example, for data frame ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 2000, 97 tom, 1998, 60 mary, 1998,100 I shall have ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 1998,100 Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame select max group by like function
Thanks all for the help! -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Tuesday, March 09, 2010 5:58 PM To: Phil Spector; Tan, Richard Cc: r-help@r-project.org Subject: RE: [R] data frame select max group by like function And yet another way is isLastInRun - function(x)c(x[-1]!=x[-length(x)], TRUE) sortedDat - dat[order(dat$ID,dat$score),] sortedDat[isLastInRun(sortedDat$ID),] ID year score 5 mary 1998 100 2 rick 199490 1 tom 199588 The row names (5,2,1) show where in the original dataset the output rows come from. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Phil Spector Sent: Tuesday, March 09, 2010 11:55 AM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] data frame select max group by like function Yet another way to do this with base R: dat = read.csv(textConnection('ID, year, score + tom, 1995, 88 + rick, 1994, 90 + mary, 2000, 97 + tom, 1998, 60 + mary, 1998,100')) do.call(rbind,lapply(split(dat,dat$ID),function(x)x[which.max( x$score),])) ID year score mary mary 1998 100 rick rick 199490 tom tom 199588 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 9 Mar 2010, Tan, Richard wrote: Hi, I have a data frame with 3 columns: ID, year and score. How can I select for each unique ID, the year that has the max score? For example, for data frame ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 2000, 97 tom, 1998, 60 mary, 1998,100 I shall have ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 1998,100 Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsub does not support \b?
Hello, can someone help? How come gsub(\bINDS\b,INDUSTRIES,ADVANCED ENERGY INDS) [1] ADVANCED ENERGY INDS not ADVANCED ENERGY INDUSTRIES Thanks. Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub does not support \b?
Ok, I figured it out. My stupid mistake, should be \\b instead of \b. From: Tan, Richard Sent: Tuesday, November 10, 2009 3:36 PM To: 'r-help@r-project.org' Subject: gsub does not support \b? Hello, can someone help? How come gsub(\bINDS\b,INDUSTRIES,ADVANCED ENERGY INDS) [1] ADVANCED ENERGY INDS not ADVANCED ENERGY INDUSTRIES Thanks. Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number
Sorry I did not give some examples in my previous posting to make my question clear. It's not exactly 1 digit, but at least one digit. Here are some examples: input = c(none='0foo f0oo foo0 foofoofoo0 0foofoofoo TOOL9NGG NONUMBER',all='foob0 fo0o0b 0foob 0foobardo foob4rdoo foobardo0') gsub(x=input, replacement='x', perl=TRUE,pattern=something) none all 0foo f0oo foo0 foo00 f0o0o foofoofoo0 0foofoofoo TOOL9NGG NONUMBER x x x x x x -Original Message- From: Wacek Kusnierczyk [mailto:waclaw.marcin.kusnierc...@idi.ntnu.no] Sent: Tuesday, June 09, 2009 1:06 PM To: Greg Snow Cc: Marc Schwartz; Barry Rowlingson; r-help@r-project.org; Tan, Richard Subject: Re: [R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number Greg Snow wrote: Here is one way using a single pattern (so can be used in a substitution), it uses Perl's positive look ahead patters: test - c(SHRT,5HRT,M1TCH,M1TCH5,LONG3RS,NONUMBER,TOOLNGG, ooops.3) sub( '(?=[a-zA-Z]{0,8}[0-9])[a-zA-Z0-9]{5,9}', 'xxx', test, perl=TRUE) yes, but: sub( '(?=[a-zA-Z]{0,8}[0-9])[a-zA-Z0-9]{5,9}', 'x', '12345', perl=TRUE) # x which is not what was expected -- as far as i understand, the point was to match 5-9 character strings with exactly 1 digit. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number
Hi, This is not exactly an R question but I am trying to use gsub to replace a string that contains 5-9 alpha-numeric characters, at least one of which is a number. Is there a good way to write it in a one line regex? Thanks, Richard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] toupper does not work in sub + regex
Hi, I don't know what I am doing wrong to the toupper does not seem working in sub + regex. The following returns 's' not the upper class 'S' as I expect: sub(q_([a-z])[a-zA-Z]*,toupper('\\1'),q_sviRaw) Can someone tell me where I did wrong? Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] toupper does not work in sub + regex
Thanks, Martin. I did not realize that. I never used perl compatible regex before but seems now I should! Richard -Original Message- From: Martin Morgan [mailto:mtmor...@fhcrc.org] Sent: Monday, April 13, 2009 12:08 PM To: Tan, Richard Subject: Re: [R] toupper does not work in sub + regex Tan, Richard r...@panagora.com writes: Hi, I don't know what I am doing wrong to the toupper does not seem working in sub + regex. The following returns 's' not the upper class 'S' as I expect: sub(q_([a-z])[a-zA-Z]*,toupper('\\1'),q_sviRaw) you're expecting toupper to be evaluated after substitution, but it is evaluated before: toupper('\\1') == '\\1'. try sub(q_([a-z])[a-zA-Z]*,'\\U\\1',q_sviRaw, perl=TRUE) Can someone tell me where I did wrong? Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] toupper does not work in sub + regex
Thanks, Bill! One more question, how do I get SviRaw, i.e., just uppercase the 1st char and keep everything else the same? sub(q_([a-z])([a-zA-Z]*), \\U\\1 \\2, q_sviRaw,perl=TRUE) Did not work. Thank you! Richard -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, April 13, 2009 1:17 PM To: Tan, Richard; r-help@r-project.org Subject: Re: [R] toupper does not work in sub + regex You could also use \\U and \\L in the replacement with perl=TRUE. \\U converts the rest of the replacement to upper case and \\L converts to lowercase. (By replacement it means the parts of the replacement that arise from parenthesized subpatterns in the pattern argument, not the replacement argument itself.) E.g., sub(q_([a-z])[a-zA-Z]*, \\U\\1\\L, q_sviRaw, perl=TRUE) [1] S sub(q_([a-z])([a-zA-Z]*), \\U\\1 then \\L\\2, q_sviRaw, perl=TRUE) [1] S then viraw sub(q_([a-z])([a-zA-Z]*), \\U\\1 then \\2, q_sviRaw, perl=TRUE) [1] S then VIRAW Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -- [R] toupper does not work in sub + regex Gabor Grothendieck ggrothendieck at gmail.com Mon Apr 13 18:26:12 CEST 2009 sub only handles replacement strings, not replacement functions. Your code is the same as: sub(q_([a-z])[a-zA-Z]*, '\\1', q_sviRaw) since toupper('\\1') has no alphabetics so its just literally '\\1' and the latter is what sub uses. The gsubfn function in the gsubfn package can deal with replacement functions: library(gsubfn) gsubfn(q_([a-z])[a-zA-Z]*, toupper, q_sviRaw) [1] S See the home page: http;//gsubfn.googlecode.com, vignette and help page. On Mon, Apr 13, 2009 at 11:54 AM, Tan, Richard RTan at panagora.com wrote: Hi, I don't know what I am doing wrong to the toupper does not seem working in sub + regex. The following returns 's' not the upper class 'S' as I expect: sub(q_([a-z])[a-zA-Z]*,toupper('\\1'),q_sviRaw) Can someone tell me where I did wrong? Thanks, Richard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] search for string insider a string
Hi, sorry if it is a too stupid question, but how do I a string search in R: I have a dataframe A with A$test like: test1 bcdtestblabla2.1bla cdtestblablabla3.88blabla and I want to search for string that start with 'dtest' and ends with number and return the location of that substring and the number, so the end result would be: NANA 32.1 23.88 I find grep can probably do this but I am new to the function so would like a good example. Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] search for string insider a string
That works. I want the position just for the purpose of my later manual check. Thanks a lot Gabor. -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Friday, March 13, 2009 2:18 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] search for string insider a string Try this. We use regexpr to get the positions and strapply puts the values in list s. The unlist statement converts NULL to NA and simplifies the list, s, to a numeric vector. For more info on strapply see http://gsubfn.googlecode.com library(gsubfn) # strapply x - ctest1, bcdtestblabla2.1bla, cdtestblablabla3.88blabla) dtest.info - cbind(posn = regexpr(dtest, x), value = { s - strapply(x, dtest[^0-9]*([0-9][0-9.]*), as.numeric) unlist(ifelse(sapply(s, length), s, NA)) }) # the above may be sufficient but # if its important to NA out rows with no match add dtest.info[dtest.info[,1] 0,] - NA dtest.info pos value [1,] NANA [2,] 3 2.10 [3,] 2 3.88 Why do you want the position? Is there a further transformation needed? What is it? There may be even easier approaches to the entire problem. On Fri, Mar 13, 2009 at 12:25 PM, Tan, Richard r...@panagora.com wrote: Hi, sorry if it is a too stupid question, but how do I a string search in R: I have a dataframe A with A$test like: test1 bcdtestblabla2.1bla cdtestblablabla3.88blabla and I want to search for string that start with 'dtest' and ends with number and return the location of that substring and the number, so the end result would be: NA NA 3 2.1 2 3.88 I find grep can probably do this but I am new to the function so would like a good example. Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Get top cluster for each item in a correlation matrix
Hi, I posted a question a few days ago and got extremely well response. https://stat.ethz.ch/pipermail/r-help/2009-February/188225.html. Now I have a somewhat related question: I have a correlation matrix of about 3000 items, with 1 on diagonal ( for example, cor.mat - cor(matrix(rnorm(3000*1000), 1000, 3000)) ). For each item in the matrix, I want to find the cluster of which 1 belongs to, i.e., the cluster with the highest correlation coeffs, and generate a data frame with 3 columns like (ID, ID2, cor), where in each row ID is one of those 3000 items, and ID2 is ID of items with in that top cluster, and cor is the correlation of ID and ID2. The cluster method is fanny, setting number of clusters to 60. It is very time consuming to do a for loop like this: for (i in 1:ncol(cor.mat)) { f - fanny(cor.mat[,i],60) temp - cbind(ID = i,ID2 = f$clustering, cor = cor.mat[,i]) temp - temp[which(temp[,2]==f$clustering[i]),] if (i == 1) { out - temp } else { out - rbind(out,temp) } } out Is there a better way to do it? Thanks, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transform key value pair to column
Thank you, works! -Original Message- From: Rowe, Brian Lee Yung (Portfolio Analytics) [mailto:b_r...@ml.com] Sent: Thursday, February 19, 2009 5:52 PM To: Wacek Kusnierczyk; Tan, Richard Cc: r-help@r-project.org Subject: RE: [R] transform key value pair to column Try this: dummy id code value 1 1 hi 10.3 2 1 lo 5.2 3 2 hi 19.4 4 3 hi 20.0 5 3 lo 12.0 6 4 lo 5.8 reshape(dummy, idvar='id', timevar='code', direction='wide') id value.hi value.lo 1 1 10.3 5.2 3 2 19.4 NA 4 3 20.0 12.0 6 4 NA 5.8 Brian -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk Sent: Thursday, February 19, 2009 5:39 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] transform key value pair to column see ?stack, for example. vQ Tan, Richard wrote: Hi, is there a good way (instead of a time-consuming for loop) to transfer a key/value pair dataframe to a dataframe with key as column and value as row? For example, I have a dataframe with three columns: id, code, value: id,code,value 1,hi,10.3 1,lo,5.2 2,hi,19.4 3,hi,20 3,lo,12 4,lo,5.8 I want to get a dataframe like this: id,hi,lo 1,10.3,5.2 2,19.4,NA 3,20,12 4,NA,5.8 Thank you, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. References to Merrill Lynch are references to any company in the Merrill Lynch Co., Inc. group of companies, which are wholly-owned by Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this E-communication may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get top 50 correlated item from a correlation matrix for each item
Hi, I have a correlation matrix of about 3000 items, i.e., a 3000*3000 matrix. For each of the 3000 items, I want to get the top 50 items that have the highest correlation with it (excluding itself) and generate a data frame with 3 columns like (ID, ID2, cor), where ID is those 3000 items each repeat 50 times, and ID2 is the top 50 correlated items with ID, and cor is the correlation of ID and ID2. I know I can use two for loops to do it but it is very time consuming considering the correlation matrix is generated for each month of the past 20 years. Is there a better way to do it? Regards, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get top 50 correlated item from a correlation matrix for each item
Works like a charm, thank you! -Original Message- From: Dimitris Rizopoulos [mailto:d.rizopou...@erasmusmc.nl] Sent: Thursday, February 12, 2009 12:11 PM To: Tan, Richard Cc: r-help@r-project.org Subject: Re: [R] get top 50 correlated item from a correlation matrix for each item a possible vectorized solution is the following: cor.mat - cor(matrix(rnorm(100*1000), 1000, 100)) p - 30 # how many top items n - ncol(cor.mat) cmat - col(cor.mat) ind - order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n) dim(ind) - dim(cor.mat) ind - ind[seq(2, p + 1), ] out - cbind(ID = c(col(ind)), ID2 = c(ind)) as.data.frame(cbind(out, cor = cor.mat[out])) I hope it helps. Best, Dimitris Tan, Richard wrote: Hi, I have a correlation matrix of about 3000 items, i.e., a 3000*3000 matrix. For each of the 3000 items, I want to get the top 50 items that have the highest correlation with it (excluding itself) and generate a data frame with 3 columns like (ID, ID2, cor), where ID is those 3000 items each repeat 50 times, and ID2 is the top 50 correlated items with ID, and cor is the correlation of ID and ID2. I know I can use two for loops to do it but it is very time consuming considering the correlation matrix is generated for each month of the past 20 years. Is there a better way to do it? Regards, Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.