[R] data frames; matching/merging
Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the merge function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1a3 2a2 3b9 4c4 5a7 6b11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a2 2 b9 3 c4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames; matching/merging
On Mon, Feb 8, 2010 at 11:39 AM, Jonathan jonsle...@gmail.com wrote: Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the merge function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1 a 3 2 a 2 3 b 9 4 c 4 5 a 7 6 b 11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a 2 2 b 9 3 c 4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames; matching/merging
Hi! I'm definitely not an expert in R (and it's my first reply!), but if I understand right, I think the aggregate function might do what you're looking for. Try ?aggregate to get more info. You might find what you need! HTH Ivan Le 2/8/2010 17:39, Jonathan a écrit : Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the merge function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1a3 2a2 3b9 4c4 5a7 6b11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a2 2 b9 3 c4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames; matching/merging
x - read.table(textConnection(V1 V2 + 1a3 + 2a2 + 3b9 + 4c4 + 5a7 + 6b11), header=TRUE) closeAllConnections() # close; matrix with rownames - easy enough to change into a dataframe if you want cbind(tapply(x$V2, x$V1, min)) [,1] a2 b9 c4 On Mon, Feb 8, 2010 at 11:39 AM, Jonathan jonsle...@gmail.com wrote: Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the merge function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1 a 3 2 a 2 3 b 9 4 c 4 5 a 7 6 b 11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a 2 2 b 9 3 c 4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames; matching/merging
On Feb 8, 2010, at 11:39 AM, Jonathan wrote: Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the merge function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1a3 2a2 3b9 4c4 5a7 6b11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. rd.txt function(txt, header=TRUE,...) { rd-read.table(textConnection(txt), header=header, ...) closeAllConnections() rd} DF - rd.txt(V1 V2 + 1a3 + 2a2 + 3b9 + 4c4 + 5a7 + 6b11 + ) tapply(DF$V2, DF$V1, min) a b c 2 9 4 as.data.frame.table(tapply(DF$V2, DF$V1, min)) Var1 Freq 1a2 2b9 3c4 DF2 - as.data.frame.table(tapply(DF$V2, DF$V1, min)) names(DF2) - names(DF) DF2 V1 V2 1 a 2 2 b 9 3 c 4 Example output: V1 V2 1 a2 2 b9 3 c4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames; matching/merging
You could try aggregate: If we call your data frame df: aggregate(df[2], by=df[1], FUN=min) will get you what you asked for (if not necessarily what you need ;-) ) Switching the columns around is easy enough if you need to; proceeding stepwise: df.new-aggregate(df[2], by=df[1], FUN=min) df.new[,c(2,1)] As to how I found aggregate: watching R-help daily for years occasionally pops up fundamental gems like aggregate... Steve Ellison LGC Jonathan jonsle...@gmail.com 08/02/2010 16:39:11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a2 2 b9 3 c4 *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames; matching/merging
Here are 3 solutions assuming DF contains the data frame: # 1. aggregate aggregate(DF[2], DF[1], min) V1 V2 1 a 2 2 b 9 3 c 4 # 2. aggregate.formula - requires R 2.11.x aggregate(V2 ~ V1, DF, min) V1 V2 1 a 2 2 b 9 3 c 4 # 3. SQL using sqldf library(sqldf) sqldf(select V1, min(V2) V2 from DF group by V1) V1 V2 1 a 2 2 b 9 3 c 4 # 4. summaryBy in the doBy package library(doBy) summaryBy(V2 ~., DF, FUN = min, keep.names = TRUE) V1 V2 1 a 2 2 b 9 3 c 4 On Mon, Feb 8, 2010 at 11:39 AM, Jonathan jonsle...@gmail.com wrote: Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the merge function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1 a 3 2 a 2 3 b 9 4 c 4 5 a 7 6 b 11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a 2 2 b 9 3 c 4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.