Re: [R] Merge data by coordinates
Dear Milu, If your objective is to match the places from one table to the nearest place in the second table, you can generally use knn algorithm for 1 nearest neighbourhood. But please, check what David suggests first. Best regards, Michal 2016-10-16 19:24 GMT+02:00 David Winsemius : > > > On Oct 16, 2016, at 6:32 AM, Miluji Sb wrote: > > > > Dear all, > > > > I have two dataframe 1 by latitude and longitude but they always do not > > match. Is it possible to merge them (e.g. nearest distance)? > > > > # Dataframe 1 > > structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L, > > 49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L > > ), PPP2000_40 = c(4606, 6575, 6593, 7431, 9393, 10773, 11716, > > 12226, 13544, 14526)), .Names = c("lat", "lon", "PPP2000_40"), row.names > = > > c(6764L, > > 8796L, 8901L, 9611L, 11649L, 12819L, 13763L, 14389L, 15641L, > > 16571L), class = "data.frame") > > > > # Dataframe 2 > > structure(list(lat = c(47, 47, 47, 47, 47, 47, 48, 48, 48, 48 > > ), lon = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), GDP = c(19.09982, > > 13.31977, 14.95925, 6.8575635, 23.334565, 6.485748, 24.01197, > > 14.30393075, 21.33759675, 9.71803675)), .Names = c("lat", "lon", > > "GDP"), row.names = c(NA, 10L), class = "data.frame") > > I think you should first do this: > > plot(d1$lat,d1$lon) > points(d2$lat,d2$lon, col="red") > > And then respond to my suggestion that this is not a well-posed computing > problem. Explain why the red dots should have a 1-1 relationship with the > black dots. > > > -- > David. > > > > > Thank you so much! > > > > Sincerely, > > > > Milu > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data by coordinates
> On Oct 16, 2016, at 6:32 AM, Miluji Sb wrote: > > Dear all, > > I have two dataframe 1 by latitude and longitude but they always do not > match. Is it possible to merge them (e.g. nearest distance)? > > # Dataframe 1 > structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L, > 49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L > ), PPP2000_40 = c(4606, 6575, 6593, 7431, 9393, 10773, 11716, > 12226, 13544, 14526)), .Names = c("lat", "lon", "PPP2000_40"), row.names = > c(6764L, > 8796L, 8901L, 9611L, 11649L, 12819L, 13763L, 14389L, 15641L, > 16571L), class = "data.frame") > > # Dataframe 2 > structure(list(lat = c(47, 47, 47, 47, 47, 47, 48, 48, 48, 48 > ), lon = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), GDP = c(19.09982, > 13.31977, 14.95925, 6.8575635, 23.334565, 6.485748, 24.01197, > 14.30393075, 21.33759675, 9.71803675)), .Names = c("lat", "lon", > "GDP"), row.names = c(NA, 10L), class = "data.frame") I think you should first do this: plot(d1$lat,d1$lon) points(d2$lat,d2$lon, col="red") And then respond to my suggestion that this is not a well-posed computing problem. Explain why the red dots should have a 1-1 relationship with the black dots. -- David. > > Thank you so much! > > Sincerely, > > Milu > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge data by coordinates
Dear all, I have two dataframe 1 by latitude and longitude but they always do not match. Is it possible to merge them (e.g. nearest distance)? # Dataframe 1 structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L, 49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L ), PPP2000_40 = c(4606, 6575, 6593, 7431, 9393, 10773, 11716, 12226, 13544, 14526)), .Names = c("lat", "lon", "PPP2000_40"), row.names = c(6764L, 8796L, 8901L, 9611L, 11649L, 12819L, 13763L, 14389L, 15641L, 16571L), class = "data.frame") # Dataframe 2 structure(list(lat = c(47, 47, 47, 47, 47, 47, 48, 48, 48, 48 ), lon = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), GDP = c(19.09982, 13.31977, 14.95925, 6.8575635, 23.334565, 6.485748, 24.01197, 14.30393075, 21.33759675, 9.71803675)), .Names = c("lat", "lon", "GDP"), row.names = c(NA, 10L), class = "data.frame") Thank you so much! Sincerely, Milu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frame with mispelling characters
David Winsemius wrote > On Nov 2, 2012, at 11:20 AM, VictorDelgado wrote: > >> Hello dear R-helpers, >> >> I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of >> two >> data frames by characters. >> In each data frame I got two different list of names, that is my main-key >> to >> be merged. >> >> To figure out what I'm saying, I build up a modified "?merge" example, >> with >> errors by purpose: >> >> # Data for authors: >> >> authors <- data.frame( >>surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")), >>nationality = c("US", "Australia", "US", "UK", "Australia"), >>deceased = c("yes", rep("no", 4))) >> >> "Venables" is without the final 's', and "Tierney, without "i". >> >> # Data for books: >> >> books <- data.frame( >>surname = I(c("Tukey", "Venables", "Tierney", >> "Ripley", "Rippley", "McNeil", "R Core")), >>title = c("Exploratory Data Analysis", >> "Modern Applied Statistics ...", >> "LISP-STAT", >> "Spatial Statistics", "Stochastic Simulation", >> "Interactive Data Analysis", >> "An Introduction to R"), >>other.author = c(NA, "Ripley", NA, NA, NA, NA, >> "Venables & Smith")) > > In your example the authors list has better spelling. The 'agrep' > functions by default will return matches that are 90% ( or more precisely > Levenshtein distance of less than or equalt to 0.1) : > > > books$altname <- NA > altidx <- unlist( sapply(books$surname, agrep, authors$surname) ) > books$altname[seq(altidx)] <- authors$surname[altidx] > books > #--- >surname title other.author altname > 1Tukey Exploratory Data Analysis > >Tukey > 2 Venables Modern Applied Statistics ... Ripley Venable > 3 Tierney LISP-STAT > > Terney > 4 RipleySpatial Statistics > > Ripley > 5 Rippley Stochastic Simulation > > Ripley > 6 McNeil Interactive Data Analysis > > McNeil > 7 R Core An Introduction to R Venables & Smith > > If you then match 'books' to 'authors' with a merge on authors$surname and > books$altname, you should get closer to your goals > > -- > David. >> >> With "surname" column instead of "name" (differs from original example >> for >> more easy going merge). And the second "Ripley" with double "p". >> >> So, if I ask for: >> >> merge(authors, books, all=TRUE) >> >> I got: >> >> >> But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney" >> and "Venable" to "Venables". I was wondering if there was any way to work >> around this problem. My orginal data have around 27,000 name entries, and >> if >> I take "all=FALSE", this database drops out to around 17,000, most >> because >> mispelling (or truncated expressions). If I take "all=TRUE", I got many >> of >> this > > cases like the example above. >> >> Has anyone experienced this? Any idea how I can get out? I'm thinking to >> take the longest match possible to each entry. For example, in >> "Venable"/"Venables" there is a 87.5% match. As I have name and surname, >> and >> also auxiliary keys to this match, I think this could work. >> >> Thank you in advance. >> >> >> >> - >> Victor Delgado >> cedeplar.ufmg.br P.H.D. student >> www.fjp.mg.gov.br reseacher >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> > R-help@ > mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Alameda, CA, USA > > __ > R-help@ > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. It's amazing to get such useful answers so fast. I did not know the RecordLinkage package, it looks very sophisticated and useful for this kind of demand. I just made some tests and I think it could be very useful. I'm working with portuguese spelling names, so I will also test agrep and see what function returns better results, giving less data loss. Thank you a lot, Jim Holtman and also David Winsemius. - Victor Delgado cedeplar.ufmg.br P.H.D. student UFOP assistant professor -- View this message in context: http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255p4648266.html Sent from the R help mailing list archive at Nabble.com. __ R-hel
Re: [R] Merge data frame with mispelling characters
On Nov 2, 2012, at 11:20 AM, VictorDelgado wrote: > Hello dear R-helpers, > > I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of two > data frames by characters. > In each data frame I got two different list of names, that is my main-key to > be merged. > > To figure out what I'm saying, I build up a modified "?merge" example, with > errors by purpose: > > # Data for authors: > > authors <- data.frame( >surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")), >nationality = c("US", "Australia", "US", "UK", "Australia"), >deceased = c("yes", rep("no", 4))) > > "Venables" is without the final 's', and "Tierney, without "i". > > # Data for books: > > books <- data.frame( >surname = I(c("Tukey", "Venables", "Tierney", > "Ripley", "Rippley", "McNeil", "R Core")), >title = c("Exploratory Data Analysis", > "Modern Applied Statistics ...", > "LISP-STAT", > "Spatial Statistics", "Stochastic Simulation", > "Interactive Data Analysis", > "An Introduction to R"), >other.author = c(NA, "Ripley", NA, NA, NA, NA, > "Venables & Smith")) In your example the authors list has better spelling. The 'agrep' functions by default will return matches that are 90% ( or more precisely Levenshtein distance of less than or equalt to 0.1) : books$altname <- NA altidx <- unlist( sapply(books$surname, agrep, authors$surname) ) books$altname[seq(altidx)] <- authors$surname[altidx] books #--- surname title other.author altname 1Tukey Exploratory Data AnalysisTukey 2 Venables Modern Applied Statistics ... Ripley Venable 3 Tierney LISP-STAT Terney 4 RipleySpatial Statistics Ripley 5 Rippley Stochastic Simulation Ripley 6 McNeil Interactive Data Analysis McNeil 7 R Core An Introduction to R Venables & Smith If you then match 'books' to 'authors' with a merge on authors$surname and books$altname, you should get closer to your goals -- David. > > With "surname" column instead of "name" (differs from original example for > more easy going merge). And the second "Ripley" with double "p". > > So, if I ask for: > > merge(authors, books, all=TRUE) > > I got: > > > But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney" > and "Venable" to "Venables". I was wondering if there was any way to work > around this problem. My orginal data have around 27,000 name entries, and if > I take "all=FALSE", this database drops out to around 17,000, most because > mispelling (or truncated expressions). If I take "all=TRUE", I got many of > this cases like the example above. > > Has anyone experienced this? Any idea how I can get out? I'm thinking to > take the longest match possible to each entry. For example, in > "Venable"/"Venables" there is a 87.5% match. As I have name and surname, and > also auxiliary keys to this match, I think this could work. > > Thank you in advance. > > > > - > Victor Delgado > cedeplar.ufmg.br P.H.D. student > www.fjp.mg.gov.br reseacher > -- > View this message in context: > http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frame with mispelling characters
You might try the 'soundex' function in the RecordLinkage package: > soundex('ripley') [1] "R140" > soundex('rippley') [1] "R140" > soundex('venable') [1] "V514" > soundex('venables') [1] "V514" > soundex('terney') [1] "T650" > soundex('tierney') [1] "T650" On Fri, Nov 2, 2012 at 2:20 PM, VictorDelgado wrote: > Hello dear R-helpers, > > I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of two > data frames by characters. > In each data frame I got two different list of names, that is my main-key to > be merged. > > To figure out what I'm saying, I build up a modified "?merge" example, with > errors by purpose: > > # Data for authors: > > authors <- data.frame( > surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")), > nationality = c("US", "Australia", "US", "UK", "Australia"), > deceased = c("yes", rep("no", 4))) > > "Venables" is without the final 's', and "Tierney, without "i". > > # Data for books: > > books <- data.frame( > surname = I(c("Tukey", "Venables", "Tierney", > "Ripley", "Rippley", "McNeil", "R Core")), > title = c("Exploratory Data Analysis", > "Modern Applied Statistics ...", > "LISP-STAT", > "Spatial Statistics", "Stochastic Simulation", > "Interactive Data Analysis", > "An Introduction to R"), > other.author = c(NA, "Ripley", NA, NA, NA, NA, > "Venables & Smith")) > > With "surname" column instead of "name" (differs from original example for > more easy going merge). And the second "Ripley" with double "p". > > So, if I ask for: > > merge(authors, books, all=TRUE) > > I got: > > > But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney" > and "Venable" to "Venables". I was wondering if there was any way to work > around this problem. My orginal data have around 27,000 name entries, and if > I take "all=FALSE", this database drops out to around 17,000, most because > mispelling (or truncated expressions). If I take "all=TRUE", I got many of > this cases like the example above. > > Has anyone experienced this? Any idea how I can get out? I'm thinking to > take the longest match possible to each entry. For example, in > "Venable"/"Venables" there is a 87.5% match. As I have name and surname, and > also auxiliary keys to this match, I think this could work. > > Thank you in advance. > > > > - > Victor Delgado > cedeplar.ufmg.br P.H.D. student > www.fjp.mg.gov.br reseacher > -- > View this message in context: > http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge data frame with mispelling characters
Hello dear R-helpers, I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of two data frames by characters. In each data frame I got two different list of names, that is my main-key to be merged. To figure out what I'm saying, I build up a modified "?merge" example, with errors by purpose: # Data for authors: authors <- data.frame( surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4))) "Venables" is without the final 's', and "Tierney, without "i". # Data for books: books <- data.frame( surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "Rippley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith")) With "surname" column instead of "name" (differs from original example for more easy going merge). And the second "Ripley" with double "p". So, if I ask for: merge(authors, books, all=TRUE) I got: But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney" and "Venable" to "Venables". I was wondering if there was any way to work around this problem. My orginal data have around 27,000 name entries, and if I take "all=FALSE", this database drops out to around 17,000, most because mispelling (or truncated expressions). If I take "all=TRUE", I got many of this cases like the example above. Has anyone experienced this? Any idea how I can get out? I'm thinking to take the longest match possible to each entry. For example, in "Venable"/"Venables" there is a 87.5% match. As I have name and surname, and also auxiliary keys to this match, I think this could work. Thank you in advance. - Victor Delgado cedeplar.ufmg.br P.H.D. student www.fjp.mg.gov.br reseacher -- View this message in context: http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge Data by time stamps
On Oct 10, 2011, at 1:28 PM, Alaios wrote: Dear all, I have some device measurements and the time stamps I get from it have the below format: MyStruct$TimeStamps[1,] [1] 2011.000 10.0006.000 16.000 23.000 30.539 I can convert them easily with ISOdate() to a number and do the calculations I need. One of my problems is that I want to gather my measurements to piles of duration (let's say) 5 minutes. Afterwards I will apply a function to these piles. As the device is not super-precise please find below the time needed for one operation to complete (in seconds) . 1.10 1.90 1.34 1.23 1.56 1.22 1.34 Assuming I understand your presentation and lacking R-coded examples and desired output on which to test: ?cumsum ?cut as you understand I can not say that 5 minutes measurements are specific to X consecutive measurements but differ. How I can ask from R to do the summation and whenever there is a 5 minute data set to split it so to apply it into a function? I would like to thank you in advance for your help -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge Data by time stamps
Dear all, I have some device measurements and the time stamps I get from it have the below format: MyStruct$TimeStamps[1,] > [1] 2011.000 10.000 6.000 16.000 23.000 30.539 I can convert them easily with ISOdate() to a number and do the calculations I need. One of my problems is that I want to gather my measurements to piles of duration (let's say) 5 minutes. Afterwards I will apply a function to these piles. As the device is not super-precise please find below the time needed for one operation to complete (in seconds) . 1.10 1.90 1.34 1.23 1.56 1.22 1.34 as you understand I can not say that 5 minutes measurements are specific to X consecutive measurements but differ. How I can ask from R to do the summation and whenever there is a 5 minute data set to split it so to apply it into a function? I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data under conditions
Thanks to both of you for your help! Jim, my problem is to match some observations of a time serie (vector 'a' in my example) with theoretical predictions of this process (vector 'b' in my example), with a small time lag between them. -- View this message in context: http://r.789695.n4.nabble.com/Merge-data-under-conditions-tp3350864p3351472.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data under conditions
use the sqldf package: > require(sqldf) > a time x 1 1.0 4 2 2.2 5 3 5.2 6 > b time y 10 1 21 3 32 5 44 7 55 9 > sqldf(" + select a.time, a.x, b.y + from a, b + where abs(a.time - b.time) < 0.5 + ") time x y 1 1.0 4 3 2 2.2 5 5 3 5.2 6 9 > On Sat, Mar 12, 2011 at 4:14 PM, flymer wrote: > Dear All, > > Debuting in R, I'm facing a problem. > I have 2 vectors, say 'a' et 'b', and I'd like to merge them according to > the proximity of their variable 'time'. > How to do to keep elements which satisfy (for example) 'a$time-b$time<0.5'? > > For example : > >> a > time x > 1 1.0 4 > 2 2.2 5 > 3 5.2 6 > >> b > time y > 1 0 1 > 2 1 3 > 3 2 5 > 4 4 7 > 5 5 9 > > I'd like to get : > >> > time x y > 1 1.0 4 3 > 2 2.2 5 5 > 3 5.2 6 9 > > I thought using the fonction 'merge'... > I hope you can help me! Thanks in advance! > > Jerome. > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Merge-data-under-conditions-tp3350864p3350864.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data under conditions
On Mar 12, 2011, at 4:14 PM, flymer wrote: Dear All, Debuting in R, I'm facing a problem. I have 2 vectors, say 'a' et 'b', and I'd like to merge them according to the proximity of their variable 'time'. How to do to keep elements which satisfy (for example) 'a$time-b $time<0.5'? For example : a time x 1 1.0 4 2 2.2 5 3 5.2 6 b time y 10 1 21 3 32 5 44 7 55 9 I'd like to get : time x y 1 1.0 4 3 2 2.2 5 5 3 5.2 6 9 I thought using the fonction 'merge'... There are often SQL magical incantation to acheive such, and there is an `sqldf` package that might help, but I am not competent with it. Here is a base R solution using three functions (six, if you count "$", "<", and "-": ?expand.grid ?rep ?"[" dfrm<- expand.grid(a$time, b$time) dfrm$x <- a$x # by virtue of recycling dfrm$y <- rep(b$y, each=3) > dfrm[abs(dfrm$Var1-dfrm$Var2) < 0.5, ] Var1 Var2 x y 4 1.01 4 3 8 2.22 5 5 15 5.25 6 9 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge data under conditions
Dear All, Debuting in R, I'm facing a problem. I have 2 vectors, say 'a' et 'b', and I'd like to merge them according to the proximity of their variable 'time'. How to do to keep elements which satisfy (for example) 'a$time-b$time<0.5'? For example : > a time x 1 1.0 4 2 2.2 5 3 5.2 6 > b time y 10 1 21 3 32 5 44 7 55 9 I'd like to get : > time x y 1 1.0 4 3 2 2.2 5 5 3 5.2 6 9 I thought using the fonction 'merge'... I hope you can help me! Thanks in advance! Jerome. -- View this message in context: http://r.789695.n4.nabble.com/Merge-data-under-conditions-tp3350864p3350864.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge Data
Hello Nasrin, Please attach how each of your files look like (their first few rows), so we could understand why the rbind doesn't work. In general, be sure to keep the r-help e-mail also corresponded so others might help if I don't know the answer (or if they answer before me). Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Wed, Nov 10, 2010 at 2:37 AM, Nasrin Pak wrote: > Hello; > > I didn`t manage to do it with rbind command! I`m wondering if this kind of > combining helps for my work.(actually I`m a new user of R ) My problem is > that I have a data set for every day of measurement in a seperate file and I > want to plot one parameter of the data for all the days in one graph. I > tried to use for loop but only the last data remains in the program memory, > I don`t know how to plot each day`s data continusly after the others(or how > to extending the x axis.) Would you please help me with it? > > Thanks for your help! > > > On Tue, Nov 9, 2010 at 12:23 PM, Tal Galili wrote: > >> Hello Nasrin, >> >> I think you might be wanting to use >> rbind >> instead of >> merge >> >> >> >> Contact >> Details:--- >> Contact me: tal.gal...@gmail.com | 972-52-7275845 >> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | >> www.r-statistics.com (English) >> >> -- >> >> >> >> >> On Tue, Nov 9, 2010 at 8:22 PM, Nasrin Pak wrote: >> >>> Hello; >>> I have a problem merging data sets. I use this command: >>> >>> FileNames <- list.files(path="C:/updated_CFL_Rad_files/2007/11", >>> full.names=TRUE) >>> > dataMerge <- data.frame() >>> > for(f in FileNames){ >>> + ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL") >>> + dataMerge <- merge(dataMerge, ReadInMerge,all=T) >>> + >>> + } >>> >>> and an error occurs.The size of the data is about 7.5 Mb, I don't know >>> what >>> does 221 Mb mean! >>> >>> Error: cannot allocate vector of size 221.6 Mb >>> In addition: Warning messages: >>> 1: Reached total allocation of 502Mb: see help(memory.size) >>> 2: Reached total allocation of 502Mb: see help(memory.size) >>> 3: Reached total allocation of 502Mb: see help(memory.size) >>> 4: Reached total allocation of 502Mb: see help(memory.size) >>> -- >>> Sincerely >>> >>> Nasrin Pak >>> >>>[[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > > -- > Sincerely > > Nasrin Pak > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge Data
Hello Nasrin, I think you might be wanting to use rbind instead of merge Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Tue, Nov 9, 2010 at 8:22 PM, Nasrin Pak wrote: > Hello; > I have a problem merging data sets. I use this command: > > FileNames <- list.files(path="C:/updated_CFL_Rad_files/2007/11", > full.names=TRUE) > > dataMerge <- data.frame() > > for(f in FileNames){ > + ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL") > + dataMerge <- merge(dataMerge, ReadInMerge,all=T) > + > + } > > and an error occurs.The size of the data is about 7.5 Mb, I don't know what > does 221 Mb mean! > > Error: cannot allocate vector of size 221.6 Mb > In addition: Warning messages: > 1: Reached total allocation of 502Mb: see help(memory.size) > 2: Reached total allocation of 502Mb: see help(memory.size) > 3: Reached total allocation of 502Mb: see help(memory.size) > 4: Reached total allocation of 502Mb: see help(memory.size) > -- > Sincerely > > Nasrin Pak > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge Data
Hello; I have a problem merging data sets. I use this command: FileNames <- list.files(path="C:/updated_CFL_Rad_files/2007/11", full.names=TRUE) > dataMerge <- data.frame() > for(f in FileNames){ + ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL") + dataMerge <- merge(dataMerge, ReadInMerge,all=T) + + } and an error occurs.The size of the data is about 7.5 Mb, I don't know what does 221 Mb mean! Error: cannot allocate vector of size 221.6 Mb In addition: Warning messages: 1: Reached total allocation of 502Mb: see help(memory.size) 2: Reached total allocation of 502Mb: see help(memory.size) 3: Reached total allocation of 502Mb: see help(memory.size) 4: Reached total allocation of 502Mb: see help(memory.size) -- Sincerely Nasrin Pak [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data
David -- thank you for your response. merge does work but it creates another dataframe. df1 is very large and I did not want another copy created. What I ended up doing is: df1 <- merge(df1, df2, by="week") In terms of memory allocation, will memory for two dataframes be allocated or will the additional column be added to df1? Thanks. David Winsemius wrote: > > On Nov 10, 2009, at 12:36 PM, Chuck White wrote: > > > df1 -- dataframe with column date and several other columns. #rows > > >40k Several of the dates are repeated. > > df2 -- dataframe with two columns date and index. #rows ~130 This > > is really a map from date to index. > > > > I would like to create a column called index in df1 which has the > > corresponding index from df2. > > > > The following works: > > index <- NULL > > for(wk in df1$week){ > >index <- c(index,df2$index[df2$week==wk]) > > } > > and then add index to df1. > > > > Can you please suggest a better way of doing this? I didn't think > > merge was suitable for this...is it? THANKS. > > I think merge should work, but if you really have looked at the > various arguments, tested reasonable examples and are still convinced > it wouldn't, then see what you get with: > > > df1 <- data.frame(dt = Sys.Date() - sample(100:120, 30, > replace=TRUE), 1:30) > > df2 <- data.frame(dt2 = Sys.Date() -100:120, index=LETTERS[1:21]) > > > df1$index <- df2[ match(df1$dt,df2$dt2), "index"] > > df1 > dt X1.30 index > 1 2009-07-30 1 D > 2 2009-07-16 2 R > 3 2009-07-23 3 K > 4 2009-07-29 4 E > 5 2009-07-15 5 S > 6 2009-08-02 6 A > 7 2009-07-18 7 P > 8 2009-07-21 8 M > 9 2009-07-27 9 G > 10 2009-07-2610 H > 11 2009-07-3111 C > 12 2009-07-2612 H > 13 2009-07-1813 P > 14 2009-07-2314 K > 15 2009-07-2115 M > 16 2009-07-1916 O > 17 2009-07-1417 T > 18 2009-07-1618 R > 19 2009-07-1519 S > 20 2009-07-1320 U > 21 2009-07-2821 F > 22 2009-07-2022 N > 23 2009-07-2423 J > 24 2009-07-2024 N > 25 2009-07-1625 R > 26 2009-07-3026 D > 27 2009-07-1427 T > 28 2009-08-0228 A > 29 2009-07-1929 O > 30 2009-07-2630 H > > I tried merge(df1, df2, by.x=1, by.y=1) and got the same result modulo > the order of the output. > > > -- > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data
On Nov 10, 2009, at 12:36 PM, Chuck White wrote: df1 -- dataframe with column date and several other columns. #rows >40k Several of the dates are repeated. df2 -- dataframe with two columns date and index. #rows ~130 This is really a map from date to index. I would like to create a column called index in df1 which has the corresponding index from df2. The following works: index <- NULL for(wk in df1$week){ index <- c(index,df2$index[df2$week==wk]) } and then add index to df1. Can you please suggest a better way of doing this? I didn't think merge was suitable for this...is it? THANKS. I think merge should work, but if you really have looked at the various arguments, tested reasonable examples and are still convinced it wouldn't, then see what you get with: > df1 <- data.frame(dt = Sys.Date() - sample(100:120, 30, replace=TRUE), 1:30) > df2 <- data.frame(dt2 = Sys.Date() -100:120, index=LETTERS[1:21]) > df1$index <- df2[ match(df1$dt,df2$dt2), "index"] > df1 dt X1.30 index 1 2009-07-30 1 D 2 2009-07-16 2 R 3 2009-07-23 3 K 4 2009-07-29 4 E 5 2009-07-15 5 S 6 2009-08-02 6 A 7 2009-07-18 7 P 8 2009-07-21 8 M 9 2009-07-27 9 G 10 2009-07-2610 H 11 2009-07-3111 C 12 2009-07-2612 H 13 2009-07-1813 P 14 2009-07-2314 K 15 2009-07-2115 M 16 2009-07-1916 O 17 2009-07-1417 T 18 2009-07-1618 R 19 2009-07-1519 S 20 2009-07-1320 U 21 2009-07-2821 F 22 2009-07-2022 N 23 2009-07-2423 J 24 2009-07-2024 N 25 2009-07-1625 R 26 2009-07-3026 D 27 2009-07-1427 T 28 2009-08-0228 A 29 2009-07-1929 O 30 2009-07-2630 H I tried merge(df1, df2, by.x=1, by.y=1) and got the same result modulo the order of the output. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge data
df1 -- dataframe with column date and several other columns. #rows >40k Several of the dates are repeated. df2 -- dataframe with two columns date and index. #rows ~130 This is really a map from date to index. I would like to create a column called index in df1 which has the corresponding index from df2. The following works: index <- NULL for(wk in df1$week){ index <- c(index,df2$index[df2$week==wk]) } and then add index to df1. Can you please suggest a better way of doing this? I didn't think merge was suitable for this...is it? THANKS. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but prefer values in on
No you cannot. You may want to write a merge function with the special capability but there is no better way than the one suggested by Henrique. On Sep 14, 12:18 pm, JiHO wrote: > On 2009-September-11 , at 13:55 , wrote: > > > Maybe: > > > do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b), > > drop = TRUE)), tail, 1)) > > > On Fri, Sep 11, 2009 at 3:45 AM, jo wrote: > > Thanks for the post-processing ideas. But is there any way to do that > > in one step? > > Thanks but by "in one step" I meant within the merge, not in one post- > processing step ;) > > JiHO > ---http://maururu.net > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but prefer values in on
On 2009-September-11 , at 13:55 , wrote: Maybe: do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b), drop = TRUE)), tail, 1)) On Fri, Sep 11, 2009 at 3:45 AM, jo wrote: Thanks for the post-processing ideas. But is there any way to do that in one step? Thanks but by "in one step" I meant within the merge, not in one post- processing step ;) JiHO --- http://maururu.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but prefer values in one
Maybe: do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b), drop = TRUE)), tail, 1)) On Fri, Sep 11, 2009 at 3:45 AM, jo wrote: > Thanks for the post-processing ideas. But is there any way to do that > in one step? > > On Thu, Sep 10, 2009 at 7:20 PM, Henrique Dallazuanna > wrote: > > > > Try this: > > > > xy <- merge(x, y, by = c("a","b"),all = TRUE) > > xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x', 'c.y')])) > 1, .x[,1], > rowSums(.x, na.rm = TRUE)) > > xy > > > > On Thu, Sep 10, 2009 at 12:21 PM, JiHO wrote: > > JiHO > --- > http://maururu.net > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but prefer values in one
Thanks for the post-processing ideas. But is there any way to do that in one step? On Thu, Sep 10, 2009 at 7:20 PM, Henrique Dallazuanna wrote: > > Try this: > > xy <- merge(x, y, by = c("a","b"),all = TRUE) > xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x', 'c.y')])) > 1, .x[,1], > rowSums(.x, na.rm = TRUE)) > xy > > On Thu, Sep 10, 2009 at 12:21 PM, JiHO wrote: JiHO --- http://maururu.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but prefer values in one
Try this: xy <- merge(x, y, by = c("a","b"),all = TRUE) xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x', 'c.y')])) > 1, .x[,1], rowSums(.x, na.rm = TRUE)) xy On Thu, Sep 10, 2009 at 12:21 PM, JiHO wrote: > Hello everyone, > > My problem is better explained with an example: > > > x=data.frame(a=1:4,b=1:4,c=rnorm(4)) > > x > a b c > 1 1 1 -0.8821089 > 2 2 2 -0.7082583 > 3 3 3 -0.5948835 > 4 4 4 -1.8571443 > > y=data.frame(a=c(1,3),b=3,c=rnorm(2)) > > y > a bc > 1 1 3 -0.273155973 > 2 3 3 0.009517862 > > Now I want to merge x and y by columns a and b, hence creating a data.frame > with all a:b combinations observed in x and y. That's easily done with > merge: > > > merge(x,y,by=c("a","b"),all=T) > a bc.x c.y > 1 1 1 -0.8821089 NA > 2 1 3 NA -0.273155973 > 3 2 2 -0.7082583 NA > 4 3 3 -0.5948835 0.009517862 > 5 4 4 -1.8571443 NA > > But rather than two c columns I would want the merge to: > - keep the value in x if there is no corresponding value in y > - keep the value in y if there is no corresponding value in x > - prefer the value in y when the a:b combination exists in both x and y > > So basically I want my result to look like: > a b c > 1 1 1 -0.8821089 > 2 1 3 -0.2731559 > 3 2 2 -0.7082583 > 4 3 3 0.0095178 > 5 4 4 -1.8571443 > > I can't find a combinations of options for merge that does this. Is there > another fonction that would do that or do I have to resort to some > post-processing after merge? It seems that it might be something like a > "right merge" for data bases but I don't know this world at all. I would be > happy to look into sqldf if that allows to do things like that. > > Thanks in advance. Sincerely, > > JiHO > --- > http://maururu.net > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge data frames but prefer values in one
Hello everyone, My problem is better explained with an example: > x=data.frame(a=1:4,b=1:4,c=rnorm(4)) > x a b c 1 1 1 -0.8821089 2 2 2 -0.7082583 3 3 3 -0.5948835 4 4 4 -1.8571443 > y=data.frame(a=c(1,3),b=3,c=rnorm(2)) > y a bc 1 1 3 -0.273155973 2 3 3 0.009517862 Now I want to merge x and y by columns a and b, hence creating a data.frame with all a:b combinations observed in x and y. That's easily done with merge: > merge(x,y,by=c("a","b"),all=T) a bc.x c.y 1 1 1 -0.8821089 NA 2 1 3 NA -0.273155973 3 2 2 -0.7082583 NA 4 3 3 -0.5948835 0.009517862 5 4 4 -1.8571443 NA But rather than two c columns I would want the merge to: - keep the value in x if there is no corresponding value in y - keep the value in y if there is no corresponding value in x - prefer the value in y when the a:b combination exists in both x and y So basically I want my result to look like: a b c 1 1 1 -0.8821089 2 1 3 -0.2731559 3 2 2 -0.7082583 4 3 3 0.0095178 5 4 4 -1.8571443 I can't find a combinations of options for merge that does this. Is there another fonction that would do that or do I have to resort to some post-processing after merge? It seems that it might be something like a "right merge" for data bases but I don't know this world at all. I would be happy to look into sqldf if that allows to do things like that. Thanks in advance. Sincerely, JiHO --- http://maururu.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but with a twist.
The inconsistency arose in order to satisfy backward compatibility while giving chron a direct way to use % codes. chron used its own format specification so it would have been difficult to add % codes there; however, as.chron, at the time, did not support a format specification at all so it was still possible to add a format specifier using % codes without disrupting existing code. On Thu, Aug 27, 2009 at 2:21 PM, Stephen Tucker wrote: > Ah, thanks always - > I originally thought as.chron() was required to have all fields (m/d/y > hh:mm:ss) as for chron() but I see that the former passes its 'format' > argument to as.POSIXct() > Good deal! > Stephen > > > > - Original Message > From: Gabor Grothendieck > To: Stephen Tucker > Cc: Tony Breyal ; r-help@r-project.org > Sent: Thursday, August 27, 2009 7:27:26 AM > Subject: Re: [R] Merge data frames but with a twist. > > On Thu, Aug 27, 2009 at 9:55 AM, Stephen Tucker wrote: >> You may want to use the reshape package for this task: >> >>> library(reshape) >>> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure") >> Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM >> 1 Firefly 3 1 >> 2 Red Dwarf 4 2 >> >> If you want to plot time series, you can do something like the following >> >>> mydf <- .Last.value ## save the output from above to mydf >>> library(zoo) >>> zobj <- zoo(`mode<-`(t(mydf),"numeric"), >>> as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p"))) >>> plot(zobj) >> >> (zobj is a time series object of the zoo class) > > Note that as.chron can take % codes directly so the as.chron portion > can be shortened to: > > as.chron(names(mydf)[-1],"%m/%d/%Y %I:%M %p") > > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but with a twist.
Ah, thanks always - I originally thought as.chron() was required to have all fields (m/d/y hh:mm:ss) as for chron() but I see that the former passes its 'format' argument to as.POSIXct() Good deal! Stephen - Original Message From: Gabor Grothendieck To: Stephen Tucker Cc: Tony Breyal ; r-help@r-project.org Sent: Thursday, August 27, 2009 7:27:26 AM Subject: Re: [R] Merge data frames but with a twist. On Thu, Aug 27, 2009 at 9:55 AM, Stephen Tucker wrote: > You may want to use the reshape package for this task: > >> library(reshape) >> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure") > Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM > 1 Firefly 3 1 > 2 Red Dwarf 4 2 > > If you want to plot time series, you can do something like the following > >> mydf <- .Last.value ## save the output from above to mydf >> library(zoo) >> zobj <- zoo(`mode<-`(t(mydf),"numeric"), >> as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p"))) >> plot(zobj) > > (zobj is a time series object of the zoo class) Note that as.chron can take % codes directly so the as.chron portion can be shortened to: as.chron(names(mydf)[-1],"%m/%d/%Y %I:%M %p") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but with a twist.
On Thu, Aug 27, 2009 at 9:55 AM, Stephen Tucker wrote: > You may want to use the reshape package for this task: > >> library(reshape) >> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure") > Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM > 1 Firefly 3 1 > 2 Red Dwarf 4 2 > > If you want to plot time series, you can do something like the following > >> mydf <- .Last.value ## save the output from above to mydf >> library(zoo) >> zobj <- zoo(`mode<-`(t(mydf),"numeric"), >> as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p"))) >> plot(zobj) > > (zobj is a time series object of the zoo class) Note that as.chron can take % codes directly so the as.chron portion can be shortened to: as.chron(names(mydf)[-1],"%m/%d/%Y %I:%M %p") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but with a twist.
You may want to use the reshape package for this task: > library(reshape) > recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure") Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM 1 Firefly 3 1 2 Red Dwarf 4 2 If you want to plot time series, you can do something like the following > mydf <- .Last.value ## save the output from above to mydf > library(zoo) > zobj <- zoo(`mode<-`(t(mydf),"numeric"), > as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p"))) > plot(zobj) (zobj is a time series object of the zoo class) - Original Message From: Tony Breyal To: r-help@r-project.org Sent: Thursday, August 27, 2009 4:04:30 AM Subject: [R] Merge data frames but with a twist. Dear all, Question: How to merge two data frames such that new column are added in a particular way? I'm not actually sure how to best articulate my question to be honest, so i hope showing you what I want to achieve will communicate my question better. Lets say I have two data frames: > DF1 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=1:2, > Datetime=c('08/26/2009 9:30 AM', '08/26/2009 9:30 AM'))) > DF2 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=3:4, > Datetime=c('08/26/2009 11:30 AM', '08/26/2009 11:30 AM'))) And then let us merge these: > DF3 <- merge(DF1, DF2, all=TRUE) Show MeasureDatetime 1 Firefly 1 08/26/2009 9:30 AM 2 Firefly 3 08/26/2009 11:30 AM 3 Red Dwarf 2 08/26/2009 9:30 AM 4 Red Dwarf 4 08/26/2009 11:30 AM What i would like to do is merge the data frames such that i end up with the following: Show 08/26/2009 9:30 AM08/26/2009 11:30 AM Firefly 13 Red Dwarf24 my reason for doing this is so that i can plot a time series somehow. I hope the formating stays when i post this message and that what i'm trying to do is easy to understand. Thank you kindly for any help in advance. Tony __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frames but with a twist.
Try this: xtabs(as.numeric(Measure) ~ Show + Datetime, data = DF3) On Thu, Aug 27, 2009 at 8:04 AM, Tony Breyal wrote: > Dear all, > > Question: How to merge two data frames such that new column are added > in a particular way? > > I'm not actually sure how to best articulate my question to be honest, > so i hope showing you what I want to achieve will communicate my > question better. > > Lets say I have two data frames: > > > DF1 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=1:2, > Datetime=c('08/26/2009 9:30 AM', '08/26/2009 9:30 AM'))) > > > DF2 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=3:4, > Datetime=c('08/26/2009 11:30 AM', '08/26/2009 11:30 AM'))) > > And then let us merge these: > > > DF3 <- merge(DF1, DF2, all=TRUE) > > Show MeasureDatetime > 1 Firefly 1 08/26/2009 9:30 AM > 2 Firefly 3 08/26/2009 11:30 AM > 3 Red Dwarf 2 08/26/2009 9:30 AM > 4 Red Dwarf 4 08/26/2009 11:30 AM > > > What i would like to do is merge the data frames such that i end up > with the following: > > Show 08/26/2009 9:30 AM08/26/2009 11:30 AM > Firefly 13 > Red Dwarf24 > > my reason for doing this is so that i can plot a time series somehow. > > I hope the formating stays when i post this message and that what i'm > trying to do is easy to understand. Thank you kindly for any help in > advance. > > Tony > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge data frames but with a twist.
Dear all, Question: How to merge two data frames such that new column are added in a particular way? I'm not actually sure how to best articulate my question to be honest, so i hope showing you what I want to achieve will communicate my question better. Lets say I have two data frames: > DF1 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=1:2, > Datetime=c('08/26/2009 9:30 AM', '08/26/2009 9:30 AM'))) > DF2 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=3:4, > Datetime=c('08/26/2009 11:30 AM', '08/26/2009 11:30 AM'))) And then let us merge these: > DF3 <- merge(DF1, DF2, all=TRUE) Show MeasureDatetime 1 Firefly 1 08/26/2009 9:30 AM 2 Firefly 3 08/26/2009 11:30 AM 3 Red Dwarf 2 08/26/2009 9:30 AM 4 Red Dwarf 4 08/26/2009 11:30 AM What i would like to do is merge the data frames such that i end up with the following: Show 08/26/2009 9:30 AM08/26/2009 11:30 AM Firefly 13 Red Dwarf24 my reason for doing this is so that i can plot a time series somehow. I hope the formating stays when i post this message and that what i'm trying to do is easy to understand. Thank you kindly for any help in advance. Tony __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frame and keep unmatched
Or if you need it to be fast, try data.table. X[Y] is a join when X and Y are both data.tables. X[Y] is a left join, Y[X] is a right join. 'nomatch' controls the inner/outer join i.e. what happens for unmatched rows. This is much faster than merge(). "Gabor Grothendieck" wrote in message news:971536df0906100704q433f5f99ld3f9c23e69d95...@mail.gmail.com... Try: merge(completedf, partdf, all.x = TRUE) or library(sqldf) # see http://sqldf.googlecode.com sqldf("select * from completedf left join partdf using(beta, alpha)") On Wed, Jun 10, 2009 at 9:56 AM, Etienne B. Racine wrote: > > Hi, > > With two data sets, one complete and another one partial, I would like to > merge them and keep the unmatched lines. The problem is that merge() > dosen't > keep the unmatched lines. Is there another function that I could use to > merge the data frames. > > Example: > > completedf <- expand.grid(alpha=letters[1:3],beta=1:3) > partdf <- data.frame( > alpha= c('a','a','c'), > beta = c(1,3,2), > val = c(2,6,4)) > > mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta')) > # it only kept the common rows > nrow(mergedf) > > Thanks, > Etienne > -- > View this message in context: > http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frame and keep unmatched
Try: merge(completedf, partdf, all.x = TRUE) or library(sqldf) # see http://sqldf.googlecode.com sqldf("select * from completedf left join partdf using(beta, alpha)") On Wed, Jun 10, 2009 at 9:56 AM, Etienne B. Racine wrote: > > Hi, > > With two data sets, one complete and another one partial, I would like to > merge them and keep the unmatched lines. The problem is that merge() dosen't > keep the unmatched lines. Is there another function that I could use to > merge the data frames. > > Example: > > completedf <- expand.grid(alpha=letters[1:3],beta=1:3) > partdf <- data.frame( > alpha= c('a','a','c'), > beta = c(1,3,2), > val = c(2,6,4)) > > mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta')) > # it only kept the common rows > nrow(mergedf) > > Thanks, > Etienne > -- > View this message in context: > http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge data frame and keep unmatched
On Jun 10, 2009, at 8:56 AM, Etienne B. Racine wrote: Hi, With two data sets, one complete and another one partial, I would like to merge them and keep the unmatched lines. The problem is that merge() dosen't keep the unmatched lines. Is there another function that I could use to merge the data frames. Example: completedf <- expand.grid(alpha=letters[1:3],beta=1:3) partdf <- data.frame( alpha= c('a','a','c'), beta = c(1,3,2), val = c(2,6,4)) mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta')) # it only kept the common rows nrow(mergedf) Thanks, Etienne Is this what you want? > merge(x=completedf, y=partdf, by=c('alpha','beta'), all = TRUE) alpha beta val 1 a1 2 2 a2 NA 3 a3 6 4 b1 NA 5 b2 NA 6 b3 NA 7 c1 NA 8 c2 4 9 c3 NA Note the 'all', 'all.x' and 'all.y' arguments... HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge data frame and keep unmatched
Hi, With two data sets, one complete and another one partial, I would like to merge them and keep the unmatched lines. The problem is that merge() dosen't keep the unmatched lines. Is there another function that I could use to merge the data frames. Example: completedf <- expand.grid(alpha=letters[1:3],beta=1:3) partdf <- data.frame( alpha= c('a','a','c'), beta = c(1,3,2), val = c(2,6,4)) mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta')) # it only kept the common rows nrow(mergedf) Thanks, Etienne -- View this message in context: http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data frames with same column names of differe nt lengths and missing values
Steven Lubitz yahoo.com> writes: > Thank you - this is very helpful. However I realized that with my real data sets (not the example I have here), > I also have different numbers of columns in each data frame. rbind doesn't seem to like this. Here's a > modified example: > > x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), item3=c(NA,2,NA,4,NA), id=1:5) > y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) > > rbind(x,y) You should add dummy variables to each partial data frame such that they look the same, and do the rbind later. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data frames with same column names of different lengths and missing values
Subject: Re: [R] merge data frames with same column names of different lengths and missing values To: "Phil Spector" Date: Saturday, March 7, 2009, 5:01 PM Phil, Thank you - this is very helpful. However I realized that with my real data sets (not the example I have here), I also have different numbers of columns in each data frame. rbind doesn't seem to like this. Here's a modified example: x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), item3=c(NA,2,NA,4,NA), id=1:5) y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) rbind(x,y) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Any ideas? Thanks, Steve --- On Sat, 3/7/09, Phil Spector wrote: From: Phil Spector Subject: Re: [R] merge data frames with same column names of different lengths and missing values To: "Steven Lubitz" Date: Saturday, March 7, 2009, 1:56 AM Steven - I believe this gives the output that you desire: > xy = rbind(x,y) > aggregate(subset(xy,select=-id),xy['id'],function(x)rev(x[!is.na(x)])[1]) id item1 item2 1 1NA 1 2 2 2NA 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6NA But I think what merge x y; by id; would give you is > aggregate(subset(xy,select=-id),xy['id'],function(x)x[length(x)]) id item1 item2 1 1NANA 2 2 2NA 3 3NA 3 4 4 4 4 5 5 5 5 6 6 6NA - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 6 Mar 2009, Steven Lubitz wrote: > > Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations: > > x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) > y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) > > > merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result > id item1.x item2.x item1.y item2.y > 1 1 NA 1 NA NA > 2 2 NA NA 2 NA > 3 3 3 NA NA 3 > 4 4 4 4 4 4 > 5 5 5 5 5 5 > > > merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result > id item1 item2 > 1 4 4 4 > 2 5 5 5 > > > merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID > id item1 item2 > 1 1NA 1 > 2 1NANA > 3 2 2NA > 4 2NANA > 5 3 3NA > 6 3NA 3 > 7 4 4 4 > 8 5 5 5 > 9 6 6NA > > In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated. > > Steve Lubitz > Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data frames with same column names of different lengths and missing values
Steve, I don't know if R has such a function to perform the task you were asking. I wrote one myself. Try the following to see if it works for you. The new function "merge.new" has one additional argument col.ID, which is the column number of ID column. To use your x, y as examples, type: merge.new(x,y,all=TRUE,col.ID=3) # merge.new<-function(...,col.ID){ inter<-merge(...) inter<-inter[order(inter[col.ID]),] #merged data sorted by ID #total columns and rows for the target dataframe total.row<-length(unique(inter[[col.ID]])) total.col<-dim(inter)[2] row.ID<-unique(inter[[col.ID]]) target<-matrix(NA,total.row,total.col) target<-as.data.frame(target) names(target)<-names(inter) for (i in 1:total.row){ inter.part<-inter[inter[col.ID]==row.ID[i],] #select all rows with the same ID for (j in 1:total.col){ if (is.na(inter.part[1,j])){ if(is.na(inter.part[2,j])) {target[i,j]=NA} else {target[i,j]=inter.part[2,j]} } else {target[i,j]=inter.part[1,j]} } } print(paste("total rows=",total.row)) print(paste("total columns=",total.col)) return(target) } # -- Jun Shen PhD PK/PD Scientist BioPharma Services Millipore Corporation 15 Research Park Dr. St Charles, MO 63304 Direct: 636-720-1589 On Fri, Mar 6, 2009 at 11:02 PM, Steven Lubitz wrote: > > Hello, I'm switching over from SAS to R and am having trouble merging data > frames. The data frames have several columns with the same name, and each > has a different number of rows. Some of the values are missing from cells > with the same column names in each data frame. I had hoped that when I > merged the dataframes, every column with the same name would be merged, with > the value in a complete cell overwriting the value in an empty cell from the > other data frame. I cannot seem to achieve this result, though I've tried > several merge adaptations: > > x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) > y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) > > > merge(x,y,by="id") #I lose observations here (n=1 in this example), and my > items are duplicated - I do not want this result > id item1.x item2.x item1.y item2.y > 1 1 NA 1 NA NA > 2 2 NA NA 2 NA > 3 3 3 NA NA 3 > 4 4 4 4 4 4 > 5 5 5 5 5 5 > > > merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) > and do not want this result > id item1 item2 > 1 4 4 4 > 2 5 5 5 > > > merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are > duplicated and the NA values are retained - I instead want one row per ID > id item1 item2 > 1 1NA 1 > 2 1NANA > 3 2 2NA > 4 2NANA > 5 3 3NA > 6 3NA 3 > 7 4 4 4 > 8 5 5 5 > 9 6 6NA > > In reality I have multiple data frames with numerous columns, all with this > problem. I can do the merge seamlessly in SAS, but am trying to learn and > stick with R for my analyses. Any help would be greatly appreciated. > > Steve Lubitz > Cardiovascular Research Fellow, Brigham and Women's Hospital and > Massachusetts General Hospital > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data frames with same column names of different lengths and missing values
Steven Lubitz wrote: Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations: x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result id item1.x item2.x item1.y item2.y 1 1 NA 1 NA NA 2 2 NA NA 2 NA 3 3 3 NA NA 3 4 4 4 4 4 4 5 5 5 5 5 5 merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result id item1 item2 1 4 4 4 2 5 5 5 merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID id item1 item2 1 1NA 1 2 1NANA 3 2 2NA 4 2NANA 5 3 3NA 6 3NA 3 7 4 4 4 8 5 5 5 9 6 6NA You should obtain the desired solution using: merge(y, x, by=c("id","item1","item2"), all=TRUE) In database terminology all=TRUE corresponds to the full outer join, all.x to the left outer join and all.y to the right outer join. Ciao, domenico In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated. Steve Lubitz Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge data frames with same column names of differe nt lengths and missing values
Steven Lubitz yahoo.com> writes: > > x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) > y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) > > merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are > retained - I instead want one row per ID > id item1 item2 > 1 1NA 1 > 2 1NANA > 3 2 2NA > 4 2NANA > 5 3 3NA > 6 3NA 3 > 7 4 4 4 > 8 5 5 5 > 9 6 6NA > I think you only got the wrong (too complex) function. Try rbind(x,y) Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge data frames with same column names of different lengths and missing values
Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations: x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result id item1.x item2.x item1.y item2.y 1 1 NA 1 NA NA 2 2 NA NA 2 NA 3 3 3 NA NA 3 4 4 4 4 4 4 5 5 5 5 5 5 merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result id item1 item2 1 4 4 4 2 5 5 5 merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID id item1 item2 1 1NA 1 2 1NANA 3 2 2NA 4 2NANA 5 3 3NA 6 3NA 3 7 4 4 4 8 5 5 5 9 6 6NA In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated. Steve Lubitz Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.