[R] How to select a row from one dataframe that is close to a row in another dataframe
I have two data frames of flight data, but they have very different numbers of rows. They come from different sources, so the data are not identical. names(oooi) [1] FltOrigDt MkdCrrCd [3] MkdFltNbr DprtTrpnStnCd [5] ArrTrpnStnCdActualOutLocalTimestamp [7] ActualOffLocal ActualOnLocal [9] ActualInLocal ArrivalGate [11] DepartureGate Flight [13] OnDate MinutesIntoDay [15] OnHour pt names(runway) [1] OnDateTime IATA ICAO Flight [5] AircraftType Tail ArrivedSTA [9] Runway From.ToDelay OnDate [13] MinutesIntoDay pt These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) 3600) (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a row from one dataframe that is close to a row in another dataframe
If the flight identifiers runway$Flight and oooi$Flight are unique (i.e. only one observation has the same identifier in each dataset), you could use merge() to bind together the dataset based on matching the two. See, ?merge Also, I see an OnDate variable in both dataset. So if Flight does not provide unique identification, maybe Flight and OnDate together do, which can also be handled in merge. Let us know if that solves the problem. Best, Daniel - cuncta stricte discussurus - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of James Rome Sent: Saturday, March 20, 2010 10:20 AM To: r-help@r-project.org Subject: [R] How to select a row from one dataframe that is close to a row in another dataframe I have two data frames of flight data, but they have very different numbers of rows. They come from different sources, so the data are not identical. names(oooi) [1] FltOrigDt MkdCrrCd [3] MkdFltNbr DprtTrpnStnCd [5] ArrTrpnStnCdActualOutLocalTimestamp [7] ActualOffLocal ActualOnLocal [9] ActualInLocal ArrivalGate [11] DepartureGate Flight [13] OnDate MinutesIntoDay [15] OnHour pt names(runway) [1] OnDateTime IATA ICAO Flight [5] AircraftType Tail ArrivedSTA [9] Runway From.ToDelay OnDate [13] MinutesIntoDay pt These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) 3600) (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a row from one dataframe that is close to a row in another dataframe
sqldf package: n - 200 x - data.frame(id=sample(1:10,n, TRUE), timex=runif(n)) y - data.frame(id=sample(1:15, n, TRUE), timey=runif(n), runway=seq(n)) require(sqldf) sqldf(select x.id, timex, y.id, timey, y.runway from x join y + where x.id = y.id and abs(timex - timey) 0.005) id timex id timey runway 1 7 0.39110674 7 0.38859025 61 2 7 0.58829365 7 0.58581915133 3 9 0.34610180 9 0.34866404 65 4 2 0.72069416 2 0.72080087 7 5 3 0.75491843 3 0.75640271 37 6 5 0.86930840 5 0.87373981 15 7 3 0.07181249 3 0.06935293108 8 6 0.55273483 6 0.55286014101 9 10 0.50377737 10 0.50563669139 10 1 0.17325424 1 0.17129662 32 11 3 0.09013624 3 0.09281639112 12 8 0.04805349 8 0.04618661196 13 2 0.19428938 2 0.19260035 50 14 10 0.16194595 10 0.16565594 44 15 7 0.51193601 7 0.51352435 14 16 3 0.02331951 3 0.02119733 64 17 9 0.69456540 9 0.69376281 39 18 2 0.20070366 2 0.20432466134 19 10 0.50438411 10 0.50563669139 20 4 0.25271897 4 0.25211036 72 On Sat, Mar 20, 2010 at 10:20 AM, James Rome jamesr...@gmail.com wrote: I have two data frames of flight data, but they have very different numbers of rows. They come from different sources, so the data are not identical. names(oooi) [1] FltOrigDt MkdCrrCd [3] MkdFltNbr DprtTrpnStnCd [5] ArrTrpnStnCdActualOutLocalTimestamp [7] ActualOffLocal ActualOnLocal [9] ActualInLocal ArrivalGate [11] DepartureGate Flight [13] OnDate MinutesIntoDay [15] OnHour pt names(runway) [1] OnDateTime IATA ICAO Flight [5] AircraftType Tail ArrivedSTA [9] Runway From.ToDelay OnDate [13] MinutesIntoDay pt These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) 3600) (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a row from one dataframe that is close to a row in another dataframe
Using the sqldf package you could do an SQL join with the indicated condition in your where clause. See the examples section of this page: http://sqldf.googlecode.com On Sat, Mar 20, 2010 at 10:20 AM, James Rome jamesr...@gmail.com wrote: I have two data frames of flight data, but they have very different numbers of rows. They come from different sources, so the data are not identical. names(oooi) [1] FltOrigDt MkdCrrCd [3] MkdFltNbr DprtTrpnStnCd [5] ArrTrpnStnCd ActualOutLocalTimestamp [7] ActualOffLocal ActualOnLocal [9] ActualInLocal ArrivalGate [11] DepartureGate Flight [13] OnDate MinutesIntoDay [15] OnHour pt names(runway) [1] OnDateTime IATA ICAO Flight [5] AircraftType Tail Arrived STA [9] Runway From.To Delay OnDate [13] MinutesIntoDay pt These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) 3600) (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a row from one dataframe that is close to a row in another dataframe
On 3/20/2010 11:52 AM, Daniel Malter wrote: If the flight identifiers runway$Flight and oooi$Flight are unique (i.e. only one observation has the same identifier in each dataset), you could use merge() to bind together the dataset based on matching the two. See, ?merge Also, I see an OnDate variable in both dataset. So if Flight does not provide unique identification, maybe Flight and OnDate together do, which can also be handled in merge. Let us know if that solves the problem. Best, Daniel --- Alas, the flight names are not unique (they fly each day). You would think that the OnDate would be the same, but flights arriving at midnight could appear on different days, which is why I am using seconds past 1/1/1970. Will merge work with different length dataframes? Perhaps I could do it in multiple steps, assuming that the dates were the same, and then fixing the errors? And I found out that abs() will not take difftime as an argument. I hope I can multiply a difftime by itself and check that way. And to use sqldf, it looks as if I have to read the source data files directly into sqldf to use it. It has to make a database. In that case, wouldn't I be better doing the whole thing in a database? Jim names(oooi) [1] FltOrigDt MkdCrrCd [3] MkdFltNbr DprtTrpnStnCd [5] ArrTrpnStnCdActualOutLocalTimestamp [7] ActualOffLocal ActualOnLocal [9] ActualInLocal ArrivalGate [11] DepartureGate Flight [13] OnDate MinutesIntoDay [15] OnHour pt names(runway) [1] OnDateTime IATA ICAO Flight [5] AircraftType Tail ArrivedSTA [9] Runway From.ToDelay OnDate [13] MinutesIntoDay pt These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) 3600) (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a row from one dataframe that is close to a row in another dataframe
On Sat, Mar 20, 2010 at 1:39 PM, James Rome jamesr...@gmail.com wrote: And to use sqldf, it looks as if I have to read the source data files directly into sqldf to use it. No. The whole idea of sqldf is that it operates directly on data frames. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.