[R] How to select a row from one dataframe that is close to a row in another dataframe

2010-03-20 Thread James Rome
I have two data frames of flight data,  but they have very different
numbers of rows. They come from different sources, so the data are not
identical.

 names(oooi)
 [1] FltOrigDt   MkdCrrCd  
 [3] MkdFltNbr   DprtTrpnStnCd 
 [5] ArrTrpnStnCdActualOutLocalTimestamp
 [7] ActualOffLocal  ActualOnLocal 
 [9] ActualInLocal   ArrivalGate   
[11] DepartureGate   Flight
[13] OnDate  MinutesIntoDay
[15] OnHour  pt  

 names(runway)
 [1] OnDateTime IATA   ICAO   Flight   
 [5] AircraftType   Tail   ArrivedSTA  
 [9] Runway From.ToDelay  OnDate   
[13] MinutesIntoDay pt   

These sets have several hundred thousand rows.

In both sets, pt is a POSIXct for the arrival time (from different
sources). They are not identical, but surely should be within an hour of
each other (hopefully a lot less), and the Flight fields must be the
same. So
(abs(runway$pt - oooi$pt)  3600)  (runway$Flight == oooi$Flight)
should pick out the corresponding rows in the two data sets (if there is
a match).

What I need to do is to take the Runway from runway and insert it into
the oooi df for the correct flight.

What is the best way to do this in R?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to select a row from one dataframe that is close to a row in another dataframe

2010-03-20 Thread Daniel Malter
If the flight identifiers runway$Flight and oooi$Flight are unique (i.e.
only one observation has the same identifier in each dataset), you could use
merge() to bind together the dataset based on matching the two. See,

?merge

Also, I see an OnDate variable in both dataset. So if Flight does not
provide unique identification, maybe Flight and OnDate together do, which
can also be handled in merge.

Let us know if that solves the problem.

Best,
Daniel 

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of James Rome
Sent: Saturday, March 20, 2010 10:20 AM
To: r-help@r-project.org
Subject: [R] How to select a row from one dataframe that is close to a row
in another dataframe

I have two data frames of flight data,  but they have very different
numbers of rows. They come from different sources, so the data are not
identical.

 names(oooi)
 [1] FltOrigDt   MkdCrrCd  
 [3] MkdFltNbr   DprtTrpnStnCd 
 [5] ArrTrpnStnCdActualOutLocalTimestamp
 [7] ActualOffLocal  ActualOnLocal 
 [9] ActualInLocal   ArrivalGate   
[11] DepartureGate   Flight
[13] OnDate  MinutesIntoDay
[15] OnHour  pt  

 names(runway)
 [1] OnDateTime IATA   ICAO   Flight   
 [5] AircraftType   Tail   ArrivedSTA  
 [9] Runway From.ToDelay  OnDate   
[13] MinutesIntoDay pt   

These sets have several hundred thousand rows.

In both sets, pt is a POSIXct for the arrival time (from different
sources). They are not identical, but surely should be within an hour of
each other (hopefully a lot less), and the Flight fields must be the
same. So
(abs(runway$pt - oooi$pt)  3600)  (runway$Flight == oooi$Flight)
should pick out the corresponding rows in the two data sets (if there is
a match).

What I need to do is to take the Runway from runway and insert it into
the oooi df for the correct flight.

What is the best way to do this in R?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to select a row from one dataframe that is close to a row in another dataframe

2010-03-20 Thread jim holtman
sqldf package:

 n - 200
 x - data.frame(id=sample(1:10,n, TRUE), timex=runif(n))
 y - data.frame(id=sample(1:15, n, TRUE), timey=runif(n), runway=seq(n))
 require(sqldf)
 sqldf(select x.id, timex, y.id, timey, y.runway from x join y
+ where x.id = y.id and abs(timex - timey)  0.005)
   id  timex id  timey runway
1   7 0.39110674  7 0.38859025 61
2   7 0.58829365  7 0.58581915133
3   9 0.34610180  9 0.34866404 65
4   2 0.72069416  2 0.72080087  7
5   3 0.75491843  3 0.75640271 37
6   5 0.86930840  5 0.87373981 15
7   3 0.07181249  3 0.06935293108
8   6 0.55273483  6 0.55286014101
9  10 0.50377737 10 0.50563669139
10  1 0.17325424  1 0.17129662 32
11  3 0.09013624  3 0.09281639112
12  8 0.04805349  8 0.04618661196
13  2 0.19428938  2 0.19260035 50
14 10 0.16194595 10 0.16565594 44
15  7 0.51193601  7 0.51352435 14
16  3 0.02331951  3 0.02119733 64
17  9 0.69456540  9 0.69376281 39
18  2 0.20070366  2 0.20432466134
19 10 0.50438411 10 0.50563669139
20  4 0.25271897  4 0.25211036 72



On Sat, Mar 20, 2010 at 10:20 AM, James Rome jamesr...@gmail.com wrote:

 I have two data frames of flight data,  but they have very different
 numbers of rows. They come from different sources, so the data are not
 identical.

  names(oooi)
  [1] FltOrigDt   MkdCrrCd
  [3] MkdFltNbr   DprtTrpnStnCd
  [5] ArrTrpnStnCdActualOutLocalTimestamp
  [7] ActualOffLocal  ActualOnLocal
  [9] ActualInLocal   ArrivalGate
 [11] DepartureGate   Flight
 [13] OnDate  MinutesIntoDay
 [15] OnHour  pt

  names(runway)
  [1] OnDateTime IATA   ICAO   Flight
  [5] AircraftType   Tail   ArrivedSTA
  [9] Runway From.ToDelay  OnDate
 [13] MinutesIntoDay pt

 These sets have several hundred thousand rows.

 In both sets, pt is a POSIXct for the arrival time (from different
 sources). They are not identical, but surely should be within an hour of
 each other (hopefully a lot less), and the Flight fields must be the
 same. So
 (abs(runway$pt - oooi$pt)  3600)  (runway$Flight == oooi$Flight)
 should pick out the corresponding rows in the two data sets (if there is
 a match).

 What I need to do is to take the Runway from runway and insert it into
 the oooi df for the correct flight.

 What is the best way to do this in R?

 Thanks,
 Jim Rome

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to select a row from one dataframe that is close to a row in another dataframe

2010-03-20 Thread Gabor Grothendieck
Using the sqldf package you could do an SQL join with the indicated
condition in your where clause.  See the examples section of this
page: http://sqldf.googlecode.com

On Sat, Mar 20, 2010 at 10:20 AM, James Rome jamesr...@gmail.com wrote:
 I have two data frames of flight data,  but they have very different
 numbers of rows. They come from different sources, so the data are not
 identical.

 names(oooi)
  [1] FltOrigDt               MkdCrrCd
  [3] MkdFltNbr               DprtTrpnStnCd
  [5] ArrTrpnStnCd            ActualOutLocalTimestamp
  [7] ActualOffLocal          ActualOnLocal
  [9] ActualInLocal           ArrivalGate
 [11] DepartureGate           Flight
 [13] OnDate                  MinutesIntoDay
 [15] OnHour                  pt

 names(runway)
  [1] OnDateTime     IATA           ICAO           Flight
  [5] AircraftType   Tail           Arrived        STA
  [9] Runway         From.To        Delay          OnDate
 [13] MinutesIntoDay pt

 These sets have several hundred thousand rows.

 In both sets, pt is a POSIXct for the arrival time (from different
 sources). They are not identical, but surely should be within an hour of
 each other (hopefully a lot less), and the Flight fields must be the
 same. So
 (abs(runway$pt - oooi$pt)  3600)  (runway$Flight == oooi$Flight)
 should pick out the corresponding rows in the two data sets (if there is
 a match).

 What I need to do is to take the Runway from runway and insert it into
 the oooi df for the correct flight.

 What is the best way to do this in R?

 Thanks,
 Jim Rome

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to select a row from one dataframe that is close to a row in another dataframe

2010-03-20 Thread James Rome
On 3/20/2010 11:52 AM, Daniel Malter wrote:

If the flight identifiers runway$Flight and oooi$Flight are unique (i.e.
only one observation has the same identifier in each dataset), you could use
merge() to bind together the dataset based on matching the two. See,

?merge

Also, I see an OnDate variable in both dataset. So if Flight does not
provide unique identification, maybe Flight and OnDate together do, which
can also be handled in merge.

Let us know if that solves the problem.

Best,
Daniel 
---
Alas, the flight names are not unique (they fly each day). You would think that 
the OnDate would be the same, but flights arriving at midnight could appear on 
different days, which is why I am using seconds past 1/1/1970.

Will merge work with different length dataframes? Perhaps I could do it in 
multiple steps, assuming that the dates were the same, and then fixing the 
errors?

And I found out that abs() will not take difftime as an argument. I hope I can 
multiply a difftime by itself and check that way.

And to use sqldf, it looks as if I have to read the source data files directly 
into sqldf to use it. It has to make a database. In that case, wouldn't I be 
better doing the whole thing in a database?

Jim

 names(oooi)
   
 [1] FltOrigDt   MkdCrrCd  
 [3] MkdFltNbr   DprtTrpnStnCd 
 [5] ArrTrpnStnCdActualOutLocalTimestamp
 [7] ActualOffLocal  ActualOnLocal 
 [9] ActualInLocal   ArrivalGate   
[11] DepartureGate   Flight
[13] OnDate  MinutesIntoDay
[15] OnHour  pt  


 names(runway)
   
 [1] OnDateTime IATA   ICAO   Flight   
 [5] AircraftType   Tail   ArrivedSTA  
 [9] Runway From.ToDelay  OnDate   
[13] MinutesIntoDay pt   

These sets have several hundred thousand rows.

In both sets, pt is a POSIXct for the arrival time (from different
sources). They are not identical, but surely should be within an hour of
each other (hopefully a lot less), and the Flight fields must be the
same. So
(abs(runway$pt - oooi$pt)  3600)  (runway$Flight == oooi$Flight)
should pick out the corresponding rows in the two data sets (if there is
a match).

What I need to do is to take the Runway from runway and insert it into
the oooi df for the correct flight.

What is the best way to do this in R?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to select a row from one dataframe that is close to a row in another dataframe

2010-03-20 Thread Gabor Grothendieck
On Sat, Mar 20, 2010 at 1:39 PM, James Rome jamesr...@gmail.com wrote:
 And to use sqldf, it looks as if I have to read the source data files 
 directly into sqldf to use it.

No. The whole idea of sqldf is that it operates directly on data frames.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.