Re: [R] Comparing dates in dataframes

2010-01-18 Thread James Rome
I finally figured this out with the help of Dave and Dennis.

The steps I had to do were
1. Convert dates to POSIXlt
2. Create a column in each frame that was paste(date, quarter)
3. Then %in% worked instantly.

gw = c(length(arr))
gw[1:length(arr[,1])]=FALSE
arr["gw"]=gw   # put a column of 0s into arr
arr$Date <- as.Date(as.character(arr$Date),format="%m/%d/%y")
weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
weather$dq = paste(as.character(weather$Date),
as.character(weather$quarter))
arr$dq = paste(as.character(arr$Date), as.character(arr$quarter))
arr$gw <- as.numeric(arr$dq %in% weather$dq)

Thanks for all the help,
Jim

On 1/17/10 10:16 PM, David Winsemius wrote:
>
> On Jan 17, 2010, at 9:22 PM, Dennis Murphy wrote:
>
>> I thought it was a solution :)
>>
>> The gw column is an indicator that is meant to match quarters with
>> good weather
>> (in the weather data frame) to the quarters in the arr data frame: 1 
>> = good weather,
>> 0 = not. I split up both data frames by Date since the quarter values
>> are in the
>> same range from day to day, and then ran a loop to generate the
>> indicator gw by
>> finding whether the quarter in the i-th component of the arrl list
>> matched the corresponding reference table of quarters (of good
>> weather) in the wc list.
>> do.call then slurps it all together into a data frame. 
>> My understanding was that James
>> just wanted to be able to distinguish arrivals during good weather
>> quarters from those
>> in bad weather quarters.
>
> This was my understanding, too. I was just trying to get to some point
> where that could be expressed in an R-encodable format and implemented.
>
>
>>
>> I did the tables as a sanity check, relative to the hand calculations
>> I did earlier as a ballpark
>> estimate.
>
> The rather small dataset may have been enough. We will need to see
> what the OP says  ...  is that a sufficient solution?
>
>
>>
>> Does this make sense?
>>
>> Dennis
>>
>> On Sun, Jan 17, 2010 at 6:04 PM, David Winsemius
>> mailto:dwinsem...@comcast.net>> wrote:
>>
>> I'm not clear about your last message. Do you have a solution?
>>
>> -- 
>> David.
>>
>> On Jan 17, 2010, at 8:27 PM, Dennis Murphy wrote:
>>
>>> Hi James and David:
>>>
>>> I tried the following: split the quarters from weather into a
>>> list by Date, and ditto with arr.
>>> Then assign value of gw by running a (sleazy) loop over the
>>> components of the arr list...
>>>
>>> wc <- split(weather$quarter, weather$Date)
>>> arrl <- split(arr, arr$Date)
>>>
>>> # Note that there are four dates in wc and three in arrl...
>>> for(i in seq_along(arrl)) {
>>>   arrl[[i]]$gw <- as.numeric(arrl[[i]]$quarter %in% wc[[i]])  }
>>> arr2 <- do.call(rbind, arrl)
>>> dim(arr2)
>>> [1] 11269
>>> table(arr2$gw)
>>>
>>>   0   1
>>> 661 465
>>> with(arr2, table(Date, gw))
>>> gw
>>> Date   0   1
>>>   2009-01-01 368  99
>>>   2009-01-02 266 348
>>>   2009-01-03  27  18
>>>
>>> OK, I was a bit off, but at least we know this is in the
>>> ballpark of my estimates  :)
>>> I'm sure David will come up with something more elegant, but
>>> this seems to work.
>>>
>>> HTH,
>>> Dennis
>>>
>>>
>>> On Sun, Jan 17, 2010 at 5:17 PM, James Rome >> > wrote:
>>>
>>> Any entry in the weather data is a good day. That is the
>>> point. And
>>> please ignore my mistake about the quarters getting too large in
>>> weather. I am being swamped with versions, and it does not
>>> matter for
>>> this purpose.. so, the bad weather days are not in the
>>> weather data set.
>>>
>>> I am trying to get gw=1 in arr if the date and quarter are
>>> in weather.
>>>
>>> Thanks,
>>> Jim
>>>
>>> On 1/17/10 7:46 PM, David Winsemius wrote:
>>> > But, but, but  there is no weather goodness variable
>>> in weather?!?!?!
>>> >
>>> > > str(weather)
>>> > 'data.frame':155 obs. of  4 variables:
>>> >  $ Date   :Class 'Date'  num [1:155] 14245 14245 14245
>>> 14245 14245 ...
>>> >  $ minute : int  5 15 30 45 0 15 30 45 0 15 ...
>>> >  $ hour   : int  15 15 15 15 17 17 17 17 18 18 ...
>>> >  $ quarter: int  65 75 90 105 68 83 98 113 72 87 ..
>>> >
>>> > I thought you said the "weather" dataframe would have some
>>> information
>>> > about "goodness" that we were supposed to map to
>>> arrivals.? What is
>>> > the meaning of those variables? How do we define a "good"
>>> quarter
>>> > hour? And why are the values of quarter not 1, 2, 3, 4?
>>> They ought to
>>> > be a factor or integer that could be matched to those that
>>> are in
>>> > "arr", which are also apparently not so de

Re: [R] Comparing dates in dataframes

2010-01-17 Thread James Rome
Any entry in the weather data is a good day. That is the point. And
please ignore my mistake about the quarters getting too large in
weather. I am being swamped with versions, and it does not matter for
this purpose.. so, the bad weather days are not in the weather data set.

I am trying to get gw=1 in arr if the date and quarter are in weather.

Thanks,
Jim

On 1/17/10 7:46 PM, David Winsemius wrote:
> But, but, but  there is no weather goodness variable in weather?!?!?!
>
> > str(weather)
> 'data.frame':155 obs. of  4 variables:
>  $ Date   :Class 'Date'  num [1:155] 14245 14245 14245 14245 14245 ...
>  $ minute : int  5 15 30 45 0 15 30 45 0 15 ...
>  $ hour   : int  15 15 15 15 17 17 17 17 18 18 ...
>  $ quarter: int  65 75 90 105 68 83 98 113 72 87 ..
>
> I thought you said the "weather" dataframe would have some information
> about "goodness" that we were supposed to map to arrivals.? What is
> the meaning of those variables? How do we define a "good" quarter
> hour? And why are the values of quarter not 1, 2, 3, 4? They ought to
> be a factor or integer that could be matched to those that are in
> "arr", which are also apparently not so defined. Let's see a better
> codebook or description of these variables.
>
> On Jan 17, 2010, at 6:47 PM, James Rome wrote:
>
>> Here are some sample data sets.
>>
>> I also tried making a combined field in each set such as
>> adq=paste(as.character(arr$Date), as.character(arr$quarter))
>> and similarly for the weather set, so I have unique single things to
>> compare, but that did not seem to help much.
>>
>> Thanks,
>> Jim
>>
>> On 1/17/10 5:50 PM, David Winsemius wrote:
>>> My guess (since we still have no data on which to test these ideas)
>>> is that you need either to merge() or to use a matrix created from the
>>> dates and qtr-hours entries in "gw", since matching on dates and hours
>>> separately will not uniquely classify the good qtr-hours within their
>>> proper corresponding dates. You want a structure (or a matching
>>> process) that takes:
>>>hqhr1qhr2qhr3qhr4 ...
>>> date1goodbadgoodbad
>>> date2badgoodgoodgood
>>> date3badbadbadgood
>>> .
>>> .
>>> .
>>> and lets you use the values in "arr" to get values in "gw". Notice
>>> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
>>> simply will not accomplish anything correct/
>>>
>>> Merging by multiple criteria (with the merge function) would do that
>>> or you could construct a matrix whose entries were the categories good
>>> /bad. The table function could create the matrix for the purpose of
>>> using an indexed solution if you are dead-set against the merge
>>> concept.
>>>
>>>
>>>
>>>
>>> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>>>
 Thank you Dennis.
 arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
 weather$quarter)
 seems to be what I want to do, but in fact, with the full data set, it
 misidentifies the rows, so I think the error message must mean
 something.

> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
> gw = c(length(arrr))
> gw[1:length(arrr[,1])]=FALSE
> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
 Warning in `==.default`(arr$Date, weather$Date) :
 longer object length is not a multiple of shorter object length
 Warning in arr$Date == weather$Date & weather$quarter %in%
 arr$quarter :
 longer object length is not a multiple of shorter object length
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0
 [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0
 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0
 [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0
 0 0 0 0
 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0
 0 0 0 0
 [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0
 0 0 0 0
 [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0
 0 0 0 0
 [260] 0 0 0 0 0 0 0 0

 There are many many more matches in the 99k line arrival data set.

 Thanks a bunch,
 Jim


 On 1/17/10 3:21 PM, Dennis Murphy wrote:
> Hi:
>
> To read a data set from a R-help message into R, one uses
> read.table(textConnection(""), ...)
>
> Your weather data set had
> (a) a variable name with a space in it, that R misread and had to be
> altered manually;
> (b) a missing value with no NA that R interpreted as an incomplete
> line; again, it had
>to be altered manually.
>
> This is why David suggested the use of dput(), so that these vagaries
> don't have to be
> dealt with by those who are trying t

Re: [R] Comparing dates in dataframes

2010-01-17 Thread David Winsemius
But, but, but  there is no weather goodness variable in  
weather?!?!?!


> str(weather)
'data.frame':   155 obs. of  4 variables:
 $ Date   :Class 'Date'  num [1:155] 14245 14245 14245 14245 14245 ...
 $ minute : int  5 15 30 45 0 15 30 45 0 15 ...
 $ hour   : int  15 15 15 15 17 17 17 17 18 18 ...
 $ quarter: int  65 75 90 105 68 83 98 113 72 87 ..

I thought you said the "weather" dataframe would have some information  
about "goodness" that we were supposed to map to arrivals.? What is  
the meaning of those variables? How do we define a "good" quarter  
hour? And why are the values of quarter not 1, 2, 3, 4? They ought to  
be a factor or integer that could be matched to those that are in  
"arr", which are also apparently not so defined. Let's see a better  
codebook or description of these variables.


On Jan 17, 2010, at 6:47 PM, James Rome wrote:


Here are some sample data sets.

I also tried making a combined field in each set such as
adq=paste(as.character(arr$Date), as.character(arr$quarter))
and similarly for the weather set, so I have unique single things to
compare, but that did not seem to help much.

Thanks,
Jim

On 1/17/10 5:50 PM, David Winsemius wrote:

My guess (since we still have no data on which to test these ideas)
is that you need either to merge() or to use a matrix created from  
the
dates and qtr-hours entries in "gw", since matching on dates and  
hours

separately will not uniquely classify the good qtr-hours within their
proper corresponding dates. You want a structure (or a matching
process) that takes:
   hqhr1qhr2qhr3qhr4 ...
date1goodbadgoodbad
date2badgoodgoodgood
date3badbadbadgood
.
.
.
and lets you use the values in "arr" to get values in "gw". Notice
that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
simply will not accomplish anything correct/

Merging by multiple criteria (with the merge function) would do that
or you could construct a matrix whose entries were the categories  
good

/bad. The table function could create the matrix for the purpose of
using an indexed solution if you are dead-set against the merge  
concept.





On Jan 17, 2010, at 4:47 PM, James Rome wrote:


Thank you Dennis.
arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
weather$quarter)
seems to be what I want to do, but in fact, with the full data  
set, it

misidentifies the rows, so I think the error message must mean
something.


arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/ 
%y")

gw = c(length(arrr))
gw[1:length(arrr[,1])]=FALSE
gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]

Warning in `==.default`(arr$Date, weather$Date) :
longer object length is not a multiple of shorter object length
Warning in arr$Date == weather$Date & weather$quarter %in% arr 
$quarter :

longer object length is not a multiple of shorter object length
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0

0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0

0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0

0 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0

0 0 0 0
[260] 0 0 0 0 0 0 0 0

There are many many more matches in the 99k line arrival data set.

Thanks a bunch,
Jim


On 1/17/10 3:21 PM, Dennis Murphy wrote:

Hi:

To read a data set from a R-help message into R, one uses
read.table(textConnection(""), ...)

Your weather data set had
(a) a variable name with a space in it, that R misread and had to  
be

altered manually;
(b) a missing value with no NA that R interpreted as an incomplete
line; again, it had
   to be altered manually.

This is why David suggested the use of dput(), so that these  
vagaries

don't have to be
dealt with by those who are trying to help.

That being said, for the example that you gave and the desired  
value

that you wanted, try

arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
weather$quarter)

(I changed DateTime to Date in the arr data frame...)

You'll get warnings like

Warning messages:
1: In is.na (e1) | is.na (e2) :
longer object length is not a multiple of shorter object length

but it seems to do the right thing. The first equality is there to
constrain matches for
quarter to be within the same day.

For future reference,


dput(weather)
structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label =  
"1/1/09",

class = "factor"),
  minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
  ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
c("Date",
"minute", 

Re: [R] Comparing dates in dataframes

2010-01-17 Thread James Rome
Here are some sample data sets.

I also tried making a combined field in each set such as
adq=paste(as.character(arr$Date), as.character(arr$quarter))
and similarly for the weather set, so I have unique single things to
compare, but that did not seem to help much.

Thanks,
Jim

On 1/17/10 5:50 PM, David Winsemius wrote:
> My guess (since we still have no data on which to test these ideas) 
> is that you need either to merge() or to use a matrix created from the
> dates and qtr-hours entries in "gw", since matching on dates and hours
> separately will not uniquely classify the good qtr-hours within their
> proper corresponding dates. You want a structure (or a matching
> process) that takes:
> hqhr1qhr2qhr3qhr4 ...
> date1goodbadgoodbad
> date2badgoodgoodgood
> date3badbadbadgood
> .
> .
> .
> and lets you use the values in "arr" to get values in "gw". Notice
> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
> simply will not accomplish anything correct/
>
> Merging by multiple criteria (with the merge function) would do that
> or you could construct a matrix whose entries were the categories good
> /bad. The table function could create the matrix for the purpose of
> using an indexed solution if you are dead-set against the merge concept.
>
>
>
>
> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>
>> Thank you Dennis.
>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>> weather$quarter)
>> seems to be what I want to do, but in fact, with the full data set, it
>> misidentifies the rows, so I think the error message must mean
>> something.
>>
>>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
>>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
>>> gw = c(length(arrr))
>>> gw[1:length(arrr[,1])]=FALSE
>>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
>> Warning in `==.default`(arr$Date, weather$Date) :
>>  longer object length is not a multiple of shorter object length
>> Warning in arr$Date == weather$Date & weather$quarter %in% arr$quarter :
>>  longer object length is not a multiple of shorter object length
>>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [260] 0 0 0 0 0 0 0 0
>>
>> There are many many more matches in the 99k line arrival data set.
>>
>> Thanks a bunch,
>> Jim
>>
>>
>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>> Hi:
>>>
>>> To read a data set from a R-help message into R, one uses
>>> read.table(textConnection(""), ...)
>>>
>>> Your weather data set had
>>> (a) a variable name with a space in it, that R misread and had to be
>>> altered manually;
>>> (b) a missing value with no NA that R interpreted as an incomplete
>>> line; again, it had
>>> to be altered manually.
>>>
>>> This is why David suggested the use of dput(), so that these vagaries
>>> don't have to be
>>> dealt with by those who are trying to help.
>>>
>>> That being said, for the example that you gave and the desired value
>>> that you wanted, try
>>>
>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>> weather$quarter)
>>>
>>> (I changed DateTime to Date in the arr data frame...)
>>>
>>> You'll get warnings like
>>>
>>> Warning messages:
>>> 1: In is.na (e1) | is.na (e2) :
>>>  longer object length is not a multiple of shorter object length
>>>
>>> but it seems to do the right thing. The first equality is there to
>>> constrain matches for
>>> quarter to be within the same day.
>>>
>>> For future reference,
>>>
 dput(weather)
>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
>>> class = "factor"),
>>>minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
>>>), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
>>> c("Date",
>>> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
>>> row.names = c(NA,
>>> -4L))
 dput(arr)
>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
>>> class = "factor"),
>>>weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>>5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
>>>1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>>quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>>60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
>>> str

Re: [R] Comparing dates in dataframes

2010-01-17 Thread David Winsemius
My guess (since we still have no data on which to test these ideas)   
is that you need either to merge() or to use a matrix created from the  
dates and qtr-hours entries in "gw", since matching on dates and hours  
separately will not uniquely classify the good qtr-hours within their  
proper corresponding dates. You want a structure (or a matching  
process) that takes:

hqhr1   qhr2qhr3qhr4 ...
date1   goodbad goodbad
date2   bad goodgoodgood
date3   bad bad bad good
.
.
.
and lets you use the values in "arr" to get values in "gw". Notice  
that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr  
simply will not accomplish anything correct/


Merging by multiple criteria (with the merge function) would do that  
or you could construct a matrix whose entries were the categories  
good /bad. The table function could create the matrix for the purpose  
of using an indexed solution if you are dead-set against the merge  
concept.





On Jan 17, 2010, at 4:47 PM, James Rome wrote:


Thank you Dennis.
arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
weather$quarter)
seems to be what I want to do, but in fact, with the full data set, it
misidentifies the rows, so I think the error message must mean  
something.



arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
gw = c(length(arrr))
gw[1:length(arrr[,1])]=FALSE
gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]

Warning in `==.default`(arr$Date, weather$Date) :
 longer object length is not a multiple of shorter object length
Warning in arr$Date == weather$Date & weather$quarter %in% arr 
$quarter :

 longer object length is not a multiple of shorter object length
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0

0 0 0 0
[260] 0 0 0 0 0 0 0 0

There are many many more matches in the 99k line arrival data set.

Thanks a bunch,
Jim


On 1/17/10 3:21 PM, Dennis Murphy wrote:

Hi:

To read a data set from a R-help message into R, one uses
read.table(textConnection(""), ...)

Your weather data set had
(a) a variable name with a space in it, that R misread and had to be
altered manually;
(b) a missing value with no NA that R interpreted as an incomplete
line; again, it had
to be altered manually.

This is why David suggested the use of dput(), so that these vagaries
don't have to be
dealt with by those who are trying to help.

That being said, for the example that you gave and the desired value
that you wanted, try

arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
weather$quarter)

(I changed DateTime to Date in the arr data frame...)

You'll get warnings like

Warning messages:
1: In is.na (e1) | is.na (e2) :
 longer object length is not a multiple of shorter object length

but it seems to do the right thing. The first equality is there to
constrain matches for
quarter to be within the same day.

For future reference,


dput(weather)

structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
class = "factor"),
   minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
   ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
c("Date",
"minute", "hour", "quarter", "efficiency"), class = "data.frame",
row.names = c(NA,
-4L))

dput(arr)

structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
class = "factor"),
   weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
   5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
   quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
   60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
structure(c(6L,
   8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
   2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
   "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L,
   19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
   14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
   "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
   "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
   "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
   gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
   TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE

Re: [R] Comparing dates in dataframes

2010-01-17 Thread James Rome
Thank you Dennis.
arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
weather$quarter)
seems to be what I want to do, but in fact, with the full data set, it
misidentifies the rows, so I think the error message must mean something.

> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
> gw = c(length(arrr))
> gw[1:length(arrr[,1])]=FALSE
> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
Warning in `==.default`(arr$Date, weather$Date) :
  longer object length is not a multiple of shorter object length
Warning in arr$Date == weather$Date & weather$quarter %in% arr$quarter :
  longer object length is not a multiple of shorter object length
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
 [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
[260] 0 0 0 0 0 0 0 0

There are many many more matches in the 99k line arrival data set.

Thanks a bunch,
Jim


On 1/17/10 3:21 PM, Dennis Murphy wrote:
> Hi:
>
> To read a data set from a R-help message into R, one uses
> read.table(textConnection(""), ...)
>
> Your weather data set had
> (a) a variable name with a space in it, that R misread and had to be
> altered manually;
> (b) a missing value with no NA that R interpreted as an incomplete
> line; again, it had
>  to be altered manually.
>
> This is why David suggested the use of dput(), so that these vagaries
> don't have to be
> dealt with by those who are trying to help.
>
> That being said, for the example that you gave and the desired value
> that you wanted, try
>
> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
> weather$quarter)
>
> (I changed DateTime to Date in the arr data frame...)
>
> You'll get warnings like
>
> Warning messages:
> 1: In is.na (e1) | is.na (e2) :
>   longer object length is not a multiple of shorter object length
>
> but it seems to do the right thing. The first equality is there to
> constrain matches for
> quarter to be within the same day.
>
> For future reference,
>
> > dput(weather)
> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
> class = "factor"),
> minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
> ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
> c("Date",
> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
> row.names = c(NA,
> -4L))
> > dput(arr)
> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
> class = "factor"),
> weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
> quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
> 60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
> structure(c(6L,
> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
> 2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
> "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L,
> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
> 14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
> "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
> "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
> gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
> FALSE)), .Names = c("Date", "weekday", "month", "quarter",
> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class = "data.frame")
>
> These can be copied and pasted directly into an R session without
> modification.
>
> HTH,
> Dennis
>
> On Sun, Jan 17, 2010 at 10:51 AM, James Rome  > wrote:
>
>
>
>
> On 1/17/10 1:06 PM, David Winsemius wrote:
> >
> > On Jan 17, 2010, at 12:37 PM, James Rome wrote:
> >
> >> I don't think it is that simple because it is not a one-to-one
> match. In
> >> the arr data frame, there are many arrivals in a quarter hour
> with good
> >> weather on a given day. So I need to match the date and the quarter
> >> hour.
> >>
> >> And all of the rows in the weather data frame are times with good
> >> weather--unique date + quarter hour. That is why I needed the
> loop. For
> >> each da

Re: [R] Comparing dates in dataframes

2010-01-17 Thread jim holtman
SInce you provided no data, it is hard to determine how to compare.  If you
also want the date, then the following will work:

 arr$GoodWeather <- arr$quarter %in% gw$quarter & arr$date %in% gw$date

If your "quarter" is just the minutes of arrival, you may have to convert
that to the appropriate quarter hour, but similar approaches have worked
fine for me in these types of situations.



On Sat, Jan 16, 2010 at 6:11 PM, jim holtman  wrote:

> If you have a vector of the quarter hours of good weather (gw), then to
> create the column in the arr dataframe you would do
>
> arr$GoodWeather <- arr$quarter %in% gw
>
> This says that if the quarter hour of the arrival is in the 'gw' vector,
> set the value TRUE; otherwise FALSE.
>
> On Sat, Jan 16, 2010 at 5:22 PM, James Rome  wrote:
>
>>   I don't want to merge the data frames because there are many entries
>> in the arrival frame for each one in the weather frame. And it is the
>> missing dates and quarters in the weather frame that constitute the date
>> I want, namely those arrivals that occurred in bad (or good) weather.
>>   But I will try converting the dates as suggested tomorrow.
>>   Is there a way to do what I want without that for loop? There are
>> almost 100,000 rows in the arrivals frame, and R is grinding to a halt.
>>   And is there a way to get R to abort its current calculation? Ctrl-C
>> and Esc do not seem to work.
>>
>> Thanks,
>> Jim
>>
>> On 1/16/10 4:26 PM, Stephan Kolassa wrote:
>> > Hi,
>> >
>> > it looks like when you read in your data.frames, you didn't tell R to
>> > expect dates, so it treats the Date columns as factors. Judicious use
>> > of  something along these lines before doing your comparisons may help:
>> >
>> > arr$Date <- as.Date(as.character(arr$Date),format=something)
>> >
>> > Then again, it may be possible to do the actual merging using merge().
>> >
>> > HTH
>> > Stephan
>> >
>> >
>> > James Rome schrieb:
>> >> I have two data frames. One (arr) has all arrivals to an airport for a
>> >> year, and the other (gw) has the dates and quarter hour of the day when
>> >> the weather is good. arr has a Date and quarter hour column.
>> >>
>> >>> names(arr)
>> >>  [1] "Date" "weekday"  "hour" "month"
>> >> "minute"  [6] "quarter"  "ICAO" "Flight"
>> >> "AircraftType"
>> >> "Tail"   [11] "Arrived"  "STA"  "Runway"
>> >> "FromTo"  "Delay"  [16] "Operator" "gw"
>> >> I added the gw column to arr and initialized it to all FALSE
>> >>
>> >>> names(gw)
>> >>  [1] "Date"   "minute" "hour"
>> >> "quarter"   [5] "Efficiency.Val" "Weekly.Avg"
>> >> "Arrival.Val""Weekly.Avg.1"  [9] "Departure.Val"
>> >> "Weekly.Avg.2"   "Num.of.Hold""Runway"   [13] "Weather"
>> >> First point of confusion:
>> >>> gw[1,1]
>> >> [1] 1/1/09
>> >> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ...
>> >> 9/9/09
>> >> Why do I get 353 levels?
>> >>
>> >> I am trying to identify the quarter hours with good weather in the arr
>> >> data frame. What I want to do is to go through the rows in gw, and to
>> >> set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw
>> >> row.
>> >>
>> >> So I tried
>> >> gooddates = function(all, good) {
>> >>la = length(all)   # All the flights
>> >>   lw = length(good)  # The good 15-minute periods
>> >>   for(j in 1:lw) {
>> >> d=good$Date[j]
>> >> q=good$quarter[j]
>> >> all[all$DateTime==d && all$quarter==q,17]=TRUE
>> >>   }
>> >> }
>> >>
>> >> but when I run this, I get
>> >> "Error in Ops.factor(all$DateTime, d) :
>> >>   level sets of factors are different"
>> >>
>> >> I know the level sets are different, that is what I am trying to find.
>> >> But I think I am comparing single elements from the data frames.
>> >>
>> >> So what am I doing wrong? And there ought to be a better way to do
>> this.
>> >>
>> >> Thanks in advance,
>> >> Jim Rome
>> >>
>> >> __
>> >> R-help@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.eth

Re: [R] Comparing dates in dataframes

2010-01-17 Thread James Rome



On 1/17/10 1:06 PM, David Winsemius wrote:
>
> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>
>> I don't think it is that simple because it is not a one-to-one match. In
>> the arr data frame, there are many arrivals in a quarter hour with good
>> weather on a given day. So I need to match the date and the quarter
>> hour.
>>
>> And all of the rows in the weather data frame are times with good
>> weather--unique date + quarter hour. That is why I needed the loop. For
>> each date and quarter hour in weather, I want to mark all the entries
>> with the corresponding date and weather as TRUE in the arr$gw column.
>>
>> I did convert the dates to POSIXlt dates and rewrote my function as
>> gooddates = function(all, good) {
>>   la = length(all)   # All the arrivals
>>  lw = length(good)  # The good 15-minute periods
>>  for(j in 1:lw) {
>>d=good$Date[j]
>>q=good$quarter[j]
>>all$gw[all$Date==d && all$quarter==q]=TRUE
>
>
> You are attempting a vectorized test and assignment with "&&" which
> seems unlikely to succeed, but even then I am not sure your problems
> would be over. (I'm also guessing that you might not have reported a
> warning.)

Why shouldn't the && succeed? You are correct there, because I do get
items if I use either part of this and test, when I insert the &&, I get
no hits. And I got no warnings.
>
> Why not merge arr to gw by date and quarter?
The sets contain different data, and the only thing I want from the
weather set is the fact that it has an entry for a given date and time
>
> Answering these questions would be greatly speeded up with a small
> sample dataset. Are you aware of the virtues of the dput function?
>

What I want is for a 1 to be in the gw column in the quarter
60,61,62,63,...

For example, here is some data from the good weather set:
Dateminute  hourquarter Efficiency Val
1/1/09  5   15  60  
1/1/09  15  15  61  72
1/1/09  30  15  62  63.3
1/1/09  45  15  63  85.4



And this is from the arrivals set:
DateTimeweekday month   quarter ICAOFlight  gw

1/1/09  5   1   59  COA COA349  0
1/1/09  5   1   59  NWA NWA9934 0
1/1/09  5   1   60  JBU JBU554  0
1/1/09  5   1   60  BTA BTA2347 0
1/1/09  5   1   60  COA COA886  0
1/1/09  5   1   60  BTA BTA2916 0
1/1/09  5   1   60  CJC CJC3225 0
1/1/09  5   1   60  BTA BTA2085 0
1/1/09  5   1   60  BTA BTA2064 0
1/1/09  5   1   60  AAL AAL842  0
1/1/09  5   1   60  BTA BTA1234 0
1/1/09  5   1   60  CJC CJC3359 0
1/1/09  5   1   60  BTA BTA3072 0
1/1/09  5   1   61  BTA BTA3086 0
1/1/09  5   1   61  COA COA1166 0
1/1/09  5   1   61  COA COA855  0
1/1/09  5   1   61  AWE AWE307  0
1/1/09  5   1   66  CHQ CHQ5312 0
1/1/09  5   1   67  BTA BTA2405 0



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing dates in dataframes

2010-01-17 Thread David Winsemius


On Jan 17, 2010, at 12:37 PM, James Rome wrote:

I don't think it is that simple because it is not a one-to-one  
match. In
the arr data frame, there are many arrivals in a quarter hour with  
good
weather on a given day. So I need to match the date and the quarter  
hour.


And all of the rows in the weather data frame are times with good
weather--unique date + quarter hour. That is why I needed the loop.  
For

each date and quarter hour in weather, I want to mark all the entries
with the corresponding date and weather as TRUE in the arr$gw column.

I did convert the dates to POSIXlt dates and rewrote my function as
gooddates = function(all, good) {
  la = length(all)   # All the arrivals
 lw = length(good)  # The good 15-minute periods
 for(j in 1:lw) {
   d=good$Date[j]
   q=good$quarter[j]
   all$gw[all$Date==d && all$quarter==q]=TRUE



You are attempting a vectorized test and assignment with "&&" which  
seems unlikely to succeed, but even then I am not sure your problems  
would be over. (I'm also guessing that you might not have reported a  
warning.)


Why not merge arr to gw by date and quarter?

Answering these questions would be greatly speeded up with a small  
sample dataset. Are you aware of the virtues of the dput function?


--
David


 }
}

Now it runs with no errors, but none of the 0s (FALSE) in arr$gw get
replaced with 1s. So I am still doing something wrong.

Thanks,
Jim

On 1/16/10 6:11 PM, jim holtman wrote:

If you have a vector of the quarter hours of good weather (gw), then
to create the column in the arr dataframe you would do

arr$GoodWeather <- arr$quarter %in% gw

This says that if the quarter hour of the arrival is in the 'gw'
vector, set the value TRUE; otherwise FALSE.


   On 1/16/10 4:26 PM, Stephan Kolassa wrote:

Hi,

it looks like when you read in your data.frames, you didn't tell

   R to

expect dates, so it treats the Date columns as factors.

   Judicious use

of  something along these lines before doing your comparisons

   may help:


arr$Date <- as.Date(as.character(arr$Date),format=something)

Then again, it may be possible to do the actual merging using

   merge().


HTH
Stephan


James Rome schrieb:

I have two data frames. One (arr) has all arrivals to an

   airport for a

year, and the other (gw) has the dates and quarter hour of the

   day when

the weather is good. arr has a Date and quarter hour column.


names(arr)

[1] "Date" "weekday"  "hour" "month"
"minute"  [6] "quarter"  "ICAO" "Flight"
"AircraftType"
"Tail"   [11] "Arrived"  "STA"  "Runway"
"FromTo"  "Delay"  [16] "Operator" "gw"
I added the gw column to arr and initialized it to all FALSE


names(gw)

[1] "Date"   "minute" "hour"
"quarter"   [5] "Efficiency.Val" "Weekly.Avg"
"Arrival.Val""Weekly.Avg.1"  [9] "Departure.Val"
"Weekly.Avg.2"   "Num.of.Hold""Runway"   [13] "Weather"
First point of confusion:

gw[1,1]

[1] 1/1/09
353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10

   1/12/09 ...

9/9/09
Why do I get 353 levels?

I am trying to identify the quarter hours with good weather in

   the arr

data frame. What I want to do is to go through the rows in gw,

   and to

set arr$gw to TRUE if arr$Date and arr$quarter match those in

   the gw

row.

So I tried
gooddates = function(all, good) {
  la = length(all)   # All the flights
 lw = length(good)  # The good 15-minute periods
 for(j in 1:lw) {
   d=good$Date[j]
   q=good$quarter[j]
   all[all$DateTime==d && all$quarter==q,17]=TRUE
 }
}

but when I run this, I get
"Error in Ops.factor(all$DateTime, d) :
 level sets of factors are different"

I know the level sets are different, that is what I am trying

   to find.

But I think I am comparing single elements from the data frames.

So what am I doing wrong? And there ought to be a better way to

   do this.


Thanks in advance,
Jim Rome

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

   

and provide commented, minimal, self-contained, reproducible code.





   __
   R-help@r-project.org  mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   
   and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.h

Re: [R] Comparing dates in dataframes

2010-01-17 Thread James Rome
I don't think it is that simple because it is not a one-to-one match. In
the arr data frame, there are many arrivals in a quarter hour with good
weather on a given day. So I need to match the date and the quarter hour.

And all of the rows in the weather data frame are times with good
weather--unique date + quarter hour. That is why I needed the loop. For
each date and quarter hour in weather, I want to mark all the entries
with the corresponding date and weather as TRUE in the arr$gw column.

I did convert the dates to POSIXlt dates and rewrote my function as
gooddates = function(all, good) {
   la = length(all)   # All the arrivals
  lw = length(good)  # The good 15-minute periods
  for(j in 1:lw) {
d=good$Date[j]
q=good$quarter[j]
all$gw[all$Date==d && all$quarter==q]=TRUE
  }
}

Now it runs with no errors, but none of the 0s (FALSE) in arr$gw get
replaced with 1s. So I am still doing something wrong.

Thanks,
Jim

On 1/16/10 6:11 PM, jim holtman wrote:
> If you have a vector of the quarter hours of good weather (gw), then
> to create the column in the arr dataframe you would do
>  
> arr$GoodWeather <- arr$quarter %in% gw
>  
> This says that if the quarter hour of the arrival is in the 'gw'
> vector, set the value TRUE; otherwise FALSE.
>
>
> On 1/16/10 4:26 PM, Stephan Kolassa wrote:
> > Hi,
> >
> > it looks like when you read in your data.frames, you didn't tell
> R to
> > expect dates, so it treats the Date columns as factors.
> Judicious use
> > of  something along these lines before doing your comparisons
> may help:
> >
> > arr$Date <- as.Date(as.character(arr$Date),format=something)
> >
> > Then again, it may be possible to do the actual merging using
> merge().
> >
> > HTH
> > Stephan
> >
> >
> > James Rome schrieb:
> >> I have two data frames. One (arr) has all arrivals to an
> airport for a
> >> year, and the other (gw) has the dates and quarter hour of the
> day when
> >> the weather is good. arr has a Date and quarter hour column.
> >>
> >>> names(arr)
> >>  [1] "Date" "weekday"  "hour" "month"
> >> "minute"  [6] "quarter"  "ICAO" "Flight"
> >> "AircraftType"
> >> "Tail"   [11] "Arrived"  "STA"  "Runway"
> >> "FromTo"  "Delay"  [16] "Operator" "gw"
> >> I added the gw column to arr and initialized it to all FALSE
> >>
> >>> names(gw)
> >>  [1] "Date"   "minute" "hour"
> >> "quarter"   [5] "Efficiency.Val" "Weekly.Avg"
> >> "Arrival.Val""Weekly.Avg.1"  [9] "Departure.Val"
> >> "Weekly.Avg.2"   "Num.of.Hold""Runway"   [13] "Weather"
> >> First point of confusion:
> >>> gw[1,1]
> >> [1] 1/1/09
> >> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10
> 1/12/09 ...
> >> 9/9/09
> >> Why do I get 353 levels?
> >>
> >> I am trying to identify the quarter hours with good weather in
> the arr
> >> data frame. What I want to do is to go through the rows in gw,
> and to
> >> set arr$gw to TRUE if arr$Date and arr$quarter match those in
> the gw
> >> row.
> >>
> >> So I tried
> >> gooddates = function(all, good) {
> >>la = length(all)   # All the flights
> >>   lw = length(good)  # The good 15-minute periods
> >>   for(j in 1:lw) {
> >> d=good$Date[j]
> >> q=good$quarter[j]
> >> all[all$DateTime==d && all$quarter==q,17]=TRUE
> >>   }
> >> }
> >>
> >> but when I run this, I get
> >> "Error in Ops.factor(all$DateTime, d) :
> >>   level sets of factors are different"
> >>
> >> I know the level sets are different, that is what I am trying
> to find.
> >> But I think I am comparing single elements from the data frames.
> >>
> >> So what am I doing wrong? And there ought to be a better way to
> do this.
> >>
> >> Thanks in advance,
> >> Jim Rome
> >>
> >> __
> >> R-help@r-project.org  mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> 
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> __
> R-help@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?

 

Re: [R] Comparing dates in dataframes

2010-01-16 Thread jim holtman
If you have a vector of the quarter hours of good weather (gw), then to
create the column in the arr dataframe you would do

arr$GoodWeather <- arr$quarter %in% gw

This says that if the quarter hour of the arrival is in the 'gw' vector, set
the value TRUE; otherwise FALSE.

On Sat, Jan 16, 2010 at 5:22 PM, James Rome  wrote:

>   I don't want to merge the data frames because there are many entries
> in the arrival frame for each one in the weather frame. And it is the
> missing dates and quarters in the weather frame that constitute the date
> I want, namely those arrivals that occurred in bad (or good) weather.
>   But I will try converting the dates as suggested tomorrow.
>   Is there a way to do what I want without that for loop? There are
> almost 100,000 rows in the arrivals frame, and R is grinding to a halt.
>   And is there a way to get R to abort its current calculation? Ctrl-C
> and Esc do not seem to work.
>
> Thanks,
> Jim
>
> On 1/16/10 4:26 PM, Stephan Kolassa wrote:
> > Hi,
> >
> > it looks like when you read in your data.frames, you didn't tell R to
> > expect dates, so it treats the Date columns as factors. Judicious use
> > of  something along these lines before doing your comparisons may help:
> >
> > arr$Date <- as.Date(as.character(arr$Date),format=something)
> >
> > Then again, it may be possible to do the actual merging using merge().
> >
> > HTH
> > Stephan
> >
> >
> > James Rome schrieb:
> >> I have two data frames. One (arr) has all arrivals to an airport for a
> >> year, and the other (gw) has the dates and quarter hour of the day when
> >> the weather is good. arr has a Date and quarter hour column.
> >>
> >>> names(arr)
> >>  [1] "Date" "weekday"  "hour" "month"
> >> "minute"  [6] "quarter"  "ICAO" "Flight"
> >> "AircraftType"
> >> "Tail"   [11] "Arrived"  "STA"  "Runway"
> >> "FromTo"  "Delay"  [16] "Operator" "gw"
> >> I added the gw column to arr and initialized it to all FALSE
> >>
> >>> names(gw)
> >>  [1] "Date"   "minute" "hour"
> >> "quarter"   [5] "Efficiency.Val" "Weekly.Avg"
> >> "Arrival.Val""Weekly.Avg.1"  [9] "Departure.Val"
> >> "Weekly.Avg.2"   "Num.of.Hold""Runway"   [13] "Weather"
> >> First point of confusion:
> >>> gw[1,1]
> >> [1] 1/1/09
> >> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ...
> >> 9/9/09
> >> Why do I get 353 levels?
> >>
> >> I am trying to identify the quarter hours with good weather in the arr
> >> data frame. What I want to do is to go through the rows in gw, and to
> >> set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw
> >> row.
> >>
> >> So I tried
> >> gooddates = function(all, good) {
> >>la = length(all)   # All the flights
> >>   lw = length(good)  # The good 15-minute periods
> >>   for(j in 1:lw) {
> >> d=good$Date[j]
> >> q=good$quarter[j]
> >> all[all$DateTime==d && all$quarter==q,17]=TRUE
> >>   }
> >> }
> >>
> >> but when I run this, I get
> >> "Error in Ops.factor(all$DateTime, d) :
> >>   level sets of factors are different"
> >>
> >> I know the level sets are different, that is what I am trying to find.
> >> But I think I am comparing single elements from the data frames.
> >>
> >> So what am I doing wrong? And there ought to be a better way to do this.
> >>
> >> Thanks in advance,
> >> Jim Rome
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing dates in dataframes

2010-01-16 Thread James Rome
   I don't want to merge the data frames because there are many entries
in the arrival frame for each one in the weather frame. And it is the
missing dates and quarters in the weather frame that constitute the date
I want, namely those arrivals that occurred in bad (or good) weather.
   But I will try converting the dates as suggested tomorrow.
   Is there a way to do what I want without that for loop? There are
almost 100,000 rows in the arrivals frame, and R is grinding to a halt.
   And is there a way to get R to abort its current calculation? Ctrl-C
and Esc do not seem to work.

Thanks,
Jim

On 1/16/10 4:26 PM, Stephan Kolassa wrote:
> Hi,
>
> it looks like when you read in your data.frames, you didn't tell R to
> expect dates, so it treats the Date columns as factors. Judicious use
> of  something along these lines before doing your comparisons may help:
>
> arr$Date <- as.Date(as.character(arr$Date),format=something)
>
> Then again, it may be possible to do the actual merging using merge().
>
> HTH
> Stephan
>
>
> James Rome schrieb:
>> I have two data frames. One (arr) has all arrivals to an airport for a
>> year, and the other (gw) has the dates and quarter hour of the day when
>> the weather is good. arr has a Date and quarter hour column.
>>
>>> names(arr)
>>  [1] "Date" "weekday"  "hour" "month"   
>> "minute"  [6] "quarter"  "ICAO" "Flight"  
>> "AircraftType"
>> "Tail"   [11] "Arrived"  "STA"  "Runway"  
>> "FromTo"  "Delay"  [16] "Operator" "gw"
>> I added the gw column to arr and initialized it to all FALSE
>>
>>> names(gw)
>>  [1] "Date"   "minute" "hour"  
>> "quarter"   [5] "Efficiency.Val" "Weekly.Avg"
>> "Arrival.Val""Weekly.Avg.1"  [9] "Departure.Val" 
>> "Weekly.Avg.2"   "Num.of.Hold""Runway"   [13] "Weather"
>> First point of confusion:
>>> gw[1,1]
>> [1] 1/1/09
>> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ...
>> 9/9/09
>> Why do I get 353 levels?
>>
>> I am trying to identify the quarter hours with good weather in the arr
>> data frame. What I want to do is to go through the rows in gw, and to
>> set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw
>> row.
>>
>> So I tried
>> gooddates = function(all, good) {
>>la = length(all)   # All the flights
>>   lw = length(good)  # The good 15-minute periods
>>   for(j in 1:lw) {
>> d=good$Date[j]
>> q=good$quarter[j]
>> all[all$DateTime==d && all$quarter==q,17]=TRUE
>>   }
>> }
>>
>> but when I run this, I get
>> "Error in Ops.factor(all$DateTime, d) :
>>   level sets of factors are different"
>>
>> I know the level sets are different, that is what I am trying to find.
>> But I think I am comparing single elements from the data frames.
>>
>> So what am I doing wrong? And there ought to be a better way to do this.
>>
>> Thanks in advance,
>> Jim Rome
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing dates in dataframes

2010-01-16 Thread Stephan Kolassa

Hi,

it looks like when you read in your data.frames, you didn't tell R to 
expect dates, so it treats the Date columns as factors. Judicious use of 
 something along these lines before doing your comparisons may help:


arr$Date <- as.Date(as.character(arr$Date),format=something)

Then again, it may be possible to do the actual merging using merge().

HTH
Stephan


James Rome schrieb:

I have two data frames. One (arr) has all arrivals to an airport for a
year, and the other (gw) has the dates and quarter hour of the day when
the weather is good. arr has a Date and quarter hour column.


names(arr)
 [1] "Date" "weekday"  "hour" "month""minute" 
 [6] "quarter"  "ICAO" "Flight"   "AircraftType"
"Tail"   
[11] "Arrived"  "STA"  "Runway"   "FromTo"  
"Delay"  
[16] "Operator" "gw" 


I added the gw column to arr and initialized it to all FALSE


names(gw)
 [1] "Date"   "minute" "hour"   "quarter"  
 [5] "Efficiency.Val" "Weekly.Avg" "Arrival.Val""Weekly.Avg.1" 
 [9] "Departure.Val"  "Weekly.Avg.2"   "Num.of.Hold""Runway"   
[13] "Weather" 


First point of confusion:

gw[1,1]

[1] 1/1/09
353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ... 9/9/09
Why do I get 353 levels?

I am trying to identify the quarter hours with good weather in the arr
data frame. What I want to do is to go through the rows in gw, and to
set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw row.

So I tried
gooddates = function(all, good) {
   la = length(all)   # All the flights
  lw = length(good)  # The good 15-minute periods
  for(j in 1:lw) {
d=good$Date[j]
q=good$quarter[j]
all[all$DateTime==d && all$quarter==q,17]=TRUE
  }
}

but when I run this, I get
"Error in Ops.factor(all$DateTime, d) :
  level sets of factors are different"

I know the level sets are different, that is what I am trying to find.
But I think I am comparing single elements from the data frames.

So what am I doing wrong? And there ought to be a better way to do this.

Thanks in advance,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing dates in dataframes

2010-01-16 Thread James Rome
I have two data frames. One (arr) has all arrivals to an airport for a
year, and the other (gw) has the dates and quarter hour of the day when
the weather is good. arr has a Date and quarter hour column.

>names(arr)
 [1] "Date" "weekday"  "hour" "month""minute" 
 [6] "quarter"  "ICAO" "Flight"   "AircraftType"
"Tail"   
[11] "Arrived"  "STA"  "Runway"   "FromTo"  
"Delay"  
[16] "Operator" "gw" 

I added the gw column to arr and initialized it to all FALSE

>names(gw)
 [1] "Date"   "minute" "hour"   "quarter"  
 [5] "Efficiency.Val" "Weekly.Avg" "Arrival.Val""Weekly.Avg.1" 
 [9] "Departure.Val"  "Weekly.Avg.2"   "Num.of.Hold""Runway"   
[13] "Weather" 

First point of confusion:
>gw[1,1]
[1] 1/1/09
353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ... 9/9/09
Why do I get 353 levels?

I am trying to identify the quarter hours with good weather in the arr
data frame. What I want to do is to go through the rows in gw, and to
set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw row.

So I tried
gooddates = function(all, good) {
   la = length(all)   # All the flights
  lw = length(good)  # The good 15-minute periods
  for(j in 1:lw) {
d=good$Date[j]
q=good$quarter[j]
all[all$DateTime==d && all$quarter==q,17]=TRUE
  }
}

but when I run this, I get
"Error in Ops.factor(all$DateTime, d) :
  level sets of factors are different"

I know the level sets are different, that is what I am trying to find.
But I think I am comparing single elements from the data frames.

So what am I doing wrong? And there ought to be a better way to do this.

Thanks in advance,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.