I finally figured this out with the help of Dave and Dennis.

The steps I had to do were
1. Convert dates to POSIXlt
2. Create a column in each frame that was paste(date, quarter)
3. Then %in% worked instantly.

gw = c(length(arr))
gw[1:length(arr[,1])]=FALSE
arr["gw"]=gw       # put a column of 0s into arr
arr$Date <- as.Date(as.character(arr$Date),format="%m/%d/%y")
weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
weather$dq = paste(as.character(weather$Date),
as.character(weather$quarter))
arr$dq = paste(as.character(arr$Date), as.character(arr$quarter))
arr$gw <- as.numeric(arr$dq %in% weather$dq)

Thanks for all the help,
Jim

On 1/17/10 10:16 PM, David Winsemius wrote:
>
> On Jan 17, 2010, at 9:22 PM, Dennis Murphy wrote:
>
>> I thought it was a solution :)
>>
>> The gw column is an indicator that is meant to match quarters with
>> good weather
>> (in the weather data frame) to the quarters in the arr data frame: 1 
>> = good weather,
>> 0 = not. I split up both data frames by Date since the quarter values
>> are in the
>> same range from day to day, and then ran a loop to generate the
>> indicator gw by
>> finding whether the quarter in the i-th component of the arrl list
>> matched the corresponding reference table of quarters (of good
>> weather) in the wc list.
>> do.call then slurps it all together into a data frame. 
>> My understanding was that James
>> just wanted to be able to distinguish arrivals during good weather
>> quarters from those
>> in bad weather quarters.
>
> This was my understanding, too. I was just trying to get to some point
> where that could be expressed in an R-encodable format and implemented.
>
>
>>
>> I did the tables as a sanity check, relative to the hand calculations
>> I did earlier as a ballpark
>> estimate.
>
> The rather small dataset may have been enough. We will need to see
> what the OP says  ...  is that a sufficient solution?
>
>
>>
>> Does this make sense?
>>
>> Dennis
>>
>> On Sun, Jan 17, 2010 at 6:04 PM, David Winsemius
>> <dwinsem...@comcast.net <mailto:dwinsem...@comcast.net>> wrote:
>>
>>     I'm not clear about your last message. Do you have a solution?
>>
>>     -- 
>>     David.
>>
>>     On Jan 17, 2010, at 8:27 PM, Dennis Murphy wrote:
>>
>>>     Hi James and David:
>>>
>>>     I tried the following: split the quarters from weather into a
>>>     list by Date, and ditto with arr.
>>>     Then assign value of gw by running a (sleazy) loop over the
>>>     components of the arr list...
>>>
>>>     wc <- split(weather$quarter, weather$Date)
>>>     arrl <- split(arr, arr$Date)
>>>
>>>     # Note that there are four dates in wc and three in arrl...
>>>     for(i in seq_along(arrl)) {
>>>       arrl[[i]]$gw <- as.numeric(arrl[[i]]$quarter %in% wc[[i]])  }
>>>     arr2 <- do.call(rbind, arrl)
>>>     dim(arr2)
>>>     [1] 1126    9
>>>     table(arr2$gw)
>>>
>>>       0   1
>>>     661 465
>>>     with(arr2, table(Date, gw))
>>>                 gw
>>>     Date           0   1
>>>       2009-01-01 368  99
>>>       2009-01-02 266 348
>>>       2009-01-03  27  18
>>>
>>>     OK, I was a bit off, but at least we know this is in the
>>>     ballpark of my estimates  :)
>>>     I'm sure David will come up with something more elegant, but
>>>     this seems to work.
>>>
>>>     HTH,
>>>     Dennis
>>>
>>>
>>>     On Sun, Jan 17, 2010 at 5:17 PM, James Rome <jamesr...@gmail.com
>>>     <mailto:jamesr...@gmail.com>> wrote:
>>>
>>>         Any entry in the weather data is a good day. That is the
>>>         point. And
>>>         please ignore my mistake about the quarters getting too large in
>>>         weather. I am being swamped with versions, and it does not
>>>         matter for
>>>         this purpose.. so, the bad weather days are not in the
>>>         weather data set.
>>>
>>>         I am trying to get gw=1 in arr if the date and quarter are
>>>         in weather.
>>>
>>>         Thanks,
>>>         Jim
>>>
>>>         On 1/17/10 7:46 PM, David Winsemius wrote:
>>>         > But, but, but .... there is no weather goodness variable
>>>         in weather?!?!?!
>>>         >
>>>         > > str(weather)
>>>         > 'data.frame':    155 obs. of  4 variables:
>>>         >  $ Date   :Class 'Date'  num [1:155] 14245 14245 14245
>>>         14245 14245 ...
>>>         >  $ minute : int  5 15 30 45 0 15 30 45 0 15 ...
>>>         >  $ hour   : int  15 15 15 15 17 17 17 17 18 18 ...
>>>         >  $ quarter: int  65 75 90 105 68 83 98 113 72 87 ..
>>>         >
>>>         > I thought you said the "weather" dataframe would have some
>>>         information
>>>         > about "goodness" that we were supposed to map to
>>>         arrivals.? What is
>>>         > the meaning of those variables? How do we define a "good"
>>>         quarter
>>>         > hour? And why are the values of quarter not 1, 2, 3, 4?
>>>         They ought to
>>>         > be a factor or integer that could be matched to those that
>>>         are in
>>>         > "arr", which are also apparently not so defined. Let's see
>>>         a better
>>>         > codebook or description of these variables.
>>>         >
>>>         > On Jan 17, 2010, at 6:47 PM, James Rome wrote:
>>>         >
>>>         >> Here are some sample data sets.
>>>         >>
>>>         >> I also tried making a combined field in each set such as
>>>         >> adq=paste(as.character(arr$Date), as.character(arr$quarter))
>>>         >> and similarly for the weather set, so I have unique
>>>         single things to
>>>         >> compare, but that did not seem to help much.
>>>         >>
>>>         >> Thanks,
>>>         >> Jim
>>>         >>
>>>         >> On 1/17/10 5:50 PM, David Winsemius wrote:
>>>         >>> My guess (since we still have no data on which to test
>>>         these ideas)
>>>         >>> is that you need either to merge() or to use a matrix
>>>         created from the
>>>         >>> dates and qtr-hours entries in "gw", since matching on
>>>         dates and hours
>>>         >>> separately will not uniquely classify the good qtr-hours
>>>         within their
>>>         >>> proper corresponding dates. You want a structure (or a
>>>         matching
>>>         >>> process) that takes:
>>>         >>>    hqhr1    qhr2    qhr3    qhr4 .......
>>>         >>> date1    good    bad    good    bad
>>>         >>> date2    bad    good    good    good
>>>         >>> date3    bad    bad    bad    good
>>>         >>> .
>>>         >>> .
>>>         >>> .
>>>         >>> and lets you use the values in "arr" to get values in
>>>         "gw". Notice
>>>         >>> that the notion of arr$Date %in% gw$date & arr$qtrhr
>>>         %in% gw$qtrhr
>>>         >>> simply will not accomplish anything correct/
>>>         >>>
>>>         >>> Merging by multiple criteria (with the merge function)
>>>         would do that
>>>         >>> or you could construct a matrix whose entries were the
>>>         categories good
>>>         >>> /bad. The table function could create the matrix for the
>>>         purpose of
>>>         >>> using an indexed solution if you are dead-set against
>>>         the merge
>>>         >>> concept.
>>>         >>>
>>>         >>>
>>>         >>>
>>>         >>>
>>>         >>> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>>>         >>>
>>>         >>>> Thank you Dennis.
>>>         >>>> arr$gw <- as.numeric(weather$Date == arr$Date &
>>>         arr$quarter %in%
>>>         >>>> weather$quarter)
>>>         >>>> seems to be what I want to do, but in fact, with the
>>>         full data set, it
>>>         >>>> misidentifies the rows, so I think the error message
>>>         must mean
>>>         >>>> something.
>>>         >>>>
>>>         >>>>> arrr$Date <-
>>>         as.Date(as.character(arr$Date),format="%m/%d/%y")
>>>         >>>>> weather$Date <-
>>>         as.Date(as.character(weather$Date),format="%m/%d/%y")
>>>         >>>>> gw = c(length(arrr))
>>>         >>>>> gw[1:length(arrr[,1])]=FALSE
>>>         >>>>> gw[arrr$Date==weather$Date & weather$quarter %in%
>>>         arr$quarter]
>>>         >>>> Warning in `==.default`(arr$Date, weather$Date) :
>>>         >>>> longer object length is not a multiple of shorter
>>>         object length
>>>         >>>> Warning in arr$Date == weather$Date & weather$quarter %in%
>>>         >>>> arr$quarter :
>>>         >>>> longer object length is not a multiple of shorter
>>>         object length
>>>         >>>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0
>>>         >>>> 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0
>>>         >>>> 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0
>>>         >>>> 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>         0 0 0 0 0 0
>>>         >>>> 0 0
>>>         >>>> 0 0 0 0
>>>         >>>> [260] 0 0 0 0 0 0 0 0
>>>         >>>>
>>>         >>>> There are many many more matches in the 99k line
>>>         arrival data set.
>>>         >>>>
>>>         >>>> Thanks a bunch,
>>>         >>>> Jim
>>>         >>>>
>>>         >>>>
>>>         >>>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>>         >>>>> Hi:
>>>         >>>>>
>>>         >>>>> To read a data set from a R-help message into R, one uses
>>>         >>>>> read.table(textConnection("<verbatim text>"), ...)
>>>         >>>>>
>>>         >>>>> Your weather data set had
>>>         >>>>> (a) a variable name with a space in it, that R misread
>>>         and had to be
>>>         >>>>> altered manually;
>>>         >>>>> (b) a missing value with no NA that R interpreted as
>>>         an incomplete
>>>         >>>>> line; again, it had
>>>         >>>>>    to be altered manually.
>>>         >>>>>
>>>         >>>>> This is why David suggested the use of dput(), so that
>>>         these vagaries
>>>         >>>>> don't have to be
>>>         >>>>> dealt with by those who are trying to help.
>>>         >>>>>
>>>         >>>>> That being said, for the example that you gave and the
>>>         desired value
>>>         >>>>> that you wanted, try
>>>         >>>>>
>>>         >>>>> arr$gw <- as.numeric(weather$Date == arr$Date &
>>>         arr$quarter %in%
>>>         >>>>> weather$quarter)
>>>         >>>>>
>>>         >>>>> (I changed DateTime to Date in the arr data frame...)
>>>         >>>>>
>>>         >>>>> You'll get warnings like
>>>         >>>>>
>>>         >>>>> Warning messages:
>>>         >>>>> 1: In is.na <http://is.na> <http://is.na>(e1) | is.na
>>>         <http://is.na> <http://is.na>(e2) :
>>>         >>>>> longer object length is not a multiple of shorter
>>>         object length
>>>         >>>>>
>>>         >>>>> but it seems to do the right thing. The first equality
>>>         is there to
>>>         >>>>> constrain matches for
>>>         >>>>> quarter to be within the same day.
>>>         >>>>>
>>>         >>>>> For future reference,
>>>         >>>>>
>>>         >>>>>> dput(weather)
>>>         >>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L),
>>>         .Label = "1/1/09",
>>>         >>>>> class = "factor"),
>>>         >>>>>   minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L,
>>>         15L, 15L
>>>         >>>>>   ), quarter = 60:63, efficiency = c(NA, 72, 63.3,
>>>         85.4)), .Names =
>>>         >>>>> c("Date",
>>>         >>>>> "minute", "hour", "quarter", "efficiency"), class =
>>>         "data.frame",
>>>         >>>>> row.names = c(NA,
>>>         >>>>> -4L))
>>>         >>>>>> dput(arr)
>>>         >>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L,
>>>         1L, 1L,
>>>         >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>>         .Label = "1/1/09",
>>>         >>>>> class = "factor"),
>>>         >>>>>   weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>>         5L, 5L,
>>>         >>>>>   5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L,
>>>         1L, 1L,
>>>         >>>>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>>         >>>>>   quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>>         >>>>>   60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L),
>>>         ICAO =
>>>         >>>>> structure(c(6L,
>>>         >>>>>   8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L,
>>>         6L, 6L,
>>>         >>>>>   2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ",
>>>         "CJC",
>>>         >>>>>   "COA", "JBU", "NWA"), class = "factor"), Flight =
>>>         structure(c(15L,
>>>         >>>>>   19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L,
>>>         10L,
>>>         >>>>>   14L, 16L, 2L, 11L, 7L), .Label = c("AAL842",
>>>         "AWE307", "BTA1234",
>>>         >>>>>   "BTA2064", "BTA2085", "BTA2347", "BTA2405",
>>>         "BTA2916", "BTA3072",
>>>         >>>>>   "BTA3086", "CHQ5312", "CJC3225", "CJC3359",
>>>         "COA1166", "COA349",
>>>         >>>>>   "COA855", "COA886", "JBU554", "NWA9934"), class =
>>>         "factor"),
>>>         >>>>>   gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>>         >>>>>   TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>>         TRUE, FALSE,
>>>         >>>>>   FALSE)), .Names = c("Date", "weekday", "month",
>>>         "quarter",
>>>         >>>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class =
>>>         >>>>> "data.frame")
>>>         >>>>>
>>>         >>>>> These can be copied and pasted directly into an R
>>>         session without
>>>         >>>>> modification.
>>>         >>>>>
>>>         >>>>> HTH,
>>>         >>>>> Dennis
>>>         >>>>>
>>>         >>>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome
>>>         <jamesr...@gmail.com <mailto:jamesr...@gmail.com>
>>>         >>>>> <mailto:jamesr...@gmail.com
>>>         <mailto:jamesr...@gmail.com>>> wrote:
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>   On 1/17/10 1:06 PM, David Winsemius wrote:
>>>         >>>>>>
>>>         >>>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>>>         >>>>>>
>>>         >>>>>>> I don't think it is that simple because it is not a
>>>         one-to-one
>>>         >>>>>   match. In
>>>         >>>>>>> the arr data frame, there are many arrivals in a
>>>         quarter hour
>>>         >>>>>   with good
>>>         >>>>>>> weather on a given day. So I need to match the date
>>>         and the quarter
>>>         >>>>>>> hour.
>>>         >>>>>>>
>>>         >>>>>>> And all of the rows in the weather data frame are
>>>         times with good
>>>         >>>>>>> weather--unique date + quarter hour. That is why I
>>>         needed the
>>>         >>>>>   loop. For
>>>         >>>>>>> each date and quarter hour in weather, I want to
>>>         mark all the
>>>         >>>>>   entries
>>>         >>>>>>> with the corresponding date and weather as TRUE in
>>>         the arr$gw
>>>         >>>>>   column.
>>>         >>>>>>>
>>>         >>>>>>> I did convert the dates to POSIXlt dates and rarrote
>>>         my function as
>>>         >>>>>>> gooddates = function(all, good) {
>>>         >>>>>>> la = length(all)   # All the arrivals
>>>         >>>>>>> lw = length(good)  # The good 15-minute periods
>>>         >>>>>>> for(j in 1:lw) {
>>>         >>>>>>>  d=good$Date[j]
>>>         >>>>>>>  q=good$quarter[j]
>>>         >>>>>>>  all$gw[all$Date==d && all$quarter==q]=TRUE
>>>         >>>>>>
>>>         >>>>>>
>>>         >>>>>> You are attempting a vectorized test and assignment
>>>         with "&&" which
>>>         >>>>>> seems unlikely to succeed, but even then I am not
>>>         sure your problems
>>>         >>>>>> would be over. (I'm also guessing that you might not
>>>         have reported a
>>>         >>>>>> warning.)
>>>         >>>>>
>>>         >>>>>   Why shouldn't the && succeed? You are correct there,
>>>         because I do
>>>         >>>>> get
>>>         >>>>>   items if I use either part of this and test, when I
>>>         insert the &&,
>>>         >>>>>   I get
>>>         >>>>>   no hits. And I got no warnings.
>>>         >>>>>>
>>>         >>>>>> Why not merge arr to gw by date and quarter?
>>>         >>>>>   The sets contain different data, and the only thing
>>>         I want from the
>>>         >>>>>   weather set is the fact that it has an entry for a
>>>         given date and
>>>         >>>>> time
>>>         >>>>>>
>>>         >>>>>> Answering these questions would be greatly speeded up
>>>         with a small
>>>         >>>>>> sample dataset. Are you aware of the virtues of the
>>>         dput function?
>>>         >>>>>>
>>>         >>>>>
>>>         >>>>>   What I want is for a 1 to be in the gw column in the
>>>         quarter
>>>         >>>>>   60,61,62,63,...
>>>         >>>>>
>>>         >>>>>   For example, here is some data from the good weather
>>>         set:
>>>         >>>>>   Date    minute  hour    quarter         Efficiency Val
>>>         >>>>>   1/1/09  5       15      60
>>>         >>>>>   1/1/09  15      15      61      72
>>>         >>>>>   1/1/09  30      15      62      63.3
>>>         >>>>>   1/1/09  45      15      63      85.4
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>   And this is from the arrivals set:
>>>         >>>>>   DateTime        weekday         month   quarter    
>>>             ICAO
>>>         >>>>>    Flight  gw
>>>         >>>>>
>>>         >>>>>   1/1/09  5       1       59      COA     COA349      
>>>            0
>>>         >>>>>   1/1/09  5       1       59      NWA     NWA9934    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      JBU     JBU554      
>>>            0
>>>         >>>>>   1/1/09  5       1       60      BTA     BTA2347    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      COA     COA886      
>>>            0
>>>         >>>>>   1/1/09  5       1       60      BTA     BTA2916    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      CJC     CJC3225    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      BTA     BTA2085    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      BTA     BTA2064    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      AAL     AAL842      
>>>            0
>>>         >>>>>   1/1/09  5       1       60      BTA     BTA1234    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      CJC     CJC3359    
>>>             0
>>>         >>>>>   1/1/09  5       1       60      BTA     BTA3072    
>>>             0
>>>         >>>>>   1/1/09  5       1       61      BTA     BTA3086    
>>>             0
>>>         >>>>>   1/1/09  5       1       61      COA     COA1166    
>>>             0
>>>         >>>>>   1/1/09  5       1       61      COA     COA855      
>>>            0
>>>         >>>>>   1/1/09  5       1       61      AWE     AWE307      
>>>            0
>>>         >>>>>   1/1/09  5       1       66      CHQ     CHQ5312    
>>>             0
>>>         >>>>>   1/1/09  5       1       67      BTA     BTA2405    
>>>             0
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>
>>>         >>>>>          [[alternative HTML version deleted]]
>>>         >>>>>
>>>         >>>>>   ______________________________________________
>>>         >>>>>   R-help@r-project.org <mailto:R-help@r-project.org>
>>>         <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
>>>         mailing list
>>>         >>>>>   https://stat.ethz.ch/mailman/listinfo/r-help
>>>         >>>>>   PLEASE do read the posting guide
>>>         >>>>>   http://www.R-project.org/posting-guide.html
>>>         >>>>>   and provide commented, minimal, self-contained,
>>>         reproducible code.
>>>         >>>>>
>>>         >>>>>
>>>         >>>>
>>>         >>>>    [[alternative HTML version deleted]]
>>>         >>>>
>>>         >>>> ______________________________________________
>>>         >>>> R-help@r-project.org <mailto:R-help@r-project.org>
>>>         mailing list
>>>         >>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>         >>>> PLEASE do read the posting guide
>>>         >>>> http://www.R-project.org/posting-guide.html
>>>         >>>> and provide commented, minimal, self-contained,
>>>         reproducible code.
>>>         >>>
>>>         >>> David Winsemius, MD
>>>         >>> Heritage Laboratories
>>>         >>> West Hartford, CT
>>>         >>>
>>>         >> <arr.rda><weather.rda>
>>>         >
>>>         > David Winsemius, MD
>>>         > Heritage Laboratories
>>>         > West Hartford, CT
>>>         >
>>>
>>>
>>
>>     David Winsemius, MD
>>     Heritage Laboratories
>>     West Hartford, CT
>>
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to