I finally figured this out with the help of Dave and Dennis. The steps I had to do were 1. Convert dates to POSIXlt 2. Create a column in each frame that was paste(date, quarter) 3. Then %in% worked instantly.
gw = c(length(arr)) gw[1:length(arr[,1])]=FALSE arr["gw"]=gw # put a column of 0s into arr arr$Date <- as.Date(as.character(arr$Date),format="%m/%d/%y") weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y") weather$dq = paste(as.character(weather$Date), as.character(weather$quarter)) arr$dq = paste(as.character(arr$Date), as.character(arr$quarter)) arr$gw <- as.numeric(arr$dq %in% weather$dq) Thanks for all the help, Jim On 1/17/10 10:16 PM, David Winsemius wrote: > > On Jan 17, 2010, at 9:22 PM, Dennis Murphy wrote: > >> I thought it was a solution :) >> >> The gw column is an indicator that is meant to match quarters with >> good weather >> (in the weather data frame) to the quarters in the arr data frame: 1 >> = good weather, >> 0 = not. I split up both data frames by Date since the quarter values >> are in the >> same range from day to day, and then ran a loop to generate the >> indicator gw by >> finding whether the quarter in the i-th component of the arrl list >> matched the corresponding reference table of quarters (of good >> weather) in the wc list. >> do.call then slurps it all together into a data frame. >> My understanding was that James >> just wanted to be able to distinguish arrivals during good weather >> quarters from those >> in bad weather quarters. > > This was my understanding, too. I was just trying to get to some point > where that could be expressed in an R-encodable format and implemented. > > >> >> I did the tables as a sanity check, relative to the hand calculations >> I did earlier as a ballpark >> estimate. > > The rather small dataset may have been enough. We will need to see > what the OP says ... is that a sufficient solution? > > >> >> Does this make sense? >> >> Dennis >> >> On Sun, Jan 17, 2010 at 6:04 PM, David Winsemius >> <dwinsem...@comcast.net <mailto:dwinsem...@comcast.net>> wrote: >> >> I'm not clear about your last message. Do you have a solution? >> >> -- >> David. >> >> On Jan 17, 2010, at 8:27 PM, Dennis Murphy wrote: >> >>> Hi James and David: >>> >>> I tried the following: split the quarters from weather into a >>> list by Date, and ditto with arr. >>> Then assign value of gw by running a (sleazy) loop over the >>> components of the arr list... >>> >>> wc <- split(weather$quarter, weather$Date) >>> arrl <- split(arr, arr$Date) >>> >>> # Note that there are four dates in wc and three in arrl... >>> for(i in seq_along(arrl)) { >>> arrl[[i]]$gw <- as.numeric(arrl[[i]]$quarter %in% wc[[i]]) } >>> arr2 <- do.call(rbind, arrl) >>> dim(arr2) >>> [1] 1126 9 >>> table(arr2$gw) >>> >>> 0 1 >>> 661 465 >>> with(arr2, table(Date, gw)) >>> gw >>> Date 0 1 >>> 2009-01-01 368 99 >>> 2009-01-02 266 348 >>> 2009-01-03 27 18 >>> >>> OK, I was a bit off, but at least we know this is in the >>> ballpark of my estimates :) >>> I'm sure David will come up with something more elegant, but >>> this seems to work. >>> >>> HTH, >>> Dennis >>> >>> >>> On Sun, Jan 17, 2010 at 5:17 PM, James Rome <jamesr...@gmail.com >>> <mailto:jamesr...@gmail.com>> wrote: >>> >>> Any entry in the weather data is a good day. That is the >>> point. And >>> please ignore my mistake about the quarters getting too large in >>> weather. I am being swamped with versions, and it does not >>> matter for >>> this purpose.. so, the bad weather days are not in the >>> weather data set. >>> >>> I am trying to get gw=1 in arr if the date and quarter are >>> in weather. >>> >>> Thanks, >>> Jim >>> >>> On 1/17/10 7:46 PM, David Winsemius wrote: >>> > But, but, but .... there is no weather goodness variable >>> in weather?!?!?! >>> > >>> > > str(weather) >>> > 'data.frame': 155 obs. of 4 variables: >>> > $ Date :Class 'Date' num [1:155] 14245 14245 14245 >>> 14245 14245 ... >>> > $ minute : int 5 15 30 45 0 15 30 45 0 15 ... >>> > $ hour : int 15 15 15 15 17 17 17 17 18 18 ... >>> > $ quarter: int 65 75 90 105 68 83 98 113 72 87 .. >>> > >>> > I thought you said the "weather" dataframe would have some >>> information >>> > about "goodness" that we were supposed to map to >>> arrivals.? What is >>> > the meaning of those variables? How do we define a "good" >>> quarter >>> > hour? And why are the values of quarter not 1, 2, 3, 4? >>> They ought to >>> > be a factor or integer that could be matched to those that >>> are in >>> > "arr", which are also apparently not so defined. Let's see >>> a better >>> > codebook or description of these variables. >>> > >>> > On Jan 17, 2010, at 6:47 PM, James Rome wrote: >>> > >>> >> Here are some sample data sets. >>> >> >>> >> I also tried making a combined field in each set such as >>> >> adq=paste(as.character(arr$Date), as.character(arr$quarter)) >>> >> and similarly for the weather set, so I have unique >>> single things to >>> >> compare, but that did not seem to help much. >>> >> >>> >> Thanks, >>> >> Jim >>> >> >>> >> On 1/17/10 5:50 PM, David Winsemius wrote: >>> >>> My guess (since we still have no data on which to test >>> these ideas) >>> >>> is that you need either to merge() or to use a matrix >>> created from the >>> >>> dates and qtr-hours entries in "gw", since matching on >>> dates and hours >>> >>> separately will not uniquely classify the good qtr-hours >>> within their >>> >>> proper corresponding dates. You want a structure (or a >>> matching >>> >>> process) that takes: >>> >>> hqhr1 qhr2 qhr3 qhr4 ....... >>> >>> date1 good bad good bad >>> >>> date2 bad good good good >>> >>> date3 bad bad bad good >>> >>> . >>> >>> . >>> >>> . >>> >>> and lets you use the values in "arr" to get values in >>> "gw". Notice >>> >>> that the notion of arr$Date %in% gw$date & arr$qtrhr >>> %in% gw$qtrhr >>> >>> simply will not accomplish anything correct/ >>> >>> >>> >>> Merging by multiple criteria (with the merge function) >>> would do that >>> >>> or you could construct a matrix whose entries were the >>> categories good >>> >>> /bad. The table function could create the matrix for the >>> purpose of >>> >>> using an indexed solution if you are dead-set against >>> the merge >>> >>> concept. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Jan 17, 2010, at 4:47 PM, James Rome wrote: >>> >>> >>> >>>> Thank you Dennis. >>> >>>> arr$gw <- as.numeric(weather$Date == arr$Date & >>> arr$quarter %in% >>> >>>> weather$quarter) >>> >>>> seems to be what I want to do, but in fact, with the >>> full data set, it >>> >>>> misidentifies the rows, so I think the error message >>> must mean >>> >>>> something. >>> >>>> >>> >>>>> arrr$Date <- >>> as.Date(as.character(arr$Date),format="%m/%d/%y") >>> >>>>> weather$Date <- >>> as.Date(as.character(weather$Date),format="%m/%d/%y") >>> >>>>> gw = c(length(arrr)) >>> >>>>> gw[1:length(arrr[,1])]=FALSE >>> >>>>> gw[arrr$Date==weather$Date & weather$quarter %in% >>> arr$quarter] >>> >>>> Warning in `==.default`(arr$Date, weather$Date) : >>> >>>> longer object length is not a multiple of shorter >>> object length >>> >>>> Warning in arr$Date == weather$Date & weather$quarter %in% >>> >>>> arr$quarter : >>> >>>> longer object length is not a multiple of shorter >>> object length >>> >>>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 >>> >>>> 0 0 0 0 >>> >>>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 >>> >>>> 0 0 0 0 >>> >>>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 >>> >>>> 0 0 0 0 >>> >>>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 >>> >>>> 0 0 >>> >>>> 0 0 0 0 >>> >>>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 >>> >>>> 0 0 >>> >>>> 0 0 0 0 >>> >>>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 >>> >>>> 0 0 >>> >>>> 0 0 0 0 >>> >>>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 >>> >>>> 0 0 >>> >>>> 0 0 0 0 >>> >>>> [260] 0 0 0 0 0 0 0 0 >>> >>>> >>> >>>> There are many many more matches in the 99k line >>> arrival data set. >>> >>>> >>> >>>> Thanks a bunch, >>> >>>> Jim >>> >>>> >>> >>>> >>> >>>> On 1/17/10 3:21 PM, Dennis Murphy wrote: >>> >>>>> Hi: >>> >>>>> >>> >>>>> To read a data set from a R-help message into R, one uses >>> >>>>> read.table(textConnection("<verbatim text>"), ...) >>> >>>>> >>> >>>>> Your weather data set had >>> >>>>> (a) a variable name with a space in it, that R misread >>> and had to be >>> >>>>> altered manually; >>> >>>>> (b) a missing value with no NA that R interpreted as >>> an incomplete >>> >>>>> line; again, it had >>> >>>>> to be altered manually. >>> >>>>> >>> >>>>> This is why David suggested the use of dput(), so that >>> these vagaries >>> >>>>> don't have to be >>> >>>>> dealt with by those who are trying to help. >>> >>>>> >>> >>>>> That being said, for the example that you gave and the >>> desired value >>> >>>>> that you wanted, try >>> >>>>> >>> >>>>> arr$gw <- as.numeric(weather$Date == arr$Date & >>> arr$quarter %in% >>> >>>>> weather$quarter) >>> >>>>> >>> >>>>> (I changed DateTime to Date in the arr data frame...) >>> >>>>> >>> >>>>> You'll get warnings like >>> >>>>> >>> >>>>> Warning messages: >>> >>>>> 1: In is.na <http://is.na> <http://is.na>(e1) | is.na >>> <http://is.na> <http://is.na>(e2) : >>> >>>>> longer object length is not a multiple of shorter >>> object length >>> >>>>> >>> >>>>> but it seems to do the right thing. The first equality >>> is there to >>> >>>>> constrain matches for >>> >>>>> quarter to be within the same day. >>> >>>>> >>> >>>>> For future reference, >>> >>>>> >>> >>>>>> dput(weather) >>> >>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), >>> .Label = "1/1/09", >>> >>>>> class = "factor"), >>> >>>>> minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, >>> 15L, 15L >>> >>>>> ), quarter = 60:63, efficiency = c(NA, 72, 63.3, >>> 85.4)), .Names = >>> >>>>> c("Date", >>> >>>>> "minute", "hour", "quarter", "efficiency"), class = >>> "data.frame", >>> >>>>> row.names = c(NA, >>> >>>>> -4L)) >>> >>>>>> dput(arr) >>> >>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, >>> 1L, 1L, >>> >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), >>> .Label = "1/1/09", >>> >>>>> class = "factor"), >>> >>>>> weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, >>> 5L, 5L, >>> >>>>> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, >>> 1L, 1L, >>> >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), >>> >>>>> quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, >>> >>>>> 60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), >>> ICAO = >>> >>>>> structure(c(6L, >>> >>>>> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, >>> 6L, 6L, >>> >>>>> 2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", >>> "CJC", >>> >>>>> "COA", "JBU", "NWA"), class = "factor"), Flight = >>> structure(c(15L, >>> >>>>> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, >>> 10L, >>> >>>>> 14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", >>> "AWE307", "BTA1234", >>> >>>>> "BTA2064", "BTA2085", "BTA2347", "BTA2405", >>> "BTA2916", "BTA3072", >>> >>>>> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", >>> "COA1166", "COA349", >>> >>>>> "COA855", "COA886", "JBU554", "NWA9934"), class = >>> "factor"), >>> >>>>> gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, >>> >>>>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, >>> TRUE, FALSE, >>> >>>>> FALSE)), .Names = c("Date", "weekday", "month", >>> "quarter", >>> >>>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class = >>> >>>>> "data.frame") >>> >>>>> >>> >>>>> These can be copied and pasted directly into an R >>> session without >>> >>>>> modification. >>> >>>>> >>> >>>>> HTH, >>> >>>>> Dennis >>> >>>>> >>> >>>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome >>> <jamesr...@gmail.com <mailto:jamesr...@gmail.com> >>> >>>>> <mailto:jamesr...@gmail.com >>> <mailto:jamesr...@gmail.com>>> wrote: >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On 1/17/10 1:06 PM, David Winsemius wrote: >>> >>>>>> >>> >>>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote: >>> >>>>>> >>> >>>>>>> I don't think it is that simple because it is not a >>> one-to-one >>> >>>>> match. In >>> >>>>>>> the arr data frame, there are many arrivals in a >>> quarter hour >>> >>>>> with good >>> >>>>>>> weather on a given day. So I need to match the date >>> and the quarter >>> >>>>>>> hour. >>> >>>>>>> >>> >>>>>>> And all of the rows in the weather data frame are >>> times with good >>> >>>>>>> weather--unique date + quarter hour. That is why I >>> needed the >>> >>>>> loop. For >>> >>>>>>> each date and quarter hour in weather, I want to >>> mark all the >>> >>>>> entries >>> >>>>>>> with the corresponding date and weather as TRUE in >>> the arr$gw >>> >>>>> column. >>> >>>>>>> >>> >>>>>>> I did convert the dates to POSIXlt dates and rarrote >>> my function as >>> >>>>>>> gooddates = function(all, good) { >>> >>>>>>> la = length(all) # All the arrivals >>> >>>>>>> lw = length(good) # The good 15-minute periods >>> >>>>>>> for(j in 1:lw) { >>> >>>>>>> d=good$Date[j] >>> >>>>>>> q=good$quarter[j] >>> >>>>>>> all$gw[all$Date==d && all$quarter==q]=TRUE >>> >>>>>> >>> >>>>>> >>> >>>>>> You are attempting a vectorized test and assignment >>> with "&&" which >>> >>>>>> seems unlikely to succeed, but even then I am not >>> sure your problems >>> >>>>>> would be over. (I'm also guessing that you might not >>> have reported a >>> >>>>>> warning.) >>> >>>>> >>> >>>>> Why shouldn't the && succeed? You are correct there, >>> because I do >>> >>>>> get >>> >>>>> items if I use either part of this and test, when I >>> insert the &&, >>> >>>>> I get >>> >>>>> no hits. And I got no warnings. >>> >>>>>> >>> >>>>>> Why not merge arr to gw by date and quarter? >>> >>>>> The sets contain different data, and the only thing >>> I want from the >>> >>>>> weather set is the fact that it has an entry for a >>> given date and >>> >>>>> time >>> >>>>>> >>> >>>>>> Answering these questions would be greatly speeded up >>> with a small >>> >>>>>> sample dataset. Are you aware of the virtues of the >>> dput function? >>> >>>>>> >>> >>>>> >>> >>>>> What I want is for a 1 to be in the gw column in the >>> quarter >>> >>>>> 60,61,62,63,... >>> >>>>> >>> >>>>> For example, here is some data from the good weather >>> set: >>> >>>>> Date minute hour quarter Efficiency Val >>> >>>>> 1/1/09 5 15 60 >>> >>>>> 1/1/09 15 15 61 72 >>> >>>>> 1/1/09 30 15 62 63.3 >>> >>>>> 1/1/09 45 15 63 85.4 >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> And this is from the arrivals set: >>> >>>>> DateTime weekday month quarter >>> ICAO >>> >>>>> Flight gw >>> >>>>> >>> >>>>> 1/1/09 5 1 59 COA COA349 >>> 0 >>> >>>>> 1/1/09 5 1 59 NWA NWA9934 >>> 0 >>> >>>>> 1/1/09 5 1 60 JBU JBU554 >>> 0 >>> >>>>> 1/1/09 5 1 60 BTA BTA2347 >>> 0 >>> >>>>> 1/1/09 5 1 60 COA COA886 >>> 0 >>> >>>>> 1/1/09 5 1 60 BTA BTA2916 >>> 0 >>> >>>>> 1/1/09 5 1 60 CJC CJC3225 >>> 0 >>> >>>>> 1/1/09 5 1 60 BTA BTA2085 >>> 0 >>> >>>>> 1/1/09 5 1 60 BTA BTA2064 >>> 0 >>> >>>>> 1/1/09 5 1 60 AAL AAL842 >>> 0 >>> >>>>> 1/1/09 5 1 60 BTA BTA1234 >>> 0 >>> >>>>> 1/1/09 5 1 60 CJC CJC3359 >>> 0 >>> >>>>> 1/1/09 5 1 60 BTA BTA3072 >>> 0 >>> >>>>> 1/1/09 5 1 61 BTA BTA3086 >>> 0 >>> >>>>> 1/1/09 5 1 61 COA COA1166 >>> 0 >>> >>>>> 1/1/09 5 1 61 COA COA855 >>> 0 >>> >>>>> 1/1/09 5 1 61 AWE AWE307 >>> 0 >>> >>>>> 1/1/09 5 1 66 CHQ CHQ5312 >>> 0 >>> >>>>> 1/1/09 5 1 67 BTA BTA2405 >>> 0 >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> [[alternative HTML version deleted]] >>> >>>>> >>> >>>>> ______________________________________________ >>> >>>>> R-help@r-project.org <mailto:R-help@r-project.org> >>> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> >>> mailing list >>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>>>> PLEASE do read the posting guide >>> >>>>> http://www.R-project.org/posting-guide.html >>> >>>>> and provide commented, minimal, self-contained, >>> reproducible code. >>> >>>>> >>> >>>>> >>> >>>> >>> >>>> [[alternative HTML version deleted]] >>> >>>> >>> >>>> ______________________________________________ >>> >>>> R-help@r-project.org <mailto:R-help@r-project.org> >>> mailing list >>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>>> PLEASE do read the posting guide >>> >>>> http://www.R-project.org/posting-guide.html >>> >>>> and provide commented, minimal, self-contained, >>> reproducible code. >>> >>> >>> >>> David Winsemius, MD >>> >>> Heritage Laboratories >>> >>> West Hartford, CT >>> >>> >>> >> <arr.rda><weather.rda> >>> > >>> > David Winsemius, MD >>> > Heritage Laboratories >>> > West Hartford, CT >>> > >>> >>> >> >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.