Any entry in the weather data is a good day. That is the point. And please ignore my mistake about the quarters getting too large in weather. I am being swamped with versions, and it does not matter for this purpose.. so, the bad weather days are not in the weather data set.
I am trying to get gw=1 in arr if the date and quarter are in weather. Thanks, Jim On 1/17/10 7:46 PM, David Winsemius wrote: > But, but, but .... there is no weather goodness variable in weather?!?!?! > > > str(weather) > 'data.frame': 155 obs. of 4 variables: > $ Date :Class 'Date' num [1:155] 14245 14245 14245 14245 14245 ... > $ minute : int 5 15 30 45 0 15 30 45 0 15 ... > $ hour : int 15 15 15 15 17 17 17 17 18 18 ... > $ quarter: int 65 75 90 105 68 83 98 113 72 87 .. > > I thought you said the "weather" dataframe would have some information > about "goodness" that we were supposed to map to arrivals.? What is > the meaning of those variables? How do we define a "good" quarter > hour? And why are the values of quarter not 1, 2, 3, 4? They ought to > be a factor or integer that could be matched to those that are in > "arr", which are also apparently not so defined. Let's see a better > codebook or description of these variables. > > On Jan 17, 2010, at 6:47 PM, James Rome wrote: > >> Here are some sample data sets. >> >> I also tried making a combined field in each set such as >> adq=paste(as.character(arr$Date), as.character(arr$quarter)) >> and similarly for the weather set, so I have unique single things to >> compare, but that did not seem to help much. >> >> Thanks, >> Jim >> >> On 1/17/10 5:50 PM, David Winsemius wrote: >>> My guess (since we still have no data on which to test these ideas) >>> is that you need either to merge() or to use a matrix created from the >>> dates and qtr-hours entries in "gw", since matching on dates and hours >>> separately will not uniquely classify the good qtr-hours within their >>> proper corresponding dates. You want a structure (or a matching >>> process) that takes: >>> hqhr1 qhr2 qhr3 qhr4 ....... >>> date1 good bad good bad >>> date2 bad good good good >>> date3 bad bad bad good >>> . >>> . >>> . >>> and lets you use the values in "arr" to get values in "gw". Notice >>> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr >>> simply will not accomplish anything correct/ >>> >>> Merging by multiple criteria (with the merge function) would do that >>> or you could construct a matrix whose entries were the categories good >>> /bad. The table function could create the matrix for the purpose of >>> using an indexed solution if you are dead-set against the merge >>> concept. >>> >>> >>> >>> >>> On Jan 17, 2010, at 4:47 PM, James Rome wrote: >>> >>>> Thank you Dennis. >>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in% >>>> weather$quarter) >>>> seems to be what I want to do, but in fact, with the full data set, it >>>> misidentifies the rows, so I think the error message must mean >>>> something. >>>> >>>>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y") >>>>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y") >>>>> gw = c(length(arrr)) >>>>> gw[1:length(arrr[,1])]=FALSE >>>>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter] >>>> Warning in `==.default`(arr$Date, weather$Date) : >>>> longer object length is not a multiple of shorter object length >>>> Warning in arr$Date == weather$Date & weather$quarter %in% >>>> arr$quarter : >>>> longer object length is not a multiple of shorter object length >>>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 0 0 >>>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 0 0 >>>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 0 0 >>>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 >>>> 0 0 0 0 >>>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 >>>> 0 0 0 0 >>>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 >>>> 0 0 0 0 >>>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>> 0 0 >>>> 0 0 0 0 >>>> [260] 0 0 0 0 0 0 0 0 >>>> >>>> There are many many more matches in the 99k line arrival data set. >>>> >>>> Thanks a bunch, >>>> Jim >>>> >>>> >>>> On 1/17/10 3:21 PM, Dennis Murphy wrote: >>>>> Hi: >>>>> >>>>> To read a data set from a R-help message into R, one uses >>>>> read.table(textConnection("<verbatim text>"), ...) >>>>> >>>>> Your weather data set had >>>>> (a) a variable name with a space in it, that R misread and had to be >>>>> altered manually; >>>>> (b) a missing value with no NA that R interpreted as an incomplete >>>>> line; again, it had >>>>> to be altered manually. >>>>> >>>>> This is why David suggested the use of dput(), so that these vagaries >>>>> don't have to be >>>>> dealt with by those who are trying to help. >>>>> >>>>> That being said, for the example that you gave and the desired value >>>>> that you wanted, try >>>>> >>>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in% >>>>> weather$quarter) >>>>> >>>>> (I changed DateTime to Date in the arr data frame...) >>>>> >>>>> You'll get warnings like >>>>> >>>>> Warning messages: >>>>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) : >>>>> longer object length is not a multiple of shorter object length >>>>> >>>>> but it seems to do the right thing. The first equality is there to >>>>> constrain matches for >>>>> quarter to be within the same day. >>>>> >>>>> For future reference, >>>>> >>>>>> dput(weather) >>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09", >>>>> class = "factor"), >>>>> minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L >>>>> ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names = >>>>> c("Date", >>>>> "minute", "hour", "quarter", "efficiency"), class = "data.frame", >>>>> row.names = c(NA, >>>>> -4L)) >>>>>> dput(arr) >>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09", >>>>> class = "factor"), >>>>> weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, >>>>> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L, >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), >>>>> quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, >>>>> 60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO = >>>>> structure(c(6L, >>>>> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L, >>>>> 2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC", >>>>> "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L, >>>>> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L, >>>>> 14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234", >>>>> "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072", >>>>> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349", >>>>> "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"), >>>>> gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, >>>>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, >>>>> FALSE)), .Names = c("Date", "weekday", "month", "quarter", >>>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class = >>>>> "data.frame") >>>>> >>>>> These can be copied and pasted directly into an R session without >>>>> modification. >>>>> >>>>> HTH, >>>>> Dennis >>>>> >>>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesr...@gmail.com >>>>> <mailto:jamesr...@gmail.com>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> On 1/17/10 1:06 PM, David Winsemius wrote: >>>>>> >>>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote: >>>>>> >>>>>>> I don't think it is that simple because it is not a one-to-one >>>>> match. In >>>>>>> the arr data frame, there are many arrivals in a quarter hour >>>>> with good >>>>>>> weather on a given day. So I need to match the date and the quarter >>>>>>> hour. >>>>>>> >>>>>>> And all of the rows in the weather data frame are times with good >>>>>>> weather--unique date + quarter hour. That is why I needed the >>>>> loop. For >>>>>>> each date and quarter hour in weather, I want to mark all the >>>>> entries >>>>>>> with the corresponding date and weather as TRUE in the arr$gw >>>>> column. >>>>>>> >>>>>>> I did convert the dates to POSIXlt dates and rewrote my function as >>>>>>> gooddates = function(all, good) { >>>>>>> la = length(all) # All the arrivals >>>>>>> lw = length(good) # The good 15-minute periods >>>>>>> for(j in 1:lw) { >>>>>>> d=good$Date[j] >>>>>>> q=good$quarter[j] >>>>>>> all$gw[all$Date==d && all$quarter==q]=TRUE >>>>>> >>>>>> >>>>>> You are attempting a vectorized test and assignment with "&&" which >>>>>> seems unlikely to succeed, but even then I am not sure your problems >>>>>> would be over. (I'm also guessing that you might not have reported a >>>>>> warning.) >>>>> >>>>> Why shouldn't the && succeed? You are correct there, because I do >>>>> get >>>>> items if I use either part of this and test, when I insert the &&, >>>>> I get >>>>> no hits. And I got no warnings. >>>>>> >>>>>> Why not merge arr to gw by date and quarter? >>>>> The sets contain different data, and the only thing I want from the >>>>> weather set is the fact that it has an entry for a given date and >>>>> time >>>>>> >>>>>> Answering these questions would be greatly speeded up with a small >>>>>> sample dataset. Are you aware of the virtues of the dput function? >>>>>> >>>>> >>>>> What I want is for a 1 to be in the gw column in the quarter >>>>> 60,61,62,63,... >>>>> >>>>> For example, here is some data from the good weather set: >>>>> Date minute hour quarter Efficiency Val >>>>> 1/1/09 5 15 60 >>>>> 1/1/09 15 15 61 72 >>>>> 1/1/09 30 15 62 63.3 >>>>> 1/1/09 45 15 63 85.4 >>>>> >>>>> >>>>> >>>>> And this is from the arrivals set: >>>>> DateTime weekday month quarter ICAO >>>>> Flight gw >>>>> >>>>> 1/1/09 5 1 59 COA COA349 0 >>>>> 1/1/09 5 1 59 NWA NWA9934 0 >>>>> 1/1/09 5 1 60 JBU JBU554 0 >>>>> 1/1/09 5 1 60 BTA BTA2347 0 >>>>> 1/1/09 5 1 60 COA COA886 0 >>>>> 1/1/09 5 1 60 BTA BTA2916 0 >>>>> 1/1/09 5 1 60 CJC CJC3225 0 >>>>> 1/1/09 5 1 60 BTA BTA2085 0 >>>>> 1/1/09 5 1 60 BTA BTA2064 0 >>>>> 1/1/09 5 1 60 AAL AAL842 0 >>>>> 1/1/09 5 1 60 BTA BTA1234 0 >>>>> 1/1/09 5 1 60 CJC CJC3359 0 >>>>> 1/1/09 5 1 60 BTA BTA3072 0 >>>>> 1/1/09 5 1 61 BTA BTA3086 0 >>>>> 1/1/09 5 1 61 COA COA1166 0 >>>>> 1/1/09 5 1 61 COA COA855 0 >>>>> 1/1/09 5 1 61 AWE AWE307 0 >>>>> 1/1/09 5 1 66 CHQ CHQ5312 0 >>>>> 1/1/09 5 1 67 BTA BTA2405 0 >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org <mailto:R-help@r-project.org> mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> Heritage Laboratories >>> West Hartford, CT >>> >> <arr.rda><weather.rda> > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.