[R] Bestglm subset analysis
Hello All, I am working on a linear regression model and trying to find the best subset of variables for my dataset. I have 21 predictors, 1 response variable, and 79 observations. I need to find the best 5 or 6 predictors for my model. I've used leaps for lm() and I'm now trying bestglm for glm(). I'm following this webpage, which gives the code below. https://rstudio-pubs-static.s3.amazonaws.com/2897_9220b21cfc0c43a396ff9abf122bb351.html My code:library(bestglm)library(base)lbw.for.bestglm <- within(df_Chl, {y <- df_Chl$Chloro })res.bestglm <- bestglm(Xy = lbw.for.bestglm, family = gaussian, IC = "AIC", method = "exhaustive") # get coefficientsres.bestglm$BestModelsHere is a sample of my results (I removed the 5th through 21st predictors for brevity).> res.bestglm$BestModels R21 R31 R32 R41 1 FALSE FALSE FALSE FALSE 2 FALSE TRUE FALSE FALSE 3 FALSE FALSE FALSE FALSE 4 FALSE TRUE FALSE FALSE 5 FALSE TRUE FALSE FALSE Criterion1 326.73272 326.95253 327.06594 327.09125 327.8208 Is it correct to assume I should keep variables that are TRUE from 1 through 5? What do those five rows represent? I know the AIC criterion result should be as low as possible. Is it possible to discern a good result for any of the IC criterion results, such as AIC, LOOCV, BICg, etc..? If BIC returns lower Criterion results, does that mean I need to use the BIC subset instead of the subset from AIC? Thank You, Doug [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a datetime vector
In addition to my previous message, DF_extract_clean.R is the program in the dropbox folder that I am currently working on. Doug On Tuesday, February 23, 2016 4:02 AM, Jim Lemon wrote: Hi Doug,It is difficult for us to work out what is happening as we don't have access to a toy data set that we can play with. Excel spreadsheets are one of those things that you can't just attach to your email to the help list. If there is somewhere you can leave a _small_ Excel sample file (take the first 10 rows, say) that we can download (Google Drive, Dropbox?) and include the URL in your email, maybe someone can offer more than guesses. Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a datetime vector
Hello Everyone, The column begins populated with integers as so:1/1/2013 0:00 in the spreadsheet equals 41257 in R's dataframe1/1/2013 0:15 in the spreadsheet equals 41257.01041664 in R's dataframe...41257 must be in minutes since 1440min/day * .01041664 day = 15 minutes. 41257 minutes is about 29 days: 41257 min / 1440 min/day = 28.65 days. So I don't know why the dataframe is showing 41257 for 1/12013 0:00. Oddly, R sees the vector as NULL despite the fact it has integers in each record in the column:data_type = str(df2_TZ$DateTimeStamp) produces a NULL (empty) variable. I tried: df2_TZ = read.xlsx2("DF_exp.xlsx", sheetName = "Sheet1")Sys.setenv(TZ = "GMT")testdtm <- as.POSIXct(df2_TZ$DateTimeStamp, format = "%m/%d/%Y %H:%M")# Inspect the resulttestdtmstr(testdtm) testdtm is a vector filled with NA values, which figures since DateTimeStamp is NULL. I noticed in the table on page 32 of the R Help Desk pdf you linked to that dp-as.POSIXct(format(dp, tz="GMT")) is the only option listed for time zone difference. So I tried:df2_TZ = read.xlsx2("DF_exp.xlsx", sheetName = "Sheet1")df2_TZ_seq <- as.POSIXct(format(dt2_TZ, tz="GMT")) and got: Error in format(dt2_TZ, tz = "GMT") : object 'dt2_TZ' not found Is the vector neither character nor factor, since it's NULL? Where do I go from here? Thank You,Doug Hi Doug,What you have done is to ask whether the character string "DF_exp.xlsx" is a character string. I think Yogi Berra, were he still around, could have told you that. What will give you some useful information is: str(DF_exp.xlsx) which asks for information about the object, not its name. Jim On Friday, February 19, 2016 12:41 PM, Jeff Newmiller wrote: This is a mailing list. I don't know how you are interacting with it... using a website rather than an email program can lead to some confusion since there can be many ways to accomplish the task of interacting with the mailing list. My email program has a "reply-all" button when I am looking at an email. It also has an option to write the email in plain text, which often prevents the message from getting corrupted (recipient not seeing what you sent to the list). Using the str function on a literal string (the name of a file) will indeed tell you that you gave it a character string. Specifying a column in your data might tell you something more interesting... e.g. str( df2_TZ$DateTimeStamp ) If that says you have character data then Jim Lemon's suggestion would be a good next thing to look at. If it is factor data then you should use the as.character function on the data column and then follow Jim's suggestion. If it is numeric then you probably need to convert it using an appropriate origin (e.g. as described at [1] or [2]). I have had best luck setting the default timezone string when converting to POSIXt types... e.g. # specify timezone assumed by input data Sys.setenv( TZ="GMT" ) testdtm <- as.POSIXct( "1/1/2016 00:00", format = "%m/%d/%Y %H:%M" ) # inspect the result testdtm str( testdtm ) # view data from a different timezone Sys.setenv( TZ="Etc/GMT+8" ) # no change to the underlying data, but it prints out differently now because the tz attribute is "" which implies using the default TZ testdtm [1] http://blog.mollietaylor.com/2013/08/date-formats-in-r.html [2] https://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf -- Sent from my phone. Please excuse my brevity. On February 19, 2016 7:48:31 AM PST, D Wolf wrote: Hello Jeff, I ran str() on the vector and it returned character.> str("DF_exp.xlsx") chr "DF_exp.xlsx" This is my first thread on this forum, and I'm not sure how to reply to the thread instead of just sending the reply to your email account; I don't see a 'reply' link in the thread.I've read this page and I don't think it advises on how to reply in the thread: R: Posting Guide: How to ask good questions that prompt useful answers | | | | | | | | | | | R: Posting Guide: How to ask good questions that prompt ...Posting Guide: How to ask good questions that prompt useful answers This guide is intended to help you get the most out of the R mailing lists, and to avoid embarra... | | | | View on www.r-project.org | Preview by Yahoo | | | | | Thank You,Doug Wolfinger On Friday, February 19, 2016 12:51 AM, Jeff Newmiller wrote: You are being rather scattershot in your explanation, so I suspect you are not being systematic in your troubleshooting. Use the str function to examine the data column after you pull it in from excel. It may be numeric, factor, or character, and the approach depends on which that function returns. -- Sent from my phone. Please excuse my brevity. On F
Re: [R] Reading a datetime vector
Hello Jim, I ran str() on the vector and it returned character:str("DF_exp.xlsx") chr "DF_exp.xlsx" I tried df2_TZ$DateTimeStamp <- strptime(as.Date(as.character(df2_TZ$DateTimeStamp, format = "%m/%d/%Y %H:%M", tz = "GMT"))), which produced an error: Error in charToDate(x) : character string is not in a standard unambiguous formatIn Excel, the column is formatted to m/d/ h:mm Removing %S from these linesdf2_TZ$DateTimeStamp = as.POSIXct(df2_TZ$DateTimeStamp, format="%m/%d/%Y %H:%M", tz="GMT") df2_TZ$DateTimeStamp = as.POSIXct(as.character(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M") made the column NA Thank You,Doug Wolfinger On Friday, February 19, 2016 1:35 AM, Jim Lemon wrote: Hi Doug,For one thing, you may be using the wrong format. Your example format has no seconds field. The other thing to watch is whether the data are in %m/%d/%Y or %d/%m/%Y date format. If the latter, you would probably get that error on dates like 19/02/2016. Jim On Fri, Feb 19, 2016 at 8:12 AM, D Wolf via R-help wrote: Hello,I am trying to read a data frame column named DateTimeStamp. The time is in GMT in this format: 1/4/2013 23:30 require(xlsx) df2_TZ = read.xlsx2("DF_exp.xlsx", sheetName = "Sheet1") It's good to that line. But these three lines, which makes the dataframe, converts the column's values to NA:df2_TZ$DateTimeStamp = as.POSIXct(df2_TZ$DateTimeStamp, format="%m/%d/%Y %H:%M:%S", tz="GMT") and... df2_TZ$DateTimeStamp = as.POSIXct(as.character(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M:%S") and...df2_TZ$DateTimeStamp = as.Date(df2_TZ$DateTimeStamp, format = "%m/%d/%Y %H:%M:%S") This line returns and error...df2_TZ$DateTimeStamp = as.POSIXct(as.Date(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M:%S") "Error in charToDate(x) : character string is not in a standard unambiguous format" Additionally, I need to convert from GMT to North American time zones, and I think the advice on this page would be good for that: http://blog.revolutionanalytics.com/2009/06/converting-time-zones.html My ultimate goal is to write an R program that finds data in another variable in df2_TZ that corresponds to a date and time that match up with the date and time in another data frame. For now, any help reading the column would be much appreciated. Thank You,Doug [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading a datetime vector
Hello,I am trying to read a data frame column named DateTimeStamp. The time is in GMT in this format: 1/4/2013 23:30 require(xlsx) df2_TZ = read.xlsx2("DF_exp.xlsx", sheetName = "Sheet1") It's good to that line. But these three lines, which makes the dataframe, converts the column's values to NA:df2_TZ$DateTimeStamp = as.POSIXct(df2_TZ$DateTimeStamp, format="%m/%d/%Y %H:%M:%S", tz="GMT") and... df2_TZ$DateTimeStamp = as.POSIXct(as.character(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M:%S") and...df2_TZ$DateTimeStamp = as.Date(df2_TZ$DateTimeStamp, format = "%m/%d/%Y %H:%M:%S") This line returns and error...df2_TZ$DateTimeStamp = as.POSIXct(as.Date(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M:%S") "Error in charToDate(x) : character string is not in a standard unambiguous format" Additionally, I need to convert from GMT to North American time zones, and I think the advice on this page would be good for that: http://blog.revolutionanalytics.com/2009/06/converting-time-zones.html My ultimate goal is to write an R program that finds data in another variable in df2_TZ that corresponds to a date and time that match up with the date and time in another data frame. For now, any help reading the column would be much appreciated. Thank You,Doug [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.