[R] Format utils::bibentry with a 'corporate name'
Hi all, Does anyone know of a way to force utils::bibentry to mimic the BibTex behaviour of using double { to force a "corporate name" in the author field to print correctly? For example take this bibentry: entry <- utils::bibentry( bibtype = "Manual", title = "The Thing", author = "The Data People", organization = "The Data Org", year = format(Sys.Date(), "%Y") ) entry #> People TD (2021). _The Thing_. The Data Org. print(entry, style = "citation") #> #> People TD (2021). _The Thing_. The Data Org. #> #> A BibTeX entry for LaTeX users is #> #> @Manual{, #> title = {The Thing}, #> author = {The Data People}, #> organization = {The Data Org}, #> year = {2021}, #> } I can simply add "{" right in the author string which then passes that to the Bibtex entry but the author field is still thinking it is a person with a name and I also get some warnings: entry <- utils::bibentry( bibtype = "Manual", title = "The Thing", author = "{The Data People}", organization = "The Data Org", year = format(Sys.Date(), "%Y") ) print(entry, style = "citation") #> Warning in parseLatex(x): x:1: unexpected '}' #> Warning in parseLatex(x): x:1: unexpected END_OF_INPUT 'The' #> Warning in parseLatex(x): x:1: unexpected '}' #> Warning in parseLatex(x): x:1: unexpected END_OF_INPUT 'The' #> Warning in withCallingHandlers(.External2(C_parseRd, tcon, srcfile, "UTF-8", : #> :1: unexpected '}' #> Warning in withCallingHandlers(.External2(C_parseRd, tcon, srcfile, "UTF-8", : :4: unexpected END_OF_INPUT 'The Data Org. #> ' #> #> People D (2021). _The Thing_. The Data Org. #> #> A BibTeX entry for LaTeX users is #> #> @Manual{, #> title = {The Thing}, #> author = {{The Data People}}, #> organization = {The Data Org}, #> year = {2021}, #> } Any thoughts? Thanks in advance, Sam __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file.access returning -1 for a file on remote Windows drive.
Thanks Jeff. And for future readers head here: https://github.com/eddelbuettel/digest/issues/49 and here: https://github.com/eddelbuettel/digest/issues/13 Sam On Fri, Feb 28, 2020 at 3:40 PM Jeff Newmiller wrote: > > Read the closed issues in his digest Github repo first... this discussion has > already occurred there. > > On February 28, 2020 3:35:09 PM PST, Sam Albers > wrote: > >Great question Will. If it were my code I would definitely do this. > >However the problem is manifesting itself for my work with Dirk's > >great digest package here: > > > >https://github.com/eddelbuettel/digest/blob/947b77e82b97024a874a808a4644be21fc329275/R/digest.R#L170-L173 > > > >So because file.access is saying the permissions aren't right, I get > >an error message from digest and can't create a hash. Knowing full > >well that this is some weird Windows thing but also knowing I am stuck > >in that environment, I wanted to figure where I was seeing a > >difference between those two functions before I went asked Dirk if > >he'd be interested in a change to that particular bit of code. > > > > > >On Fri, Feb 28, 2020 at 3:28 PM William Dunlap > >wrote: > >> > >> If file.access() says the file is unreadable but file() says it can > >be opened, why don't you > >> just open the file and read it? You can use tryCatch to deal with > >problems opening or > >> reading the file. > >> > >> Bill Dunlap > >> TIBCO Software > >> wdunlap tibco.com > >> > >> > >> On Fri, Feb 28, 2020 at 2:54 PM Sam Albers > > wrote: > >>> > >>> Thanks Jeff. I am probably not explaining myself very well but my > >>> question under what circumstances would > >>> > >>> summary(file(remote_file, "rb"))$`can read` > >>> > >>> be different from: > >>> > >>> file.access(remote_file, 4) > >>> > >>> If my permissions were different across remote and local should that > >>> not be reflected in both of these functions? > >>> > >>> On Fri, Feb 28, 2020 at 2:37 PM Jeff Newmiller > > wrote: > >>> > > >>> > Dunno. They agree for me. Maybe look closer at all permissions via > >Windows File Manager? > >>> > > >>> > On February 28, 2020 2:06:34 PM PST, Sam Albers > > wrote: > >>> > >Some additional follow-up: > >>> > > > >>> > >> summary(file(remote_file, "rb"))$`can read` > >>> > >[1] "yes" > >>> > > > >>> > >> summary(file(local_file, "rb"))$`can read` > >>> > >[1] "yes" > >>> > > > >>> > >compared to: > >>> > > > >>> > >> file.access(local_file, 4) > >>> > >local.R > >>> > > 0 > >>> > > > >>> > >> file.access(remote_file, 4) > >>> > >remote.R > >>> > >-1 > >>> > > > >>> > >Can anyone think why file.access and file would be contradicting > >each > >>> > >other? > >>> > > > >>> > >Sam > >>> > > > >>> > >On Fri, Feb 28, 2020 at 10:47 AM Sam Albers > >>> > > wrote: > >>> > >> > >>> > >> Hi there, > >>> > >> > >>> > >> Looking for some help in diagnosing or developing a work around > >to a > >>> > >> problem I am having on a Windows machine. I am running R 3.6.2. > >>> > >> > >>> > >> I have two identical files, one stored locally and the other > >stored > >>> > >on > >>> > >> a network drive. > >>> > >> > >>> > >> For access: > >>> > >> > >>> > >> > file.access(local_file, 4) > >>> > >> local.R > >>> > >> 0 > >>> > >> > >>> > >> > file.access(remote_file, 4) > >>> > >> remote.R > >>> > >> -1 > >>> > >> > >>> > >> Also for file.info > >>> > >> > >>> > >> > file.info(local_file)$mode: > >>> > >> [1] "666" > >>&
Re: [R] file.access returning -1 for a file on remote Windows drive.
Great question Will. If it were my code I would definitely do this. However the problem is manifesting itself for my work with Dirk's great digest package here: https://github.com/eddelbuettel/digest/blob/947b77e82b97024a874a808a4644be21fc329275/R/digest.R#L170-L173 So because file.access is saying the permissions aren't right, I get an error message from digest and can't create a hash. Knowing full well that this is some weird Windows thing but also knowing I am stuck in that environment, I wanted to figure where I was seeing a difference between those two functions before I went asked Dirk if he'd be interested in a change to that particular bit of code. On Fri, Feb 28, 2020 at 3:28 PM William Dunlap wrote: > > If file.access() says the file is unreadable but file() says it can be > opened, why don't you > just open the file and read it? You can use tryCatch to deal with problems > opening or > reading the file. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Fri, Feb 28, 2020 at 2:54 PM Sam Albers wrote: >> >> Thanks Jeff. I am probably not explaining myself very well but my >> question under what circumstances would >> >> summary(file(remote_file, "rb"))$`can read` >> >> be different from: >> >> file.access(remote_file, 4) >> >> If my permissions were different across remote and local should that >> not be reflected in both of these functions? >> >> On Fri, Feb 28, 2020 at 2:37 PM Jeff Newmiller >> wrote: >> > >> > Dunno. They agree for me. Maybe look closer at all permissions via Windows >> > File Manager? >> > >> > On February 28, 2020 2:06:34 PM PST, Sam Albers >> > wrote: >> > >Some additional follow-up: >> > > >> > >> summary(file(remote_file, "rb"))$`can read` >> > >[1] "yes" >> > > >> > >> summary(file(local_file, "rb"))$`can read` >> > >[1] "yes" >> > > >> > >compared to: >> > > >> > >> file.access(local_file, 4) >> > >local.R >> > > 0 >> > > >> > >> file.access(remote_file, 4) >> > >remote.R >> > >-1 >> > > >> > >Can anyone think why file.access and file would be contradicting each >> > >other? >> > > >> > >Sam >> > > >> > >On Fri, Feb 28, 2020 at 10:47 AM Sam Albers >> > > wrote: >> > >> >> > >> Hi there, >> > >> >> > >> Looking for some help in diagnosing or developing a work around to a >> > >> problem I am having on a Windows machine. I am running R 3.6.2. >> > >> >> > >> I have two identical files, one stored locally and the other stored >> > >on >> > >> a network drive. >> > >> >> > >> For access: >> > >> >> > >> > file.access(local_file, 4) >> > >> local.R >> > >> 0 >> > >> >> > >> > file.access(remote_file, 4) >> > >> remote.R >> > >> -1 >> > >> >> > >> Also for file.info >> > >> >> > >> > file.info(local_file)$mode: >> > >> [1] "666" >> > >> >> > >> > file.info(remote_file)$mode: >> > >> [1] "666" >> > >> >> > >> Ok so I am access issues. Maybe they are ephemeral and I can change >> > >> the permissions: >> > >> >> > >> > Sys.chmod('remote.R', mode = '666') >> > >> > file.access(remote_file, 4) >> > >> remote.R >> > >> -1 >> > >> >> > >> Nope. I am thoroughly stumped and maybe can't make it any further >> > >> because of Windows. >> > >> >> > >> Downstream I am trying to use digest::digest to create a hash but >> > >> digest thinks we don't have permission because file.access is >> > >failing. >> > >> Any thoughts on how I can get file.access to return 0 for the >> > >remote.R >> > >> file? Any ideas? >> > >> >> > >> Thanks in advance, >> > >> >> > >> Sam >> > > >> > >__ >> > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >https://stat.ethz.ch/mailman/listinfo/r-help >> > >PLEASE do read the posting guide >> > >http://www.R-project.org/posting-guide.html >> > >and provide commented, minimal, self-contained, reproducible code. >> > >> > -- >> > Sent from my phone. Please excuse my brevity. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file.access returning -1 for a file on remote Windows drive.
Thanks Jeff. I am probably not explaining myself very well but my question under what circumstances would summary(file(remote_file, "rb"))$`can read` be different from: file.access(remote_file, 4) If my permissions were different across remote and local should that not be reflected in both of these functions? On Fri, Feb 28, 2020 at 2:37 PM Jeff Newmiller wrote: > > Dunno. They agree for me. Maybe look closer at all permissions via Windows > File Manager? > > On February 28, 2020 2:06:34 PM PST, Sam Albers > wrote: > >Some additional follow-up: > > > >> summary(file(remote_file, "rb"))$`can read` > >[1] "yes" > > > >> summary(file(local_file, "rb"))$`can read` > >[1] "yes" > > > >compared to: > > > >> file.access(local_file, 4) > >local.R > > 0 > > > >> file.access(remote_file, 4) > >remote.R > >-1 > > > >Can anyone think why file.access and file would be contradicting each > >other? > > > >Sam > > > >On Fri, Feb 28, 2020 at 10:47 AM Sam Albers > > wrote: > >> > >> Hi there, > >> > >> Looking for some help in diagnosing or developing a work around to a > >> problem I am having on a Windows machine. I am running R 3.6.2. > >> > >> I have two identical files, one stored locally and the other stored > >on > >> a network drive. > >> > >> For access: > >> > >> > file.access(local_file, 4) > >> local.R > >> 0 > >> > >> > file.access(remote_file, 4) > >> remote.R > >> -1 > >> > >> Also for file.info > >> > >> > file.info(local_file)$mode: > >> [1] "666" > >> > >> > file.info(remote_file)$mode: > >> [1] "666" > >> > >> Ok so I am access issues. Maybe they are ephemeral and I can change > >> the permissions: > >> > >> > Sys.chmod('remote.R', mode = '666') > >> > file.access(remote_file, 4) > >> remote.R > >> -1 > >> > >> Nope. I am thoroughly stumped and maybe can't make it any further > >> because of Windows. > >> > >> Downstream I am trying to use digest::digest to create a hash but > >> digest thinks we don't have permission because file.access is > >failing. > >> Any thoughts on how I can get file.access to return 0 for the > >remote.R > >> file? Any ideas? > >> > >> Thanks in advance, > >> > >> Sam > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file.access returning -1 for a file on remote Windows drive.
Some additional follow-up: > summary(file(remote_file, "rb"))$`can read` [1] "yes" > summary(file(local_file, "rb"))$`can read` [1] "yes" compared to: > file.access(local_file, 4) local.R 0 > file.access(remote_file, 4) remote.R -1 Can anyone think why file.access and file would be contradicting each other? Sam On Fri, Feb 28, 2020 at 10:47 AM Sam Albers wrote: > > Hi there, > > Looking for some help in diagnosing or developing a work around to a > problem I am having on a Windows machine. I am running R 3.6.2. > > I have two identical files, one stored locally and the other stored on > a network drive. > > For access: > > > file.access(local_file, 4) > local.R > 0 > > > file.access(remote_file, 4) > remote.R > -1 > > Also for file.info > > > file.info(local_file)$mode: > [1] "666" > > > file.info(remote_file)$mode: > [1] "666" > > Ok so I am access issues. Maybe they are ephemeral and I can change > the permissions: > > > Sys.chmod('remote.R', mode = '666') > > file.access(remote_file, 4) > remote.R > -1 > > Nope. I am thoroughly stumped and maybe can't make it any further > because of Windows. > > Downstream I am trying to use digest::digest to create a hash but > digest thinks we don't have permission because file.access is failing. > Any thoughts on how I can get file.access to return 0 for the remote.R > file? Any ideas? > > Thanks in advance, > > Sam __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] file.access returning -1 for a file on remote Windows drive.
Hi there, Looking for some help in diagnosing or developing a work around to a problem I am having on a Windows machine. I am running R 3.6.2. I have two identical files, one stored locally and the other stored on a network drive. For access: > file.access(local_file, 4) local.R 0 > file.access(remote_file, 4) remote.R -1 Also for file.info > file.info(local_file)$mode: [1] "666" > file.info(remote_file)$mode: [1] "666" Ok so I am access issues. Maybe they are ephemeral and I can change the permissions: > Sys.chmod('remote.R', mode = '666') > file.access(remote_file, 4) remote.R -1 Nope. I am thoroughly stumped and maybe can't make it any further because of Windows. Downstream I am trying to use digest::digest to create a hash but digest thinks we don't have permission because file.access is failing. Any thoughts on how I can get file.access to return 0 for the remote.R file? Any ideas? Thanks in advance, Sam __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Hello all, I am experience some issues with building a package that we are hosting on GitHub. The package itself is quite large. It is a data package with a bunch of spatial files stored as .rds files. The repo is located here: https://github.com/bcgov/bcmaps.rdata If we clone that package to local machine via: git clone https://github.com/bcgov/bcmaps.rdata The first oddity is that the package installs successfully using this: $ R CMD INSTALL "./bcmaps.rdata" But fails when I try to build the package: $ R CMD build "./bcmaps.rdata" * checking for file './bcmaps.rdata/DESCRIPTION' ... OK * preparing 'bcmaps.rdata': * checking DESCRIPTION meta-information ... OK * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * looking to see if a 'data/datalist' file should be added Warning in gzfile(file, "rb") : cannot open compressed file 'bcmaps.rdata', probable reason 'Permission denied' Error in gzfile(file, "rb") : cannot open the connection Execution halted The second oddity is that if I remove the . from the Package name in the DESCRIPTION file, the build proceeds smoothly: $ R CMD build "./bcmaps.rdata" * checking for file './bcmaps.rdata/DESCRIPTION' ... OK * preparing 'bcmapsrdata': * checking DESCRIPTION meta-information ... OK * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * looking to see if a 'data/datalist' file should be added * building 'bcmapsrdata_0.2.0.tar.gz' I am assuming that R CMD install builds the package internally so I find it confusing that I am not able to build it myself. Similarly confusing is the lack of a . in the package name indicative of anything? Does anyone have any idea what's going on here? Am I missing something obvious? Thanks in advance, Sam __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to return gmtoff from as.POSIXlt without converting date string to as.POSIXct first
Is it possible for someone to explain what is going on here? I would expect that `as.POSIXlt` would be able to accept `datestring` and return all the elements without having to convert it using `as.POSIXct` first. Do `as.POSIXlt` and `as.POSIXct` do different things with the `tz` arg? datestring <- "2017-01-01 12:00:00" foo <- as.POSIXlt(datestring, tz = "America/Moncton") foo [1] "2017-01-01 12:00:00 AST" foo$gmtoff [1] NA bar <- as.POSIXlt(as.POSIXct(datestring, tz = "America/Moncton")) bar [1] "2017-01-01 12:00:00 AST" bar$gmtoff [1] -14400 Thanks in advance, Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getOption() versus Sys.getenv
Hi there, I am trying to distinguish between getOption() and Sys.getenv(). My understanding is that these are both used to set values for variables. getOption is set something like this: option("var" = "A"). This can be placed in an .Rprofile or at the top of script. They are called like this getOption("var"). Environmental variables are set in the .Renviron file like this: "var" = "A" and called like this: Sys.getenv("var"). I've seen mention in the httr package documentation that credentials for APIs should be stored in this way. So my question is how does one decide which path is most appropriate? For example I am working on a package that has to query a database in almost every function call. I want to provide users an ability to skip having to specify that path in every function call. So in this case should I recommend users store the path as an option or as an environmental variable? If I am storing credentials in an .Renviron file then maybe I should store the path there as well? More generally the question is can anyone recommend some good discussion/documentation on this topic? Thanks in advance, Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Introducing the rsoi package
Hi folks, I am pleased to announce that the rsoi is now up on CRAN (v0.2.1 https://CRAN.R-project.org/package=rsoi). rsoi is a minimal but hopefully useful package to folks that are looking for easy access in R to Southern Oscillation Index and Oceanic Nino Index data. rsoi uses data collected by the National Oceanic Atmospheric Administration. Their data are usually updated monthly. Data are downloaded and formatted for use in R by the `download_enso()` function. El Nino, La Nina and neutral periods of ENSO are categorized by temperature anomalies from a 30 year base period in the Central South Pacific Ocean. Suggestions and contributions are very much welcome at the rsoi github page: https://github.com/boshek/rsoi -Sam ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R implementation of the Split and Merge Algorithm
Hello there, I am wondering if anyone on this list has ever encountered an implementation of the Split and Merge algorithm for R. This algorithm is reasonably well known and was first developed in this paper: https://www.computer.org/csdl/trans/tc/1974/08/01672634-abs.html >From that paper here is the essence of what the Split and Merge Algorithm accomplishes: "Given a set of points S = {xi,yi | i = 1,2... N} determine the minimum number n such that S is divided in n subsets S1, S2...Sn, where on each of them the data points are approximated by a polynomial of order at most m - 1 with an error norm less than a prespecified quantity e." There has been work on this in MatLab but so far I've not been able to find the approach in R. The `segmented` package comes close but as far as I understand, is not what I am looking for. I am happy to do this myself but wanted to check first to see if someone has already accomplished this. I know this is a stretch but I thought it wouldn't hurt to ask. Thanks in advance. Sam __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract an number from a character
Hello, I have a problem to which I am certain grep or gsub or that family of functions are the solution. However, I just can't seem to wrap my mind around exactly how. I have a dataframe below that has the dimensions of a net. I am given the data is the "W X H" format. For calculations I'll like to have each number as a separated column. I have been using ifelse(). However that seems like a poor solution to this problem especially once dataframes get larger and larger. So my question is, can anyone describe a way to extract the number from the variable y below is the example? I had also tried substr() but that fall apart with the 2.5 x 2.5 net. Thanks in advance! Sam Example: ##dataframe df<-data.frame(x=rnorm(10), y=c("7 x 3","7 x 3","7 x 3","7 x 3","7 x 3","2.5 x 2.5","2.5 x 2.5","2.5 x 2.5","2.5 x 2.5","2.5 x 2.5")) df$Width<-as.numeric(ifelse(df$y=="7 x 3","7","2.5")) df$Height<-as.numeric(ifelse(df$y=="7 x 3","3","2.5")) df$Width<-as.numeric(substr(df$y,5,5)) df$Width<-as.numeric(substr(df$y,5,5)) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Choosing columns by number
Hi all, This is a process question. How do folks efficiently identify column numbers in a dataframe without manually counting them. For example, if I want to choose columns from the iris dataframe I know of two options. I can do this: str(iris)'data.frame':150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... or this: names(iris)[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Neither option explicitly identifies the column number so that I can do something like this: iris[,c(2,4)] I feel like there must be a better way to do this so I wanted to ask the collective wisdom here what people do to accomplish this. Obviously this is a trivial example, but the issue really becomes problematic when you have a large dataframe. Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Choosing columns by number
Thierry's answer of: data.frame( seq_along(iris), colnames(iris) ) is exactly what I was looking for. Apologies for vagueness and HTML. It was unintended. Sam On Tue, Aug 25, 2015 at 8:32 AM, stephen sefick ssef...@gmail.com wrote: ?grep I think this will do what you want. #something like a - data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10)) toMatch - c(a, d) grep(paste(toMatch,collapse=|), colnames(a)) #to subset a[,grep(paste(toMatch,collapse=|), colnames(a))] On Tue, Aug 25, 2015 at 10:17 AM, Sam Albers tonightstheni...@gmail.com wrote: Hi all, This is a process question. How do folks efficiently identify column numbers in a dataframe without manually counting them. For example, if I want to choose columns from the iris dataframe I know of two options. I can do this: str(iris)'data.frame':150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... or this: names(iris)[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Neither option explicitly identifies the column number so that I can do something like this: iris[,c(2,4)] I feel like there must be a better way to do this so I wanted to ask the collective wisdom here what people do to accomplish this. Obviously this is a trivial example, but the issue really becomes problematic when you have a large dataframe. Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis A big computer, a complex algorithm and a long time does not equal science. -Robert Gentleman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automatically updating a plot from a regularly updated data file
Hi all, I have a question about using R in a way that may not be correct but I thought I would ask anyway. I have an instrument that outputs a text file with comma separated data. A new line is added to the file each time the instrument takes a new reading. Is there any way to configure R such that a script to generate a plot from said text file is re-run each time the file is modified (i.e. a new line is added). So basically update an exported plot each time a new line of data is collected. Is this type of thing possible in R? If not can anyone recommend some Windows (or Linux if need be) tools that could help me accomplish this preferably still utilizing R's plotting capabilites? I know that there are other tools that can do this all but nothing makes figures as nicely as R. I suppose more generally this is a question about way to automate processes with R to take advantage of R's functionality. Thanks in advance. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Working with and is data sets
Hello, I am having some trouble figuring out how to deal with data that has some observations that are detection limits and others that are integers denoted by greater and less than symbols. Ideally I would like a column that has the data as numbers then another column with values Measured or Limit or something like that. Data and further clarification below. ##Data zp-structure(list(variable = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c(ZP.1, ZP.3, ZP.5, ZP.7, ZP.9), class = factor), value = structure(c(3L, 4L, 2L, 1L, 7L, 8L, 6L, 5L, 12L, 11L, 10L, 9L, 15L, 16L, 14L, 13L, 19L, 18L, 17L, 9L), .Label = c(0.030, 1.2, 1160, 27.3, 0.025, 0.85, 1870, 45.7, 0.0020, 0.050, 31.9, 695, 0.0060, 0.20, 311, 8.84, 0.090, 12, 646), class = factor)), .Names = c(variable, value), row.names = c(NA, -20L), class = data.frame) ## As expected converting everything to numeric results is a slew of NA values zp$valuefactor-as.numeric(as.character(zp$value)) ## At this point I am unsure how to proceed. zp ### So I am just wondering how folks deal with this type of data. Any advice would be much appreciated as I am looking for something that will reliably works on a large data set. Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] symbols in a data frame
Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
Thanks for all the responses. It sometimes difficult to outline exactly what you need. These response were helpful to get there. Speaking to Bert's point a bit, I needed a column to identify where the symbol was used. If I knew more about R I think I might be embarrassed to post my solution to that problem but here is how I used Sarah's solution but still kept the info about detection limits. I'm sure there is a more elegant way: metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) metals$temp1-metals$Cedar.Creek metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) metals$temp2-metals$temp1==metals$Cedar.Creek metals$Detection-factor(ifelse(metals$temp2==TRUE,Measured,Limit)) metals[,c(1,2,5)] Thanks again! Sam On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter gunter.ber...@gene.com wrote: Well, ?grep and ?regex are clearly apropos here -- dealing with character data is an essential skill for handling input from diverse sources with various formatting conventions. I suggest you go through one of the many regular expression tutorials on the web to learn more. But this may not be the important issue here at all. If k means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. Otherwise ignore. ... and please post in plain text in future (as requested) as HTML can get garbled. Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi Sam, I'd take the similar tack of removing the instead. Note that if you import the data frame using the stringsAsFactors=FALSE argument, you don't need the first step. metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) R str(metals) 'data.frame':19 obs. of 2 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... Sarah On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code
[R] Define a variable on a non-standard year interval (Water Years)
Hello, I am trying to define a different interval for a year. In hydrology, a water year is defined as the period between October 1st and September 30 of the following year. I was wondering how I might do this in R. Say I have a data.frame like the following and I want to extract a variable with the water year specs as defined above: df-data.frame(Date=seq(as.Date(2000/10/1), as.Date(2003/9/30), days)) ## Extract the normal year df$year - factor(format(as.Date(df$Date), %Y)) So the question is how might I define a variable that extends from October 1st to September 30th rather than the normal January 1st to December 31st? Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drop values of one dataframe based on the value of another
Hello all, Let me first say that this isn't a question about outliers. I am using the outlier function from the outliers package but I am using it only because it is a convenient wrapper to determine values that have the largest difference between itself and the sample mean. Where I am running into problems is that I am several groups where I want to calculate the outlier within that group. Then I want to create two data.frames, one with the outliers and the other those values dropped. And both dataframes need to include additional columns of data present before the subset. The first case is easy but I can't seem to figure out how to determine the next. So for example: library(plyr) library(outliers) ## A dataframe with some obviously extreme values dfa - data.frame(Mins=runif(15, 0,1), Fac=rep(c(Test1,Test2,Test3), each=5)) df.out - data.frame(Mins=c(3,4,5), Fac=c(Test1,Test2,Test3)) df - rbind(dfa, df.out) df$Meta - runif(18,4,5); df ## Dataframe with the extreme value To_remove-ddply(df, c(Fac), subset, Mins==outlier(Mins)); To_remove So now my question is how can I use this dataframe (To_remove) to remove all these values from df and create a new dataframe. Given a df (To_remove) with a list of values, how can I choose all values of another dataframe (df) that aren't those values in the To_remove dataframe?. There is a rm.outliers function in this same package but I having trouble with that and would like to try another approach. Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract time from irregular date and time data records
Hello, I am having a problem making use of some data outputted from an instrument in a somewhat weird format. The instrument outputs two columns - one called JulianDay.Hour and one called Minutes.Seconds. I would like to convert these columns into a single column with a time. So I was using substr() and paste to extract that info. This works fine for the JulianDay.Hour column as there are always five characters in an entry. However in the Minutes.Seconds column any leading zeroes are dropped by the instrument. So if I use substr() to selected based on character position I end up with incorrect times. So for example: ## df df-structure(list(Temperature = c(18.63, 18.4, 18.18, 16.99, 16.86, 11.39, 11.39, 11.37, 11.37, 11.37, 11.37), JulianDay.Hour = c(22610L, 22610L, 22610L, 22610L, 22610L, 22611L, 22611L, 22611L, 22611L, 22611L, 22611L), Minutes.Seconds = c(4608L, 4611L, 4614L, 4638L, 4641L, 141L, 144L, 208L, 211L, 214L, 238L)), .Names = c(Temperature, JulianDay.Hour, Minutes.Seconds), row.names = c(3176L, 3177L, 3178L, 3179L, 3180L, 3079L, 3080L, 3054L, 3055L, 3056L, 3057L ), class = data.frame) ## Extraction method for times df$Time.Incorrect - paste(substr(df$JulianDay.Hour, 4,5),:, substr(df$Minutes.Seconds,1,2),:, substr(df$Minutes.Seconds,3,4), sep=) ## Manual generation of desired time df$Time.Correct - c(10:46:08, 10:46:11,10:46:14,10:46:38,10:46:41,11:01:41,11:01:44,11:02:08,11:02:11,11:02:14,11:02:38) ## Note the absence of leading zeroes in the Minutes.Seconds leading to incomplete time records (df$Time.Incorrect) df ## So can anyone recommend a good way to extract a time from variables like these two? Basically this is subsetting a string issue. Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract time from irregular date and time data records
Apologies. I was searching using the wrong search terms. This is clearly a string issue. I've added the solution below. Sam On Tue, May 29, 2012 at 11:39 AM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I am having a problem making use of some data outputted from an instrument in a somewhat weird format. The instrument outputs two columns - one called JulianDay.Hour and one called Minutes.Seconds. I would like to convert these columns into a single column with a time. So I was using substr() and paste to extract that info. This works fine for the JulianDay.Hour column as there are always five characters in an entry. However in the Minutes.Seconds column any leading zeroes are dropped by the instrument. So if I use substr() to selected based on character position I end up with incorrect times. So for example: ## df df-structure(list(Temperature = c(18.63, 18.4, 18.18, 16.99, 16.86, 11.39, 11.39, 11.37, 11.37, 11.37, 11.37), JulianDay.Hour = c(22610L, 22610L, 22610L, 22610L, 22610L, 22611L, 22611L, 22611L, 22611L, 22611L, 22611L), Minutes.Seconds = c(4608L, 4611L, 4614L, 4638L, 4641L, 141L, 144L, 208L, 211L, 214L, 238L)), .Names = c(Temperature, JulianDay.Hour, Minutes.Seconds), row.names = c(3176L, 3177L, 3178L, 3179L, 3180L, 3079L, 3080L, 3054L, 3055L, 3056L, 3057L ), class = data.frame) ## Extraction method for times df$Time.Incorrect - paste(substr(df$JulianDay.Hour, 4,5),:, substr(df$Minutes.Seconds,1,2),:, substr(df$Minutes.Seconds,3,4), sep=) ## Addition of leading zeroes df$Time.Correct - paste(substr(df$JulianDay.Hour, 4,5),:, substr(formatC(df$Minutes.Seconds, width = 4, format = d, flag = 0),1,2),:, substr(formatC(df$Minutes.Seconds, width = 4, format = d, flag = 0),3,4), sep=) ## Note the absence of leading zeroes in the Minutes.Seconds leading to incomplete time records (df$Time.Incorrect) df ## So can anyone recommend a good way to extract a time from variables like these two? Basically this is subsetting a string issue. Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Displayed Date Format in Plot Title.
Hello all, I can't seem to figure out how to format a date as a title. I have something like this: plot(x=1:10, y=runif(10,1,18), main=paste(as.Date(2011-05-03, format=%Y-%m-%d))) ## When I would really like this plot(x=1:10, y=runif(10,1,18), main=paste(May-03-2011)) ## I thought to try this but that produces an NA. plot(x=1:10, y=runif(10,1,18), main=paste(as.Date(2011-05-03, format=%Y-%b-%d))) How do folks usually accomplish something like this? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert day of year back into a date format.
Hello, I am having trouble figuring out how to convert a Day of Year integer back into a Date format. For example I have the following: date - c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07', '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15', '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23') ## this is then converted into a number corresponding to the day of the year like so: dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1 ## Now my question is how do I get back to a date format (obviously omitting the year). ## The end result is that I'd like to be able to have axis labels as something like Month-Day or just Month ## instead of just an integers which isn't always intuitive for people but I can't seem to figure out how to tell R ## to recognize an integer as a date. Any suggestions? Many thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory limits for MDSplot in randomForest package
Hello, I am struggling to produce an MDS plot using the randomForest package with a moderately large data set. My data set has one categorical response variables, 7 predictor variables and just under 19000 observations. That means my proximity matrix is approximately 133000 by 133000 which is quite large. To train a random forest on this large a dataset I have to use my institutions high performance computer. Using this setup I was able to train a randomForest with the proximity argument set to TRUE. At this point I wanted to construct an MDSplot using the following: MDSplot(nech.rf, nech.d$pd.fl, palette=c(1,2,3), pch=as.numeric(nech.d$pd.fl)) where nech.rf is the randomForest object and nech.d$pd.fl is the classification factor. Now with the architecture listed below, I've been waiting for approximately 2 days for this to run. My issue is that I am not sure if this will ever run. Can anyone recommend a way to tweak the MDSplot function to run a little faster? I tried changing the cmdscale arguments (i.e. eigenvalues) within the MDSplot function a little but that didn't seem to have any effect of the overall running time using a much smaller data set. Or even if someone could comment whether I am dreaming that this will actually ever run? This is probably the best computer that I will have access to so I was hoping that somehow I could get this to run. I was just hoping that someone reading the list might have some experience with randomForests and using large datasets and might be able to comment on my situation. Below the architecture information I have constructed a dummy example to illustrate what I am doing but given the nature of the problem, this doesn't completely reflect my situation. Any help would be much appreciated! Thanks! Sam Computer specs and sessionInfo() OS: Suse Linux Memory: 64 GB Processors: Intel Itanium 2, 64 x 1500 MHz And: sessionInfo() R version 2.6.2 (2008-02-08) ia64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] randomForest_4.6-6 loaded via a namespace (and not attached): [1] rcompgen_0.1-17 ### # Dummy Example ### require(randomForest) set.seed(17) ## Number of points x - 10 df - rbind( data.frame(var1=runif(x, 10, 50), var2=runif(x, 2, 7), var3=runif(x, 0.2, 0.35), var4=runif(x, 1, 2), var5=runif(x, 5, 8), var6=runif(x, 1, 2), var7=runif(x, 5, 8), cls=factor(CLASS-2) ) , data.frame(var1=runif(x, 10, 50), var2=runif(x, -3, 3), var3=runif(x, 0.1, 0.25), var4=runif(x, 1, 2), var5=runif(x, 5, 8), var6=runif(x, 1, 2), var7=runif(x, 5, 8), cls=factor(CLASS-1) ) ) df.rf-randomForest(y=df[,8],x=df[,1:7], proximity=TRUE, importance=TRUE) MDSplot(df.rf, df$cls, k=2, palette=c(1,2,3,4), pch=as.numeric(df$cls)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lag based on Date objects with non-consecutive values
On Mon, Mar 19, 2012 at 9:11 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Mar 19, 2012 at 8:03 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello R-ers, I just wanted to update this post. I've made some progress on this but am still not quite where I need to be. I feel like I am close so I just wanted to share my work so far. Try this: Lines - Date Dis1 1967-06-05 1.146405 1967-06-06 9.732887 1967-06-07 -9.279462 1967-06-08 7.856646 1967-06-09 5.494370 1967-06-15 5.070176 1967-06-16 3.847314 1967-06-17 -5.243094 1967-06-18 9.396560 1967-06-19 4.112792 # read in data library(zoo) z - read.zoo(text = Lines, header = TRUE) # process it g - seq(start(z), end(z), day) # all days zg - merge(z, zoo(, g)) # fill in missing days lag(zg, 0:-2)[time(z)] Thanks Gabor. I was, however, hoping for base R solution. I think I've got it and I will post the result here just to be complete. A big thanks to Brain Cade for an off-list suggestion. set.seed(32) df1-data.frame( Date=seq(as.Date(1967-06-05,%Y-%m-%d),by=day, length=5), Dis1=rnorm(5, 1,10) ) df2-data.frame( Date=seq(as.Date(1967-06-15,%Y-%m-%d),by=day, length=5), Dis1=rnorm(5, 1,10) ) df - rbind(df1,df2) df$Dis2 - df$Dis1*2 lag.base - function (lag.date, lag.by, lag.var) { time_dif - as.numeric(lag.date)-c(rep(NA,lag.by), head(lag.date, -lag.by)) lag.tmp -c(rep(NA,lag.by), head(lag.var, -lag.by)) lv - ifelse(time_dif=lag.by,lag.tmp,NA) return(lv) } df$lag - lag.base(lag.date=df$Date, lag.var=df$Dis1, lag.by=3);df df$lag2 - lag.base(lag.date=df$Date, lag.var=df$Dis2, lag.by=3);df -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lag based on Date objects with non-consecutive values
Hello all, I need to figure out a way to lag a variable in by a number of days without using the zoo package. I need to use a remote R connection that doesn't have the zoo package installed and is unwilling to do so. So that is, I want a function where I can specify the number of days to lag a variable against a Date formatted column. That is relatively easy to do. The problem arises when I don't have consecutive dates. I can't seem to figure out a way to insert an NA when there is non-consecutive date. So for example: ## A dataframe with non-consecutive dates set.seed(32) df1-data.frame( Date=seq(as.Date(1967-06-05,%Y-%m-%d),by=day, length=5), Dis1=rnorm(5, 1,10) ) df2-data.frame( Date=seq(as.Date(1967-07-05,%Y-%m-%d),by=day, length=10), Dis1=rnorm(5, 1,10) ) df - rbind(df1,df2); df ## A function to lag the variable by a specified number of days lag.day - function (lag.by, data) { c(rep(NA,lag.by), head(data$Dis1, -lag.by)) } ## Using the function df$lag1 - lag.day(lag.by=1, data=df); df ## returns this data frame Date Dis1 lag1 1 1967-06-05 1.146405NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 5.494370 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 ## When really what I would like is something like this: Date Dis1 lag1 1 1967-06-05 1.146405NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 NA 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 So can anyone recommend a way (either using my function or any other approaches) that I might be able to consistently lag values based on a lag.by value and consecutive dates? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lag based on Date objects with non-consecutive values
Hello R-ers, I just wanted to update this post. I've made some progress on this but am still not quite where I need to be. I feel like I am close so I just wanted to share my work so far. Thanks in advance! Sam On Mon, Mar 19, 2012 at 1:10 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello all, I need to figure out a way to lag a variable in by a number of days without using the zoo package. I need to use a remote R connection that doesn't have the zoo package installed and is unwilling to do so. So that is, I want a function where I can specify the number of days to lag a variable against a Date formatted column. That is relatively easy to do. The problem arises when I don't have consecutive dates. I can't seem to figure out a way to insert an NA when there is non-consecutive date. So for example: ## A dataframe with non-consecutive dates set.seed(32) df1-data.frame( Date=seq(as.Date(1967-06-05,%Y-%m-%d),by=day, length=5), Dis1=rnorm(5, 1,10) ) df2-data.frame( Date=seq(as.Date(1967-07-05,%Y-%m-%d),by=day, length=10), Dis1=rnorm(5, 1,10) ) df - rbind(df1,df2); df ## A function to lag the variable by a specified number of days lag.day - function (lag.by, data) { c(rep(NA,lag.by), head(data$Dis1, -lag.by)) } ## Using the function df$lag1 - lag.day(lag.by=1, data=df); df ## returns this data frame Date Dis1 lag1 1 1967-06-05 1.146405 NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 5.494370 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 ## When really what I would like is something like this: Date Dis1 lag1 1 1967-06-05 1.146405 NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 NA 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 I've now gotten this far but have realized that my approach is flawed because if I increase the lag.by value to anything great than 1, an NA is no longer entered into the correct position. So here is my updated effort: lag.by - function (data, lag.by) { tmp-data.frame( ## Difference in days between dates diff=c(diff(data$Date), NA), lag.tmp=c(rep(NA,lag.by), head(data$Dis1, -lag.by)) ) ## Diff calculates difference to next row so all the difference ## values need to be lagged ifelse(c(rep(NA,lag.by), head(tmp$diff, -lag.by))=1,tmp$lag.tmp,NA) } df$lag - lag.by(df, lag.by=1) df$lag2 - lag.by(df, lag.by=2); df Date Dis1 lag lag2 1 1967-06-05 1.146405NANA 2 1967-06-06 9.732887 1.146405NA 3 1967-06-07 -9.279462 9.732887 1.146405 4 1967-06-08 7.856646 -9.279462 9.732887 5 1967-06-09 5.494370 7.856646 -9.279462 6 1967-06-15 5.070176NA 7.856646 - Need this to be a NA 7 1967-06-16 3.847314 5.070176NA 8 1967-06-17 -5.243094 3.847314 5.070176 9 1967-06-18 9.396560 -5.243094 3.847314 10 1967-06-19 4.112792 9.396560 -5.243094 So, I should have NA's in the lag2 column at rows 6 and 7. Any help or thoughts would be much appreciated here. So can anyone recommend a way (either using my function or any other approaches) that I might be able to consistently lag values based on a lag.by value and consecutive dates? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strategies to deal with unbalanced classification data in randomForest
Hello all, I have become somewhat confused with options available for dealing with a highly unbalanced data set (1 in one class, 50 in the other). As a summary I am unsure: a) if I am perform the two class weighting methods properly, b) if the data are too unbalanced and that this type of analysis is appropriate and c) if there is any interaction between the weighting for class imbalances and number of trees in a forest. An example will illustrate this best. Say I have a data set like the following: df - rbind( data.frame(var1=runif(1, 10, 50), var2=runif(1, -3, 3), var3=runif(1, 0.1, 0.25), cls=factor(CLASS-1) ), data.frame(var1=runif(50, 10, 50), var2=runif(50, 2, 7), var3=runif(50, 0.2, 0.35), cls=factor(CLASS-2) ) ) ## Where the response vector is highly imbalanced like so: summary(df$cls) library(randomForest) set.seed(17) ## Now the obviously an extreme case but I am wondering what the options are to deal with something like this. ## The problem with this situation manifests itself when I try to train a random forest ## without accounting for this imbalance df.rf-randomForest(cls~var1+var2+var3, data=df,importance=TRUE) ## Now one option is to down sample the majority variable. However, I can seem to find exactly ## how to do this. Does this seem correct? df.rf.downsamp -randomForest(cls~var1+var2+var3, data=df,sampsize=c(50,50), importance=TRUE) ## 50 being the number of observations in the minority variable ## The other option which there seems to be some confusion over is establish some class weights ## to balance the error rate. This approach I've mostly drawn from here: ## http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm#balance ## This might not be appropriate, however, as of September it looks like Breiman method wasn't used in R df.rf.weights-randomForest(cls~var1+var2+var3, data=df,classwt=c(1, 600), importance=TRUE) ## Nevertheless, what I am concerned about is the effect of an unbalanced data set has on my randomForest model ## For example: par(mfrow=c(1,3)) plot(df.rf) plot(df.rf.downsamp) plot(df.rf.weights) presents three very different scenarios and I having trouble resolving the issues I mentioned above. I am extremely grateful for all the work that has been done on randomForests in R up to this point. I was hoping that someone, with more experience, might be able to advise what the best strategy is to deal with this problem. Which of these approaches are best and am I using them right? Thanks so much in advance for any help. Sam sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] ggplot2_0.8.9 plyr_1.7.1tools_2.14.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combining month and year into a single variable
Hello all, ## I am trying to convert some year and month data into a single variable that has a date format so I can plot a proper x axis. ## I've made a few tries at this and search around but I haven't found anything. I am looking for something of the format %Y-%m ## A data.frame df - data.frame(x=rnorm(36, 1, 10), month=rep(1:12, each = 3), year=c(2000,2001,2002)) ## One option. I'm not totally sure why this doesn't work df$Date - as.Date(paste(df$year, df$month,sep=-), %Y-%m) ## This adds the monthly total to the day rather than the MOnday and this option ## is messy anyway as I am adding a day to the variable or = format(ISOdate(df$year-1, 12, 31), %Y-%m-%d) df$Date2 = as.Date(or) + df$month ## Just a plot to illustrate this. plot(x~Date2, data=df) ## Any thoughts on how I can combine the year and the month into a form that is useful for plotting? Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting for the ten highest values by group in a dataframe
Hello, I am looking for a way to subset a data frame by choosing the top ten maximum values from that dataframe. As well this occurs within some factor levels. ## I've used plyr here but I'm not married to this approach require(plyr) ## I've created a data.frame with two groups and then a id variable (y) df - data.frame(x=rnorm(400, mean=20), y=1:400, z=c(A,B)) ## So using ddply I can find the highest value of x df.max1 - ddply(df, c(z), subset, x==sort(x, TRUE)[1]) ## Or the 2nd highest value df.max2 - ddply(df, c(z), subset, x==sort(x, TRUE)[2]) ## And so on but when I try to make a series of numbers like so ## to get the top ten values, I don't get a warning message but ## two values that don't really make sense to me df.max - ddply(df, c(z), subset, x==sort(x, TRUE)[1:10]) ## So no error message when I use the method above, which is clearly wrong. ## But I really am not sure how to diagnose the problem. ## Can anyone suggest a way to subset a data.frame with groups to select the top ten max values in that data.frame for each group? ## Thanks so much in advance? Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Establishing groups using something other than ifelse()
Hello all, This is one of those Is there a better way to do this questions. Say I have a dataframe (df) with a grouping variable (z). This is my base data. Now I know that there is a higher order level of grouping that exist for my group variable. So what I want to do is create a new column that express that higher order level of grouping based on values in the sub-group (z in this case). In the past I have used ifelse() but this tends to get fairly redundant and messy with a large amount of sub-groupings (z). I've created a sample dataset below. Can anyone recommend a better way of achieving what I am currently achieving with ifelse()? A long series of ifelse statements makes me think that there is something better for this. ## Dataframe creation df - data.frame(x=runif(36, 0, 120), y=runif(36, 0, 120), z=factor(c(A1,A1,A2,A2,B1,B1,B2,B2,C1,C,C2,C2)) ) ## Current method is grouping df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A, ifelse(df$z==B1, B, ifelse(df$z==B2, B, C) So any suggestions? Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Establishing groups using something other than ifelse()
On Thu, Jan 19, 2012 at 3:34 PM, Justin Haynes jto...@gmail.com wrote: how bout levels(df$z)[grep('A',levels(df$z))] - 'A' levels(df$z)[grep('B',levels(df$z))] - 'B' levels(df$z)[grep('C',levels(df$z))] - 'C' does that do what you're wanting? Shoot. Might have made my example confusing, sorry. First of all I want to retain the information in the sub.group (z) here but more importantly, I used A1 and A2 to illustrate the grouping under the larger group A but the pattern of the group names is irrelevant for my purposes. So to modify the example I wanted to achieve this without pattern matching like the above: df - data.frame(x=runif(36, 0, 120), y=runif(36, 0, 120), z=factor(c(G1,G1,G2,G2,H1,H1,H2,H2,I1,I1,I2,I2)) ) df$Big.Group - with(df, ifelse(df$z==G1,A, ifelse(df$z==G2,A, ifelse(df$z==H1, B, ifelse(df$z==H2, B, C) Thanks for the response! Sam On Thu, Jan 19, 2012 at 3:05 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello all, This is one of those Is there a better way to do this questions. Say I have a dataframe (df) with a grouping variable (z). This is my base data. Now I know that there is a higher order level of grouping that exist for my group variable. So what I want to do is create a new column that express that higher order level of grouping based on values in the sub-group (z in this case). In the past I have used ifelse() but this tends to get fairly redundant and messy with a large amount of sub-groupings (z). I've created a sample dataset below. Can anyone recommend a better way of achieving what I am currently achieving with ifelse()? A long series of ifelse statements makes me think that there is something better for this. ## Dataframe creation df - data.frame(x=runif(36, 0, 120), y=runif(36, 0, 120), z=factor(c(A1,A1,A2,A2,B1,B1,B2,B2,C1,C,C2,C2)) ) ## Current method is grouping df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A, ifelse(df$z==B1, B, ifelse(df$z==B2, B, C) So any suggestions? Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Establishing groups using something other than ifelse()
That is great Jorge. Thanks! Just to complete this, I will include using record with this example: df$Big.Group2 - recode(df$z, c('G1','G2')='A'; c('H1','H2')='B'; else='C') Sam On Thu, Jan 19, 2012 at 3:49 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Hi Sam, Check the examples in require(car) ?recode HTH, Jorge.- On Thu, Jan 19, 2012 at 6:05 PM, Sam Albers wrote: Hello all, This is one of those Is there a better way to do this questions. Say I have a dataframe (df) with a grouping variable (z). This is my base data. Now I know that there is a higher order level of grouping that exist for my group variable. So what I want to do is create a new column that express that higher order level of grouping based on values in the sub-group (z in this case). In the past I have used ifelse() but this tends to get fairly redundant and messy with a large amount of sub-groupings (z). I've created a sample dataset below. Can anyone recommend a better way of achieving what I am currently achieving with ifelse()? A long series of ifelse statements makes me think that there is something better for this. ## Dataframe creation df - data.frame(x=runif(36, 0, 120), y=runif(36, 0, 120), z=factor(c(A1,A1,A2,A2,B1,B1,B2,B2,C1,C,C2,C2)) ) ## Current method is grouping df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A, ifelse(df$z==B1, B, ifelse(df$z==B2, B, C) So any suggestions? Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating rolling mean by group
Thanks for getting me on the right path Gabor! I have one outstanding issue though. On Mon, Jan 9, 2012 at 4:21 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Jan 9, 2012 at 6:39 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello all, I am trying to determine how to calculate rolling means in R using a grouping variable. Say I have a dataframe like so: dat1 - data.frame(x = runif(2190, 0, 125), year=rep(1995:2000, each=365), jday=1:365, site=here) dat2 - data.frame(x = runif(2190, 0, 200), year=rep(1995:2000, each=365), jday=1:365, site=there) dat - rbind(dat1,dat2) ## What I would like to do is calculate a rolling 7 day mean separately for each site. I have looked at both ## rollmean() in the zoo package and running.mean() in the igraph package but neither seem to have led ## me to calculating a rolling mean by group. My first thought was to use the plyr package but I am confused ## by this output: library(plyr) library(zoo) ddply(dat, c(site), function(df) return(c(roll=rollmean(df$x, 7 ## Can anyone recommend a better way to do this or shed some light on this output? Using dat in the question, try this: library(zoo) z - read.zoo(dat, index = 2:3, split = 4, format = %Y %j) zz - rollmean(z, 7) The result, zz, is a multivariate zoo series with one column per group. Using the zoo approach works well except that an wrinkle in my dataset not reflected in the sample data caused some problems. I am actually dealing with a situation where there is an unequal number of observations in each group like the below data set library(zoo) dat1 - data.frame(x = runif(2190, 0, 125), year=rep(1995:2000, each=365), jday=1:365, site=here) dat2 - data.frame(x = runif(4380, 0, 200), year=rep(1989:2000, each=365), jday=1:365, site=there) dat - rbind(dat1,dat2) ## When I use read.zoo everything is read in fine z - read.zoo(dat, index = 2:3, split = 4, format = %Y %j) ## But when I use rollmean to get a 7 day average for both the 'here' and 'there' columns only the 'there' column 7 day ## average is calculated zz - rollmean(z, 7) Any thoughts on how I can then calculate a rolling mean on groups where there is an unequal number of observations in each group? Thanks for the previous post and in advance. Sam -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating rolling mean by group
Hello all, I am trying to determine how to calculate rolling means in R using a grouping variable. Say I have a dataframe like so: dat1 - data.frame(x = runif(2190, 0, 125), year=rep(1995:2000, each=365), jday=1:365, site=here) dat2 - data.frame(x = runif(2190, 0, 200), year=rep(1995:2000, each=365), jday=1:365, site=there) dat - rbind(dat1,dat2) ## What I would like to do is calculate a rolling 7 day mean separately for each site. I have looked at both ## rollmean() in the zoo package and running.mean() in the igraph package but neither seem to have led ## me to calculating a rolling mean by group. My first thought was to use the plyr package but I am confused ## by this output: library(plyr) library(zoo) ddply(dat, c(site), function(df) return(c(roll=rollmean(df$x, 7 ## Can anyone recommend a better way to do this or shed some light on this output? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specifying argument values in a function
Hello all, I am trying write a fairly simple function that provide a quick way to calculate several distributions for a data set. I am trying to provide a function that has a argument that specifies which distribution is outputted (here norm or cumu). I also have a melt argument but that seems to be working fine. I have been able to get my function working well for just one distribution but when I add another and try to add a dist.type argument (with potential values cumu and norm), I get an error message (see below). I am having trouble finding material that explains how to add an argument that isn't a TRUE/FALSE situation. Could any explain what I am doing wrong with the second distribution specifying argument? I apologize as I am sure this is a simple problem but I am just getting my feet wet with this type of thing in R and am having a little trouble diagnosing the problem. #Example below: library(reshape) dat - data.frame(`v1`=runif(6, 0, 125), `v2`=runif(6, 50, 75), `v3`=runif(6, 0, 100), `v4`=runif(6, 0, 200) ) my.norm - function(x, melt=TRUE) { #Normalized distribution N.dist - as.data.frame(sapply(1:length(x), function(i) (x[[i]]/rowSums(x[,c(1:4)]))*100 )) norm.melt - melt.data.frame(N.dist) if (melt == TRUE) ##Default is a melted data frame return(norm.melt) if (melt == FALSE) return(N.dist) } ## So this single distribution function works fine my.norm(dat, melt=TRUE) my.fun - function(x, melt=TRUE, dist.type=norm) { #Normalized distribution N.dist - as.data.frame(sapply(1:length(x), function(i) (x[[i]]/rowSums(x[,c(1:4)]))*100 )) norm.melt - melt.data.frame(N.dist) if (melt == TRUE dist.type == norm) ##Default is a melted data frame return(norm.melt) if (melt == FALSE dist.type == norm) return(N.dist) ## Cumulative distribution C.dist - as.data.frame(t(apply(N.dist, 1, cumsum))) cumu.melt - melt.data.frame(C.dist) if (melt == TRUE dist.type == cumu) ##Default is a melted data frame return(cumu.melt) if (melt == FALSE dist.type == cumu) return(C.dist) } ## But this function when used yields two different error messages depending on the value used for dist.type: my.fun(dat, melt=TRUE, dist.type = norm) ## Error in dist.type == norm : ## comparison (1) is possible only for atomic and list types my.fun(dat, melt=TRUE, dist.type = cumu) ## Error in my.fun(dat, dist.type = cumu) : object 'cumu' not found Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Utilizing column names to multiply over all columns
## Hello there, ## I have an issue where I need to use the value of column names to multiply with the individual values in a column and I have many columns to do this over. I have data like this where the column names are numbers: mydf - data.frame(`2.72`=runif(20, 0, 125), `3.2`=runif(20, 50, 75), `3.78`=runif(20, 0, 100), yy= head(letters,2), check.names=FALSE) ## I had been doing something like this but this seems rather tedious and clunky. These append the correct values to my dataframe but is there any way that I can do this generally over each column, also using each column name as the multiplier for that column? mydf$vd2.72 - mydf$'2.72'*2.72 mydf$vd3.2 - mydf$'3.2'*3.2 mydf$vd3.78 - mydf$'3.78'*3.78 ## So can I get to this point with a more generalized solution? For now, I would also prefer to keep this in wide format and I am aware (thanks to the list!) that I could use melt() to get the values I want. mydf ## Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Utilizing column names to multiply over all columns
Thanks for the response David. On Tue, Aug 16, 2011 at 1:13 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 16, 2011, at 3:37 PM, Sam Albers wrote: ## Hello there, ## I have an issue where I need to use the value of column names to multiply with the individual values in a column and I have many columns to do this over. I have data like this where the column names are numbers: mydf - data.frame(`2.72`=runif(20, 0, 125), `3.2`=runif(20, 50, 75), `3.78`=runif(20, 0, 100), yy= head(letters,2), check.names=FALSE) mydf 2.72 3.2 3.78 yy 1 31.07874 74.48555 89.306591 a 2 123.68290 74.30030 11.943576 b 3 89.64024 68.26378 97.627211 a 4 81.46604 59.79607 91.005217 b ## I had been doing something like this but this seems rather tedious and clunky. These append the correct values to my dataframe but is there any way that I can do this generally over each column, also using each column name as the multiplier for that column? mydf$vd2.72 - mydf$'2.72'*2.72 mydf$vd3.2 - mydf$'3.2'*3.2 mydf$vd3.78 - mydf$'3.78'*3.78 ## So can I get to this point with a more generalized solution? For now, I would also prefer to keep this in wide format and I am aware (thanks to the list!) that I could use melt() to get the values I want. You will get the warning that last last column is not going right but otherwise this returns what you asked for: sapply(1:length(mydf), function(i) mydf[[i]]* as.numeric(names(mydf)[i]) ) This suits my purposes well with a couple slight modifications: ## I made this into a data.frame so I could append it to the other one (mydf) mydf.vd - as.data.frame(sapply(1:length(mydf), function(i) mydf[[i]]*as.numeric(names(mydf)[i]) )) ## I also renamed all the columns accordingly. colnames(mydf.vd) - paste(vd,names(mydf), sep=) ##Then added the new data.frame to the old one. out - cbind(mydf,mydf.vd) Thanks for your help with this! (Also thanks Bert for the other helpful suggestion) [,1] [,2] [,3] [,4] [1,] 84.53416 238.3538 337.57891 NA [2,] 336.41748 237.7610 45.14672 NA [3,] 243.82145 218.4441 369.03086 NA [4,] 221.58762 191.3474 343.99972 NA [5,] 81.78911 213.0770 97.90072 NA snipped remainder -- David Winsemius, MD West Hartford, CT Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Alternative and more efficient data manipulation
Hello list, ## I have been doing the following process to convert data from one form to another for a while but it occurs to me that there is probably an easier way to do this. I am often given data that have column names which are actually data and I much prefer dealing with data that are sorted by factors. So to convert the columns I have previously made use of make.groups() in the lattice package which works completely satisfactorily. However, it is a bit clunky for what I am using it for and I have to carry the other variables forward. Can anyone suggest a better way of converting data like this? library(lattice) dat - data.frame(`x1`=runif(6, 0, 125), `x2`=runif(6, 50, 75), `x3`=runif(6, 0, 100), `x4`=runif(6, 0, 200), date = as.Date(c(2009-09-25,2009-09-28,2009-10-02,2009-10-07,2009-10-15,2009-10-21)), yy= head(letters,2), check.names=FALSE) ## Here is an example of the type of data that NEED converting dat dat.group - with(dat, make.groups(x1,x2,x3,x4)) ## Carrying the other variables forward dat.group$date - dat$date dat.group$yy - dat$yy ## Here is an example of what I would like the data to look like dat.group ## The point of this all is so that I can used the data in a manner such as this: with(dat.group, xyplot(data ~ as.numeric(substr(which, 2,2))|yy, groups=date)) ## So I suppose what I am asking is if there is a more efficient way of doing this? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standardizing the number of records by group
Hello R-help, I have some data collected at regular intervals but for a varying length of time. I would like to standardize the length of time collected and I can do this by standardizing the number of records I use for my analysis. Take for example the data set below: library(plyr) x - runif(18,10, 15) df - as.data.frame(x) df$fac - factor(c(Test1,Test1,Test1,Test1,Test1,Test1,Test1, Test2,Test2,Test2,Test2,Test2, Test3,Test3,Test3,Test3,Test3,Test3)) ## Here is where I would like to standardize the number of records df.avg - ddply(df, c(fac), function(df) return(c(x.avg=mean(df$x), n=length(df$x df.avg Here there is a different number of records for each factor level. Say I only wanted to use the first 4 records at each factor level. Prior to taking the mean of these values how might I drop all the records after 4? Can anyone suggest a good way to do this? I am using R 2.12.1 and Emacs + ESS. Thanks so much in advance. Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Italicized greek symbols in PDF plots
Many thanks Dr. Ripley. I was made aware that the problem for me was that I was sending HTML in my email message. Transgression noted. On Wed, Jun 29, 2011 at 10:17 PM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote: On Wed, 29 Jun 2011, Sam Albers wrote: I know that this has been asked before in other variations but I just can't seem to figure out my particular application from previous posts. My apologies if I have missed the answer to this question somewhere in the archives. I have indeed looked. I am running Ubuntu 11.04, with R 2.12.1 and ESS+Emacs. For journal formatting requirements, I need to italicize all the greek letters in any plot. This is reasonably straight forward to do and I accomplished this task like so: library(ggplot2) label_parseall - function(variable, value) { plyr::llply(value, function(x) parse(text = paste(x))) } dat - data.frame(x = runif(270, 0, 125), z = rep(LETTERS[1:3], each = 3), yy = 1:9, stringsAsFactors = TRUE) #unicode italicized delta dat$gltr = factor(c(italic(\u03b4)^14*N**,italic(\u03b4)^15*N,** italic(\u03b4)^13*C)) #So this is what I want my plot to look like: plt - ggplot(data = dat, aes(x = yy, y = x)) + geom_point(aes(x= yy, y=x, shape=z, group=z), alpha=0.4,position = position_dodge(width = 0.8)) + facet_grid(gltr~.,labeller= label_parseall, scales=free_y) plt #So then I exported my plot as a PDF like so: pdf(Times_regular.pdf, family='Times') plt dev.off() #The problem with this was that the delta symbols turned into dots. You forgot to set the encoding: see the ?pdf help file. Greek is most likely not covered by the default encoding (and you also forgot the 'at a minimum' information required by the posting guide, so we don't know what your defaults would be). Here is the results of sessionInfo(). Is this what you meant by defaults? sessionInfo() R version 2.12.1 (2010-12-16) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] Cairo_1.4-9 ggplot2_0.8.9 reshape_0.8.4 plyr_1.5.2proto_0.3-9.2 loaded via a namespace (and not attached): [1] digest_0.4.2 I also tried using all the encodings found /usr/lib/R/library/grDevices/enc AdobeStd.enc CP1250.enc CP1253.enc Cyrillic.enc ISOLatin1.enc ISOLatin7.enc KOI8-R.enc MacRoman.enc TeXtext.enc AdobeSym.enc CP1251.enc CP1257.enc Greek.enc ISOLatin2.enc ISOLatin9.enc KOI8-U.enc PDFDoc.encWinAnsi.enc None of these seemed to produce italicized greek letters. It seems like encoding is ignored in CairoPDF so I never tried it with that command. #I solved this problem using Cairo library(Cairo) cairo_pdf(Cairo.pdf) plt dev.off() The problem that I face now is that I am unsure how to output a figure that maintains the greek symbols but outputs everything in the plot as TImes New Roman, another requirement of the journal. So I can produce a Times New Roman PDF plot and an italicize greek symbol unicode PDF plot but not both. Does anoyone have any idea how I might accomplish both of these things together in a single PDF? I woud use cairo_pdf() in base R (and not package Cairo). Use grid facilities to change font, or use the version in R-devel which has a family= argument. I tried this using CairoPDF() like so: CairoPDF(Cairo.pdf, 6, 6, family=Times) plt dev.off() But this omitted the greek symbols AND didn't produce the figure in the desired font. It seems like other folks have also experienced this problem before: https://stat.ethz.ch/pipermail/r-help/2011-January/266657.html Have I missed something? Are there any other strategies that could suggest to get italicized greek letters? Thanks again. Sam Thanks so much in advance, Sam [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html That does mean you! -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https
Re: [R] Italicized greek symbols in PDF plots
Thanks for the response Dr. Ripley. Much appreciated. On Wed, Jun 29, 2011 at 10:17 PM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote: On Wed, 29 Jun 2011, Sam Albers wrote: I know that this has been asked before in other variations but I just can't seem to figure out my particular application from previous posts. My apologies if I have missed the answer to this question somewhere in the archives. I have indeed looked. I am running Ubuntu 11.04, with R 2.12.1 and ESS+Emacs. For journal formatting requirements, I need to italicize all the greek letters in any plot. This is reasonably straight forward to do and I accomplished this task like so: library(ggplot2) label_parseall - function(variable, value) { plyr::llply(value, function(x) parse(text = paste(x))) } dat - data.frame(x = runif(270, 0, 125), z = rep(LETTERS[1:3], each = 3), yy = 1:9, stringsAsFactors = TRUE) #unicode italicized delta dat$gltr = factor(c(italic(\u03b4)^14*N**,italic(\u03b4)^15*N,** italic(\u03b4)^13*C)) #So this is what I want my plot to look like: plt - ggplot(data = dat, aes(x = yy, y = x)) + geom_point(aes(x= yy, y=x, shape=z, group=z), alpha=0.4,position = position_dodge(width = 0.8)) + facet_grid(gltr~.,labeller= label_parseall, scales=free_y) plt #So then I exported my plot as a PDF like so: pdf(Times_regular.pdf, family='Times') plt dev.off() #The problem with this was that the delta symbols turned into dots. You forgot to set the encoding: see the ?pdf help file. Greek is most likely not covered by the default encoding (and you also forgot the 'at a minimum' information required by the posting guide, so we don't know what your defaults would be). Here is the results of sessionInfo(). Is this what you meant by defaults? sessionInfo() R version 2.12.1 (2010-12-16) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] Cairo_1.4-9 ggplot2_0.8.9 reshape_0.8.4 plyr_1.5.2proto_0.3-9.2 loaded via a namespace (and not attached): [1] digest_0.4.2 I also tried using all the encodings found /usr/lib/R/library/grDevices/enc AdobeStd.enc CP1250.enc CP1253.enc Cyrillic.enc ISOLatin1.enc ISOLatin7.enc KOI8-R.enc MacRoman.enc TeXtext.enc AdobeSym.enc CP1251.enc CP1257.enc Greek.enc ISOLatin2.enc ISOLatin9.enc KOI8-U.enc PDFDoc.encWinAnsi.enc None of these seemed to produce italicized greek letters. It seems like encoding is ignored in CairoPDF so I never tried it with that command. #I solved this problem using Cairo library(Cairo) cairo_pdf(Cairo.pdf) plt dev.off() The problem that I face now is that I am unsure how to output a figure that maintains the greek symbols but outputs everything in the plot as TImes New Roman, another requirement of the journal. So I can produce a Times New Roman PDF plot and an italicize greek symbol unicode PDF plot but not both. Does anoyone have any idea how I might accomplish both of these things together in a single PDF? I woud use cairo_pdf() in base R (and not package Cairo). Use grid facilities to change font, or use the version in R-devel which has a family= argument. I tried this using CairoPDF() like so: CairoPDF(Cairo.pdf, 6, 6, family=Times) plt dev.off() But this omitted the greek symbols AND didn't produce the figure in the desired font. It seems like other folks have also experienced this problem before: https://stat.ethz.ch/pipermail/r-help/2011-January/266657.html Have I missed something? Are there any other strategies that could suggest to get italicized greek letters? Thanks again. Sam Thanks so much in advance, Sam [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html That does mean you! Apologies for the inadequate posting. I will try to be clearer in the future. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https
[R] Italicized greek symbols in PDF plots
I know that this has been asked before in other variations but I just can't seem to figure out my particular application from previous posts. My apologies if I have missed the answer to this question somewhere in the archives. I have indeed looked. I am running Ubuntu 11.04, with R 2.12.1 and ESS+Emacs. For journal formatting requirements, I need to italicize all the greek letters in any plot. This is reasonably straight forward to do and I accomplished this task like so: library(ggplot2) label_parseall - function(variable, value) { plyr::llply(value, function(x) parse(text = paste(x))) } dat - data.frame(x = runif(270, 0, 125), z = rep(LETTERS[1:3], each = 3), yy = 1:9, stringsAsFactors = TRUE) #unicode italicized delta dat$gltr = factor(c(italic(\u03b4)^14*N,italic(\u03b4)^15*N,italic(\u03b4)^13*C)) #So this is what I want my plot to look like: plt - ggplot(data = dat, aes(x = yy, y = x)) + geom_point(aes(x= yy, y=x, shape=z, group=z), alpha=0.4,position = position_dodge(width = 0.8)) + facet_grid(gltr~.,labeller= label_parseall, scales=free_y) plt #So then I exported my plot as a PDF like so: pdf(Times_regular.pdf, family='Times') plt dev.off() #The problem with this was that the delta symbols turned into dots. #I solved this problem using Cairo library(Cairo) cairo_pdf(Cairo.pdf) plt dev.off() The problem that I face now is that I am unsure how to output a figure that maintains the greek symbols but outputs everything in the plot as TImes New Roman, another requirement of the journal. So I can produce a Times New Roman PDF plot and an italicize greek symbol unicode PDF plot but not both. Does anoyone have any idea how I might accomplish both of these things together in a single PDF? Thanks so much in advance, Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Axes labels, greek letters and spaces
Hello all, I can't seem to figure how to use a greek character in expression() in plot() labels without adding a space. So for example below when plotting this out x-1:10 plot(x,x^2, xlab=expression(Chlorophyll~italic(a)~mu~g~cm^-2)) the axis label read as μ g cm^-2 because I have space there with a tilda. But if I remove the tilda then my units are mug cm^-2. Can anyone recommend a way that I can modify the axis label to look for like this: μg cm^-2 Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract data from a column
If I understand what you want (which I may very well not) you could use something like this: If this is an example of your type of data: 564589,+ substr(x, 1, 6) as.numeric(x) Please try to post something more thorough if you would like a better answer. Sam -- View this message in context: http://r.789695.n4.nabble.com/extract-data-from-a-column-tp3609890p3610030.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating a mean based on a factor range
Hello all, I have been using an instrument that collects a temperature profile of a water column. The instrument records the temperature and depth any time it takes a reading. I was sampling many times at discrete depth rather than a complete profile of the water column (e.g. interested in 5m, 10m and 20m depth position only). The issue was that these measurement were taken with the instrument hanging off the side of a boat so a big enough wave moved the instrument such that it recorded a slightly different depth. For my purposes, however, this difference is negligible and I wish to consider all those different readings at close depth as a single depth. So for example: library(ggplot2) eg - read.csv(http://dl.dropbox.com/u/1574243/example_data.csv;, header=TRUE, sep=,) ## Calculating an average value from all the readings for each depth reading eg.avg - ddply(eg, c(site, depth), function(df) return(c(temp=mean(df$temperature), + num_samp=length(df$temperature) + ))) ## An example of my problem eg.avg[eg.avg$num_samp10 eg.avg==Station 3,] sitedepth temp num_samp 154 Station 3 1.09000 4.073667 30 159 Station 3 2.49744 3.950972 72 175 Station 3 7.96332 3.903188 69 208 Station 3 19.37708 4.066393 61 209 Station 3 19.54096 4.025385 13 ## So here you will notice that record 208 and 209, by my criteria, should be considered a sample at the same depth and lumped together. Yet I can't figure out a way to coerce R to calculate a mean value of temperature based on a threshold range depth (say +/- 0.25). Generally speaking this can be said to be calculating a mean (temperature) based on a factor (depth) range. Any thoughts on this? I am using R 2.12.1. Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting depth profiles based on maximum depth
Hello, I am having a little trouble finding the right set of criteria to subset a portion of data. I am using an instrument that does depth profiles of a water column. The instrument records on the way down as well as the way up. ## So I am left with data like this: dat - data.frame(var = runif(11, 0, 10)) dat$depth - c(1:5,5,5:1) # So for the example dat ## I am trying tp figure out how to subset the data so that all data collected at the maximum depth and those collected on the way UP the water column are used and the data collected on the way DOWN through the water column are discarded. I got stumped by the fact that I can't just ask R for all values less than the maximum depth. ## So I've tried determining the row number of the maximum depth value and discarding all values above that but so far I haven't been able to figure this out. which.max(dat$depth) Can anyone recommend a better strategy to figure this out? Thanks so much in advance. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting depth profiles based on maximum depth by group with plyr
Hello, Apologies for a similar earlier post. I didn't include enough details in that one. I am having a little trouble subsetting some data based on a grouping variable. I am using an instrument that does depth profiles of a water column. The instrument records on the way down as well as the way up. So thanks to an off-list reply I can subset the data so that all data collected at the maximum depth and those collected on the way UP the water column are used and the data collected on the way DOWN through the water column are discarded. This is illustrated by the following: dat1 - data.frame(var=100*(0:10), depth=c(1:5,5,5:1)) dat1[ seq_len(nrow(dat1)) = which.max(dat1$depth), ] However, I have data frame where I would like to perform this subset for several groups. My data.frame looks like the following: dat1 - data.frame(var=100*(0:10), depth=c(1:5,5,5:1)) dat1$group - A dat2 - data.frame(var=100*(0:10), depth=c(1:5,7,5:1)) dat2$group - B dat - rbind(dat1,dat2) I thought I might be able to use the plyr package to do this but for some reason the following gives me almost the opposite of what I was hoping for: library(plyr) ddply(dat, .(group), function(.df) { .df[seq_len(nrow(.df) = which.max(.df$depth)),] }) Can anyone recommend a way to subset based on a grouping variable preferably? Thanks in advance. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting ordinal dates and time into sensible formats
Hello all, I am having a little trouble working with strptime and I was hoping someone might be able to give me a hand. I have an instrument that outputs an ordinal date and time in two columns something like this: day.hour min.sec 1125252050 2125182029 3125242023 4125242028 5125072035 Now the problem I am having is converting these numbers into dates and times. I am able to convert these into their respective POSIXlt formats but I am left with two columns where hour is left with the date (data$Only.Date) and the date is left with the time (data$Only.Time). Can anyone recommend a good way to convert these ordinal dates into something like the following? day.hour min.secDateTime 12511 20332011-05-05 11:20:33 ## A trivial example ##a data frame day.hour -as.integer(runif(5, 12500, 12523)) #First 3 digits are the day of the year, last 2 are the hour of the day data - as.data.frame(day.hour) data$min.sec -as.integer(runif(5, 2000, 2060)) #First 2 digits are the minute, last 2 are the seconds ##example of how things get a little jumbled. strptime was easy enough to use. data$Date - strptime(data$day.hour, format=%j%H) data$Time - strptime(data$min.sec, format=%M%S) data Using Ubuntu 10.10 and R 2.11.1. Thanks in advance Sam -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop Through Columns to Remove Rows
Hello Venerable List, I am trying to loop (I think) an operation through a list of columns in a dataframe to remove set of #DIV/0! values. I am trying to do this like so: #Data.frame test - read.csv(http://dl.dropbox.com/u/1574243/sample_data.csv;, header=TRUE, sep=,) #This removes all the rows with #DIV/0! values in the mean column. only.mean - test[!test$mean==#DIV/0!,] #This removes the majority of #DIV/0! values as there is a large block of these values that extends over every column. #However, it doesn't remove then all. Can any recommend a way where I can cycle through all the columns and remove these values other than manually like so: mean.median - only.mean[!only.mean$median==#DIV/0!,] # and so on through each column? Can anyone recommend a better way of doing this? Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop Through Columns to Remove Rows
Many thanks for this Jorge. Exactly what I was looking for. I've never encountered any() before. Quite useful. Thanks again! Sam On Wed, Mar 9, 2011 at 1:05 PM, Jorge Ivan Velez jorgeivanve...@gmail.comwrote: Hi Sam, How about this? test[apply(test, 1, function(x) !any(x == '#DIV/0!')), ] HTH, Jorge On Wed, Mar 9, 2011 at 3:29 PM, Sam Albers wrote: Hello Venerable List, I am trying to loop (I think) an operation through a list of columns in a dataframe to remove set of #DIV/0! values. I am trying to do this like so: #Data.frame test - read.csv(http://dl.dropbox.com/u/1574243/sample_data.csv;, header=TRUE, sep=,) #This removes all the rows with #DIV/0! values in the mean column. only.mean - test[!test$mean==#DIV/0!,] #This removes the majority of #DIV/0! values as there is a large block of these values that extends over every column. #However, it doesn't remove then all. Can any recommend a way where I can cycle through all the columns and remove these values other than manually like so: mean.median - only.mean[!only.mean$median==#DIV/0!,] # and so on through each column? Can anyone recommend a better way of doing this? Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summarizing a response variable based on an irregular time period
Hello, I have a question about working with dates in R. I would like to summarize a response variable based on a designated and irregular time period. The purpose of this is to compare the summarized values (which were sampled daily) to another variable that was sampled less frequently. Below is a trivial example where I would like to summarize the response variable dat$x such that I have average and sum values from Sept25-27 and Sept28-Oct1. Can anyone suggest an efficient way to deal with dates like this? As an extremely tedious previous effort, I simply created another grouping variable but I had to do this manually. For a large dataset this really isn't a good option. Thanks in advance! Sam library(plyr) dat - data.frame(x = runif(6, 0, 125), date = as.Date(c(2009-09-25,2009-09-26,2009-09-27,2009-09-28,2009-09-29,2009-09-30,2009-10-01), format=%Y-%m-%d), yy = letters[1:2], stringsAsFactors = TRUE) #If I was using a regular factor, I would do something like this and this is what I would be hoping for as a result (obviously switching yy for date as the grouping variable) ddply(dat, c(yy), function(df) return(c(avg=mean(df$x), sum=sum(df$x #This is the data.frame that I would like to compare to dat. dat2 - data.frame(y = runif(2, 0, 125), date = as.Date(c(2009-09-27,2009-10-01), format=%Y-%m-%d)) -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drop non-integers
Hello all, I have a fairly simple data manipulation question. Say I have a dataframe like this: dat - as.data.frame(runif(7, 3, 5)) dat$cat - factor(c(1,4,13,1,4,13,13A)) dat runif(7, 3, 5) cat 1 3.880020 1 2 4.062800 4 3 4.828950 13 4 4.761850 1 5 4.716962 4 6 3.868348 13 7 3.420944 13A Under the dat$cat variable the 13A value is an analytical replicate. For my purposes I would like to drop all values that are not an integer (i.e. 13A) from the dataframe. Can anyone recommend a way to drop all rows where the cat value is a non-integer? Sorry for the simple question and thanks in advance. Sam -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drop non-integers
On Wed, Nov 17, 2010 at 3:49 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 17, 2010, at 6:27 PM, Sam Albers wrote: Hello all, I have a fairly simple data manipulation question. Say I have a dataframe like this: dat - as.data.frame(runif(7, 3, 5)) dat$cat - factor(c(1,4,13,1,4,13,13A)) dat runif(7, 3, 5) cat 1 3.880020 1 2 4.062800 4 3 4.828950 13 4 4.761850 1 5 4.716962 4 6 3.868348 13 7 3.420944 13A Under the dat$cat variable the 13A value is an analytical replicate. For my purposes I would like to drop all values that are not an integer (i.e. 13A) from the dataframe. Can anyone recommend a way to drop all rows where the cat value is a non-integer? dat[!is.na(as.numeric(as.character(dat$cat))), ] (You do get a warning about coercion to NA's but that is a good sign since that is what we were trying to exclude in the first place.) Apologies. This worked fine but I didn't quite outline that I also wanted to drop the unused levels of the factor as well. drop=TRUE doesn't seem to work, so can anyone suggest a way to drop the factor levels in addition to the values? sd - dat[!is.na(as.numeric(as.character(dat$cat))), ] Warning message: In `[.data.frame`(dat, !is.na(as.numeric(as.character(dat$cat))), : NAs introduced by coercion str(sd) 'data.frame':6 obs. of 2 variables: $ runif(7, 3, 5): num 3.88 4.06 4.83 4.76 4.72 ... $ cat : Factor w/ 4 levels 1,13,13A,..: 1 4 2 1 4 2 Sorry for the simple question and thanks in advance. Sam -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change plot order in lattice xyplot
Prior to creating a plot I usually just order the factor levels to the order I want them in. So for your example I would do: #Create some data library(lattice) x - runif(100, 0, 20) df - data.frame(x) df$y - (1:10) df$Month - c(October, September, August, July,June) #Plot the figure plt -xyplot(x~y | Month, data =df, layout=c(5,1), xlab=Log density from hydroacoustics (integration), ylab=Log density from Tucker trawl, main=Density estimates, Tucker Trawl, cex=1.5) #Factor levels aren't in the order you want them in. Reorder them how you want. df$Month - factor(df$Month, levels=c(June,July,August, September, October), order=TRUE) #Plot again. plt HTH, Sam -- View this message in context: http://r.789695.n4.nabble.com/Change-plot-order-in-lattice-xyplot-tp2531542p2531619.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hydrology plots in R
Hello, I am trying to create a plot often seen in hydrodynamic work than includes a contour plot representing the water speed with arrows pointing in the direction of flow. Does anyone have any idea how I might add arrows based on wf$angle (in the example below) to the plot below? Thanks in advance! Sam library(lattice) speed - runif(100, 0, 20) wf - data.frame(speed) wf$width - (1:10) wf$length - rep(1:10, each=10) wf$angle -runif(100, 0, 360) #How do I add arrows based on wf$angle within each coloured box to represent the direction of flow? #i don't have to use lattice. Just using it as an example. with(wf, contourplot(speed ~ width*length, region=TRUE, contour=FALSE )) -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I set default parameters for the default graphics device?
Unless you aren't writing scripts why wouldn't you just use something like this? x=c(1,2,3) pdf(RRules.pdf) plot(x,x) dev.off() -- View this message in context: http://r.789695.n4.nabble.com/Can-I-set-default-parameters-for-the-default-graphics-device-tp2290827p2290836.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] k-sample Kolmogorov-Smirnov test?
Hello, I am curious if anyone has had any success with finding a R version of a k-sample Kolmogorov-Smirnov test. Most of the references that I have able to find on this are fairly old and I am wondering if this type of analysis has fallen out of favour. If so, how do people tend to compare distributions when they have more than two? Is it reasonable to pursue an adjusted p-value method. That is, could you compare say three distributions by performing three two-sample K-S test's then apply a bonferroni correction? Just curious what some peoples approaches are when they want to compare more than two distributions. Thanks in advance. Sam -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] k-sample Kolmogorov-Smirnov test?
Hello, I am curious if anyone has had any success with finding a R version of a k-sample Kolmogorov-Smirnov test. Most of the references that I have able to find on this are fairly old and I am wondering if this type of analysis has fallen out of favour. If so, how do people tend to compare distributions when they have more than two? Is it reasonable to pursue an adjusted p-value method. That is, could you compare say three distributions by performing three two-sample K-S test's then apply a bonferroni correction? Just curious what some peoples approaches are when they want to compare more than two distributions. Thanks in advance. Sam -- View this message in context: http://r.789695.n4.nabble.com/k-sample-Kolmogorov-Smirnov-test-tp812997p2264455.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting data when all you have is the summary data
Bimal, in the memisc packages: ?panel.errbars This might be a good option for you. HTH, Sam -- View this message in context: http://r.789695.n4.nabble.com/plotting-data-when-all-you-have-is-the-summary-data-tp2173026p2173303.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmeans in trellis view?
I'm not sure about plotmeans but this is usually the way I plot means with lattice: library(lattice) x - runif(48, 2, 70) data - data.frame(x) data$factor1 - factor(c(A, B, C, D)) data$factor2 - factor(c(X, Y, Z)) data.mean - with(data, aggregate(data$x, by=list(factor1=factor1, factor2=factor2), mean)) with(data.mean, xyplot(x~factor1 | factor2)) Is this sort of what you were looking for? HTH, Sam -- View this message in context: http://r.789695.n4.nabble.com/plotmeans-in-trellis-view-tp2065860p2065945.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assumptions on Non-Standard F ratios
Hello there, I am trying to run an ANOVA model using a non-Standard F ratio. Imagine that the treatments (treatments 1 2) are applied to the row not to individual samples. Thus the row is the experimental unit. Therefore my error term in my ANOVA table should be the error associated with with row. The question is how do I check the assumptions of an ANOVA model when I have a non-standard F ratio? For this type of model I would normally use plot(model) to examine the residuals. However this doesn't seem to work and I expect that R is looking for residuals that don't exist. Is there some option I can change on the plot command? Sorry if this is simple but searching for this answer was a little difficult as plot() has many uses. Below is an example. I am using R 2.10.1 and Ubuntu 9.04. Thanks in advance! Sam x - runif(48, 2, 70) data - data.frame(x) data$treat1 - factor(c(ONE, TWO, THREE)) data$treat2 - factor(c(PRUNED, UNPRUNED)) data$row - factor(1:12) model - with(data, aov(x ~ treat1 + treat2 + treat1*treat2 + Error(row))) plot(model) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Bars in lattice- barcharts
Well, when the error message says argument 'lx' is missing, with no default, it really means that argument 'lx' is missing, with no default. Your panel function has an argument 'lx', which you forgot to change to 'ly' as you did with the prepanel function. Hope that helps... Thanks for the help Felix! It was a bit obvious what the problem was and I apologize for not thinking about the error message more clearly. I am going to continue this as I think that it would be helpful if there was a working example of this type of plot. Unfortunately, I do not have it yet. The modified example below produced a stacked barplot with lovely error bars. However, I can't seem to produce a plot that doesn't stack the bars. stack=FALSE doesn't seem to have any effect. Any thoughts? Thanks in advance. Sam #Generating the data library(lattice) temp - abs(rnorm(81*5)) err - as.data.frame(temp) err$section=c(down,down,down,mid,mid,mid, up,up, up) err$depth=c(Surface,D50, 2xD50) err$err.date=c(05/09/2009,12/09/2009,13/10/2009,19/10/2009,21/09/2009) err.split - with(err, split(temp, list(depth,section, err.date))) #I've tried to alter the panel function according to the thread to produce #vertical error bars in my barcharts prepanel.ci - function(x, y, ly, uy, subscripts, ...) { y - as.numeric(y) ly - as.numeric(ly[subscripts]) uy - as.numeric(uy[subscripts]) list(ylim = range(y, uy, ly, finite = TRUE)) } panel.ci - function(x, y, ly, uy, subscripts, pch = 16, ...) { x - as.numeric(x) y - as.numeric(y) ly - as.numeric(ly[subscripts]) uy - as.numeric(uy[subscripts]) panel.arrows(x, ly, x, uy, col = 'black', length = 0.25, unit = native, angle = 90, code = 3) panel.barchart(x, y, pch = pch, ...) } se -function(x) sqrt(var(x)/length(x)) err.ucl - sapply(err.split, function(x) { st - boxplot.stats(x) c(mean(x), mean(x) + se(x), mean(x) -se(x)) }) err.ucl - as.data.frame(t(err.ucl)) names(err.ucl) - c(mean, upper.se, lower.se) err.ucl$label - factor(rownames(err.ucl),levels = rownames(err.ucl)) # add factor, grouping and by variables err.ucl$section=c(down,down,down,mid,mid,mid, up,up, up) err.ucl$depth=c(Surface,D50, 2xD50) s err.ucl$err.date=c(05/09/2009,12/09/2009,13/10/2009,19/10/2009,21/09/2009) #This produces the figure I am looking for minus the error bars. with(err.ucl, barchart(mean ~ err.date | section, group=depth, layout=c(1,3), horizontal=FALSE, scales=list(x=list(rot=45)), )) #OK, now that this work and the error bars are drawn, I am curious why the stack=TRUE doesn't produce each bar beside each other. with(err.ucl, barchart(mean ~ err.date| section, group=depth, layout=c(1,3), horizontal=FALSE, stack=FALSE, scales=list(x=list(rot=45)), ly=lower.se, uy=upper.se, auto.key = list(points = FALSE, rectangles = TRUE, space= right, title = Depth, border = TRUE), #auto.key=TRUE, prepanel=prepanel.ci, panel=panel.superpose, panel.groups=panel.ci )) -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Bars in lattice- barcharts
Hi Ivan, Can you educate me a little bit on the use of barchart? Unfortunately no... For this post I eventually used the barplot2() in the gplots packages. I got bogged down trying to do it in lattice so I looked for an alternative. It was quite straight forward which was nice and I was able to get what I wanted quite quickly. Sorry I can't be of more help. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using ifelse and grep
Good Morning, I am trying to create a new column of character strings based on the first two letters in a string in another column. I believe that I need to use some combination of ifelse and grep but I am not totally sure how to combine them. I am not totally sure why the command below isn't working. Obviously it isn't finding anything that matches my criteria but I am not sure why. Any ideas on how I might be able to modify this to get to work? Below is also a data example of what I would like to achieve with this command. section - ifelse(Sample==grep(^BU, Sample),up, ifelse(Sample==grep(^BM, Sample), mid,down)) section [1] down down down down down down down down down down [11] down down Thanks in advance. Sam Sample Transmission section BU1 0.39353 up BU2 0.38778 up BU3 0.42645 up BM1 0.37510 mid BM2 0.5103 mid BM3 0.67224 mid BD1 0.37482 down BD2 0.54716 down BD3 0.50866 down BU1 0.34869 up BU2 0.32831 up BU3 0.59877 up BM1 0.52518 mid BM2 0.94387 mid BM3 0.94387 mid BD1 0.46872 down BD2 0.63115 down BD3 0.45239 down n down down down down down down -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing summary.aov results to a file which can be opened in Excel
Julia, I think exporting to excel takes you a step back. Likely it would be easier to work solely in R and sort the P values like that. I had to do something similar only with a bunch of regressions a while back. I found this post extremely helpful as well as the plyr package http://www.r-bloggers.com/r-calculating-all-possible-linear-regression-models-for-a-given-set-of-predictors/ Not much help I realize but maybe it will point you in the right path. Sam -- View this message in context: http://n4.nabble.com/Writing-summary-aov-results-to-a-file-which-can-be-opened-in-Excel-tp1749775p1750249.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using ifelse and grep
Fantastic. Solved. Thanks! On Sat, Apr 3, 2010 at 9:59 AM, Gabor Grothendieck ggrothendi...@gmail.comwrote: # 1 grep returns an index, not the value unless you use grep(..., value = TRUE). Easier might be: # 2 Sample2 - substr(Sample, 1, 2) ifelse(Sample2 == BU, up, ifelse(Sample2 == BM, mid, down)) or #3 the following which matches the first 2 characters against the given list names and return the corresponding list values. library(gsubfn) gsubfn(^(..)., list(BU = up, BD = down, BM = mid), Sample) Note that if Sample is a factor rather than character then use as.character(Sample) in place of Sample in the last line. On Sat, Apr 3, 2010 at 12:18 PM, Sam Albers tonightstheni...@gmail.com wrote: Good Morning, I am trying to create a new column of character strings based on the first two letters in a string in another column. I believe that I need to use some combination of ifelse and grep but I am not totally sure how to combine them. I am not totally sure why the command below isn't working. Obviously it isn't finding anything that matches my criteria but I am not sure why. Any ideas on how I might be able to modify this to get to work? Below is also a data example of what I would like to achieve with this command. section - ifelse(Sample==grep(^BU, Sample),up, ifelse(Sample==grep(^BM, Sample), mid,down)) section [1] down down down down down down down down down down [11] down down Thanks in advance. Sam Sample Transmission section BU1 0.39353 up BU2 0.38778 up BU3 0.42645 up BM1 0.37510 mid BM2 0.5103 mid BM3 0.67224 mid BD1 0.37482 down BD2 0.54716 down BD3 0.50866 down BU1 0.34869 up BU2 0.32831 up BU3 0.59877 up BM1 0.52518 mid BM2 0.94387 mid BM3 0.94387 mid BD1 0.46872 down BD2 0.63115 down BD3 0.45239 down n down down down down down down -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Selecting the first row based on a factor
Hello there, I have a situation where I would like to select the first row of a particular factor for a data frame (data example below). So that is, I would like to select the first entry when the factor1 =A and then the first row when factor1=B etc. I have thousands of entries so I need some general way of doing this. I have a minimal example that should illustrate what I am trying to do. I am using R version 2.9.2, ESS version 5.4 and Ubuntu 9.04. Thanks so much in advance! Sam #Minimal example x - rnorm(100) y - rnorm(100) xy - data.frame(x,y) xy$factor1 - c(A, B,C,D) xy$factor2 - c(a,b) xy - xy[order(xy$factor1),] #This simply orders the data to look more like the actual data I am working with #I am trying to use this approach but I am not sure that I am selecting the correct row and then the output temp is a total mess. temp - with(xy, unlist(lapply(split(xy, list(factor1=factor1, factor2=factor2)), function(x) x[1,]))) xy factor1 factor2 10.700042585 -2.481633101 A a # I would like to select this row 51.402677849 -0.691143942 A a 90.188287765 -1.723823157 A a 13 0.714946028 0.715361315 A a 17 0.690177271 -0.112394002 A a 21 0.333101579 -0.316285321 A a 25 0.439505793 -3.356415326 A a 89 -1.001153334 -0.739440288 A a 93 0.135509539 0.949943380 A a 97 -1.730936150 0.356133105 A a 2 -0.399355582 -0.843874548 B b # Then I would like to select this row. etc 61.285958969 0.958501988 B b 10 0.495795836 -0.805012667 B b 14 0.512486789 -0.968247016 B b 18 -1.189627025 0.455278250 B b -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting the first row based on a factor
Thanks! On Fri, Apr 2, 2010 at 11:35 AM, Erik Iverson er...@ccbr.umn.edu wrote: Hello, Sam Albers wrote: Hello there, I have a situation where I would like to select the first row of a particular factor for a data frame (data example below). So that is, I would like to select the first entry when the factor1 =A and then the first row when factor1=B etc. I have thousands of entries so I need some general way of doing this. I have a minimal example that should illustrate what I am trying to do. I am using R version 2.9.2, ESS version 5.4 and Ubuntu 9.04. Thanks so much in advance! Sam #Minimal example x - rnorm(100) y - rnorm(100) xy - data.frame(x,y) xy$factor1 - c(A, B,C,D) xy$factor2 - c(a,b) xy - xy[order(xy$factor1),] #This simply orders the data to look more like the actual data I am working with Does xy[!duplicated(xy$factor1),] This most definitely works. What a beautifully elegant solution. Thanks! do what you want? -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vertical subtraction in dataframes
Hello all, I have not been able to find an answer to this problem. I feel like it might be so simple though that it might not get a response. Suppose I have a dataframe like the one I have copied below (minus the 'calib' column). I wish to create a column like calib where I am subtracting the 'Count' when 'stain' is 'none' from all other 'Count' data for every value of 'rep'. This is sort of analogous to putting a $ in front of the number that identifies a cell in a spreadsheet environment. Specifically I need some like this: mydataframe$calib - Count - (Count when stain = none for each value rep) Any thoughts on how I might accomplish this? Thanks in advance. Sam Note: I've already calculated the calib column in gnumeric for clarity. rep Count stain calib 1 1522 none 0 1 147 syto -1375 1 544.8 sytolec -977.2 1 2432.6 sytolec 910.6 1 234.6 sytolec -1287.4 2 5699.8 none 0 2 265.6 syto -5434.2 2 329.6 sytolec -5370.2 2 383 sytolec -5316.8 2 968.8 sytolec -4731 3 2466.8 none 0 3 1303 syto -1163.8 3 1290.6 sytolec -1176.2 3 110.2 sytolec -2356.6 3 15086.8 sytolec 12620 -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error Bars in lattice- barcharts
Hello, I am attempting to write a script that adds error bars to a barchart. I basing my attempt heavily on the following thread: http://tolstoy.newcastle.edu.au/R/e2/help/06/10/2791.html I can't seem to get around the problem that was discussed in the thread. The following example should illustrate my problem. Sorry about the messy example but I am 1) trying to make it as close as possible to my actual work and 2) my skill level is spotty at best. Can anyone suggest a way to do this or even another way to make a grouped barchart with error bars? I'm not married to this method although I prefer working with lattice. Thanks for any help in advance! Sam #Generating the data library(lattice) temp - abs(rnorm(81*5)) err - as.data.frame(temp) err$section=c(down,down,down,mid,mid,mid, up,up, up) err$depth=c(Surface,D50, 2xD50) err$err.date=c(05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,28/08/2009, 28/08/2009, 28/08/2009,28/08/2009, 28/08/2009, 28/08/2009,28/08/2009, 28/08/2009, 28/08/2009) err.split - with(err, split(temp, list(depth,section, err.date))) #I've tried to alter the panel function according to the thread to produce vertical error bars in my barcharts prepanel.ci - function(x, y, ly, uy, subscripts, ...) { y - as.numeric(y) ly - as.numeric(ly[subscripts]) uy - as.numeric(uy[subscripts]) list(ylim = range(y, uy, ly, finite = TRUE)) } panel.ci - function(x, y, lx, ux, subscripts, pch = 16, ...) { x - as.numeric(x) y - as.numeric(y) lx - as.numeric(lx[subscripts]) ux - as.numeric(ux[subscripts]) panel.arrows(x, ly, x, uy, col = 'black', length = 0.25, unit = native, angle = 90, code = 3) panel.barchart(x, y, pch = pch, ...) } se -function(x) sqrt(var(x)/length(x)) err.ucl - sapply(err.split, function(x) { st - boxplot.stats(x) c(mean(x), mean(x) + se(x), mean(x) -se(x)) }) err.ucl - as.data.frame(t(err.ucl)) names(err.ucl) - c(mean, upper.se, lower.se) err.ucl$label - factor(rownames(err.ucl),levels = rownames(err.ucl)) # add factor, grouping and by variables err.ucl$section=c(down,down,down,mid,mid,mid, up,up, up) err.ucl$depth=c(Surface,D50, 2xD50) #There has got to be a better way of doing this err.ucl$err.date=c(05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/09/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,05/10/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,12/09/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,13/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,19/10/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,21/09/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,26/10/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,27/09/2009,28/08/2009, 28/08/2009, 28/08/2009,28/08/2009, 28/08/2009, 28/08/2009,28/08/2009, 28/08/2009, 28/08/2009) #This produces the figure I am looking for minus the error bars. with(err.ucl, barchart(mean ~ err.date | section, group=depth, layout=c(1,3), horizontal=FALSE, scales=list(x=list(rot=45)), )) # Deepayan's original example. I am unsure how to diagnose the packet error. This is where I run into problems with(err.ucl, barchart(mean ~ err.date | section, group=depth, layout=c(1,3), horizontal=FALSE, scales=list(x=list(rot=45)), ly=lower.se, uy=upper.se, prepanel=prepanel.ci, panel=panel.superpose, panel.groups=panel.ci )) -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted
[R] augPred and nlme
Hello there, Using 'The R Book' (p675-677) I am following instructions on performing a series of nonlinear regressions fitting the same model to a set of groups. I have been to able to fit the model to my data using the following call to nlme: library(nlme) inorg.model-nlme(inorg.grv ~ a*exp( - ((numDate-b)^2 / 2*c^2)), fixed=a+b+c~1, random=a~1|Sectionf, start=c(a=adi,b=bdi,c=cdi), verbose = TRUE) Now, again following the R book, I would like to plot these models using augPred. However I receive the following error: plot(augPred(inorg.model)) Error in augPred.lme(inorg.model) : Data in inorg.model call must evaluate to a data frame I am not sure even how to diagnose this problem. I basically followed to R book directions to the letter. I could plot each of these curves from each out individually but the prospect of R doing it all for me is too tempting. I am using R 2.8.1-1 and Ubuntu 9.04. Thanks in advance! Sam -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model and by()
/09,0.2 Middle,125.31,04/09/09,0.13 Middle,125.31,04/09/09,0.11 Downstream,125.31,04/09/09,0.16 Downstream,125.31,04/09/09,0.17 Downstream,125.31,04/09/09,0.17 Upstream,150.29,04/09/09,0.17 Upstream,150.29,04/09/09,0.19 Upstream,150.29,04/09/09,0.14 Middle,150.29,04/09/09,0.2 Middle,150.29,04/09/09,0.13 Middle,150.29,04/09/09,0.11 Downstream,150.29,04/09/09,0.16 Downstream,150.29,04/09/09,0.17 Downstream,150.29,04/09/09,0.17 Upstream,0,11/09/09,0.12 Upstream,0,11/09/09,0.16 Upstream,0,11/09/09,0.12 Middle,0,11/09/09,0.08 Middle,0,11/09/09,0.12 Middle,0,11/09/09,0.1 Downstream,0,11/09/09,0.11 Downstream,0,11/09/09,0.13 Downstream,0,11/09/09,0.13 Upstream,25,11/09/09,0.12 Upstream,25,11/09/09,0.16 Upstream,25,11/09/09,0.12 Middle,25,11/09/09,0.08 Middle,25,11/09/09,0.12 Middle,25,11/09/09,0.1 Downstream,25,11/09/09,0.11 Downstream,25,11/09/09,0.13 Downstream,25,11/09/09,0.13 Upstream,50,11/09/09,0.12 Upstream,50,11/09/09,0.16 Upstream,50,11/09/09,0.12 Middle,50,11/09/09,0.08 Middle,50,11/09/09,0.12 Middle,50,11/09/09,0.1 Downstream,50,11/09/09,0.11 Downstream,50,11/09/09,0.13 Downstream,50,11/09/09,0.13 Upstream,75,11/09/09,0.12 Upstream,75,11/09/09,0.16 Upstream,75,11/09/09,0.12 Middle,75,11/09/09,0.08 Middle,75,11/09/09,0.12 Middle,75,11/09/09,0.1 Downstream,75,11/09/09,0.11 Downstream,75,11/09/09,0.13 Downstream,75,11/09/09,0.13 Upstream,100,11/09/09,0.12 Upstream,100,11/09/09,0.16 Upstream,100,11/09/09,0.12 Middle,100,11/09/09,0.08 Middle,100,11/09/09,0.12 Middle,100,11/09/09,0.1 Downstream,100,11/09/09,0.11 Downstream,100,11/09/09,0.13 Downstream,100,11/09/09,0.13 Upstream,125.04,11/09/09,0.12 Upstream,125.04,11/09/09,0.16 Upstream,125.04,11/09/09,0.12 Middle,125.04,11/09/09,0.08 Middle,125.04,11/09/09,0.12 Middle,125.04,11/09/09,0.1 Downstream,125.04,11/09/09,0.11 Downstream,125.04,11/09/09,0.13 Downstream,125.04,11/09/09,0.13 -- * Sam Albers Geography Program David Winsemius, MD Heritage Laboratories West Hartford, CT -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear model and by()
,75,11/09/09,0.13 Upstream,100,11/09/09,0.12 Upstream,100,11/09/09,0.16 Upstream,100,11/09/09,0.12 Middle,100,11/09/09,0.08 Middle,100,11/09/09,0.12 Middle,100,11/09/09,0.1 Downstream,100,11/09/09,0.11 Downstream,100,11/09/09,0.13 Downstream,100,11/09/09,0.13 Upstream,125.04,11/09/09,0.12 Upstream,125.04,11/09/09,0.16 Upstream,125.04,11/09/09,0.12 Middle,125.04,11/09/09,0.08 Middle,125.04,11/09/09,0.12 Middle,125.04,11/09/09,0.1 Downstream,125.04,11/09/09,0.11 Downstream,125.04,11/09/09,0.13 Downstream,125.04,11/09/09,0.13 -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dates plotting backwards
Hello, I am having a little trouble formatting my dates correctly. When I plot something using the following commands, R plots the most recent date on the left of the figure and then earlier date on the right of the figure. Given that English is read from left to right I would like to have the dates on my figure arranged in the same way. I am sure that this is something fairly simple but I was wondering if someone could help me out. Here is a minimal example that should reproduce my problem. I've also included so data thinking that perhaps my data format was the problem. Thanks in advance! Sam Date=as.Date(test$Date, format= %d/%m/%Y) plot(test$D.2D50.SA ~ test$Date) Date,D.2D50.SA 28/08/2009,60.67 28/08/2009,66.4 28/08/2009,50.19 28/08/2009,38.19 28/08/2009,50.19 12/09/2009,62.2 12/09/2009,93.77 12/09/2009,49.89 12/09/2009,106.34 12/09/2009,42.22 22/09/2009,24.15 22/09/2009,105.17 22/09/2009,15.04 22/09/2009,23.54 22/09/2009,19.6 05/10/2009,74.41 05/10/2009,34.78 05/10/2009,28.74 05/10/2009,41.29 05/10/2009,42.68 12/10/2009,46.26 12/10/2009,13.31 12/10/2009,29.95 12/10/2009,34.28 12/10/2009,74.51 19/10/2009,33.67 19/10/2009,69.86 19/10/2009,61.3 19/10/2009,21.38 19/10/2009,80.37 26/10/2009,20.69 26/10/2009,63.37 26/10/2009,70.91 26/10/2009,22.7 26/10/2009,23.89 -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dates plotting backwards
Thanks in advance! Sam Date=as.Date(test$Date, format= %d/%m/%Y) Change that to test$Data - as.Date(...) or plot Date instead of test$Date. Yes that worked. Silly mistake. Sometimes those are the hardest ones to spot. Thanks! Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com plot(test$D.2D50.SA ~ test$Date) Date,D.2D50.SA 28/08/2009,60.67 28/08/2009,66.4 28/08/2009,50.19 28/08/2009,38.19 28/08/2009,50.19 12/09/2009,62.2 12/09/2009,93.77 12/09/2009,49.89 12/09/2009,106.34 12/09/2009,42.22 22/09/2009,24.15 22/09/2009,105.17 22/09/2009,15.04 22/09/2009,23.54 22/09/2009,19.6 05/10/2009,74.41 05/10/2009,34.78 05/10/2009,28.74 05/10/2009,41.29 05/10/2009,42.68 12/10/2009,46.26 12/10/2009,13.31 12/10/2009,29.95 12/10/2009,34.28 12/10/2009,74.51 19/10/2009,33.67 19/10/2009,69.86 19/10/2009,61.3 19/10/2009,21.38 19/10/2009,80.37 26/10/2009,20.69 26/10/2009,63.37 26/10/2009,70.91 26/10/2009,22.7 26/10/2009,23.89 -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- * Sam Albers Geography Program University of Northern British Columbia University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Web implementation of R?
Hello, Can anyone recommend a good example of web implementation of R? Can't seem to find anything on my own. Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Contour Plot Aspect Ratio
So I managed to solve this for myself in a very roundabout kind of way. So I figured that I should share in case anyone else needed something like this. filled.contour(contour, axes=F, frame.plot=F, color=terrain.colors, ylab= , key.title = title(main=Velocity\n(m/s)),asp=2, key.axes = axis(4, seq(0, 0.6, by = 0.1)), plot.axes = { axis.mult(side=1,mult=0.005,mult.label=Width (cm)) axis(side=2, at=x, line = -5, labels=colnames(contour)) }) mtext(side=2, line=-1.5, 'Length Along Flume (m)') I just removed the frame.plot then manually shifted the axes and axes label over until they lined up nicely with the edge of the actual contour plot. Anyways, this is solved and thank you for your help. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contour Plot Aspect Ratio
Hello there, I have a fairly simple request (I hope!) I have produced a filled contour plot like this: library(grDevices) library(gplots) library(plotrix) filled.contour(contour, axes=F, frame.plot=TRUE, color=terrain.colors, ylab= Length Along Flume (m), key.title = title(main=Velocity\n(m/s)), key.axes = axis(4, seq(0, 0.6, by = 0.1)), asp=2, plot.axes = { axis.mult(side=1,mult=0.005,mult.label=Width (cm)) axis(side=2, at=x, labels=colnames(contour)) }) Note the asp=2 argument. I would like to make this plot twice as long as it is wide. I accomplish this using asp=2 but the actual box that I am plotting now is too big for the data contained within. Here is what it looks like: http://docs.google.com/Doc?id=ddqdnxbq_30ffthshgk Does anyone know how I might be able to lengthen this graph without it looking like this? I want to suck in that vertical axes so that it is snug with the actual contour plot. Thanks in advance. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grid building in R
Right equidistant was clearly the wrong word. Sorry. I just meant that any given point should have an equal distance from the four points immediately surrounding it (x,-x,y-y) aside from those on the edge which will obviously only have two or three points surrounding. On Wed, Jul 9, 2008 at 3:12 PM, hadley wickham [EMAIL PROTECTED] wrote: What do you mean by equidistant? You can have three points that are equidistant on the plane, but there's no way to add another point and have it be the same distance from all of the existing points. (Unless all the points are in the same place) Hadley On Wed, Jul 9, 2008 at 5:02 PM, hippie dream [EMAIL PROTECTED] wrote: This might not possible in R but I thought I would give it shot. I am have to set up a 40 x 40 cm grid of 181 points equidistant from each other. Is there any way to produce a graph with R that can do this for me? Actual sizes are unimportant as long it is to scale. Thanks -- View this message in context: http://www.nabble.com/Grid-building-in-R-tp18371874p18371874.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grid building in R
Basically, I want 181 points equally spaced over a 40 x 40 cm area. I want to be able to specify the number of points and the area to which they are plotted on. I think you are right that grid is what I am looking for but I was the grid to have axes which your code below, although appreciated, did not give me. Sorry to be unclear. On Wed, Jul 9, 2008 at 3:48 PM, Erik Iverson [EMAIL PROTECTED] wrote: Still not sure exactly what you want, but it sounds like the 'grid' package may be of some help. It has very flexible ways partitioning regions for plotting. Is this anything like you're after? library(grid) for(i in 0:10) for(j in 0:10) grid.points(i / 10, j / 10, default.unit = npc) hippie dream wrote: This might not possible in R but I thought I would give it shot. I am have to set up a 40 x 40 cm grid of 181 points equidistant from each other. Is there any way to produce a graph with R that can do this for me? Actual sizes are unimportant as long it is to scale. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grid building in R
Ahhh. That worked perfectly. Thank you very much. On Wed, Jul 9, 2008 at 4:19 PM, Dylan Beaudette [EMAIL PROTECTED] wrote: On Wednesday 09 July 2008, hippie dream wrote: This might not possible in R but I thought I would give it shot. I am have to set up a 40 x 40 cm grid of 181 points equidistant from each other. Is there any way to produce a graph with R that can do this for me? Actual sizes are unimportant as long it is to scale. Thanks how about: # 40cm spacing spacings - 0:13*40 # a square grid with 196 points # sqrt(181) is not an integer, sorry! g - expand.grid(x=spacings, y=spacings) # check it out plot(g, pch=3, cex=0.5) -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Graph Order in xyplot
I have constructed a Trellis style xyplot. lengthf - factor(length) xyplot(SLI$velocity ~ SLI$width | SLI$lengthf, layout = c(2,7), xlab = Width (cm), ylab = Velocity (m/s^2), col = black) This produces a lovely little plot. However, the grouping factor(lengthf) isn't in the right order. My values range from 2-28 and the 2 graph on the bottom left and the graphs continue sequentially left to right to the top of the page. I would like to have the 2 at the top and have the graphs shown in descending order (i.e. have the entire graph read like a book) I tried the following but it didn't seem to work. lengthd -sort(SLI$length, decreasing =TRUE) lengthdf - factor(SLI$lengthd) Then I plotted the graph again: xyplot(SLI$velocity ~ SLI$width | SLI$lengthdf, layout = c(2,7), xlab = Width (cm), ylab = Velocity (m/s^2), col = black) This simply gave me the same graph and now I am a little lost. Is there an easier way to do this? Do I have to rearrange my data or can this be changed around using the original xyplot command line. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems exporting graphs
I have trying to figure this out all day so hopefully the answer isn't too obvious. I am able to view a graph in the viewer window. However, I need to export graph outside of the viewer window. Here is the script I am using: png(Compare.png) plot(compare$DepthSLI, compare$DischargeSLI, col=blue, xlab = Average Water Depth (cm), ylab = Discharge (m^3/s), xlim=c(5,40), ylim=c(0.03,0.1), main=Discharge in Flume 1) dev.off() null device 1 When I do this R simply produces an empty file that produces an error message when I try to open it. I have tried this same script with every file format available under help(device) all with the same result. I am running Ubuntu 8.04 so I unable to simply copy and paste the graph as in windows. I am fairly sure I have all the right packages installed. I am running R 2.6.2-2. Any suggestions? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.