Re: [R] remove
Val, Working with R's special missing value indicator (NA) would be useful here. You could use the na.strings arg in read.table() to recognise "-" as a missing value: dfr <- read.table( text= 'first week last Alex1 West Bob 1 John Cory1 Jack Cory2 - Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ', header = TRUE, as.is = TRUE, na.strings = c("NA", "-")) and then modify the function used by ave() or by() to exclude missing values from the count of unique last names. Here's one approach adapting code from earlier in this thread: err <- ave(dfr$last, dfr$first, FUN = function(x) length(unique(x[!is.na(x)]))) res <- dfr[err == 1 , ] res <- res[order(res$first) , ] res first week last 2 Bob1 John 5 Bob2 John 6 Bob3 John 3 Cory1 Jack 4 Cory2 Alternatively, if not using na.strings, change "-" to NA after first reading the data in: identify last names recorded as "-" using an index, and assign NA to these elements, before proceeding as above. Philip On 13/02/2017 3:18 PM, Val wrote: Hi Jeff and All, When I examined the excluded data, ie., first name with with different last names, I noticed that some last names were not recorded or instance, I modified the data as follows DF<- read.table( text= 'first week last Alex1 West Bob 1 John Cory1 Jack Cory2 - Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ', header = TRUE, as.is = TRUE ) err2<- ave( seq_along( DF$first ) , DF[ , "first", drop = FALSE] , FUN = function( n ) { length( unique( DF[ n, "last" ] ) ) } ) result2<- DF[ 1 == err2, ] result2 first week last 2 Bob1 John 5 Bob2 John 6 Bob3 John However, I want keep Cory's record. It is assumed that not recorded should have the same last name. Final out put should be first week last Bob1 John Bob2 John Bob3 John Cory1 Jack Cory2 - Thank you again! On Sun, Feb 12, 2017 at 7:28 PM, Val wrote: Sorry Jeff, I did not finish my email. I accidentally touched the send button. My question was the when I used this one length(unique(result2$first)) vs dim(result2[!duplicated(result2[,c('first')]),]) [1] I did get different results but now I found out the problem. Thank you!. On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller wrote: Your question mystifies me, since it looks to me like you already know the answer. -- Sent from my phone. Please excuse my brevity. On February 12, 2017 3:30:49 PM PST, Val wrote: Hi Jeff and all, How do I get the number of unique first names in the two data sets? for the first one, result2<- DF[ 1 == err2, ] length(unique(result2$first)) On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller wrote: The "by" function aggregates and returns a result with generally fewer rows than the original data. Since you are looking to index the rows in the original data set, the "ave" function is better suited because it always returns a vector that is just as long as the input vector: # I usually work with character data rather than factors if I plan # to modify the data (e.g. removing rows) DF<- read.table( text= 'first week last Alex1 West Bob 1 John Cory1 Jack Cory2 Jack Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ', header = TRUE, as.is = TRUE ) err<- ave( DF$last , DF[ , "first", drop = FALSE] , FUN = function( lst ) { length( unique( lst ) ) } ) result<- DF[ "1" == err, ] result Notice that the ave function returns a vector of the same type as was given to it, so even though the function returns a numeric the err vector is character. If you wanted to be able to examine more than one other column in determining the keep/reject decision, you could do: err2<- ave( seq_along( DF$first ) , DF[ , "first", drop = FALSE] , FUN = function( n ) { length( unique( DF[ n, "last" ] ) ) } ) result2<- DF[ 1 == err2, ] result2 and then you would have the option to re-use the "n" index to look at other columns as well. Finally, here is a dplyr solution: library(dplyr) result3<- ( DF %>% group_by( first ) # like a prep for ave or by %>% mutate( err = length( unique( last ) ) ) # similar to ave %>% filter( 1 == err ) # drop the rows with too many last names %>% select( -err ) # drop the temporary column %>% as.data.frame # convert back to a plain-jane data frame ) result3 which uses a small set of verbs in a pipeline of functions to go from input to result in one pass. If your data set is really big (running out of memory big) then you might want to investigate the data.table or sqlite packages, either of which ca
Re: [R] Help with saving user defined functions
ecdf() is part of the stats package, which is (typically) automatically attached on startup. I have no idea what you mean by "splitting" and "saving." This is basically how all of R works -- e.g. see the value of lm() and the (S3) plot method, plot.lm, for "lm" objects. This has nothing to do with free variables and lexical scoping. Perhaps you need to review how functions and S3 methods work? Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Feb 12, 2017 at 5:31 PM, George Trojan - NOAA Federal wrote: > I want to split my computation into parts. The first script processes the > data, the second does the graphics. I want to save results of > time-consuming calculations. My example tried to simulate this by terminate > the session without saving it, so the environment was lost on purpose. What > confuses me that ecdf can be saved and restored, but not my own derived > function. > Of course I can save parameters and redefine the function in the second > script. > > Reading Chapter 8 of Advanced R, hopefully the book will clear my mind. > > On Mon, Feb 13, 2017 at 12:05 AM, Bert Gunter > wrote: >> >> It worked fine for me: >> >> > t <- rnorm(100) >> > cdf <- ecdf(t) >> > >> > trans <- function(x) qnorm(cdf(x) * 0.99) >> > saveRDS(trans, "/tmp/foo") >> > trans(1.2) >> [1] 1.042457 >> > trans1 <- readRDS("/tmp/foo") >> > trans1(0) >> [1] 0.1117773 >> >> >> Of course, if I remove cdf() from the global environment, it will fail: >> >> > rm(cdf) >> > trans1(0) >> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" >> >> So it looks like you're clearing you global workspace in between >> saving and loading? >> >> You may need to read up on function closures/lexical scoping : A >> user-defined function in R includes not only code but also a pointer >> to the environment in which it was defined, in your case, the global >> environment from which you apparently removed cdf(). Note that >> functions are not evauated until called, so free variables in the >> functions that do not or will not exist in the function's lexical >> scope when called will not trigger any errors until the function *is* >> called. >> >> Same comments for your second version -- if tmp is removed the >> function will fail. >> >> >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal >> wrote: >> > I can't figure out how to save functions to RDS file. Here is an example >> > what I am trying to achieve: >> > >> >> t <- rnorm(100) >> >> cdf <- ecdf(t) >> >> cdf(0) >> > [1] 0.59 >> >> saveRDS(cdf, "/tmp/foo") >> >> >> > Save workspace image? [y/n/c]: n >> > [gtrojan@asok petproject]$ R >> >> cdf <- readRDS("/tmp/foo") >> >> cdf >> > Empirical CDF >> > Call: ecdf(t) >> > x[1:100] = -2.8881, -2.2054, -2.0026, ..., 2.0367, 2.0414 >> > >> > This works. However when instead of saving cdf() I try to save function >> > >> >> trans <- function(x) qnorm(cdf(x) * 0.99) >> > >> > after restoring object from file I get an error: >> > >> >> trans <- readRDS("/tmp/foo") >> >> trans(0) >> > Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" >> > >> > I tried to define and call cdf within the definition of trans, without >> > success: >> > >> >> tmp <- rnorm(100) >> >> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } >> >> saveRDS(trans, "/tmp/foo") >> > Save workspace image? [y/n/c]: n >> > >> >> trans <- readRDS("/tmp/foo") >> >> trans >> > function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } >> >> trans(0) >> > Error in sort(x) : object 'tmp' not found >> > >> > So, here the call cdf(0) did not force evaluation of my random sample. >> > What >> > am I missing? >> > >> > George >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove
Hi Jeff and All, When I examined the excluded data, ie., first name with with different last names, I noticed that some last names were not recorded or instance, I modified the data as follows DF <- read.table( text= 'first week last Alex1 West Bob 1 John Cory1 Jack Cory2 - Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ', header = TRUE, as.is = TRUE ) err2 <- ave( seq_along( DF$first ) , DF[ , "first", drop = FALSE] , FUN = function( n ) { length( unique( DF[ n, "last" ] ) ) } ) result2 <- DF[ 1 == err2, ] result2 first week last 2 Bob1 John 5 Bob2 John 6 Bob3 John However, I want keep Cory's record. It is assumed that not recorded should have the same last name. Final out put should be first week last Bob1 John Bob2 John Bob3 John Cory1 Jack Cory2 - Thank you again! On Sun, Feb 12, 2017 at 7:28 PM, Val wrote: > Sorry Jeff, I did not finish my email. I accidentally touched the send > button. > My question was the > when I used this one > length(unique(result2$first)) > vs > dim(result2[!duplicated(result2[,c('first')]),]) [1] > > I did get different results but now I found out the problem. > > Thank you!. > > > > > > > > > On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller > wrote: >> Your question mystifies me, since it looks to me like you already know the >> answer. >> -- >> Sent from my phone. Please excuse my brevity. >> >> On February 12, 2017 3:30:49 PM PST, Val wrote: >>>Hi Jeff and all, >>> How do I get the number of unique first names in the two data sets? >>> >>>for the first one, >>>result2 <- DF[ 1 == err2, ] >>>length(unique(result2$first)) >>> >>> >>> >>> >>>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller >>> wrote: The "by" function aggregates and returns a result with generally >>>fewer rows than the original data. Since you are looking to index the rows in >>>the original data set, the "ave" function is better suited because it >>>always returns a vector that is just as long as the input vector: # I usually work with character data rather than factors if I plan # to modify the data (e.g. removing rows) DF <- read.table( text= 'first week last Alex1 West Bob 1 John Cory1 Jack Cory2 Jack Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ', header = TRUE, as.is = TRUE ) err <- ave( DF$last , DF[ , "first", drop = FALSE] , FUN = function( lst ) { length( unique( lst ) ) } ) result <- DF[ "1" == err, ] result Notice that the ave function returns a vector of the same type as was >>>given to it, so even though the function returns a numeric the err vector is character. If you wanted to be able to examine more than one other column in determining the keep/reject decision, you could do: err2 <- ave( seq_along( DF$first ) , DF[ , "first", drop = FALSE] , FUN = function( n ) { length( unique( DF[ n, "last" ] ) ) } ) result2 <- DF[ 1 == err2, ] result2 and then you would have the option to re-use the "n" index to look at >>>other columns as well. Finally, here is a dplyr solution: library(dplyr) result3 <- ( DF %>% group_by( first ) # like a prep for ave or by %>% mutate( err = length( unique( last ) ) ) # similar to >>>ave %>% filter( 1 == err ) # drop the rows with too many last >>>names %>% select( -err ) # drop the temporary column %>% as.data.frame # convert back to a plain-jane data >>>frame ) result3 which uses a small set of verbs in a pipeline of functions to go from >>>input to result in one pass. If your data set is really big (running out of memory big) then you >>>might want to investigate the data.table or sqlite packages, either of >>>which can be combined with dplyr to get a standardized syntax for managing >>>larger amounts of data. However, most people actually aren't running out of >>>memory so in most cases the extra horsepower isn't actually needed. On Sun, 12 Feb 2017, P Tennant wrote: > Hi Val, > > The by() function could be used here. With the dataframe dfr: > > # split the data by first name and check for more than one last name >>>for > each first name > res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) > # make the result more easily manipulated > res <- as.table(res) > res > # first > # Alex Bob Cory > # TRUE FALSE FA
Re: [R] Converting Excel Date format into R-Date formats
Hi Jeff, Most likely the "Event Date" field is a factor. Try this: df$Event.Date <- as.Date(as.character(df$Event.Date), "%d-%b-%y") Also beware of Excel's habit of silently converting mixed date formats (i.e. dd/mm/ and mm/dd/) to one or the other format. The only way I know to prevent this is to stick to international (-mm-dd) format in Excel. Jim On Mon, Feb 13, 2017 at 11:23 AM, Jeff Reichman wrote: > R-Help Group > > > > What is the proper way to convert excel date formats to R-Date format. > > > > > Event ID > > Event Date > > Event Type > > > 250013 > > 1-Jan-09 > > NSAG Attack > > > 250015 > > 1-Jan-09 > > NSAG Attack > > > 250016 > > 1-Jan-09 > > NSAG Attack > > > > Obviously this is wrong > > > > df$Event.Date <- as.Date(df$Event.Date, "%d-%b-%y") > > > > as it return "NA" > > > > Jeff > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with saving user defined functions
I want to split my computation into parts. The first script processes the data, the second does the graphics. I want to save results of time-consuming calculations. My example tried to simulate this by terminate the session without saving it, so the environment was lost on purpose. What confuses me that ecdf can be saved and restored, but not my own derived function. Of course I can save parameters and redefine the function in the second script. Reading Chapter 8 of Advanced R, hopefully the book will clear my mind. On Mon, Feb 13, 2017 at 12:05 AM, Bert Gunter wrote: > It worked fine for me: > > > t <- rnorm(100) > > cdf <- ecdf(t) > > > > trans <- function(x) qnorm(cdf(x) * 0.99) > > saveRDS(trans, "/tmp/foo") > > trans(1.2) > [1] 1.042457 > > trans1 <- readRDS("/tmp/foo") > > trans1(0) > [1] 0.1117773 > > > Of course, if I remove cdf() from the global environment, it will fail: > > > rm(cdf) > > trans1(0) > Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" > > So it looks like you're clearing you global workspace in between > saving and loading? > > You may need to read up on function closures/lexical scoping : A > user-defined function in R includes not only code but also a pointer > to the environment in which it was defined, in your case, the global > environment from which you apparently removed cdf(). Note that > functions are not evauated until called, so free variables in the > functions that do not or will not exist in the function's lexical > scope when called will not trigger any errors until the function *is* > called. > > Same comments for your second version -- if tmp is removed the > function will fail. > > > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal > wrote: > > I can't figure out how to save functions to RDS file. Here is an example > > what I am trying to achieve: > > > >> t <- rnorm(100) > >> cdf <- ecdf(t) > >> cdf(0) > > [1] 0.59 > >> saveRDS(cdf, "/tmp/foo") > >> > > Save workspace image? [y/n/c]: n > > [gtrojan@asok petproject]$ R > >> cdf <- readRDS("/tmp/foo") > >> cdf > > Empirical CDF > > Call: ecdf(t) > > x[1:100] = -2.8881, -2.2054, -2.0026, ..., 2.0367, 2.0414 > > > > This works. However when instead of saving cdf() I try to save function > > > >> trans <- function(x) qnorm(cdf(x) * 0.99) > > > > after restoring object from file I get an error: > > > >> trans <- readRDS("/tmp/foo") > >> trans(0) > > Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" > > > > I tried to define and call cdf within the definition of trans, without > > success: > > > >> tmp <- rnorm(100) > >> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } > >> saveRDS(trans, "/tmp/foo") > > Save workspace image? [y/n/c]: n > > > >> trans <- readRDS("/tmp/foo") > >> trans > > function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } > >> trans(0) > > Error in sort(x) : object 'tmp' not found > > > > So, here the call cdf(0) did not force evaluation of my random sample. > What > > am I missing? > > > > George > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove
Sorry Jeff, I did not finish my email. I accidentally touched the send button. My question was the when I used this one length(unique(result2$first)) vs dim(result2[!duplicated(result2[,c('first')]),]) [1] I did get different results but now I found out the problem. Thank you!. On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller wrote: > Your question mystifies me, since it looks to me like you already know the > answer. > -- > Sent from my phone. Please excuse my brevity. > > On February 12, 2017 3:30:49 PM PST, Val wrote: >>Hi Jeff and all, >> How do I get the number of unique first names in the two data sets? >> >>for the first one, >>result2 <- DF[ 1 == err2, ] >>length(unique(result2$first)) >> >> >> >> >>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller >> wrote: >>> The "by" function aggregates and returns a result with generally >>fewer rows >>> than the original data. Since you are looking to index the rows in >>the >>> original data set, the "ave" function is better suited because it >>always >>> returns a vector that is just as long as the input vector: >>> >>> # I usually work with character data rather than factors if I plan >>> # to modify the data (e.g. removing rows) >>> DF <- read.table( text= >>> 'first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>> Alex2 Joseph >>> Alex3 West >>> Alex4 West >>> ', header = TRUE, as.is = TRUE ) >>> >>> err <- ave( DF$last >>> , DF[ , "first", drop = FALSE] >>> , FUN = function( lst ) { >>> length( unique( lst ) ) >>> } >>> ) >>> result <- DF[ "1" == err, ] >>> result >>> >>> Notice that the ave function returns a vector of the same type as was >>given >>> to it, so even though the function returns a numeric the err >>> vector is character. >>> >>> If you wanted to be able to examine more than one other column in >>> determining the keep/reject decision, you could do: >>> >>> err2 <- ave( seq_along( DF$first ) >>>, DF[ , "first", drop = FALSE] >>>, FUN = function( n ) { >>> length( unique( DF[ n, "last" ] ) ) >>> } >>>) >>> result2 <- DF[ 1 == err2, ] >>> result2 >>> >>> and then you would have the option to re-use the "n" index to look at >>other >>> columns as well. >>> >>> Finally, here is a dplyr solution: >>> >>> library(dplyr) >>> result3 <- ( DF >>>%>% group_by( first ) # like a prep for ave or by >>>%>% mutate( err = length( unique( last ) ) ) # similar to >>ave >>>%>% filter( 1 == err ) # drop the rows with too many last >>names >>>%>% select( -err ) # drop the temporary column >>>%>% as.data.frame # convert back to a plain-jane data >>frame >>>) >>> result3 >>> >>> which uses a small set of verbs in a pipeline of functions to go from >>input >>> to result in one pass. >>> >>> If your data set is really big (running out of memory big) then you >>might >>> want to investigate the data.table or sqlite packages, either of >>which can >>> be combined with dplyr to get a standardized syntax for managing >>larger >>> amounts of data. However, most people actually aren't running out of >>memory >>> so in most cases the extra horsepower isn't actually needed. >>> >>> >>> On Sun, 12 Feb 2017, P Tennant wrote: >>> Hi Val, The by() function could be used here. With the dataframe dfr: # split the data by first name and check for more than one last name >>for each first name res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) # make the result more easily manipulated res <- as.table(res) res # first # Alex Bob Cory # TRUE FALSE FALSE # then use this result to subset the data nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] # sort if needed nw.dfr[order(nw.dfr$first) , ] first week last 2 Bob1 John 5 Bob2 John 6 Bob3 John 3 Cory1 Jack 4 Cory2 Jack Philip On 12/02/2017 4:02 PM, Val wrote: > > Hi all, > I have a big data set and want to remove rows conditionally. > In my data file each person were recorded for several weeks. >>Somehow > during the recording periods, their last name was misreported. >>For > each person, the last name should be the same. Otherwise remove >>from > the data. Example, in the following data set, Alex was found to >>have > two last names . > > Alex West > Alex Joseph > > Alex should be removed from the data. if this happens then I want > remove all rows with Alex. Here is my data set > > df<- read.table(header=TRUE, text='first week last > Alex1 West > Bob 1 John > Cory1 Jack > Cory2 Jack > Bob 2 John > Bob
Re: [R] Help with saving user defined functions
Jeff: Oh yes!-- and I meant to say so and forgot, so I'm glad you did. Not only might the free variable in the function not be there; worse yet, it might be there but something else. So it seems like a disaster waiting to happen. The solution, I would presume, is to have no free variables (make them arguments). Or save and read the function *and* its environment. Namespaces in packages I think would also take care of this, right? Note: If my understanding on any of this is incorrect, I would greatly appreciate someone settting me straight. In particular, as Jeff noted, my understanding is that saving a function (closure) with a free variable in the function depends on the function finding its enclosing environment when it is read back into R via readRDS() . Correct? The man page is silent on this point. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Feb 12, 2017 at 4:26 PM, Jeff Newmiller wrote: > So doesn't the fact that a function contains a reference to an environment > suggest that this whole exercise is a really bad idea? > -- > Sent from my phone. Please excuse my brevity. > > On February 12, 2017 4:05:31 PM PST, Bert Gunter > wrote: >>It worked fine for me: >> >>> t <- rnorm(100) >>> cdf <- ecdf(t) >>> >>> trans <- function(x) qnorm(cdf(x) * 0.99) >>> saveRDS(trans, "/tmp/foo") >>> trans(1.2) >>[1] 1.042457 >>> trans1 <- readRDS("/tmp/foo") >>> trans1(0) >>[1] 0.1117773 >> >> >>Of course, if I remove cdf() from the global environment, it will fail: >> >>> rm(cdf) >>> trans1(0) >>Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" >> >>So it looks like you're clearing you global workspace in between >>saving and loading? >> >>You may need to read up on function closures/lexical scoping : A >>user-defined function in R includes not only code but also a pointer >>to the environment in which it was defined, in your case, the global >>environment from which you apparently removed cdf(). Note that >>functions are not evauated until called, so free variables in the >>functions that do not or will not exist in the function's lexical >>scope when called will not trigger any errors until the function *is* >>called. >> >>Same comments for your second version -- if tmp is removed the >>function will fail. >> >> >> >>Cheers, >>Bert >> >> >>Bert Gunter >> >>"The trouble with having an open mind is that people keep coming along >>and sticking things into it." >>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >>On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal >> wrote: >>> I can't figure out how to save functions to RDS file. Here is an >>example >>> what I am trying to achieve: >>> t <- rnorm(100) cdf <- ecdf(t) cdf(0) >>> [1] 0.59 saveRDS(cdf, "/tmp/foo") >>> Save workspace image? [y/n/c]: n >>> [gtrojan@asok petproject]$ R cdf <- readRDS("/tmp/foo") cdf >>> Empirical CDF >>> Call: ecdf(t) >>> x[1:100] = -2.8881, -2.2054, -2.0026, ..., 2.0367, 2.0414 >>> >>> This works. However when instead of saving cdf() I try to save >>function >>> trans <- function(x) qnorm(cdf(x) * 0.99) >>> >>> after restoring object from file I get an error: >>> trans <- readRDS("/tmp/foo") trans(0) >>> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" >>> >>> I tried to define and call cdf within the definition of trans, >>without >>> success: >>> tmp <- rnorm(100) trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * >>0.99 } saveRDS(trans, "/tmp/foo") >>> Save workspace image? [y/n/c]: n >>> trans <- readRDS("/tmp/foo") trans >>> function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } trans(0) >>> Error in sort(x) : object 'tmp' not found >>> >>> So, here the call cdf(0) did not force evaluation of my random >>sample. What >>> am I missing? >>> >>> George >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >>__ >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contai
Re: [R] remove
Your question mystifies me, since it looks to me like you already know the answer. -- Sent from my phone. Please excuse my brevity. On February 12, 2017 3:30:49 PM PST, Val wrote: >Hi Jeff and all, > How do I get the number of unique first names in the two data sets? > >for the first one, >result2 <- DF[ 1 == err2, ] >length(unique(result2$first)) > > > > >On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller > wrote: >> The "by" function aggregates and returns a result with generally >fewer rows >> than the original data. Since you are looking to index the rows in >the >> original data set, the "ave" function is better suited because it >always >> returns a vector that is just as long as the input vector: >> >> # I usually work with character data rather than factors if I plan >> # to modify the data (e.g. removing rows) >> DF <- read.table( text= >> 'first week last >> Alex1 West >> Bob 1 John >> Cory1 Jack >> Cory2 Jack >> Bob 2 John >> Bob 3 John >> Alex2 Joseph >> Alex3 West >> Alex4 West >> ', header = TRUE, as.is = TRUE ) >> >> err <- ave( DF$last >> , DF[ , "first", drop = FALSE] >> , FUN = function( lst ) { >> length( unique( lst ) ) >> } >> ) >> result <- DF[ "1" == err, ] >> result >> >> Notice that the ave function returns a vector of the same type as was >given >> to it, so even though the function returns a numeric the err >> vector is character. >> >> If you wanted to be able to examine more than one other column in >> determining the keep/reject decision, you could do: >> >> err2 <- ave( seq_along( DF$first ) >>, DF[ , "first", drop = FALSE] >>, FUN = function( n ) { >> length( unique( DF[ n, "last" ] ) ) >> } >>) >> result2 <- DF[ 1 == err2, ] >> result2 >> >> and then you would have the option to re-use the "n" index to look at >other >> columns as well. >> >> Finally, here is a dplyr solution: >> >> library(dplyr) >> result3 <- ( DF >>%>% group_by( first ) # like a prep for ave or by >>%>% mutate( err = length( unique( last ) ) ) # similar to >ave >>%>% filter( 1 == err ) # drop the rows with too many last >names >>%>% select( -err ) # drop the temporary column >>%>% as.data.frame # convert back to a plain-jane data >frame >>) >> result3 >> >> which uses a small set of verbs in a pipeline of functions to go from >input >> to result in one pass. >> >> If your data set is really big (running out of memory big) then you >might >> want to investigate the data.table or sqlite packages, either of >which can >> be combined with dplyr to get a standardized syntax for managing >larger >> amounts of data. However, most people actually aren't running out of >memory >> so in most cases the extra horsepower isn't actually needed. >> >> >> On Sun, 12 Feb 2017, P Tennant wrote: >> >>> Hi Val, >>> >>> The by() function could be used here. With the dataframe dfr: >>> >>> # split the data by first name and check for more than one last name >for >>> each first name >>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >>> # make the result more easily manipulated >>> res <- as.table(res) >>> res >>> # first >>> # Alex Bob Cory >>> # TRUE FALSE FALSE >>> >>> # then use this result to subset the data >>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] >>> # sort if needed >>> nw.dfr[order(nw.dfr$first) , ] >>> >>> first week last >>> 2 Bob1 John >>> 5 Bob2 John >>> 6 Bob3 John >>> 3 Cory1 Jack >>> 4 Cory2 Jack >>> >>> >>> Philip >>> >>> On 12/02/2017 4:02 PM, Val wrote: Hi all, I have a big data set and want to remove rows conditionally. In my data file each person were recorded for several weeks. >Somehow during the recording periods, their last name was misreported. >For each person, the last name should be the same. Otherwise remove >from the data. Example, in the following data set, Alex was found to >have two last names . Alex West Alex Joseph Alex should be removed from the data. if this happens then I want remove all rows with Alex. Here is my data set df<- read.table(header=TRUE, text='first week last Alex1 West Bob 1 John Cory1 Jack Cory2 Jack Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ') Desired output first week last 1 Bob 1 John 2 Bob 2 John 3 Bob 3 John 4 Cory 1 Jack 5 Cory 2 Jack Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
Re: [R] Help with saving user defined functions
So doesn't the fact that a function contains a reference to an environment suggest that this whole exercise is a really bad idea? -- Sent from my phone. Please excuse my brevity. On February 12, 2017 4:05:31 PM PST, Bert Gunter wrote: >It worked fine for me: > >> t <- rnorm(100) >> cdf <- ecdf(t) >> >> trans <- function(x) qnorm(cdf(x) * 0.99) >> saveRDS(trans, "/tmp/foo") >> trans(1.2) >[1] 1.042457 >> trans1 <- readRDS("/tmp/foo") >> trans1(0) >[1] 0.1117773 > > >Of course, if I remove cdf() from the global environment, it will fail: > >> rm(cdf) >> trans1(0) >Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" > >So it looks like you're clearing you global workspace in between >saving and loading? > >You may need to read up on function closures/lexical scoping : A >user-defined function in R includes not only code but also a pointer >to the environment in which it was defined, in your case, the global >environment from which you apparently removed cdf(). Note that >functions are not evauated until called, so free variables in the >functions that do not or will not exist in the function's lexical >scope when called will not trigger any errors until the function *is* >called. > >Same comments for your second version -- if tmp is removed the >function will fail. > > > >Cheers, >Bert > > >Bert Gunter > >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal > wrote: >> I can't figure out how to save functions to RDS file. Here is an >example >> what I am trying to achieve: >> >>> t <- rnorm(100) >>> cdf <- ecdf(t) >>> cdf(0) >> [1] 0.59 >>> saveRDS(cdf, "/tmp/foo") >>> >> Save workspace image? [y/n/c]: n >> [gtrojan@asok petproject]$ R >>> cdf <- readRDS("/tmp/foo") >>> cdf >> Empirical CDF >> Call: ecdf(t) >> x[1:100] = -2.8881, -2.2054, -2.0026, ..., 2.0367, 2.0414 >> >> This works. However when instead of saving cdf() I try to save >function >> >>> trans <- function(x) qnorm(cdf(x) * 0.99) >> >> after restoring object from file I get an error: >> >>> trans <- readRDS("/tmp/foo") >>> trans(0) >> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" >> >> I tried to define and call cdf within the definition of trans, >without >> success: >> >>> tmp <- rnorm(100) >>> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * >0.99 } >>> saveRDS(trans, "/tmp/foo") >> Save workspace image? [y/n/c]: n >> >>> trans <- readRDS("/tmp/foo") >>> trans >> function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } >>> trans(0) >> Error in sort(x) : object 'tmp' not found >> >> So, here the call cdf(0) did not force evaluation of my random >sample. What >> am I missing? >> >> George >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting Excel Date format into R-Date formats
R-Help Group What is the proper way to convert excel date formats to R-Date format. Event ID Event Date Event Type 250013 1-Jan-09 NSAG Attack 250015 1-Jan-09 NSAG Attack 250016 1-Jan-09 NSAG Attack Obviously this is wrong df$Event.Date <- as.Date(df$Event.Date, "%d-%b-%y") as it return "NA" Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with saving user defined functions
It worked fine for me: > t <- rnorm(100) > cdf <- ecdf(t) > > trans <- function(x) qnorm(cdf(x) * 0.99) > saveRDS(trans, "/tmp/foo") > trans(1.2) [1] 1.042457 > trans1 <- readRDS("/tmp/foo") > trans1(0) [1] 0.1117773 Of course, if I remove cdf() from the global environment, it will fail: > rm(cdf) > trans1(0) Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" So it looks like you're clearing you global workspace in between saving and loading? You may need to read up on function closures/lexical scoping : A user-defined function in R includes not only code but also a pointer to the environment in which it was defined, in your case, the global environment from which you apparently removed cdf(). Note that functions are not evauated until called, so free variables in the functions that do not or will not exist in the function's lexical scope when called will not trigger any errors until the function *is* called. Same comments for your second version -- if tmp is removed the function will fail. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal wrote: > I can't figure out how to save functions to RDS file. Here is an example > what I am trying to achieve: > >> t <- rnorm(100) >> cdf <- ecdf(t) >> cdf(0) > [1] 0.59 >> saveRDS(cdf, "/tmp/foo") >> > Save workspace image? [y/n/c]: n > [gtrojan@asok petproject]$ R >> cdf <- readRDS("/tmp/foo") >> cdf > Empirical CDF > Call: ecdf(t) > x[1:100] = -2.8881, -2.2054, -2.0026, ..., 2.0367, 2.0414 > > This works. However when instead of saving cdf() I try to save function > >> trans <- function(x) qnorm(cdf(x) * 0.99) > > after restoring object from file I get an error: > >> trans <- readRDS("/tmp/foo") >> trans(0) > Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" > > I tried to define and call cdf within the definition of trans, without > success: > >> tmp <- rnorm(100) >> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } >> saveRDS(trans, "/tmp/foo") > Save workspace image? [y/n/c]: n > >> trans <- readRDS("/tmp/foo") >> trans > function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } >> trans(0) > Error in sort(x) : object 'tmp' not found > > So, here the call cdf(0) did not force evaluation of my random sample. What > am I missing? > > George > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove
Hi Jeff and all, How do I get the number of unique first names in the two data sets? for the first one, result2 <- DF[ 1 == err2, ] length(unique(result2$first)) On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller wrote: > The "by" function aggregates and returns a result with generally fewer rows > than the original data. Since you are looking to index the rows in the > original data set, the "ave" function is better suited because it always > returns a vector that is just as long as the input vector: > > # I usually work with character data rather than factors if I plan > # to modify the data (e.g. removing rows) > DF <- read.table( text= > 'first week last > Alex1 West > Bob 1 John > Cory1 Jack > Cory2 Jack > Bob 2 John > Bob 3 John > Alex2 Joseph > Alex3 West > Alex4 West > ', header = TRUE, as.is = TRUE ) > > err <- ave( DF$last > , DF[ , "first", drop = FALSE] > , FUN = function( lst ) { > length( unique( lst ) ) > } > ) > result <- DF[ "1" == err, ] > result > > Notice that the ave function returns a vector of the same type as was given > to it, so even though the function returns a numeric the err > vector is character. > > If you wanted to be able to examine more than one other column in > determining the keep/reject decision, you could do: > > err2 <- ave( seq_along( DF$first ) >, DF[ , "first", drop = FALSE] >, FUN = function( n ) { > length( unique( DF[ n, "last" ] ) ) > } >) > result2 <- DF[ 1 == err2, ] > result2 > > and then you would have the option to re-use the "n" index to look at other > columns as well. > > Finally, here is a dplyr solution: > > library(dplyr) > result3 <- ( DF >%>% group_by( first ) # like a prep for ave or by >%>% mutate( err = length( unique( last ) ) ) # similar to ave >%>% filter( 1 == err ) # drop the rows with too many last names >%>% select( -err ) # drop the temporary column >%>% as.data.frame # convert back to a plain-jane data frame >) > result3 > > which uses a small set of verbs in a pipeline of functions to go from input > to result in one pass. > > If your data set is really big (running out of memory big) then you might > want to investigate the data.table or sqlite packages, either of which can > be combined with dplyr to get a standardized syntax for managing larger > amounts of data. However, most people actually aren't running out of memory > so in most cases the extra horsepower isn't actually needed. > > > On Sun, 12 Feb 2017, P Tennant wrote: > >> Hi Val, >> >> The by() function could be used here. With the dataframe dfr: >> >> # split the data by first name and check for more than one last name for >> each first name >> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >> # make the result more easily manipulated >> res <- as.table(res) >> res >> # first >> # Alex Bob Cory >> # TRUE FALSE FALSE >> >> # then use this result to subset the data >> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] >> # sort if needed >> nw.dfr[order(nw.dfr$first) , ] >> >> first week last >> 2 Bob1 John >> 5 Bob2 John >> 6 Bob3 John >> 3 Cory1 Jack >> 4 Cory2 Jack >> >> >> Philip >> >> On 12/02/2017 4:02 PM, Val wrote: >>> >>> Hi all, >>> I have a big data set and want to remove rows conditionally. >>> In my data file each person were recorded for several weeks. Somehow >>> during the recording periods, their last name was misreported. For >>> each person, the last name should be the same. Otherwise remove from >>> the data. Example, in the following data set, Alex was found to have >>> two last names . >>> >>> Alex West >>> Alex Joseph >>> >>> Alex should be removed from the data. if this happens then I want >>> remove all rows with Alex. Here is my data set >>> >>> df<- read.table(header=TRUE, text='first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>> Alex2 Joseph >>> Alex3 West >>> Alex4 West ') >>> >>> Desired output >>> >>>first week last >>> 1 Bob 1 John >>> 2 Bob 2 John >>> 3 Bob 3 John >>> 4 Cory 1 Jack >>> 5 Cory 2 Jack >>> >>> Thank you in advance >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-
[R] Help with saving user defined functions
I can't figure out how to save functions to RDS file. Here is an example what I am trying to achieve: > t <- rnorm(100) > cdf <- ecdf(t) > cdf(0) [1] 0.59 > saveRDS(cdf, "/tmp/foo") > Save workspace image? [y/n/c]: n [gtrojan@asok petproject]$ R > cdf <- readRDS("/tmp/foo") > cdf Empirical CDF Call: ecdf(t) x[1:100] = -2.8881, -2.2054, -2.0026, ..., 2.0367, 2.0414 This works. However when instead of saving cdf() I try to save function > trans <- function(x) qnorm(cdf(x) * 0.99) after restoring object from file I get an error: > trans <- readRDS("/tmp/foo") > trans(0) Error in qnorm(cdf(x) * 0.99) : could not find function "cdf" I tried to define and call cdf within the definition of trans, without success: > tmp <- rnorm(100) > trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } > saveRDS(trans, "/tmp/foo") Save workspace image? [y/n/c]: n > trans <- readRDS("/tmp/foo") > trans function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 } > trans(0) Error in sort(x) : object 'tmp' not found So, here the call cdf(0) did not force evaluation of my random sample. What am I missing? George [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] object of type 'closure' is not subsettable
> Error in forecast[[d + 1]] = paste(index(lEJReturnsOffset[windowLength]), : > object of type 'closure' is not subsettable A 'closure' is a function and you cannot use '[' or '[[' to make a subset of a function. You used forecast[d+1] <- ... in one branch of the 'if' statement and forecasts[d+1] <- ... in the other. Do you see the problem now? By the way, the code snippet in the error message says '[[d+1]]' but the code you supplied has '[d+1]'. Does the html mangling selectively double brackets or did you not show us the code that generated that message? Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Feb 12, 2017 at 4:34 AM, Allan Tanaka wrote: > Hi. > I tried to run this R-code but still completely no idea why it still gives > error message: Error in forecast[[d + 1]] = > paste(index(lEJReturnsOffset[windowLength]), : object of type 'closure' is > not subsettable > Here is the R-code: > library(rugarch); library(sos); > library(forecast);library(lattice)library(quantmod); require(stochvol); > require(fBasics);data = read.table("EURJPY.m1440.csv", > header=F)names(data)data=ts(data)lEJ=log(data)lret.EJ = 100*diff(lEJ)lret.EJ > = > ts(lret.EJ)lret.EJ[as.character(head(index(lret.EJ)))]=0windowLength=500foreLength=length(lret.EJ)-windowLengthforecasts<-vector(mode="character", > length=foreLength)for (d in 0:foreLength) { > lEJReturnsOffset=lret.EJ[(1+d):(windowLength+d)] final.aic<-Inf > final.order<-c(0,0,0) for (p in 0:5) for (q in 0:5) {if(p == 0 && q == > 0) { next}arimaFit=tryCatch(arima(lEJReturnsOffset, > order=c(p,0,q)), error=function(err)FALSE, >warning=function(err)FALSE)if(!is.logical(arimaFit)) { > current.aic<-AIC(arimaFit) if(current.aic final.aic<-current.aicfinal.order<-c(p,0,q) > final.arima<-arima(lEJReturnsOffset, order=final.order) }} els! e { next} } > spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = > c(1,1)), mean.model = list(armaOrder = c(final.order[1], > final.order[3]), arfima = FALSE, include.mean = TRUE), > distribution.model = "sged")fit <- tryCatch(ugarchfit(spec, lEJReturnsOffset, > solver='gosolnp'), error=function(e) e, warning=function(w) w)if(is(fit, > "warning")) { forecast[d+1]=paste(index(lEJReturnsOffset[windowLength]), 1, > sep=",") print(paste(index(lEJReturnsOffset[windowLength]), 1, sep=","))} > else { fore = ugarchforecast(fit, n.ahead=1) ind = fore@forecast$seriesFor > forecasts[d+1] = paste(colnames(ind), ifelse(ind[1] < 0, -1, 1), sep=",") > print(paste(colnames(ind), ifelse(ind[1] < 0, -1, 1), sep=",")) > }}write.csv(forecasts, file="forecasts.csv", row.names=FALSE) > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Query - Merging and conditional replacement of values in a data frame
Thanks for all your help. This is helpful. Best, Bhaskar On Sun, Feb 12, 2017 at 4:35 AM, Jim Lemon wrote: > Hi Bhaskar, > Maybe: > > df1 <-read.table(text="time v1 v2 v3 > 1 2 3 4 > 2 5 6 4 > 3 1 3 4 > 4 1 3 4 > 5 2 3 4 > 6 2 3 4", > header=TRUE) > > > df2 <-read.table(text="time v11 v12 v13 > 3 112 3 4 > 4 112 3 4", > header=TRUE) > > for(time1 in df1$time) { > time2<-which(df2$time==time1) > if(length(time2)) df1[df1$time==time1,]<-df2[time2,] > } > > Jim > > > On Sun, Feb 12, 2017 at 11:13 AM, Bhaskar Mitra > wrote: > > Hello Everyone, > > > > I have two data frames df1 and df2 as shown below. They > > are of different length. However, they have one common column - time. > > > > df1 <- > > time v1 v2 v3 > > 1 2 3 4 > > 2 5 6 4 > > 3 1 3 4 > > 4 1 3 4 > > 5 2 3 4 > > 6 2 3 4 > > > > > > df2 <- > > time v11 v12 v13 > > 3 112 3 4 > > 4 112 3 4 > > > > By matching the 'time' column in df1 and df2, I am trying to modify > column > > 'v1' in df1 by replacing it > > with values in column 'v11' in df2. The modified df1 should look > something > > like this: > > > > df1 <- > > time v1 v2 v3 > > 1 2 3 4 > > 2 5 6 4 > > 3 112 3 4 > > 4 112 3 4 > > 5 2 3 4 > > 6 2 3 4 > > > > I tried to use the 'merge' function to combine df1 and df2 followed by > > the conditional 'ifelse' statement. However, that doesn't seem to work. > > > > Can I replace the values in df1 by not merging the two data frames? > > > > Thanks for your help, > > > > Regards, > > Bhaskar > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Re: remove
Thank you Rainer, The question was :- 1. Identify those first names with different last names or more than one last names. 2. Once identified (like Alex) then exclude them. This is because not reliable record. On Sun, Feb 12, 2017 at 11:17 AM, Rainer Schuermann wrote: > I may not be understanding the question well enough but for me > > df[ df[ , "first"] != "Alex", ] > > seems to do the job: > > first week last > > Rainer > > > > > On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote: >> >> On 12/02/17 18:36, Bert Gunter wrote: >> > Basic stuff! >> > >> > Either subscripting or ?subset. >> > >> > There are many good R tutorials on the web. You should spend some >> > (more?) time with some. >> >> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't >> seem basic to me. The only way that I can see how to go at it is via >> a for loop: >> >> rdln <- function(X) { >> # Remove discordant last names. >> ok <- logical(nrow(X)) >> for(nm in unique(X$first)) { >> xxx <- unique(X$last[X$first==nm]) >> if(length(xxx)==1) ok[X$first==nm] <- TRUE >> } >> Y <- X[ok,] >> Y <- Y[order(Y$first),] >> rownames(Y) <- 1:nrow(Y) >> Y >> } >> >> Calling the toy data frame "melvin" rather than "df" (since "df" is the >> name of the built in F density function, it is bad form to use it as the >> name of another object) I get: >> >> > rdln(melvin) >>first week last >> 1 Bob1 John >> 2 Bob2 John >> 3 Bob3 John >> 4 Cory1 Jack >> 5 Cory2 Jack >> >> which is the desired output. If there is a "basic stuff" way to do this >> I'd like to see it. Perhaps I will then be toadally embarrassed, but >> they say that this is good for one. >> >> cheers, >> >> Rolf >> >> > On Sat, Feb 11, 2017 at 9:02 PM, Val wrote: >> >> Hi all, >> >> I have a big data set and want to remove rows conditionally. >> >> In my data file each person were recorded for several weeks. Somehow >> >> during the recording periods, their last name was misreported. For >> >> each person, the last name should be the same. Otherwise remove from >> >> the data. Example, in the following data set, Alex was found to have >> >> two last names . >> >> >> >> Alex West >> >> Alex Joseph >> >> >> >> Alex should be removed from the data. if this happens then I want >> >> remove all rows with Alex. Here is my data set >> >> >> >> df <- read.table(header=TRUE, text='first week last >> >> Alex1 West >> >> Bob 1 John >> >> Cory1 Jack >> >> Cory2 Jack >> >> Bob 2 John >> >> Bob 3 John >> >> Alex2 Joseph >> >> Alex3 West >> >> Alex4 West ') >> >> >> >> Desired output >> >> >> >> first week last >> >> 1 Bob 1 John >> >> 2 Bob 2 John >> >> 3 Bob 3 John >> >> 4 Cory 1 Jack >> >> 5 Cory 2 Jack >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Re: remove
I may not be understanding the question well enough but for me df[ df[ , "first"] != "Alex", ] seems to do the job: first week last Rainer On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote: > > On 12/02/17 18:36, Bert Gunter wrote: > > Basic stuff! > > > > Either subscripting or ?subset. > > > > There are many good R tutorials on the web. You should spend some > > (more?) time with some. > > Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't > seem basic to me. The only way that I can see how to go at it is via > a for loop: > > rdln <- function(X) { > # Remove discordant last names. > ok <- logical(nrow(X)) > for(nm in unique(X$first)) { > xxx <- unique(X$last[X$first==nm]) > if(length(xxx)==1) ok[X$first==nm] <- TRUE > } > Y <- X[ok,] > Y <- Y[order(Y$first),] > rownames(Y) <- 1:nrow(Y) > Y > } > > Calling the toy data frame "melvin" rather than "df" (since "df" is the > name of the built in F density function, it is bad form to use it as the > name of another object) I get: > > > rdln(melvin) >first week last > 1 Bob1 John > 2 Bob2 John > 3 Bob3 John > 4 Cory1 Jack > 5 Cory2 Jack > > which is the desired output. If there is a "basic stuff" way to do this > I'd like to see it. Perhaps I will then be toadally embarrassed, but > they say that this is good for one. > > cheers, > > Rolf > > > On Sat, Feb 11, 2017 at 9:02 PM, Val wrote: > >> Hi all, > >> I have a big data set and want to remove rows conditionally. > >> In my data file each person were recorded for several weeks. Somehow > >> during the recording periods, their last name was misreported. For > >> each person, the last name should be the same. Otherwise remove from > >> the data. Example, in the following data set, Alex was found to have > >> two last names . > >> > >> Alex West > >> Alex Joseph > >> > >> Alex should be removed from the data. if this happens then I want > >> remove all rows with Alex. Here is my data set > >> > >> df <- read.table(header=TRUE, text='first week last > >> Alex1 West > >> Bob 1 John > >> Cory1 Jack > >> Cory2 Jack > >> Bob 2 John > >> Bob 3 John > >> Alex2 Joseph > >> Alex3 West > >> Alex4 West ') > >> > >> Desired output > >> > >> first week last > >> 1 Bob 1 John > >> 2 Bob 2 John > >> 3 Bob 3 John > >> 4 Cory 1 Jack > >> 5 Cory 2 Jack > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Re: remove
My understanding was that the discordant names has been identified. So in the example the OP gave, removing rows with first = "Alex" is done by: df[df$first !="Alex",] If that is not the case, as others have pointed out, various forms of tapply() (by, ave, etc.) can be used. I agree that that is not so "basic," so I apologize if my understanding was incorrect. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Feb 11, 2017 at 10:04 PM, Rolf Turner wrote: > > On 12/02/17 18:36, Bert Gunter wrote: >> >> Basic stuff! >> >> Either subscripting or ?subset. >> >> There are many good R tutorials on the web. You should spend some >> (more?) time with some. > > > Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't seem > basic to me. The only way that I can see how to go at it is via > a for loop: > > rdln <- function(X) { > # Remove discordant last names. > ok <- logical(nrow(X)) > for(nm in unique(X$first)) { > xxx <- unique(X$last[X$first==nm]) > if(length(xxx)==1) ok[X$first==nm] <- TRUE > } > Y <- X[ok,] > Y <- Y[order(Y$first),] > rownames(Y) <- 1:nrow(Y) > Y > } > > Calling the toy data frame "melvin" rather than "df" (since "df" is the name > of the built in F density function, it is bad form to use it as the name of > another object) I get: > >> rdln(melvin) > first week last > 1 Bob1 John > 2 Bob2 John > 3 Bob3 John > 4 Cory1 Jack > 5 Cory2 Jack > > which is the desired output. If there is a "basic stuff" way to do this > I'd like to see it. Perhaps I will then be toadally embarrassed, but they > say that this is good for one. > > cheers, > > Rolf > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > >> On Sat, Feb 11, 2017 at 9:02 PM, Val wrote: >>> >>> Hi all, >>> I have a big data set and want to remove rows conditionally. >>> In my data file each person were recorded for several weeks. Somehow >>> during the recording periods, their last name was misreported. For >>> each person, the last name should be the same. Otherwise remove from >>> the data. Example, in the following data set, Alex was found to have >>> two last names . >>> >>> Alex West >>> Alex Joseph >>> >>> Alex should be removed from the data. if this happens then I want >>> remove all rows with Alex. Here is my data set >>> >>> df <- read.table(header=TRUE, text='first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>> Alex2 Joseph >>> Alex3 West >>> Alex4 West ') >>> >>> Desired output >>> >>> first week last >>> 1 Bob 1 John >>> 2 Bob 2 John >>> 3 Bob 3 John >>> 4 Cory 1 Jack >>> 5 Cory 2 Jack __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] object of type 'closure' is not subsettable
By failing to send your email in plain text format on this mailing list, we see a damaged version of what you saw when you sent it. Also, we would need some some data to test the code with. Google "r reproducible example" to find discussions of how to ask questions online. From the error message alone I suspect forecast is the function from the forecast package, and you are trying to create and modify a data object with that same name. At the very least re-using names is unwise, but I think your whole concept of how to create forecasts is deviating from the normal way this is done. But the scrambling of the code isn't helping. -- Sent from my phone. Please excuse my brevity. On February 12, 2017 4:34:20 AM PST, Allan Tanaka wrote: >Hi. >I tried to run this R-code but still completely no idea why it still >gives error message: Error in forecast[[d + 1]] = >paste(index(lEJReturnsOffset[windowLength]), : object of type >'closure' is not subsettable >Here is the R-code: >library(rugarch); library(sos); >library(forecast);library(lattice)library(quantmod); require(stochvol); >require(fBasics);data = read.table("EURJPY.m1440.csv", >header=F)names(data)data=ts(data)lEJ=log(data)lret.EJ = >100*diff(lEJ)lret.EJ = >ts(lret.EJ)lret.EJ[as.character(head(index(lret.EJ)))]=0windowLength=500foreLength=length(lret.EJ)-windowLengthforecasts<-vector(mode="character", >length=foreLength)for (d in 0:foreLength) { >lEJReturnsOffset=lret.EJ[(1+d):(windowLength+d)] final.aic<-Inf >final.order<-c(0,0,0) for (p in 0:5) for (q in 0:5) { if(p == 0 && >q == 0) { next } >arimaFit=tryCatch(arima(lEJReturnsOffset, order=c(p,0,q)), > error=function(err)FALSE, >warning=function(err)FALSE) if(!is.logical(arimaFit)) { >current.aic<-AIC(arimaFit) if(current.aicfinal.aic<-current.aic final.order<-c(p,0,q) >final.arima<-arima(lEJReturnsOffset, order=final.order) } } >else { next } } >spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = >c(1,1)), mean.model = list(armaOrder = >c(final.order[1], final.order[3]), arfima = FALSE, include.mean = >TRUE), distribution.model = "sged")fit <- >tryCatch(ugarchfit(spec, lEJReturnsOffset, solver='gosolnp'), >error=function(e) e, warning=function(w) w)if(is(fit, "warning")) { >forecast[d+1]=paste(index(lEJReturnsOffset[windowLength]), 1, sep=",") >print(paste(index(lEJReturnsOffset[windowLength]), 1, sep=","))} else >{ fore = ugarchforecast(fit, n.ahead=1) ind = >fore@forecast$seriesFor forecasts[d+1] = paste(colnames(ind), >ifelse(ind[1] < 0, -1, 1), sep=",") print(paste(colnames(ind), >ifelse(ind[1] < 0, -1, 1), sep=",")) }}write.csv(forecasts, >file="forecasts.csv", row.names=FALSE) > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove
Exactly. Sort of like the optimisation of using which.max instead of max followed by which, though ideally the only intermediate vector would be the logical vector that says keep or don't keep. -- Sent from my phone. Please excuse my brevity. On February 11, 2017 11:19:11 PM PST, P Tennant wrote: >Hi Jeff, > >Why do you say ave() is better suited *because* it always returns a >vector that is just as long as the input vector? Is it because that >feature (of equal length), allows match() to be avoided, and as a >result, the subsequent subsetting is faster with very large datasets? > >Thanks, Philip > > >On 12/02/2017 5:42 PM, Jeff Newmiller wrote: >> The "by" function aggregates and returns a result with generally >fewer >> rows than the original data. Since you are looking to index the rows >> in the original data set, the "ave" function is better suited because > >> it always returns a vector that is just as long as the input vector: >> >> # I usually work with character data rather than factors if I plan >> # to modify the data (e.g. removing rows) >> DF <- read.table( text= >> 'first week last >> Alex1 West >> Bob 1 John >> Cory1 Jack >> Cory2 Jack >> Bob 2 John >> Bob 3 John >> Alex2 Joseph >> Alex3 West >> Alex4 West >> ', header = TRUE, as.is = TRUE ) >> >> err <- ave( DF$last >> , DF[ , "first", drop = FALSE] >> , FUN = function( lst ) { >> length( unique( lst ) ) >> } >> ) >> result <- DF[ "1" == err, ] >> result >> >> Notice that the ave function returns a vector of the same type as was > >> given to it, so even though the function returns a numeric the err >> vector is character. >> >> If you wanted to be able to examine more than one other column in >> determining the keep/reject decision, you could do: >> >> err2 <- ave( seq_along( DF$first ) >>, DF[ , "first", drop = FALSE] >>, FUN = function( n ) { >> length( unique( DF[ n, "last" ] ) ) >> } >>) >> result2 <- DF[ 1 == err2, ] >> result2 >> >> and then you would have the option to re-use the "n" index to look at > >> other columns as well. >> >> Finally, here is a dplyr solution: >> >> library(dplyr) >> result3 <- ( DF >>%>% group_by( first ) # like a prep for ave or by >>%>% mutate( err = length( unique( last ) ) ) # similar to >ave >>%>% filter( 1 == err ) # drop the rows with too many last >> names >>%>% select( -err ) # drop the temporary column >>%>% as.data.frame # convert back to a plain-jane data >frame >>) >> result3 >> >> which uses a small set of verbs in a pipeline of functions to go from > >> input to result in one pass. >> >> If your data set is really big (running out of memory big) then you >> might want to investigate the data.table or sqlite packages, either >of >> which can be combined with dplyr to get a standardized syntax for >> managing larger amounts of data. However, most people actually aren't > >> running out of memory so in most cases the extra horsepower isn't >> actually needed. >> >> On Sun, 12 Feb 2017, P Tennant wrote: >> >>> Hi Val, >>> >>> The by() function could be used here. With the dataframe dfr: >>> >>> # split the data by first name and check for more than one last name > >>> for each first name >>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >>> # make the result more easily manipulated >>> res <- as.table(res) >>> res >>> # first >>> # Alex Bob Cory >>> # TRUE FALSE FALSE >>> >>> # then use this result to subset the data >>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] >>> # sort if needed >>> nw.dfr[order(nw.dfr$first) , ] >>> >>> first week last >>> 2 Bob1 John >>> 5 Bob2 John >>> 6 Bob3 John >>> 3 Cory1 Jack >>> 4 Cory2 Jack >>> >>> >>> Philip >>> >>> On 12/02/2017 4:02 PM, Val wrote: Hi all, I have a big data set and want to remove rows conditionally. In my data file each person were recorded for several weeks. >Somehow during the recording periods, their last name was misreported. >For each person, the last name should be the same. Otherwise remove >from the data. Example, in the following data set, Alex was found to >have two last names . Alex West Alex Joseph Alex should be removed from the data. if this happens then I want remove all rows with Alex. Here is my data set df<- read.table(header=TRUE, text='first week last Alex1 West Bob 1 John Cory1 Jack Cory2 Jack Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ') Desired output first week last 1 Bob 1 John 2 Bob 2 John 3 Bob 3 John 4 Cory 1 Jack 5 Cory
Re: [R] remove
Jeff, Rolf and Philip. Thank you very much for your suggestion. Jeff, you suggested if your data is big then consider data.table My data is "big" it is more than 200M records and I will see if this function works. Thank you again. On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller wrote: > The "by" function aggregates and returns a result with generally fewer rows > than the original data. Since you are looking to index the rows in the > original data set, the "ave" function is better suited because it always > returns a vector that is just as long as the input vector: > > # I usually work with character data rather than factors if I plan > # to modify the data (e.g. removing rows) > DF <- read.table( text= > 'first week last > Alex1 West > Bob 1 John > Cory1 Jack > Cory2 Jack > Bob 2 John > Bob 3 John > Alex2 Joseph > Alex3 West > Alex4 West > ', header = TRUE, as.is = TRUE ) > > err <- ave( DF$last > , DF[ , "first", drop = FALSE] > , FUN = function( lst ) { > length( unique( lst ) ) > } > ) > result <- DF[ "1" == err, ] > result > > Notice that the ave function returns a vector of the same type as was given > to it, so even though the function returns a numeric the err > vector is character. > > If you wanted to be able to examine more than one other column in > determining the keep/reject decision, you could do: > > err2 <- ave( seq_along( DF$first ) >, DF[ , "first", drop = FALSE] >, FUN = function( n ) { > length( unique( DF[ n, "last" ] ) ) > } >) > result2 <- DF[ 1 == err2, ] > result2 > > and then you would have the option to re-use the "n" index to look at other > columns as well. > > Finally, here is a dplyr solution: > > library(dplyr) > result3 <- ( DF >%>% group_by( first ) # like a prep for ave or by >%>% mutate( err = length( unique( last ) ) ) # similar to ave >%>% filter( 1 == err ) # drop the rows with too many last names >%>% select( -err ) # drop the temporary column >%>% as.data.frame # convert back to a plain-jane data frame >) > result3 > > which uses a small set of verbs in a pipeline of functions to go from input > to result in one pass. > > If your data set is really big (running out of memory big) then you might > want to investigate the data.table or sqlite packages, either of which can > be combined with dplyr to get a standardized syntax for managing larger > amounts of data. However, most people actually aren't running out of memory > so in most cases the extra horsepower isn't actually needed. > > > On Sun, 12 Feb 2017, P Tennant wrote: > >> Hi Val, >> >> The by() function could be used here. With the dataframe dfr: >> >> # split the data by first name and check for more than one last name for >> each first name >> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >> # make the result more easily manipulated >> res <- as.table(res) >> res >> # first >> # Alex Bob Cory >> # TRUE FALSE FALSE >> >> # then use this result to subset the data >> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] >> # sort if needed >> nw.dfr[order(nw.dfr$first) , ] >> >> first week last >> 2 Bob1 John >> 5 Bob2 John >> 6 Bob3 John >> 3 Cory1 Jack >> 4 Cory2 Jack >> >> >> Philip >> >> On 12/02/2017 4:02 PM, Val wrote: >>> >>> Hi all, >>> I have a big data set and want to remove rows conditionally. >>> In my data file each person were recorded for several weeks. Somehow >>> during the recording periods, their last name was misreported. For >>> each person, the last name should be the same. Otherwise remove from >>> the data. Example, in the following data set, Alex was found to have >>> two last names . >>> >>> Alex West >>> Alex Joseph >>> >>> Alex should be removed from the data. if this happens then I want >>> remove all rows with Alex. Here is my data set >>> >>> df<- read.table(header=TRUE, text='first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>> Alex2 Joseph >>> Alex3 West >>> Alex4 West ') >>> >>> Desired output >>> >>>first week last >>> 1 Bob 1 John >>> 2 Bob 2 John >>> 3 Bob 3 John >>> 4 Cory 1 Jack >>> 5 Cory 2 Jack >>> >>> Thank you in advance >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mail
[R] object of type 'closure' is not subsettable
Hi. I tried to run this R-code but still completely no idea why it still gives error message: Error in forecast[[d + 1]] = paste(index(lEJReturnsOffset[windowLength]), : object of type 'closure' is not subsettable Here is the R-code: library(rugarch); library(sos); library(forecast);library(lattice)library(quantmod); require(stochvol); require(fBasics);data = read.table("EURJPY.m1440.csv", header=F)names(data)data=ts(data)lEJ=log(data)lret.EJ = 100*diff(lEJ)lret.EJ = ts(lret.EJ)lret.EJ[as.character(head(index(lret.EJ)))]=0windowLength=500foreLength=length(lret.EJ)-windowLengthforecasts<-vector(mode="character", length=foreLength)for (d in 0:foreLength) { lEJReturnsOffset=lret.EJ[(1+d):(windowLength+d)] final.aic<-Inf final.order<-c(0,0,0) for (p in 0:5) for (q in 0:5) { if(p == 0 && q == 0) { next } arimaFit=tryCatch(arima(lEJReturnsOffset, order=c(p,0,q)), error=function(err)FALSE, warning=function(err)FALSE) if(!is.logical(arimaFit)) { current.aic<-AIC(arimaFit) if(current.aichttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Landscape in R-Studio
Colleagues who use Word seem to find no problem with .wmf files. On 11/02/2017 22:08, peter dalgaard wrote: On 11 Feb 2017, at 20:13 , Jeff Newmiller wrote: While the question AS POSED is off base here (and in fact unlikely to have any satisfactory answer due to the unavoidable squishiness of pasted graphics in Word), I did wonder whether it wouldn't be easier just to export to a (PDF? WMF?) file and import that in Word. That looks like a no-brainer from the RStudio side. Or write directly to the appropriate device. -pd the OP could investigate the ReporteRs package which can export graphics directly to word files in a fairly predictable manner, including creating landscape oriented sections. -- Sent from my phone. Please excuse my brevity. On February 11, 2017 9:01:47 AM PST, David Winsemius wrote: On Feb 11, 2017, at 8:26 AM, Jeff Reichman wrote: R-Help How can I format a plot within R-Studio (Plot Windows) to conform to an 8.5 x 11- landscape. Such that when I Export - Copy to Clip board I can past plot into word. This is really the wrong venue for asking questions about transferring graphics from RStudio to Word. Two other options: RStudio has its own help forum and this would probably be an OK question if you constructed a minimal verifiable example to submit to StackOverflow. Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Query - Merging and conditional replacement of values in a data frame
Hi Bhaskar, Maybe: df1 <-read.table(text="time v1 v2 v3 1 2 3 4 2 5 6 4 3 1 3 4 4 1 3 4 5 2 3 4 6 2 3 4", header=TRUE) df2 <-read.table(text="time v11 v12 v13 3 112 3 4 4 112 3 4", header=TRUE) for(time1 in df1$time) { time2<-which(df2$time==time1) if(length(time2)) df1[df1$time==time1,]<-df2[time2,] } Jim On Sun, Feb 12, 2017 at 11:13 AM, Bhaskar Mitra wrote: > Hello Everyone, > > I have two data frames df1 and df2 as shown below. They > are of different length. However, they have one common column - time. > > df1 <- > time v1 v2 v3 > 1 2 3 4 > 2 5 6 4 > 3 1 3 4 > 4 1 3 4 > 5 2 3 4 > 6 2 3 4 > > > df2 <- > time v11 v12 v13 > 3 112 3 4 > 4 112 3 4 > > By matching the 'time' column in df1 and df2, I am trying to modify column > 'v1' in df1 by replacing it > with values in column 'v11' in df2. The modified df1 should look something > like this: > > df1 <- > time v1 v2 v3 > 1 2 3 4 > 2 5 6 4 > 3 112 3 4 > 4 112 3 4 > 5 2 3 4 > 6 2 3 4 > > I tried to use the 'merge' function to combine df1 and df2 followed by > the conditional 'ifelse' statement. However, that doesn't seem to work. > > Can I replace the values in df1 by not merging the two data frames? > > Thanks for your help, > > Regards, > Bhaskar > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to disable verbose grob results in pdf when using knitr with gridExtra?
Sorry for no reproducible example. using warnings=FALSE chunk options in knitr does not help. I found it is not the knitr's business. I used resultpdf<- grid.arrange(facetpoint1,pright1,pright2,pright3,pright4,pright5,pright6,pright7, ncol=2, layout_matrix=cbind(c(1,1,1,1,1,1,1),c(2,3,4,5,6,7,8)), widths=c(2,1)) then print(resultpdf) in my chunk. I removed object resultpdf, just used code below to produce figure grid.arrange(facetpoint1,pright1,pright2,pright3,pright4,pright5,pright6,pright7, ncol=2, layout_matrix=cbind(c(1,1,1,1,1,1,1),c(2,3,4,5,6,7,8)), widths=c(2,1)) then no verbose appeared. The problem is using print() with ggplot2. On 星期六, 11 二月 2017 07:45:54 -0800 Jeff Newmillerwrote On Sat, 11 Feb 2017, vod vos wrote: > Hi every one, > > I am using Knitr, Keep in mind that this list is about R first and foremost. There is a mailing list for Knitr, and also the maintainer of the knitr package recommends asking questions on stackoverflow.com. > R and Latex to produce pdf file. When using gridExtra to set up a > gtable layout to place multiple grobs on a page, > > grid.arrange(facetpoint1,pright1,pright2,pright3,pright4,pright5,pright6,pright7, ncol=2, layout_matrix=cbind(c(1,1,1,1,1,1,1),c(2,3,4,5,6,7,8)), widths=c(2,1)) This is not a reproducible example. No matter where you ask this question you need to supply a complete short script that exhibits the problem. That also means including enough data IN THE SCRIPT to allow the script to run. There are multiple guides online that describe how to do this in detail. > the verbose of the infomation shows before the one figure in the pdf file: > > ## TableGrob (7 x 2) "arrange": 8 grobs ## z cells name grob ## 1 1 (1-7,1-1) arrange gtable[layout] ## 2 2 (1-1,2-2) arrange gtable[layout] ## 3 3 (2-2,2-2) arrange gtable[layout] ## 4 4 (3-3,2-2) arrange gtable[layout] ## 5 5 (4-4,2-2) arrange gtable[layout] ## 6 6 (5-5,2-2) arrange gtable[layout] ## 7 7 (6-6,2-2) arrange gtable[layout] ## 8 8 (7-7,2-2) arrange gtable[layout] None of this appears when I created my own reproducible R example: begin code library(grid) library(gridExtra) facetpoint1 <- pright1 <- pright2 <- pright3 <- pright4 <- pright5 <- pright6 <- pright7 <- textGrob("X") grid.arrange( facetpoint1, pright1, pright2, pright3, pright4, pright5 , pright6, pright7 , ncol=2 , layout_matrix = cbind( c( 1, 1, 1, 1, 1, 1, 1 ) , c( 2, 3, 4, 5, 6, 7, 8 ) ) , widths = c( 2, 1 ) ) end code If the above example produces output for you in R or in a knitted PDF then something is different about your setup than mine. > When I ?grid.arrange, no ways were found to disable the verbose in the pdf file. Any ideas? Does this happen at the R console? If it does, please post a reproducible example, and the invocation and output of sessionInfo() (mine is below). If it doesn't, there could be some interaction with knitr going on, and using the echo=FALSE or warnings=FALSE chunk options could help, or you may need more specialized help than we can offer here (e.g. via one of the knitr support areas mentioned above). > Thanks. > > [[alternative HTML version deleted]] When you don't set your email to plain text, the automatic conversion of HTML to text is very likely to cause us to see something quite different than you were looking at. It is in your best interest to figure out how to set your email program to send plain text. Please read the Posting Guide: > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. --- Jeff Newmiller The . . Go Live... DCN: Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] gridExtra_2.2.1 loaded via a namespace (and not attached): [1] backports
[R] How to create HyCa$NIR and octane like the "yarn" of "pls".
I am a user of package, "pls". I am going to draw the NIR spectra of my own measured data using matplot. Question For example, I have such a csv data, "HyCa.csv", below. Would you please tell me how to create a data like the "yarn". yarn has the structure of "NIR" and "density". That is to say,how to create HyCa$NIR and octane for drawing and analyzing the obtained data. X1540X1560X1580X1600 Octane S001 0.240016 0.232166 0.239428 0.255710 87.3 S002 0.246177 0.237545 0.243874 0.259296 87.0 S003 0.242777 0.234150 0.240941 0.256484 87.1 S004 0.244098 0.237214 0.244729 0.261580 89.7 S005 0.241922 0.231888 0.237418 0.252461 84.9 S006 0.242209 0.232352 0.238188 0.253036 84.7 S007 0.244148 0.237362 0.244701 0.261598 89.3 S008 0.242019 0.234185 0.241428 0.257564 87.6 S009 0.242408 0.232431 0.238130 0.253083 84.5 S010 0.244512 0.238601 0.246392 0.263583 91.7 Detaied explanation of "yarn" yarn NIR spectra and density measurements of PET yarns Description A training set consisting of 21 NIR spectra of PET yarns, measured at 268 wavelengths, and 21 corresponding densities. A test set of 7 samples is also provided. Many thanks to Erik Swierenga. 56 yarn Usage yarn Format A data frame with components NIR Numeric matrix of NIR measurements density Numeric vector of densities train Logical vector with TRUE for the training samples and FALSE for the test samples Source Swierenga H., de Weijer A. P., van Wijk R. J., Buydens L. M. C. (1999) Strategy for constructing robust multivariate calibration models Chemometrics and Intelligent Laboratoryy Systems, 49(1),1–17. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.