Re: [R] remove

2017-02-12 Thread P Tennant

Val,

Working with R's special missing value indicator (NA) would be useful 
here. You could use the na.strings arg in read.table() to recognise "-" 
as a missing value:


dfr <- read.table( text=
'first  week last
Alex1  West
Bob 1  John
Cory1  Jack
Cory2  -
Bob 2  John
Bob 3  John
Alex2  Joseph
Alex3  West
Alex4  West
', header = TRUE, as.is = TRUE, na.strings = c("NA", "-"))

and then modify the function used by ave() or by() to exclude missing 
values from the count of unique last names. Here's one approach adapting 
code from earlier in this thread:


err <- ave(dfr$last, dfr$first, FUN = function(x) 
length(unique(x[!is.na(x)])))

res <- dfr[err == 1 , ]
res <- res[order(res$first) , ]
res

  first week last
2   Bob1 John
5   Bob2 John
6   Bob3 John
3  Cory1 Jack
4  Cory2 


Alternatively, if not using na.strings, change "-" to NA after first 
reading the data in: identify last names recorded as "-" using an index, 
and assign NA to these elements, before proceeding as above.


Philip

On 13/02/2017 3:18 PM, Val wrote:

Hi Jeff and All,

When I examined the excluded  data,  ie.,  first name with  with
different last names, I noticed that  some last names were  not
recorded
or instance, I modified the data as follows
DF<- read.table( text=
'first  week last
Alex1  West
Bob 1  John
Cory1  Jack
Cory2 -
Bob 2  John
Bob 3  John
Alex2  Joseph
Alex3  West
Alex4  West
', header = TRUE, as.is = TRUE )


err2<- ave( seq_along( DF$first )
, DF[ , "first", drop = FALSE]
, FUN = function( n ) {
   length( unique( DF[ n, "last" ] ) )
  }
)
result2<- DF[ 1 == err2, ]
result2

first week last
2   Bob1 John
5   Bob2 John
6   Bob3 John

However, I want keep Cory's record. It is assumed that not recorded
should have the same last name.

Final out put should be

first week last
Bob1 John
Bob2 John
Bob3 John
   Cory1  Jack
   Cory2   -

Thank you again!

On Sun, Feb 12, 2017 at 7:28 PM, Val  wrote:

Sorry  Jeff, I did not finish my email. I accidentally touched the send button.
My question was the
when I used this one
length(unique(result2$first))
  vs
dim(result2[!duplicated(result2[,c('first')]),]) [1]

I did get different results but now I found out the problem.

Thank you!.








On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller
  wrote:

Your question mystifies me, since it looks to me like you already know the 
answer.
--
Sent from my phone. Please excuse my brevity.

On February 12, 2017 3:30:49 PM PST, Val  wrote:

Hi Jeff and all,
How do I get the  number of unique first names   in the two data sets?

for the first one,
result2<- DF[ 1 == err2, ]
length(unique(result2$first))




On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
  wrote:

The "by" function aggregates and returns a result with generally

fewer rows

than the original data. Since you are looking to index the rows in

the

original data set, the "ave" function is better suited because it

always

returns a vector that is just as long as the input vector:

# I usually work with character data rather than factors if I plan
# to modify the data (e.g. removing rows)
DF<- read.table( text=
'first  week last
Alex1  West
Bob 1  John
Cory1  Jack
Cory2  Jack
Bob 2  John
Bob 3  John
Alex2  Joseph
Alex3  West
Alex4  West
', header = TRUE, as.is = TRUE )

err<- ave( DF$last
   , DF[ , "first", drop = FALSE]
   , FUN = function( lst ) {
   length( unique( lst ) )
 }
   )
result<- DF[ "1" == err, ]
result

Notice that the ave function returns a vector of the same type as was

given

to it, so even though the function returns a numeric the err
vector is character.

If you wanted to be able to examine more than one other column in
determining the keep/reject decision, you could do:

err2<- ave( seq_along( DF$first )
, DF[ , "first", drop = FALSE]
, FUN = function( n ) {
   length( unique( DF[ n, "last" ] ) )
  }
)
result2<- DF[ 1 == err2, ]
result2

and then you would have the option to re-use the "n" index to look at

other

columns as well.

Finally, here is a dplyr solution:

library(dplyr)
result3<- (   DF
%>% group_by( first ) # like a prep for ave or by
%>% mutate( err = length( unique( last ) ) ) # similar to

ave

%>% filter( 1 == err ) # drop the rows with too many last

names

%>% select( -err ) # drop the temporary column
%>% as.data.frame # convert back to a plain-jane data

frame

)
result3

which uses a small set of verbs in a pipeline of functions to go from

input

to result in one pass.

If your data set is really big (running out of memory big) then you

might

want to investigate the data.table or sqlite packages, either of

which ca

Re: [R] Help with saving user defined functions

2017-02-12 Thread Bert Gunter
ecdf() is part of the stats package, which is (typically)
automatically attached on startup.

I have no idea what you mean by "splitting" and "saving." This is
basically how all of R works -- e.g. see the value of lm() and the
(S3) plot method, plot.lm, for "lm"  objects. This has nothing to do
with free variables and lexical scoping. Perhaps you need to review
how functions and S3 methods work?

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Feb 12, 2017 at 5:31 PM, George Trojan - NOAA Federal
 wrote:
> I want to split my computation into parts. The first script processes the
> data, the second does the graphics. I want to save  results of
> time-consuming calculations. My example tried to simulate this by terminate
> the session without saving it, so the environment was lost on purpose. What
> confuses me that ecdf can be saved and restored, but not my own derived
> function.
> Of course I can save parameters and redefine the function in the second
> script.
>
> Reading Chapter 8 of Advanced R, hopefully the book will clear my mind.
>
> On Mon, Feb 13, 2017 at 12:05 AM, Bert Gunter 
> wrote:
>>
>> It worked fine for me:
>>
>> > t <- rnorm(100)
>> > cdf <- ecdf(t)
>> >
>> > trans <- function(x) qnorm(cdf(x) * 0.99)
>> > saveRDS(trans, "/tmp/foo")
>> > trans(1.2)
>> [1] 1.042457
>> > trans1 <- readRDS("/tmp/foo")
>> > trans1(0)
>> [1] 0.1117773
>>
>>
>> Of course, if I remove cdf() from the global environment, it will fail:
>>
>> > rm(cdf)
>> > trans1(0)
>> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>>
>> So it looks like you're clearing you global workspace in between
>> saving and loading?
>>
>> You may need to read up on function closures/lexical scoping : A
>> user-defined function in R includes not only code but also a pointer
>> to the environment in which it was defined, in your case, the global
>> environment from which you apparently removed cdf(). Note that
>> functions are not evauated until called, so free variables in the
>> functions that do not or will not exist in the function's lexical
>> scope when called will not trigger any errors until the function *is*
>> called.
>>
>> Same comments for your second version -- if tmp is removed the
>> function will fail.
>>
>>
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal
>>  wrote:
>> > I can't figure out how to save functions to RDS file. Here is an example
>> > what I am trying to achieve:
>> >
>> >> t <- rnorm(100)
>> >> cdf <- ecdf(t)
>> >> cdf(0)
>> > [1] 0.59
>> >> saveRDS(cdf, "/tmp/foo")
>> >>
>> > Save workspace image? [y/n/c]: n
>> > [gtrojan@asok petproject]$ R
>> >> cdf <- readRDS("/tmp/foo")
>> >> cdf
>> > Empirical CDF
>> > Call: ecdf(t)
>> > x[1:100] = -2.8881, -2.2054, -2.0026,  ..., 2.0367, 2.0414
>> >
>> > This works. However when instead of saving cdf() I try to save function
>> >
>> >> trans <- function(x) qnorm(cdf(x) * 0.99)
>> >
>> > after restoring object from file I get an error:
>> >
>> >> trans <- readRDS("/tmp/foo")
>> >> trans(0)
>> > Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>> >
>> > I tried to define and call cdf within the definition of trans, without
>> > success:
>> >
>> >> tmp <- rnorm(100)
>> >> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
>> >> saveRDS(trans, "/tmp/foo")
>> > Save workspace image? [y/n/c]: n
>> >
>> >> trans <- readRDS("/tmp/foo")
>> >> trans
>> > function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
>> >> trans(0)
>> > Error in sort(x) : object 'tmp' not found
>> >
>> > So, here the call cdf(0) did not force evaluation of my random sample.
>> > What
>> > am I missing?
>> >
>> > George
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove

2017-02-12 Thread Val
Hi Jeff and All,

When I examined the excluded  data,  ie.,  first name with  with
different last names, I noticed that  some last names were  not
recorded
or instance, I modified the data as follows
DF <- read.table( text=
'first  week last
Alex1  West
Bob 1  John
Cory1  Jack
Cory2 -
Bob 2  John
Bob 3  John
Alex2  Joseph
Alex3  West
Alex4  West
', header = TRUE, as.is = TRUE )


err2 <- ave( seq_along( DF$first )
   , DF[ , "first", drop = FALSE]
   , FUN = function( n ) {
  length( unique( DF[ n, "last" ] ) )
 }
   )
result2 <- DF[ 1 == err2, ]
result2

first week last
2   Bob1 John
5   Bob2 John
6   Bob3 John

However, I want keep Cory's record. It is assumed that not recorded
should have the same last name.

Final out put should be

first week last
   Bob1 John
   Bob2 John
   Bob3 John
  Cory1  Jack
  Cory2   -

Thank you again!

On Sun, Feb 12, 2017 at 7:28 PM, Val  wrote:
> Sorry  Jeff, I did not finish my email. I accidentally touched the send 
> button.
> My question was the
> when I used this one
> length(unique(result2$first))
>  vs
> dim(result2[!duplicated(result2[,c('first')]),]) [1]
>
> I did get different results but now I found out the problem.
>
> Thank you!.
>
>
>
>
>
>
>
>
> On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller
>  wrote:
>> Your question mystifies me, since it looks to me like you already know the 
>> answer.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On February 12, 2017 3:30:49 PM PST, Val  wrote:
>>>Hi Jeff and all,
>>> How do I get the  number of unique first names   in the two data sets?
>>>
>>>for the first one,
>>>result2 <- DF[ 1 == err2, ]
>>>length(unique(result2$first))
>>>
>>>
>>>
>>>
>>>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
>>> wrote:
 The "by" function aggregates and returns a result with generally
>>>fewer rows
 than the original data. Since you are looking to index the rows in
>>>the
 original data set, the "ave" function is better suited because it
>>>always
 returns a vector that is just as long as the input vector:

 # I usually work with character data rather than factors if I plan
 # to modify the data (e.g. removing rows)
 DF <- read.table( text=
 'first  week last
 Alex1  West
 Bob 1  John
 Cory1  Jack
 Cory2  Jack
 Bob 2  John
 Bob 3  John
 Alex2  Joseph
 Alex3  West
 Alex4  West
 ', header = TRUE, as.is = TRUE )

 err <- ave( DF$last
   , DF[ , "first", drop = FALSE]
   , FUN = function( lst ) {
   length( unique( lst ) )
 }
   )
 result <- DF[ "1" == err, ]
 result

 Notice that the ave function returns a vector of the same type as was
>>>given
 to it, so even though the function returns a numeric the err
 vector is character.

 If you wanted to be able to examine more than one other column in
 determining the keep/reject decision, you could do:

 err2 <- ave( seq_along( DF$first )
, DF[ , "first", drop = FALSE]
, FUN = function( n ) {
   length( unique( DF[ n, "last" ] ) )
  }
)
 result2 <- DF[ 1 == err2, ]
 result2

 and then you would have the option to re-use the "n" index to look at
>>>other
 columns as well.

 Finally, here is a dplyr solution:

 library(dplyr)
 result3 <- (   DF
%>% group_by( first ) # like a prep for ave or by
%>% mutate( err = length( unique( last ) ) ) # similar to
>>>ave
%>% filter( 1 == err ) # drop the rows with too many last
>>>names
%>% select( -err ) # drop the temporary column
%>% as.data.frame # convert back to a plain-jane data
>>>frame
)
 result3

 which uses a small set of verbs in a pipeline of functions to go from
>>>input
 to result in one pass.

 If your data set is really big (running out of memory big) then you
>>>might
 want to investigate the data.table or sqlite packages, either of
>>>which can
 be combined with dplyr to get a standardized syntax for managing
>>>larger
 amounts of data. However, most people actually aren't running out of
>>>memory
 so in most cases the extra horsepower isn't actually needed.


 On Sun, 12 Feb 2017, P Tennant wrote:

> Hi Val,
>
> The by() function could be used here. With the dataframe dfr:
>
> # split the data by first name and check for more than one last name
>>>for
> each first name
> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
> # make the result more easily manipulated
> res <- as.table(res)
> res
> # first
> # Alex   Bob  Cory
> # TRUE FALSE FA

Re: [R] Converting Excel Date format into R-Date formats

2017-02-12 Thread Jim Lemon
Hi Jeff,
Most likely the "Event Date" field is a factor. Try this:

df$Event.Date <- as.Date(as.character(df$Event.Date),
 "%d-%b-%y")

Also beware of Excel's habit of silently converting mixed date formats
(i.e. dd/mm/ and mm/dd/) to one or the other format. The only
way I know to prevent this is to stick to international (-mm-dd)
format in Excel.

Jim


On Mon, Feb 13, 2017 at 11:23 AM, Jeff Reichman  wrote:
> R-Help Group
>
>
>
> What is the proper way to convert excel date formats to R-Date format.
>
>
>
>
> Event ID
>
> Event Date
>
> Event Type
>
>
> 250013
>
> 1-Jan-09
>
> NSAG Attack
>
>
> 250015
>
> 1-Jan-09
>
> NSAG Attack
>
>
> 250016
>
> 1-Jan-09
>
> NSAG Attack
>
>
>
> Obviously this is wrong
>
>
>
> df$Event.Date <- as.Date(df$Event.Date, "%d-%b-%y")
>
>
>
> as it return "NA"
>
>
>
> Jeff
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with saving user defined functions

2017-02-12 Thread George Trojan - NOAA Federal
I want to split my computation into parts. The first script processes the
data, the second does the graphics. I want to save  results of
time-consuming calculations. My example tried to simulate this by terminate
the session without saving it, so the environment was lost on purpose. What
confuses me that ecdf can be saved and restored, but not my own derived
function.
Of course I can save parameters and redefine the function in the second
script.

Reading Chapter 8 of Advanced R, hopefully the book will clear my mind.

On Mon, Feb 13, 2017 at 12:05 AM, Bert Gunter 
wrote:

> It worked fine for me:
>
> > t <- rnorm(100)
> > cdf <- ecdf(t)
> >
> > trans <- function(x) qnorm(cdf(x) * 0.99)
> > saveRDS(trans, "/tmp/foo")
> > trans(1.2)
> [1] 1.042457
> > trans1 <- readRDS("/tmp/foo")
> > trans1(0)
> [1] 0.1117773
>
>
> Of course, if I remove cdf() from the global environment, it will fail:
>
> > rm(cdf)
> > trans1(0)
> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>
> So it looks like you're clearing you global workspace in between
> saving and loading?
>
> You may need to read up on function closures/lexical scoping : A
> user-defined function in R includes not only code but also a pointer
> to the environment in which it was defined, in your case, the global
> environment from which you apparently removed cdf(). Note that
> functions are not evauated until called, so free variables in the
> functions that do not or will not exist in the function's lexical
> scope when called will not trigger any errors until the function *is*
> called.
>
> Same comments for your second version -- if tmp is removed the
> function will fail.
>
>
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal
>  wrote:
> > I can't figure out how to save functions to RDS file. Here is an example
> > what I am trying to achieve:
> >
> >> t <- rnorm(100)
> >> cdf <- ecdf(t)
> >> cdf(0)
> > [1] 0.59
> >> saveRDS(cdf, "/tmp/foo")
> >>
> > Save workspace image? [y/n/c]: n
> > [gtrojan@asok petproject]$ R
> >> cdf <- readRDS("/tmp/foo")
> >> cdf
> > Empirical CDF
> > Call: ecdf(t)
> > x[1:100] = -2.8881, -2.2054, -2.0026,  ..., 2.0367, 2.0414
> >
> > This works. However when instead of saving cdf() I try to save function
> >
> >> trans <- function(x) qnorm(cdf(x) * 0.99)
> >
> > after restoring object from file I get an error:
> >
> >> trans <- readRDS("/tmp/foo")
> >> trans(0)
> > Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
> >
> > I tried to define and call cdf within the definition of trans, without
> > success:
> >
> >> tmp <- rnorm(100)
> >> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
> >> saveRDS(trans, "/tmp/foo")
> > Save workspace image? [y/n/c]: n
> >
> >> trans <- readRDS("/tmp/foo")
> >> trans
> > function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
> >> trans(0)
> > Error in sort(x) : object 'tmp' not found
> >
> > So, here the call cdf(0) did not force evaluation of my random sample.
> What
> > am I missing?
> >
> > George
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove

2017-02-12 Thread Val
Sorry  Jeff, I did not finish my email. I accidentally touched the send button.
My question was the
when I used this one
length(unique(result2$first))
 vs
dim(result2[!duplicated(result2[,c('first')]),]) [1]

I did get different results but now I found out the problem.

Thank you!.








On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller
 wrote:
> Your question mystifies me, since it looks to me like you already know the 
> answer.
> --
> Sent from my phone. Please excuse my brevity.
>
> On February 12, 2017 3:30:49 PM PST, Val  wrote:
>>Hi Jeff and all,
>> How do I get the  number of unique first names   in the two data sets?
>>
>>for the first one,
>>result2 <- DF[ 1 == err2, ]
>>length(unique(result2$first))
>>
>>
>>
>>
>>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
>> wrote:
>>> The "by" function aggregates and returns a result with generally
>>fewer rows
>>> than the original data. Since you are looking to index the rows in
>>the
>>> original data set, the "ave" function is better suited because it
>>always
>>> returns a vector that is just as long as the input vector:
>>>
>>> # I usually work with character data rather than factors if I plan
>>> # to modify the data (e.g. removing rows)
>>> DF <- read.table( text=
>>> 'first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>> Alex2  Joseph
>>> Alex3  West
>>> Alex4  West
>>> ', header = TRUE, as.is = TRUE )
>>>
>>> err <- ave( DF$last
>>>   , DF[ , "first", drop = FALSE]
>>>   , FUN = function( lst ) {
>>>   length( unique( lst ) )
>>> }
>>>   )
>>> result <- DF[ "1" == err, ]
>>> result
>>>
>>> Notice that the ave function returns a vector of the same type as was
>>given
>>> to it, so even though the function returns a numeric the err
>>> vector is character.
>>>
>>> If you wanted to be able to examine more than one other column in
>>> determining the keep/reject decision, you could do:
>>>
>>> err2 <- ave( seq_along( DF$first )
>>>, DF[ , "first", drop = FALSE]
>>>, FUN = function( n ) {
>>>   length( unique( DF[ n, "last" ] ) )
>>>  }
>>>)
>>> result2 <- DF[ 1 == err2, ]
>>> result2
>>>
>>> and then you would have the option to re-use the "n" index to look at
>>other
>>> columns as well.
>>>
>>> Finally, here is a dplyr solution:
>>>
>>> library(dplyr)
>>> result3 <- (   DF
>>>%>% group_by( first ) # like a prep for ave or by
>>>%>% mutate( err = length( unique( last ) ) ) # similar to
>>ave
>>>%>% filter( 1 == err ) # drop the rows with too many last
>>names
>>>%>% select( -err ) # drop the temporary column
>>>%>% as.data.frame # convert back to a plain-jane data
>>frame
>>>)
>>> result3
>>>
>>> which uses a small set of verbs in a pipeline of functions to go from
>>input
>>> to result in one pass.
>>>
>>> If your data set is really big (running out of memory big) then you
>>might
>>> want to investigate the data.table or sqlite packages, either of
>>which can
>>> be combined with dplyr to get a standardized syntax for managing
>>larger
>>> amounts of data. However, most people actually aren't running out of
>>memory
>>> so in most cases the extra horsepower isn't actually needed.
>>>
>>>
>>> On Sun, 12 Feb 2017, P Tennant wrote:
>>>
 Hi Val,

 The by() function could be used here. With the dataframe dfr:

 # split the data by first name and check for more than one last name
>>for
 each first name
 res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
 # make the result more easily manipulated
 res <- as.table(res)
 res
 # first
 # Alex   Bob  Cory
 # TRUE FALSE FALSE

 # then use this result to subset the data
 nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
 # sort if needed
 nw.dfr[order(nw.dfr$first) , ]

  first week last
 2   Bob1 John
 5   Bob2 John
 6   Bob3 John
 3  Cory1 Jack
 4  Cory2 Jack


 Philip

 On 12/02/2017 4:02 PM, Val wrote:
>
> Hi all,
> I have a big data set and want to  remove rows conditionally.
> In my data file  each person were recorded  for several weeks.
>>Somehow
> during the recording periods, their last name was misreported.
>>For
> each person,   the last name should be the same. Otherwise remove
>>from
> the data. Example, in the following data set, Alex was found to
>>have
> two last names .
>
> Alex   West
> Alex   Joseph
>
> Alex should be removed  from the data.  if this happens then I want
> remove  all rows with Alex. Here is my data set
>
> df<- read.table(header=TRUE, text='first  week last
> Alex1  West
> Bob 1  John
> Cory1  Jack
> Cory2  Jack
> Bob 2  John
> Bob   

Re: [R] Help with saving user defined functions

2017-02-12 Thread Bert Gunter
Jeff:

Oh yes!-- and I meant to say so and forgot, so I'm glad you did. Not
only might the free variable in the function not be there; worse yet,
it might be there but something else. So it seems like a disaster
waiting to happen. The solution, I would presume, is to have no free
variables (make them arguments). Or save and read the function *and*
its environment.  Namespaces in packages I think would also take care
of this, right?

Note: If my understanding on any of this is incorrect, I would greatly
appreciate someone settting me straight. In particular, as Jeff noted,
my understanding is that saving a function (closure)  with a free
variable in the function depends on the function finding its enclosing
environment when it is read back into R via readRDS() .  Correct?  The
man page is silent on this point.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Feb 12, 2017 at 4:26 PM, Jeff Newmiller
 wrote:
> So doesn't the fact that a function contains a reference to an environment 
> suggest that this whole exercise is a really bad idea?
> --
> Sent from my phone. Please excuse my brevity.
>
> On February 12, 2017 4:05:31 PM PST, Bert Gunter  
> wrote:
>>It worked fine for me:
>>
>>> t <- rnorm(100)
>>> cdf <- ecdf(t)
>>>
>>> trans <- function(x) qnorm(cdf(x) * 0.99)
>>> saveRDS(trans, "/tmp/foo")
>>> trans(1.2)
>>[1] 1.042457
>>> trans1 <- readRDS("/tmp/foo")
>>> trans1(0)
>>[1] 0.1117773
>>
>>
>>Of course, if I remove cdf() from the global environment, it will fail:
>>
>>> rm(cdf)
>>> trans1(0)
>>Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>>
>>So it looks like you're clearing you global workspace in between
>>saving and loading?
>>
>>You may need to read up on function closures/lexical scoping : A
>>user-defined function in R includes not only code but also a pointer
>>to the environment in which it was defined, in your case, the global
>>environment from which you apparently removed cdf(). Note that
>>functions are not evauated until called, so free variables in the
>>functions that do not or will not exist in the function's lexical
>>scope when called will not trigger any errors until the function *is*
>>called.
>>
>>Same comments for your second version -- if tmp is removed the
>>function will fail.
>>
>>
>>
>>Cheers,
>>Bert
>>
>>
>>Bert Gunter
>>
>>"The trouble with having an open mind is that people keep coming along
>>and sticking things into it."
>>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>>On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal
>> wrote:
>>> I can't figure out how to save functions to RDS file. Here is an
>>example
>>> what I am trying to achieve:
>>>
 t <- rnorm(100)
 cdf <- ecdf(t)
 cdf(0)
>>> [1] 0.59
 saveRDS(cdf, "/tmp/foo")

>>> Save workspace image? [y/n/c]: n
>>> [gtrojan@asok petproject]$ R
 cdf <- readRDS("/tmp/foo")
 cdf
>>> Empirical CDF
>>> Call: ecdf(t)
>>> x[1:100] = -2.8881, -2.2054, -2.0026,  ..., 2.0367, 2.0414
>>>
>>> This works. However when instead of saving cdf() I try to save
>>function
>>>
 trans <- function(x) qnorm(cdf(x) * 0.99)
>>>
>>> after restoring object from file I get an error:
>>>
 trans <- readRDS("/tmp/foo")
 trans(0)
>>> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>>>
>>> I tried to define and call cdf within the definition of trans,
>>without
>>> success:
>>>
 tmp <- rnorm(100)
 trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) *
>>0.99 }
 saveRDS(trans, "/tmp/foo")
>>> Save workspace image? [y/n/c]: n
>>>
 trans <- readRDS("/tmp/foo")
 trans
>>> function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
 trans(0)
>>> Error in sort(x) : object 'tmp' not found
>>>
>>> So, here the call cdf(0) did not force evaluation of my random
>>sample. What
>>> am I missing?
>>>
>>> George
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contai

Re: [R] remove

2017-02-12 Thread Jeff Newmiller
Your question mystifies me, since it looks to me like you already know the 
answer. 
-- 
Sent from my phone. Please excuse my brevity.

On February 12, 2017 3:30:49 PM PST, Val  wrote:
>Hi Jeff and all,
> How do I get the  number of unique first names   in the two data sets?
>
>for the first one,
>result2 <- DF[ 1 == err2, ]
>length(unique(result2$first))
>
>
>
>
>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
> wrote:
>> The "by" function aggregates and returns a result with generally
>fewer rows
>> than the original data. Since you are looking to index the rows in
>the
>> original data set, the "ave" function is better suited because it
>always
>> returns a vector that is just as long as the input vector:
>>
>> # I usually work with character data rather than factors if I plan
>> # to modify the data (e.g. removing rows)
>> DF <- read.table( text=
>> 'first  week last
>> Alex1  West
>> Bob 1  John
>> Cory1  Jack
>> Cory2  Jack
>> Bob 2  John
>> Bob 3  John
>> Alex2  Joseph
>> Alex3  West
>> Alex4  West
>> ', header = TRUE, as.is = TRUE )
>>
>> err <- ave( DF$last
>>   , DF[ , "first", drop = FALSE]
>>   , FUN = function( lst ) {
>>   length( unique( lst ) )
>> }
>>   )
>> result <- DF[ "1" == err, ]
>> result
>>
>> Notice that the ave function returns a vector of the same type as was
>given
>> to it, so even though the function returns a numeric the err
>> vector is character.
>>
>> If you wanted to be able to examine more than one other column in
>> determining the keep/reject decision, you could do:
>>
>> err2 <- ave( seq_along( DF$first )
>>, DF[ , "first", drop = FALSE]
>>, FUN = function( n ) {
>>   length( unique( DF[ n, "last" ] ) )
>>  }
>>)
>> result2 <- DF[ 1 == err2, ]
>> result2
>>
>> and then you would have the option to re-use the "n" index to look at
>other
>> columns as well.
>>
>> Finally, here is a dplyr solution:
>>
>> library(dplyr)
>> result3 <- (   DF
>>%>% group_by( first ) # like a prep for ave or by
>>%>% mutate( err = length( unique( last ) ) ) # similar to
>ave
>>%>% filter( 1 == err ) # drop the rows with too many last
>names
>>%>% select( -err ) # drop the temporary column
>>%>% as.data.frame # convert back to a plain-jane data
>frame
>>)
>> result3
>>
>> which uses a small set of verbs in a pipeline of functions to go from
>input
>> to result in one pass.
>>
>> If your data set is really big (running out of memory big) then you
>might
>> want to investigate the data.table or sqlite packages, either of
>which can
>> be combined with dplyr to get a standardized syntax for managing
>larger
>> amounts of data. However, most people actually aren't running out of
>memory
>> so in most cases the extra horsepower isn't actually needed.
>>
>>
>> On Sun, 12 Feb 2017, P Tennant wrote:
>>
>>> Hi Val,
>>>
>>> The by() function could be used here. With the dataframe dfr:
>>>
>>> # split the data by first name and check for more than one last name
>for
>>> each first name
>>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>>> # make the result more easily manipulated
>>> res <- as.table(res)
>>> res
>>> # first
>>> # Alex   Bob  Cory
>>> # TRUE FALSE FALSE
>>>
>>> # then use this result to subset the data
>>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
>>> # sort if needed
>>> nw.dfr[order(nw.dfr$first) , ]
>>>
>>>  first week last
>>> 2   Bob1 John
>>> 5   Bob2 John
>>> 6   Bob3 John
>>> 3  Cory1 Jack
>>> 4  Cory2 Jack
>>>
>>>
>>> Philip
>>>
>>> On 12/02/2017 4:02 PM, Val wrote:

 Hi all,
 I have a big data set and want to  remove rows conditionally.
 In my data file  each person were recorded  for several weeks.
>Somehow
 during the recording periods, their last name was misreported.  
>For
 each person,   the last name should be the same. Otherwise remove
>from
 the data. Example, in the following data set, Alex was found to
>have
 two last names .

 Alex   West
 Alex   Joseph

 Alex should be removed  from the data.  if this happens then I want
 remove  all rows with Alex. Here is my data set

 df<- read.table(header=TRUE, text='first  week last
 Alex1  West
 Bob 1  John
 Cory1  Jack
 Cory2  Jack
 Bob 2  John
 Bob 3  John
 Alex2  Joseph
 Alex3  West
 Alex4  West ')

 Desired output

first  week last
 1 Bob 1   John
 2 Bob 2   John
 3 Bob 3   John
 4 Cory 1   Jack
 5 Cory 2   Jack

 Thank you in advance

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the 

Re: [R] Help with saving user defined functions

2017-02-12 Thread Jeff Newmiller
So doesn't the fact that a function contains a reference to an environment 
suggest that this whole exercise is a really bad idea?
-- 
Sent from my phone. Please excuse my brevity.

On February 12, 2017 4:05:31 PM PST, Bert Gunter  wrote:
>It worked fine for me:
>
>> t <- rnorm(100)
>> cdf <- ecdf(t)
>>
>> trans <- function(x) qnorm(cdf(x) * 0.99)
>> saveRDS(trans, "/tmp/foo")
>> trans(1.2)
>[1] 1.042457
>> trans1 <- readRDS("/tmp/foo")
>> trans1(0)
>[1] 0.1117773
>
>
>Of course, if I remove cdf() from the global environment, it will fail:
>
>> rm(cdf)
>> trans1(0)
>Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>
>So it looks like you're clearing you global workspace in between
>saving and loading?
>
>You may need to read up on function closures/lexical scoping : A
>user-defined function in R includes not only code but also a pointer
>to the environment in which it was defined, in your case, the global
>environment from which you apparently removed cdf(). Note that
>functions are not evauated until called, so free variables in the
>functions that do not or will not exist in the function's lexical
>scope when called will not trigger any errors until the function *is*
>called.
>
>Same comments for your second version -- if tmp is removed the
>function will fail.
>
>
>
>Cheers,
>Bert
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal
> wrote:
>> I can't figure out how to save functions to RDS file. Here is an
>example
>> what I am trying to achieve:
>>
>>> t <- rnorm(100)
>>> cdf <- ecdf(t)
>>> cdf(0)
>> [1] 0.59
>>> saveRDS(cdf, "/tmp/foo")
>>>
>> Save workspace image? [y/n/c]: n
>> [gtrojan@asok petproject]$ R
>>> cdf <- readRDS("/tmp/foo")
>>> cdf
>> Empirical CDF
>> Call: ecdf(t)
>> x[1:100] = -2.8881, -2.2054, -2.0026,  ..., 2.0367, 2.0414
>>
>> This works. However when instead of saving cdf() I try to save
>function
>>
>>> trans <- function(x) qnorm(cdf(x) * 0.99)
>>
>> after restoring object from file I get an error:
>>
>>> trans <- readRDS("/tmp/foo")
>>> trans(0)
>> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>>
>> I tried to define and call cdf within the definition of trans,
>without
>> success:
>>
>>> tmp <- rnorm(100)
>>> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) *
>0.99 }
>>> saveRDS(trans, "/tmp/foo")
>> Save workspace image? [y/n/c]: n
>>
>>> trans <- readRDS("/tmp/foo")
>>> trans
>> function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
>>> trans(0)
>> Error in sort(x) : object 'tmp' not found
>>
>> So, here the call cdf(0) did not force evaluation of my random
>sample. What
>> am I missing?
>>
>> George
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Converting Excel Date format into R-Date formats

2017-02-12 Thread Jeff Reichman
R-Help Group

 

What is the proper way to convert excel date formats to R-Date format.

 


Event ID

Event Date

Event Type


250013

1-Jan-09

NSAG Attack


250015

1-Jan-09

NSAG Attack


250016

1-Jan-09

NSAG Attack

 

Obviously this is wrong 

 

df$Event.Date <- as.Date(df$Event.Date, "%d-%b-%y")

 

as it return "NA"

 

Jeff

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with saving user defined functions

2017-02-12 Thread Bert Gunter
It worked fine for me:

> t <- rnorm(100)
> cdf <- ecdf(t)
>
> trans <- function(x) qnorm(cdf(x) * 0.99)
> saveRDS(trans, "/tmp/foo")
> trans(1.2)
[1] 1.042457
> trans1 <- readRDS("/tmp/foo")
> trans1(0)
[1] 0.1117773


Of course, if I remove cdf() from the global environment, it will fail:

> rm(cdf)
> trans1(0)
Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"

So it looks like you're clearing you global workspace in between
saving and loading?

You may need to read up on function closures/lexical scoping : A
user-defined function in R includes not only code but also a pointer
to the environment in which it was defined, in your case, the global
environment from which you apparently removed cdf(). Note that
functions are not evauated until called, so free variables in the
functions that do not or will not exist in the function's lexical
scope when called will not trigger any errors until the function *is*
called.

Same comments for your second version -- if tmp is removed the
function will fail.



Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Feb 12, 2017 at 2:11 PM, George Trojan - NOAA Federal
 wrote:
> I can't figure out how to save functions to RDS file. Here is an example
> what I am trying to achieve:
>
>> t <- rnorm(100)
>> cdf <- ecdf(t)
>> cdf(0)
> [1] 0.59
>> saveRDS(cdf, "/tmp/foo")
>>
> Save workspace image? [y/n/c]: n
> [gtrojan@asok petproject]$ R
>> cdf <- readRDS("/tmp/foo")
>> cdf
> Empirical CDF
> Call: ecdf(t)
> x[1:100] = -2.8881, -2.2054, -2.0026,  ..., 2.0367, 2.0414
>
> This works. However when instead of saving cdf() I try to save function
>
>> trans <- function(x) qnorm(cdf(x) * 0.99)
>
> after restoring object from file I get an error:
>
>> trans <- readRDS("/tmp/foo")
>> trans(0)
> Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"
>
> I tried to define and call cdf within the definition of trans, without
> success:
>
>> tmp <- rnorm(100)
>> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
>> saveRDS(trans, "/tmp/foo")
> Save workspace image? [y/n/c]: n
>
>> trans <- readRDS("/tmp/foo")
>> trans
> function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
>> trans(0)
> Error in sort(x) : object 'tmp' not found
>
> So, here the call cdf(0) did not force evaluation of my random sample. What
> am I missing?
>
> George
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove

2017-02-12 Thread Val
Hi Jeff and all,
 How do I get the  number of unique first names   in the two data sets?

for the first one,
result2 <- DF[ 1 == err2, ]
length(unique(result2$first))




On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
 wrote:
> The "by" function aggregates and returns a result with generally fewer rows
> than the original data. Since you are looking to index the rows in the
> original data set, the "ave" function is better suited because it always
> returns a vector that is just as long as the input vector:
>
> # I usually work with character data rather than factors if I plan
> # to modify the data (e.g. removing rows)
> DF <- read.table( text=
> 'first  week last
> Alex1  West
> Bob 1  John
> Cory1  Jack
> Cory2  Jack
> Bob 2  John
> Bob 3  John
> Alex2  Joseph
> Alex3  West
> Alex4  West
> ', header = TRUE, as.is = TRUE )
>
> err <- ave( DF$last
>   , DF[ , "first", drop = FALSE]
>   , FUN = function( lst ) {
>   length( unique( lst ) )
> }
>   )
> result <- DF[ "1" == err, ]
> result
>
> Notice that the ave function returns a vector of the same type as was given
> to it, so even though the function returns a numeric the err
> vector is character.
>
> If you wanted to be able to examine more than one other column in
> determining the keep/reject decision, you could do:
>
> err2 <- ave( seq_along( DF$first )
>, DF[ , "first", drop = FALSE]
>, FUN = function( n ) {
>   length( unique( DF[ n, "last" ] ) )
>  }
>)
> result2 <- DF[ 1 == err2, ]
> result2
>
> and then you would have the option to re-use the "n" index to look at other
> columns as well.
>
> Finally, here is a dplyr solution:
>
> library(dplyr)
> result3 <- (   DF
>%>% group_by( first ) # like a prep for ave or by
>%>% mutate( err = length( unique( last ) ) ) # similar to ave
>%>% filter( 1 == err ) # drop the rows with too many last names
>%>% select( -err ) # drop the temporary column
>%>% as.data.frame # convert back to a plain-jane data frame
>)
> result3
>
> which uses a small set of verbs in a pipeline of functions to go from input
> to result in one pass.
>
> If your data set is really big (running out of memory big) then you might
> want to investigate the data.table or sqlite packages, either of which can
> be combined with dplyr to get a standardized syntax for managing larger
> amounts of data. However, most people actually aren't running out of memory
> so in most cases the extra horsepower isn't actually needed.
>
>
> On Sun, 12 Feb 2017, P Tennant wrote:
>
>> Hi Val,
>>
>> The by() function could be used here. With the dataframe dfr:
>>
>> # split the data by first name and check for more than one last name for
>> each first name
>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>> # make the result more easily manipulated
>> res <- as.table(res)
>> res
>> # first
>> # Alex   Bob  Cory
>> # TRUE FALSE FALSE
>>
>> # then use this result to subset the data
>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
>> # sort if needed
>> nw.dfr[order(nw.dfr$first) , ]
>>
>>  first week last
>> 2   Bob1 John
>> 5   Bob2 John
>> 6   Bob3 John
>> 3  Cory1 Jack
>> 4  Cory2 Jack
>>
>>
>> Philip
>>
>> On 12/02/2017 4:02 PM, Val wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks. Somehow
>>> during the recording periods, their last name was misreported.   For
>>> each person,   the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df<- read.table(header=TRUE, text='first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>> Alex2  Joseph
>>> Alex3  West
>>> Alex4  West ')
>>>
>>> Desired output
>>>
>>>first  week last
>>> 1 Bob 1   John
>>> 2 Bob 2   John
>>> 3 Bob 3   John
>>> 4 Cory 1   Jack
>>> 5 Cory 2   Jack
>>>
>>> Thank you in advance
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-

[R] Help with saving user defined functions

2017-02-12 Thread George Trojan - NOAA Federal
I can't figure out how to save functions to RDS file. Here is an example
what I am trying to achieve:

> t <- rnorm(100)
> cdf <- ecdf(t)
> cdf(0)
[1] 0.59
> saveRDS(cdf, "/tmp/foo")
>
Save workspace image? [y/n/c]: n
[gtrojan@asok petproject]$ R
> cdf <- readRDS("/tmp/foo")
> cdf
Empirical CDF
Call: ecdf(t)
x[1:100] = -2.8881, -2.2054, -2.0026,  ..., 2.0367, 2.0414

This works. However when instead of saving cdf() I try to save function

> trans <- function(x) qnorm(cdf(x) * 0.99)

after restoring object from file I get an error:

> trans <- readRDS("/tmp/foo")
> trans(0)
Error in qnorm(cdf(x) * 0.99) : could not find function "cdf"

I tried to define and call cdf within the definition of trans, without
success:

> tmp <- rnorm(100)
> trans <- function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
> saveRDS(trans, "/tmp/foo")
Save workspace image? [y/n/c]: n

> trans <- readRDS("/tmp/foo")
> trans
function(x) { cdf <- ecdf(tmp); cdf(0); qnorm(cdf(x)) * 0.99 }
> trans(0)
Error in sort(x) : object 'tmp' not found

So, here the call cdf(0) did not force evaluation of my random sample. What
am I missing?

George

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] object of type 'closure' is not subsettable

2017-02-12 Thread William Dunlap via R-help
> Error in forecast[[d + 1]] = paste(index(lEJReturnsOffset[windowLength]),  : 
> object of type 'closure' is not subsettable

A 'closure' is a function and you cannot use '[' or '[[' to make a
subset of a function.

You used
   forecast[d+1] <- ...
in one branch of the 'if' statement and
   forecasts[d+1] <- ...
in the other.  Do you see the problem now?

By the way, the code snippet in the error message says '[[d+1]]' but
the code you supplied has '[d+1]'.  Does the html mangling selectively
double brackets or did you not show us the code that generated that
message?

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Feb 12, 2017 at 4:34 AM, Allan Tanaka  wrote:
> Hi.
> I tried to run this R-code but still completely no idea why it still gives 
> error message: Error in forecast[[d + 1]] = 
> paste(index(lEJReturnsOffset[windowLength]),  : object of type 'closure' is 
> not subsettable
> Here is the R-code:
> library(rugarch); library(sos); 
> library(forecast);library(lattice)library(quantmod); require(stochvol); 
> require(fBasics);data = read.table("EURJPY.m1440.csv", 
> header=F)names(data)data=ts(data)lEJ=log(data)lret.EJ = 100*diff(lEJ)lret.EJ 
> = 
> ts(lret.EJ)lret.EJ[as.character(head(index(lret.EJ)))]=0windowLength=500foreLength=length(lret.EJ)-windowLengthforecasts<-vector(mode="character",
>  length=foreLength)for (d in 0:foreLength) {  
> lEJReturnsOffset=lret.EJ[(1+d):(windowLength+d)]  final.aic<-Inf  
> final.order<-c(0,0,0)  for (p in 0:5) for (q in 0:5) {if(p == 0 && q == 
> 0) {  next}arimaFit=tryCatch(arima(lEJReturnsOffset, 
> order=c(p,0,q)),  error=function(err)FALSE,   
>warning=function(err)FALSE)if(!is.logical(arimaFit)) {  
> current.aic<-AIC(arimaFit)  if(current.aic final.aic<-current.aicfinal.order<-c(p,0,q)
> final.arima<-arima(lEJReturnsOffset, order=final.order)  }} els!
 e {  next}  }
> spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = 
> c(1,1)), mean.model = list(armaOrder = c(final.order[1], 
> final.order[3]), arfima = FALSE, include.mean = TRUE), 
> distribution.model = "sged")fit <- tryCatch(ugarchfit(spec, lEJReturnsOffset, 
> solver='gosolnp'),  error=function(e) e, warning=function(w) w)if(is(fit, 
> "warning")) {  forecast[d+1]=paste(index(lEJReturnsOffset[windowLength]), 1, 
> sep=",")  print(paste(index(lEJReturnsOffset[windowLength]), 1, sep=","))} 
> else {  fore = ugarchforecast(fit, n.ahead=1)  ind = fore@forecast$seriesFor  
> forecasts[d+1] = paste(colnames(ind), ifelse(ind[1] < 0, -1, 1), sep=",")  
> print(paste(colnames(ind), ifelse(ind[1] < 0, -1, 1), sep=",")) 
> }}write.csv(forecasts, file="forecasts.csv", row.names=FALSE)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query - Merging and conditional replacement of values in a data frame

2017-02-12 Thread Bhaskar Mitra
Thanks for all your help. This is helpful.

Best,
Bhaskar

On Sun, Feb 12, 2017 at 4:35 AM, Jim Lemon  wrote:

> Hi Bhaskar,
> Maybe:
>
> df1 <-read.table(text="time v1  v2 v3
> 1 2   3  4
> 2 5   6  4
> 3 1   3  4
> 4 1   3  4
> 5 2   3  4
> 6 2   3  4",
> header=TRUE)
>
>
> df2 <-read.table(text="time v11  v12 v13
> 3 112   3  4
> 4 112   3  4",
> header=TRUE)
>
> for(time1 in df1$time) {
>  time2<-which(df2$time==time1)
>  if(length(time2)) df1[df1$time==time1,]<-df2[time2,]
> }
>
> Jim
>
>
> On Sun, Feb 12, 2017 at 11:13 AM, Bhaskar Mitra
>  wrote:
> > Hello Everyone,
> >
> > I have two data frames df1 and df2 as shown below. They
> > are of different length. However, they have one common column - time.
> >
> > df1 <-
> > time v1  v2 v3
> > 1 2   3  4
> > 2 5   6  4
> > 3 1   3  4
> > 4 1   3  4
> > 5 2   3  4
> > 6 2   3  4
> >
> >
> > df2 <-
> > time v11  v12 v13
> > 3 112   3  4
> > 4 112   3  4
> >
> > By matching the 'time' column in df1 and df2, I am trying to modify
> column
> > 'v1' in df1 by replacing it
> > with values in column 'v11' in df2. The modified df1 should look
> something
> > like this:
> >
> > df1 <-
> > time v1   v2 v3
> > 1 2   3  4
> > 2 5   6  4
> > 3 112 3  4
> > 4 112 3  4
> > 5 2   3  4
> > 6 2   3  4
> >
> > I tried to use the 'merge' function to combine df1 and df2 followed by
> > the conditional 'ifelse' statement. However, that doesn't seem to work.
> >
> > Can I replace the values in df1 by not merging the two data frames?
> >
> > Thanks for your help,
> >
> > Regards,
> > Bhaskar
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: remove

2017-02-12 Thread Val
Thank you Rainer,

The question was :-
1. Identify those first names with different last names or more than
one last names.
2. Once identified (like Alex)  then exclude them.  This is because
not reliable record.

On Sun, Feb 12, 2017 at 11:17 AM, Rainer Schuermann
 wrote:
> I may not be understanding the question well enough but for me
>
> df[ df[ , "first"]  != "Alex", ]
>
> seems to do the job:
>
>   first week last
>
> Rainer
>
>
>
>
> On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote:
>>
>> On 12/02/17 18:36, Bert Gunter wrote:
>> > Basic stuff!
>> >
>> > Either subscripting or ?subset.
>> >
>> > There are many good R tutorials on the web. You should spend some
>> > (more?) time with some.
>>
>> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't
>> seem basic to me.  The only way that I can see how to go at it is via
>> a for loop:
>>
>> rdln <- function(X) {
>> # Remove discordant last names.
>>  ok <- logical(nrow(X))
>>  for(nm in unique(X$first)) {
>>  xxx <- unique(X$last[X$first==nm])
>>  if(length(xxx)==1) ok[X$first==nm] <- TRUE
>>  }
>>  Y <- X[ok,]
>>  Y <- Y[order(Y$first),]
>>  rownames(Y) <- 1:nrow(Y)
>>  Y
>> }
>>
>> Calling the toy data frame "melvin" rather than "df" (since "df" is the
>> name of the built in F density function, it is bad form to use it as the
>> name of another object) I get:
>>
>>  > rdln(melvin)
>>first week last
>> 1   Bob1 John
>> 2   Bob2 John
>> 3   Bob3 John
>> 4  Cory1 Jack
>> 5  Cory2 Jack
>>
>> which is the desired output.  If there is a "basic stuff" way to do this
>> I'd like to see it.  Perhaps I will then be toadally embarrassed, but
>> they say that this is good for one.
>>
>> cheers,
>>
>> Rolf
>>
>> > On Sat, Feb 11, 2017 at 9:02 PM, Val  wrote:
>> >> Hi all,
>> >> I have a big data set and want to  remove rows conditionally.
>> >> In my data file  each person were recorded  for several weeks. Somehow
>> >> during the recording periods, their last name was misreported.   For
>> >> each person,   the last name should be the same. Otherwise remove from
>> >> the data. Example, in the following data set, Alex was found to have
>> >> two last names .
>> >>
>> >> Alex   West
>> >> Alex   Joseph
>> >>
>> >> Alex should be removed  from the data.  if this happens then I want
>> >> remove  all rows with Alex. Here is my data set
>> >>
>> >> df <- read.table(header=TRUE, text='first  week last
>> >> Alex1  West
>> >> Bob 1  John
>> >> Cory1  Jack
>> >> Cory2  Jack
>> >> Bob 2  John
>> >> Bob 3  John
>> >> Alex2  Joseph
>> >> Alex3  West
>> >> Alex4  West ')
>> >>
>> >> Desired output
>> >>
>> >>   first  week last
>> >> 1 Bob 1   John
>> >> 2 Bob 2   John
>> >> 3 Bob 3   John
>> >> 4 Cory 1   Jack
>> >> 5 Cory 2   Jack
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: remove

2017-02-12 Thread Rainer Schuermann
I may not be understanding the question well enough but for me

df[ df[ , "first"]  != "Alex", ]

seems to do the job:

  first week last 

Rainer




On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote:
> 
> On 12/02/17 18:36, Bert Gunter wrote:
> > Basic stuff!
> >
> > Either subscripting or ?subset.
> >
> > There are many good R tutorials on the web. You should spend some
> > (more?) time with some.
> 
> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't 
> seem basic to me.  The only way that I can see how to go at it is via
> a for loop:
> 
> rdln <- function(X) {
> # Remove discordant last names.
>  ok <- logical(nrow(X))
>  for(nm in unique(X$first)) {
>  xxx <- unique(X$last[X$first==nm])
>  if(length(xxx)==1) ok[X$first==nm] <- TRUE
>  }
>  Y <- X[ok,]
>  Y <- Y[order(Y$first),]
>  rownames(Y) <- 1:nrow(Y)
>  Y
> }
> 
> Calling the toy data frame "melvin" rather than "df" (since "df" is the 
> name of the built in F density function, it is bad form to use it as the 
> name of another object) I get:
> 
>  > rdln(melvin)
>first week last
> 1   Bob1 John
> 2   Bob2 John
> 3   Bob3 John
> 4  Cory1 Jack
> 5  Cory2 Jack
> 
> which is the desired output.  If there is a "basic stuff" way to do this
> I'd like to see it.  Perhaps I will then be toadally embarrassed, but 
> they say that this is good for one.
> 
> cheers,
> 
> Rolf
> 
> > On Sat, Feb 11, 2017 at 9:02 PM, Val  wrote:
> >> Hi all,
> >> I have a big data set and want to  remove rows conditionally.
> >> In my data file  each person were recorded  for several weeks. Somehow
> >> during the recording periods, their last name was misreported.   For
> >> each person,   the last name should be the same. Otherwise remove from
> >> the data. Example, in the following data set, Alex was found to have
> >> two last names .
> >>
> >> Alex   West
> >> Alex   Joseph
> >>
> >> Alex should be removed  from the data.  if this happens then I want
> >> remove  all rows with Alex. Here is my data set
> >>
> >> df <- read.table(header=TRUE, text='first  week last
> >> Alex1  West
> >> Bob 1  John
> >> Cory1  Jack
> >> Cory2  Jack
> >> Bob 2  John
> >> Bob 3  John
> >> Alex2  Joseph
> >> Alex3  West
> >> Alex4  West ')
> >>
> >> Desired output
> >>
> >>   first  week last
> >> 1 Bob 1   John
> >> 2 Bob 2   John
> >> 3 Bob 3   John
> >> 4 Cory 1   Jack
> >> 5 Cory 2   Jack
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: remove

2017-02-12 Thread Bert Gunter
My understanding was that the discordant names has been identified. So
in the example the OP gave, removing rows with first = "Alex" is done
by:

df[df$first !="Alex",]

If that is not the case, as others have pointed out, various forms of
tapply() (by, ave, etc.) can be used. I agree that that is not so
"basic," so I apologize if my understanding was incorrect.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Feb 11, 2017 at 10:04 PM, Rolf Turner  wrote:
>
> On 12/02/17 18:36, Bert Gunter wrote:
>>
>> Basic stuff!
>>
>> Either subscripting or ?subset.
>>
>> There are many good R tutorials on the web. You should spend some
>> (more?) time with some.
>
>
> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't seem
> basic to me.  The only way that I can see how to go at it is via
> a for loop:
>
> rdln <- function(X) {
> # Remove discordant last names.
> ok <- logical(nrow(X))
> for(nm in unique(X$first)) {
> xxx <- unique(X$last[X$first==nm])
> if(length(xxx)==1) ok[X$first==nm] <- TRUE
> }
> Y <- X[ok,]
> Y <- Y[order(Y$first),]
> rownames(Y) <- 1:nrow(Y)
> Y
> }
>
> Calling the toy data frame "melvin" rather than "df" (since "df" is the name
> of the built in F density function, it is bad form to use it as the name of
> another object) I get:
>
>> rdln(melvin)
>   first week last
> 1   Bob1 John
> 2   Bob2 John
> 3   Bob3 John
> 4  Cory1 Jack
> 5  Cory2 Jack
>
> which is the desired output.  If there is a "basic stuff" way to do this
> I'd like to see it.  Perhaps I will then be toadally embarrassed, but they
> say that this is good for one.
>
> cheers,
>
> Rolf
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>> On Sat, Feb 11, 2017 at 9:02 PM, Val  wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks. Somehow
>>> during the recording periods, their last name was misreported.   For
>>> each person,   the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df <- read.table(header=TRUE, text='first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>> Alex2  Joseph
>>> Alex3  West
>>> Alex4  West ')
>>>
>>> Desired output
>>>
>>>   first  week last
>>> 1 Bob 1   John
>>> 2 Bob 2   John
>>> 3 Bob 3   John
>>> 4 Cory 1   Jack
>>> 5 Cory 2   Jack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] object of type 'closure' is not subsettable

2017-02-12 Thread Jeff Newmiller
By failing to send your email in plain text format on this mailing list, we see 
a damaged version of what you saw when you sent it. 

Also, we would need some some data to test the code with. Google "r 
reproducible example" to find discussions of how to ask questions online. 

From the error message alone I suspect forecast is the function from the 
forecast package, and you are trying to create and modify a data object with 
that same name. At the very least re-using names is unwise, but I think your 
whole concept of how to create forecasts is deviating from the normal way this 
is done. But the scrambling of the code isn't helping. 

-- 
Sent from my phone. Please excuse my brevity.

On February 12, 2017 4:34:20 AM PST, Allan Tanaka  
wrote:
>Hi.
>I tried to run this R-code but still completely no idea why it still
>gives error message: Error in forecast[[d + 1]] =
>paste(index(lEJReturnsOffset[windowLength]),  : object of type
>'closure' is not subsettable
>Here is the R-code:
>library(rugarch); library(sos);
>library(forecast);library(lattice)library(quantmod); require(stochvol);
>require(fBasics);data = read.table("EURJPY.m1440.csv",
>header=F)names(data)data=ts(data)lEJ=log(data)lret.EJ =
>100*diff(lEJ)lret.EJ =
>ts(lret.EJ)lret.EJ[as.character(head(index(lret.EJ)))]=0windowLength=500foreLength=length(lret.EJ)-windowLengthforecasts<-vector(mode="character",
>length=foreLength)for (d in 0:foreLength) { 
>lEJReturnsOffset=lret.EJ[(1+d):(windowLength+d)]  final.aic<-Inf 
>final.order<-c(0,0,0)  for (p in 0:5) for (q in 0:5) {    if(p == 0 &&
>q == 0) {      next    }       
>arimaFit=tryCatch(arima(lEJReturnsOffset, order=c(p,0,q)),             
>        error=function(err)FALSE,                     
>warning=function(err)FALSE)    if(!is.logical(arimaFit)) {     
>current.aic<-AIC(arimaFit)      if(current.aicfinal.aic<-current.aic        final.order<-c(p,0,q)       
>final.arima<-arima(lEJReturnsOffset, order=final.order)      }    }
>else {      next    }  }
>spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder =
>c(1,1)),                     mean.model = list(armaOrder =
>c(final.order[1], final.order[3]), arfima = FALSE, include.mean =
>TRUE),                     distribution.model = "sged")fit <-
>tryCatch(ugarchfit(spec, lEJReturnsOffset, solver='gosolnp'), 
>error=function(e) e, warning=function(w) w)if(is(fit, "warning")) { 
>forecast[d+1]=paste(index(lEJReturnsOffset[windowLength]), 1, sep=",") 
>print(paste(index(lEJReturnsOffset[windowLength]), 1, sep=","))} else
>{  fore = ugarchforecast(fit, n.ahead=1)  ind =
>fore@forecast$seriesFor  forecasts[d+1] = paste(colnames(ind),
>ifelse(ind[1] < 0, -1, 1), sep=",")  print(paste(colnames(ind),
>ifelse(ind[1] < 0, -1, 1), sep=",")) }}write.csv(forecasts,
>file="forecasts.csv", row.names=FALSE)
>  
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove

2017-02-12 Thread Jeff Newmiller
Exactly. Sort of like the optimisation of using which.max instead of max 
followed by which, though ideally the only intermediate vector would be the 
logical vector that says keep or don't keep.
-- 
Sent from my phone. Please excuse my brevity.

On February 11, 2017 11:19:11 PM PST, P Tennant  wrote:
>Hi Jeff,
>
>Why do you say ave() is better suited *because* it always returns a 
>vector that is just as long as the input vector? Is it because that 
>feature (of equal length), allows match() to be avoided, and as a 
>result, the subsequent subsetting is faster with very large datasets?
>
>Thanks, Philip
>
>
>On 12/02/2017 5:42 PM, Jeff Newmiller wrote:
>> The "by" function aggregates and returns a result with generally
>fewer 
>> rows than the original data. Since you are looking to index the rows 
>> in the original data set, the "ave" function is better suited because
>
>> it always returns a vector that is just as long as the input vector:
>>
>> # I usually work with character data rather than factors if I plan
>> # to modify the data (e.g. removing rows)
>> DF <- read.table( text=
>> 'first  week last
>> Alex1  West
>> Bob 1  John
>> Cory1  Jack
>> Cory2  Jack
>> Bob 2  John
>> Bob 3  John
>> Alex2  Joseph
>> Alex3  West
>> Alex4  West
>> ', header = TRUE, as.is = TRUE )
>>
>> err <- ave( DF$last
>>   , DF[ , "first", drop = FALSE]
>>   , FUN = function( lst ) {
>>   length( unique( lst ) )
>> }
>>   )
>> result <- DF[ "1" == err, ]
>> result
>>
>> Notice that the ave function returns a vector of the same type as was
>
>> given to it, so even though the function returns a numeric the err
>> vector is character.
>>
>> If you wanted to be able to examine more than one other column in 
>> determining the keep/reject decision, you could do:
>>
>> err2 <- ave( seq_along( DF$first )
>>, DF[ , "first", drop = FALSE]
>>, FUN = function( n ) {
>>   length( unique( DF[ n, "last" ] ) )
>>  }
>>)
>> result2 <- DF[ 1 == err2, ]
>> result2
>>
>> and then you would have the option to re-use the "n" index to look at
>
>> other columns as well.
>>
>> Finally, here is a dplyr solution:
>>
>> library(dplyr)
>> result3 <- (   DF
>>%>% group_by( first ) # like a prep for ave or by
>>%>% mutate( err = length( unique( last ) ) ) # similar to
>ave
>>%>% filter( 1 == err ) # drop the rows with too many last 
>> names
>>%>% select( -err ) # drop the temporary column
>>%>% as.data.frame # convert back to a plain-jane data
>frame
>>)
>> result3
>>
>> which uses a small set of verbs in a pipeline of functions to go from
>
>> input to result in one pass.
>>
>> If your data set is really big (running out of memory big) then you 
>> might want to investigate the data.table or sqlite packages, either
>of 
>> which can be combined with dplyr to get a standardized syntax for 
>> managing larger amounts of data. However, most people actually aren't
>
>> running out of memory so in most cases the extra horsepower isn't 
>> actually needed.
>>
>> On Sun, 12 Feb 2017, P Tennant wrote:
>>
>>> Hi Val,
>>>
>>> The by() function could be used here. With the dataframe dfr:
>>>
>>> # split the data by first name and check for more than one last name
>
>>> for each first name
>>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>>> # make the result more easily manipulated
>>> res <- as.table(res)
>>> res
>>> # first
>>> # Alex   Bob  Cory
>>> # TRUE FALSE FALSE
>>>
>>> # then use this result to subset the data
>>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
>>> # sort if needed
>>> nw.dfr[order(nw.dfr$first) , ]
>>>
>>>  first week last
>>> 2   Bob1 John
>>> 5   Bob2 John
>>> 6   Bob3 John
>>> 3  Cory1 Jack
>>> 4  Cory2 Jack
>>>
>>>
>>> Philip
>>>
>>> On 12/02/2017 4:02 PM, Val wrote:
 Hi all,
 I have a big data set and want to  remove rows conditionally.
 In my data file  each person were recorded  for several weeks.
>Somehow
 during the recording periods, their last name was misreported.  
>For
 each person,   the last name should be the same. Otherwise remove
>from
 the data. Example, in the following data set, Alex was found to
>have
 two last names .

 Alex   West
 Alex   Joseph

 Alex should be removed  from the data.  if this happens then I want
 remove  all rows with Alex. Here is my data set

 df<- read.table(header=TRUE, text='first  week last
 Alex1  West
 Bob 1  John
 Cory1  Jack
 Cory2  Jack
 Bob 2  John
 Bob 3  John
 Alex2  Joseph
 Alex3  West
 Alex4  West ')

 Desired output

first  week last
 1 Bob 1   John
 2 Bob 2   John
 3 Bob 3   John
 4 Cory 1   Jack
 5 Cory  

Re: [R] remove

2017-02-12 Thread Val
 Jeff, Rolf and Philip.
Thank you very much for your suggestion.

Jeff, you suggested if your data is big then consider data.table 
My data is "big"  it is more than 200M  records and I will see if this
function works.

Thank you again.


On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
 wrote:
> The "by" function aggregates and returns a result with generally fewer rows
> than the original data. Since you are looking to index the rows in the
> original data set, the "ave" function is better suited because it always
> returns a vector that is just as long as the input vector:
>
> # I usually work with character data rather than factors if I plan
> # to modify the data (e.g. removing rows)
> DF <- read.table( text=
> 'first  week last
> Alex1  West
> Bob 1  John
> Cory1  Jack
> Cory2  Jack
> Bob 2  John
> Bob 3  John
> Alex2  Joseph
> Alex3  West
> Alex4  West
> ', header = TRUE, as.is = TRUE )
>
> err <- ave( DF$last
>   , DF[ , "first", drop = FALSE]
>   , FUN = function( lst ) {
>   length( unique( lst ) )
> }
>   )
> result <- DF[ "1" == err, ]
> result
>
> Notice that the ave function returns a vector of the same type as was given
> to it, so even though the function returns a numeric the err
> vector is character.
>
> If you wanted to be able to examine more than one other column in
> determining the keep/reject decision, you could do:
>
> err2 <- ave( seq_along( DF$first )
>, DF[ , "first", drop = FALSE]
>, FUN = function( n ) {
>   length( unique( DF[ n, "last" ] ) )
>  }
>)
> result2 <- DF[ 1 == err2, ]
> result2
>
> and then you would have the option to re-use the "n" index to look at other
> columns as well.
>
> Finally, here is a dplyr solution:
>
> library(dplyr)
> result3 <- (   DF
>%>% group_by( first ) # like a prep for ave or by
>%>% mutate( err = length( unique( last ) ) ) # similar to ave
>%>% filter( 1 == err ) # drop the rows with too many last names
>%>% select( -err ) # drop the temporary column
>%>% as.data.frame # convert back to a plain-jane data frame
>)
> result3
>
> which uses a small set of verbs in a pipeline of functions to go from input
> to result in one pass.
>
> If your data set is really big (running out of memory big) then you might
> want to investigate the data.table or sqlite packages, either of which can
> be combined with dplyr to get a standardized syntax for managing larger
> amounts of data. However, most people actually aren't running out of memory
> so in most cases the extra horsepower isn't actually needed.
>
>
> On Sun, 12 Feb 2017, P Tennant wrote:
>
>> Hi Val,
>>
>> The by() function could be used here. With the dataframe dfr:
>>
>> # split the data by first name and check for more than one last name for
>> each first name
>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>> # make the result more easily manipulated
>> res <- as.table(res)
>> res
>> # first
>> # Alex   Bob  Cory
>> # TRUE FALSE FALSE
>>
>> # then use this result to subset the data
>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
>> # sort if needed
>> nw.dfr[order(nw.dfr$first) , ]
>>
>>  first week last
>> 2   Bob1 John
>> 5   Bob2 John
>> 6   Bob3 John
>> 3  Cory1 Jack
>> 4  Cory2 Jack
>>
>>
>> Philip
>>
>> On 12/02/2017 4:02 PM, Val wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks. Somehow
>>> during the recording periods, their last name was misreported.   For
>>> each person,   the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df<- read.table(header=TRUE, text='first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>> Alex2  Joseph
>>> Alex3  West
>>> Alex4  West ')
>>>
>>> Desired output
>>>
>>>first  week last
>>> 1 Bob 1   John
>>> 2 Bob 2   John
>>> 3 Bob 3   John
>>> 4 Cory 1   Jack
>>> 5 Cory 2   Jack
>>>
>>> Thank you in advance
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mail

[R] object of type 'closure' is not subsettable

2017-02-12 Thread Allan Tanaka
Hi.
I tried to run this R-code but still completely no idea why it still gives 
error message: Error in forecast[[d + 1]] = 
paste(index(lEJReturnsOffset[windowLength]),  : object of type 'closure' is not 
subsettable
Here is the R-code:
library(rugarch); library(sos); 
library(forecast);library(lattice)library(quantmod); require(stochvol); 
require(fBasics);data = read.table("EURJPY.m1440.csv", 
header=F)names(data)data=ts(data)lEJ=log(data)lret.EJ = 100*diff(lEJ)lret.EJ = 
ts(lret.EJ)lret.EJ[as.character(head(index(lret.EJ)))]=0windowLength=500foreLength=length(lret.EJ)-windowLengthforecasts<-vector(mode="character",
 length=foreLength)for (d in 0:foreLength) {  
lEJReturnsOffset=lret.EJ[(1+d):(windowLength+d)]  final.aic<-Inf  
final.order<-c(0,0,0)  for (p in 0:5) for (q in 0:5) {    if(p == 0 && q == 0) 
{      next    }        arimaFit=tryCatch(arima(lEJReturnsOffset, 
order=c(p,0,q)),                      error=function(err)FALSE,                 
     warning=function(err)FALSE)    if(!is.logical(arimaFit)) {      
current.aic<-AIC(arimaFit)      if(current.aichttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting Landscape in R-Studio

2017-02-12 Thread Michael Dewey

Colleagues who use Word seem to find no problem with .wmf files.

On 11/02/2017 22:08, peter dalgaard wrote:



On 11 Feb 2017, at 20:13 , Jeff Newmiller  wrote:

While the question AS POSED is off base here (and in fact unlikely to have any 
satisfactory answer due to the unavoidable squishiness of pasted graphics in 
Word),


I did wonder whether it wouldn't be easier just to export to a (PDF? WMF?) file 
and import that in Word. That looks like a no-brainer from the RStudio side. Or 
write directly to the appropriate device.

-pd



the OP could investigate the ReporteRs package which can export graphics 
directly to word files in a fairly predictable manner, including creating 
landscape oriented sections.
--
Sent from my phone. Please excuse my brevity.

On February 11, 2017 9:01:47 AM PST, David Winsemius  
wrote:



On Feb 11, 2017, at 8:26 AM, Jeff Reichman 

wrote:


R-Help



How can I format a plot within R-Studio  (Plot Windows) to conform to

an 8.5

x 11-  landscape.  Such that when I Export - Copy to Clip board I can

past

plot into word.



This is really the wrong venue for asking questions about transferring
graphics from RStudio to Word. Two other options: RStudio has its own
help forum and this would probably be an OK question if you constructed
a minimal verifiable example to submit to StackOverflow.





Jeff


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query - Merging and conditional replacement of values in a data frame

2017-02-12 Thread Jim Lemon
Hi Bhaskar,
Maybe:

df1 <-read.table(text="time v1  v2 v3
1 2   3  4
2 5   6  4
3 1   3  4
4 1   3  4
5 2   3  4
6 2   3  4",
header=TRUE)


df2 <-read.table(text="time v11  v12 v13
3 112   3  4
4 112   3  4",
header=TRUE)

for(time1 in df1$time) {
 time2<-which(df2$time==time1)
 if(length(time2)) df1[df1$time==time1,]<-df2[time2,]
}

Jim


On Sun, Feb 12, 2017 at 11:13 AM, Bhaskar Mitra
 wrote:
> Hello Everyone,
>
> I have two data frames df1 and df2 as shown below. They
> are of different length. However, they have one common column - time.
>
> df1 <-
> time v1  v2 v3
> 1 2   3  4
> 2 5   6  4
> 3 1   3  4
> 4 1   3  4
> 5 2   3  4
> 6 2   3  4
>
>
> df2 <-
> time v11  v12 v13
> 3 112   3  4
> 4 112   3  4
>
> By matching the 'time' column in df1 and df2, I am trying to modify column
> 'v1' in df1 by replacing it
> with values in column 'v11' in df2. The modified df1 should look something
> like this:
>
> df1 <-
> time v1   v2 v3
> 1 2   3  4
> 2 5   6  4
> 3 112 3  4
> 4 112 3  4
> 5 2   3  4
> 6 2   3  4
>
> I tried to use the 'merge' function to combine df1 and df2 followed by
> the conditional 'ifelse' statement. However, that doesn't seem to work.
>
> Can I replace the values in df1 by not merging the two data frames?
>
> Thanks for your help,
>
> Regards,
> Bhaskar
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to disable verbose grob results in pdf when using knitr with gridExtra?

2017-02-12 Thread vod vos
Sorry for no reproducible example.



using warnings=FALSE chunk options in knitr does not help. I found it is not 
the knitr's business.  I used 



resultpdf<- 
grid.arrange(facetpoint1,pright1,pright2,pright3,pright4,pright5,pright6,pright7,
 ncol=2, layout_matrix=cbind(c(1,1,1,1,1,1,1),c(2,3,4,5,6,7,8)), widths=c(2,1)) 



then 



print(resultpdf)



in my chunk.



I removed object resultpdf, just used code below to produce figure



grid.arrange(facetpoint1,pright1,pright2,pright3,pright4,pright5,pright6,pright7,
 ncol=2, layout_matrix=cbind(c(1,1,1,1,1,1,1),c(2,3,4,5,6,7,8)), widths=c(2,1)) 



then no verbose appeared.



The problem is using print() with ggplot2.




 On 星期六, 11 二月 2017 07:45:54 -0800 Jeff Newmiller 
 wrote 




On Sat, 11 Feb 2017, vod vos wrote: 

 

> Hi every one, 

> 

> I am using Knitr, 

 

Keep in mind that this list is about R first and foremost. There is a 

mailing list for Knitr, and also the maintainer of the knitr package 

recommends asking questions on stackoverflow.com. 

 

> R and Latex to produce pdf file. When using gridExtra to set up a 

> gtable layout to place multiple grobs on a page, 

> 

> 
grid.arrange(facetpoint1,pright1,pright2,pright3,pright4,pright5,pright6,pright7,
 ncol=2, layout_matrix=cbind(c(1,1,1,1,1,1,1),c(2,3,4,5,6,7,8)), widths=c(2,1)) 

 

This is not a reproducible example. No matter where you ask this question 

you need to supply a complete short script that exhibits the problem. That 

also means including enough data IN THE SCRIPT to allow the script to 

run. There are multiple guides online that describe how to do this in 

detail. 

 

> the verbose of the infomation shows before the one figure in the pdf file: 

> 

> ## TableGrob (7 x 2) "arrange": 8 grobs ## z cells name grob ## 1 1 
(1-7,1-1) arrange gtable[layout] ## 2 2 (1-1,2-2) arrange gtable[layout] ## 3 3 
(2-2,2-2) arrange gtable[layout] ## 4 4 (3-3,2-2) arrange gtable[layout] ## 5 5 
(4-4,2-2) arrange gtable[layout] ## 6 6 (5-5,2-2) arrange gtable[layout] ## 7 7 
(6-6,2-2) arrange gtable[layout] ## 8 8 (7-7,2-2) arrange gtable[layout] 

 

None of this appears when I created my own reproducible R example: 

 

 begin code 

library(grid) 

library(gridExtra) 

 

facetpoint1 <- pright1 <- pright2 <- pright3 <- pright4 <- 
pright5 <- 

pright6 <- pright7 <- textGrob("X") 

grid.arrange( facetpoint1, pright1, pright2, pright3, pright4, pright5 

 , pright6, pright7 

 , ncol=2 

 , layout_matrix = cbind( c( 1, 1, 1, 1, 1, 1, 1 ) 

 , c( 2, 3, 4, 5, 6, 7, 8 ) ) 

 , widths = c( 2, 1 ) 

 ) 

 end code 

 

If the above example produces output for you in R or in a knitted PDF then 

something is different about your setup than mine. 

 

> When I ?grid.arrange, no ways were found to disable the verbose in the pdf 
file. Any ideas? 

 

Does this happen at the R console? If it does, please post a reproducible 

example, and the invocation and output of sessionInfo() (mine is below). 

If it doesn't, there could be some interaction with knitr going on, and 

using the echo=FALSE or warnings=FALSE chunk options could help, or you 

may need more specialized help than we can offer here (e.g. via one of 

the knitr support areas mentioned above). 

 

> Thanks. 

> 

> [[alternative HTML version deleted]] 

 

When you don't set your email to plain text, the automatic conversion of 

HTML to text is very likely to cause us to see something quite different 

than you were looking at. It is in your best interest to figure out how to 

set your email program to send plain text. Please read the Posting Guide: 

 

> __ 

> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 

> https://stat.ethz.ch/mailman/listinfo/r-help 

> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html 

> and provide commented, minimal, self-contained, reproducible code. 

 

---

Jeff Newmiller The . . Go Live... 

DCN: Basics: ##.#. ##.#. Live Go... 

 Live: OO#.. Dead: OO#.. Playing 

Research Engineer (Solar/Batteries O.O#. #.O#. with 

/Software/Embedded Controllers) .OO#. .OO#. rocks...1k 

---



> sessionInfo() 

R version 3.3.2 (2016-10-31) 

Platform: x86_64-pc-linux-gnu (64-bit) 

Running under: Ubuntu 14.04.5 LTS 



locale: 

 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C 

LC_TIME=en_US.UTF-8 

 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 

LC_MESSAGES=en_US.UTF-8 

 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C 

[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 

LC_IDENTIFICATION=C 



attached base packages: 

[1] grid stats graphics grDevices utils datasets methods 

base 



other attached packages: 

[1] gridExtra_2.2.1 



loaded via a namespace (and not attached): 

 [1] backports

[R] How to create HyCa$NIR and octane like the "yarn" of "pls".

2017-02-12 Thread 貝原巳樹雄
I am a user of package, "pls".  I am going to draw the NIR spectra of my
own measured data using matplot.

Question
   For example, I have such a csv data, "HyCa.csv", below.
  Would you please tell me how to create a data like the "yarn".
  yarn has the structure of "NIR" and "density".
  That is to say,how to create HyCa$NIR and octane for drawing and
analyzing the obtained data.

X1540X1560X1580X1600 Octane
S001 0.240016 0.232166 0.239428 0.255710   87.3
S002 0.246177 0.237545 0.243874 0.259296   87.0
S003 0.242777 0.234150 0.240941 0.256484   87.1
S004 0.244098 0.237214 0.244729 0.261580   89.7
S005 0.241922 0.231888 0.237418 0.252461   84.9
S006 0.242209 0.232352 0.238188 0.253036   84.7
S007 0.244148 0.237362 0.244701 0.261598   89.3
S008 0.242019 0.234185 0.241428 0.257564   87.6
S009 0.242408 0.232431 0.238130 0.253083   84.5
S010 0.244512 0.238601 0.246392 0.263583   91.7

Detaied explanation of "yarn"

yarn NIR spectra and density measurements of PET yarns
Description
A training set consisting of 21 NIR spectra of PET yarns, measured at 268
wavelengths, and 21
corresponding densities. A test set of 7 samples is also provided. Many
thanks to Erik Swierenga.
56 yarn
Usage
yarn
Format
A data frame with components
NIR Numeric matrix of NIR measurements
density Numeric vector of densities
train Logical vector with TRUE for the training samples and FALSE for the
test samples
Source
Swierenga H., de Weijer A. P., van Wijk R. J., Buydens L. M. C. (1999)
Strategy for constructing
robust multivariate calibration models Chemometrics and Intelligent
Laboratoryy Systems, 49(1),1–17.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.