Re: [R] Calculate daily means from 5-minute interval data

2021-09-03 Thread Rich Shepard

On Thu, 2 Sep 2021, Jeff Newmiller wrote:


Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the
key is learning how to create the column that is the same for all records
corresponding to the time interval of interest.


Jeff,

I definitely agree with the above


If you convert the sampdate to POSIXct, the tz IS important, because most
of us use local timezones that respect daylight savings time, and a naive
conversion of standard time will run into trouble if R is assuming
daylight savings time applies. The lubridate package gets around this by
always assuming UTC and giving you a function to "fix" the timezone after
the conversion. I prefer to always be specific about timezones, at least
by using so something like
   Sys.setenv( TZ = "Etc/GMT+8" )
which does not respect daylight savings.


I'm not following you here. All my projects have always been in a single
time zone and the data might be recorded at June 19th or November 4th but do
not depend on whether the time is PDT or PST. My hosts all set the hardware
clock to local time, not UTC.

As the location(s) at which data are collected remain fixed geographically I
don't understand why daylight savings time, or non-daylight savings time is
important.


Regarding using character data for identifying the month, in order to have
clean plots of the data I prefer to use the trunc function but it returns
a POSIXlt so I convert it to POSIXct:


I don't use character data for months, as far as I know. If a sample data
is, for example, 2021-09-03 then monthly summaries are based on '09', not
'September.'

I've always valued your inputs to help me understand what I don't. In this
case I'm really lost in understanding your position.

Have a good Labor Day weekend,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph means not equal to means of model matrix

2021-09-03 Thread Therneau, Terry M., Ph.D. via R-help


On 9/3/21 12:59 PM, Bond, Stephen wrote:
>
> I looked at the nocenter and it says (-1,0,1) values but it seems that any 
> three-level 
> factor is included in that (represented as 1,2,3 in R) .
>
A factor is turned into a set of 0/1 dummy variable, so the nocenter applies.� 
I will add 
more clarification to the documentation.

> Also, is the baseline curve now showing the reference level and not the 
> fictional .428 
> sex? If I predict the risk for a new row, should I multiply the coefficient 
> shown in the 
> output by 1 for a sex=1? It used to be (1-.428)*coef.
>
Yes, the "mean" component is the reference level for predict and survfit.� If I 
could go 
back in time it would be labeled as "reference" instead of "mean".�� Another 
opportunity 
for me to make the documentation clearer.

Good questions,
 � Terry T

> Thanks for clarifying.
>
> SB
>
> *From:* Therneau, Terry M., Ph.D. 
> *Sent:* Friday, 3 September, 2021 12:37
> *To:* Bond, Stephen 
> *Cc:* R-help 
> *Subject:* Re: coxph means not equal to means of model matrix
>
> [EXTERNAL]
>
> --
>
> See ?coxph, in particular the new "nocenter" option.
>
> Basically, the "mean" component is used to center later computations.� This 
> can be 
> critical for continuous variables, avoiding overflow in the exp function, but 
> is not 
> necessary for 0/1 covariates.�� The fact that the default survival curve 
> would be for a 
> sex of .453, say, was off-putting to many.
>
> Terry T.
>
> On 9/3/21 11:01 AM, Bond, Stephen wrote:
>
> Hi,
>
> Please, help me understand what is happening with the means of a Cox 
> model?
>
> I have:
>
> R version 4.0.2 (2020-06-22) -- "Taking Off Again"
>
> Copyright (C) 2020 The R Foundation for Statistical Computing
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> getOption("contrasts")
>
> ��� unordered ordered
>
> "contr.treatment" "contr.poly"
>
> According to the help �coxph.object has a component holding the means of 
> the X
> (model.matrix). This does not hold any more.
>
> ```
>
> library(survival)
>
> test1 <- list(time=c(4,3,1,1,2,2,3),
>
> ���status=c(1,1,1,0,1,1,0),
>
> ���x=c(0,2,1,1,1,0,0),
>
> ���sex=factor(c(0,0,0,0,1,1,1)))
>
> m1 <- coxph(Surv(time, status) ~ x + sex, test1)
>
> m1$means
>
> ##��� x� sex1
>
> ## 0.7142857 0.000
>
> colMeans(model.matrix(m1))
>
> ## x� sex1
>
> ## 0.7142857 0.4285714
>
> ```
>
> Will new observations be scored using the zero mean from the object?? Is 
> this just a
> reporting change where $means shows the reference level and no longer the 
> mean of
> the model matrix??
>
> Thanks everybody
>
> ATTENTION : This email originated outside your organization. Exercise caution 
> before 
> clicking links, opening attachments, or responding with personal information.
>
> --


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph means not equal to means of model matrix

2021-09-03 Thread Bond, Stephen
I looked at the nocenter and it says (-1,0,1) values but it seems that any 
three-level factor is included in that (represented as 1,2,3 in R) .
Also, is the baseline curve now showing the reference level and not the 
fictional .428 sex? If I predict the risk for a new row, should I multiply the 
coefficient shown in the output by 1 for a sex=1? It used to be (1-.428)*coef.
Thanks for clarifying.
SB

From: Therneau, Terry M., Ph.D. 
Sent: Friday, 3 September, 2021 12:37
To: Bond, Stephen 
Cc: R-help 
Subject: Re: coxph means not equal to means of model matrix

[EXTERNAL]

See ?coxph, in particular the new "nocenter" option.

Basically, the "mean" component is used to center later computations.  This can 
be critical for continuous variables, avoiding overflow in the exp function, 
but is not necessary for 0/1 covariates.   The fact that the default survival 
curve would be for a sex of .453, say, was off-putting to many.

Terry T.

On 9/3/21 11:01 AM, Bond, Stephen wrote:
Hi,

Please, help me understand what is happening with the means of a Cox model?
I have:
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

getOption("contrasts")
unordered   ordered
"contr.treatment"  "contr.poly"

According to the help  coxph.object has a component holding the means of the X 
(model.matrix). This does not hold any more.
```
library(survival)
test1 <- list(time=c(4,3,1,1,2,2,3),
   status=c(1,1,1,0,1,1,0),
   x=c(0,2,1,1,1,0,0),
   sex=factor(c(0,0,0,0,1,1,1)))
m1 <- coxph(Surv(time, status) ~ x + sex, test1)
m1$means
##x  sex1
## 0.7142857 0.000
colMeans(model.matrix(m1))
## x  sex1
## 0.7142857 0.4285714

```
Will new observations be scored using the zero mean from the object?? Is this 
just a reporting change where $means shows the reference level and no longer 
the mean of the model matrix??

Thanks everybody



ATTENTION : This email originated outside your organization. Exercise caution 
before clicking links, opening attachments, or responding with personal 
information.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph means not equal to means of model matrix

2021-09-03 Thread Therneau, Terry M., Ph.D. via R-help
See ?coxph, in particular the new "nocenter" option.

Basically, the "mean" component is used to center later computations.� This can 
be 
critical for continuous variables, avoiding overflow in the exp function, but 
is not 
necessary for 0/1 covariates.�� The fact that the default survival curve would 
be for a 
sex of .453, say, was off-putting to many.

Terry T.


On 9/3/21 11:01 AM, Bond, Stephen wrote:
>
> Hi,
>
> Please, help me understand what is happening with the means of a Cox model?
>
> I have:
>
> R version 4.0.2 (2020-06-22) -- "Taking Off Again"
>
> Copyright (C) 2020 The R Foundation for Statistical Computing
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> getOption("contrasts")
>
> ��� unordered�� ordered
>
> "contr.treatment"� "contr.poly"
>
> According to the help �coxph.object has a component holding the means of the 
> X 
> (model.matrix). This does not hold any more.
>
> ```
>
> library(survival)
>
> test1 <- list(time=c(4,3,1,1,2,2,3),
>
> ���status=c(1,1,1,0,1,1,0),
>
> ���x=c(0,2,1,1,1,0,0),
>
> ���sex=factor(c(0,0,0,0,1,1,1)))
>
> m1 <- coxph(Surv(time, status) ~ x + sex, test1)
>
> m1$means
>
> ##��� x� sex1
>
> ## 0.7142857 0.000
>
> colMeans(model.matrix(m1))
>
> ## x� sex1
>
> ## 0.7142857 0.4285714
>
> ```
>
> Will new observations be scored using the zero mean from the object?? Is this 
> just a 
> reporting change where $means shows the reference level and no longer the 
> mean of the 
> model matrix??
>
> Thanks everybody
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] coxph means not equal to means of model matrix

2021-09-03 Thread Bond, Stephen
Hi,

Please, help me understand what is happening with the means of a Cox model?
I have:
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

getOption("contrasts")
unordered   ordered
"contr.treatment"  "contr.poly"

According to the help  coxph.object has a component holding the means of the X 
(model.matrix). This does not hold any more.
```
library(survival)
test1 <- list(time=c(4,3,1,1,2,2,3),
   status=c(1,1,1,0,1,1,0),
   x=c(0,2,1,1,1,0,0),
   sex=factor(c(0,0,0,0,1,1,1)))
m1 <- coxph(Surv(time, status) ~ x + sex, test1)
m1$means
##x  sex1
## 0.7142857 0.000
colMeans(model.matrix(m1))
## x  sex1
## 0.7142857 0.4285714

```
Will new observations be scored using the zero mean from the object?? Is this 
just a reporting change where $means shows the reference level and no longer 
the mean of the model matrix??

Thanks everybody



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate daily means from 5-minute interval data

2021-09-03 Thread Rich Shepard

On Thu, 2 Sep 2021, Jeff Newmiller wrote:


Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the
key is learning how to create the column that is the same for all records
corresponding to the time interval of interest.


Jeff,

I tried responding to only you but my message bounced:

: host
d9300a.ess.barracudanetworks.com[209.222.82.252] said: 550 permanent
failure for one or more recipients (jdnew...@dcn.davis.ca.us:blocked) (in
reply to end of DATA command)

My response was not pertininet to the entire list, IMO, so I sent it to your
address.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-03 Thread AbouEl-Makarim Aboueissa
Hi Richard:

Thank you very much for your help in this matter.

with thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Fri, Sep 3, 2021 at 10:25 AM Richard O'Keefe  wrote:

> Your question is ambiguous.
> One reading is
>   n <- length(table$Data)
>   m <- n %/% 3
>   s <- sample(1:n, n)
>   X <- table$Data[s[1:m]]
>   Y <- table$Data[s[(m+1):(2*m)]]
>   Z <- table$Data[s[(m*2+1):(3*m)]]
>
>
>
>
> On Fri, 3 Sept 2021 at 13:31, AbouEl-Makarim Aboueissa
>  wrote:
> >
> > Dear All:
> >
> > How to split a column data *randomly* into three groups. Please see the
> > attached data. I need to split column #2 titled "Data"
> >
> > with many thanks
> > abou
> > __
> >
> >
> > *AbouEl-Makarim Aboueissa, PhD*
> >
> > *Professor, Statistics and Data Science*
> > *Graduate Coordinator*
> >
> > *Department of Mathematics and Statistics*
> > *University of Southern Maine*
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-03 Thread AbouEl-Makarim Aboueissa
Hi Avi: good morning

Again, many thanks to all of you. I appreciate all what you are doing. You
are good. I did it in Minitab. It cost me a little bit more time, but it is
okay.

It was a little bit confusing for me to do it in R. Because in *Step 1: *I
have to select a random sample of size n=204 (say) out of N=700 (say). Then
in Step 2: I have to allocate the 204 randomly selected obs. into three
groups of equal sample sizes.

Again, thank you very much, and sorry if I bothered you.


with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Sep 2, 2021 at 10:42 PM Avi Gross via R-help 
wrote:

> Abou,
>
>
>
> I am not trying to be negative. Assuming you are a professor of
> Statistics, your request seems odd as what you are asking about is very
> routine in much of statistical work where you want to make a model or
> something using just part of your data and need to reserve some to check if
> you perhaps trained an algorithm too much for the original data used.
>
>
>
> A simple online search before asking questions here is appreciated. I did
> a quick search for something like “R split data into three parts” and see
> several applicable answers.
>
>
>
> There are people on this forum who actually get paid to do nontrivial
> tasks and do not mind help in spots but feel sort of used if expected to
> write a serious amount of code and perhaps then be asked to redo it with
> more bells and whistles added. A recent badly phrased request comes to mind
> where several of us provided and answer only to find out it was for a
> different scenario, …
>
>
>
> So let me continue with a serious answer. May we assume you KNOW how to
> read the data in to something like a data.frame? If so, and if you see no
> need or value in doing this the hard way, then your question could have
> been to ask if there is an R built-in function or perhaps a pacjkage
> already set to solve it quickly. Again, a simple online search can do
> wonders.  Here, for example is a package called caret and this page
> discusses spliutting data multiple ways:
>
>
>
> https://topepo.github.io/caret/data-splitting.html
>
>
>
> There are other such pages suggesting how to do it using base R.
>
>
>
> Here is one that gives an example on how to make  three unequal partitions:
>
>
>
> inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test
> = 0.2))
>
>
>
>
>
> There is more to do below but in the above, you would use whatever names
> you want instead of train/valid/test and set all three to 0.33 and so on.
>
>
>
> I repeat, that what you want to do strikes some of us as a fairly routine
> thing to do and lots of people have written how they have done it and you
> can pick and choose, or redo it on your own. If what you have is a homework
> assignment, the appropriate thing is to have you learn to use some
> technique yourself and perhaps get minor help when it fails. But if you
> will be doing this regularly, use of some packages is highly valuable.
>
>
>
> Good Luck.
>
>
>
>
>
>
>
>
>
>
>
> From: AbouEl-Makarim Aboueissa 
> Sent: Thursday, September 2, 2021 9:51 PM
> To: Avi Gross 
> Cc: R mailing list 
> Subject: Re: [R] Splitting a data column randomly into 3 groups
>
>
>
> Sorry, please forget about it. I believe that I am very serious when I
> posted my question.
>
>
>
> with thanks
>
> abou
>
>
> __
>
> AbouEl-Makarim Aboueissa, PhD
>
>
>
> Professor, Statistics and Data Science
>
> Graduate Coordinator
>
> Department of Mathematics and Statistics
>
> University of Southern Maine
>
>
>
>
>
>
>
> On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help   > wrote:
>
> What is stopping you Abou?
>
> Some of us here start wondering if we have better things to do than
> homework for others. Help is supposed to be after they try and encounter
> issues that we may help with.
>
> So think about your problem. You supplied data in a file that is NOT in
> CSV format but is in Tab separated format.
>
> You need to get it in to your program and store it in something. It looks
> like you have 204 items so 1/3 of those would be exactly 68.
>
> So if your data is in an object like a vector or data.frame, you want to
> choose random number between 1 and 204. How do you do that? You need 1/3 of
> the length of the object items, in your case 68.
>
> Now extract the items with  those indices into say A1. Extract all the
> rest into a temporary item.
>
> Make another 68 random indices, with no overlap, and copy those items into
> A2 and the ones that do not have those into A3 and you are sort of done,
> other than some cleanup or whatever.
>
> There are many ways to do the above and I am sure packages too.
>
> But since you have made no visible effort, I personally am not going to
> pick anything in particular.
>
> Had you shown some 

Re: [R] Splitting a data column randomly into 3 groups

2021-09-03 Thread Richard O'Keefe
Your question is ambiguous.
One reading is
  n <- length(table$Data)
  m <- n %/% 3
  s <- sample(1:n, n)
  X <- table$Data[s[1:m]]
  Y <- table$Data[s[(m+1):(2*m)]]
  Z <- table$Data[s[(m*2+1):(3*m)]]




On Fri, 3 Sept 2021 at 13:31, AbouEl-Makarim Aboueissa
 wrote:
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-03 Thread peter dalgaard
Yes, even

> summary(NA_real_)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's 
 NA  NA  NA NaN  NA  NA   1 

which is presumably because the mean is an empty sum (= 0) divided by a zero 
count, and 0/0 = NaN.

Notice also the differenc between

> mean(NA_real_)
[1] NA
> mean(NA_real_, na.rm=TRUE)
[1] NaN


> On 3 Sep 2021, at 09:59 , Luigi Marongiu  wrote:
> 
> Fair enough, I'll check the actual data to see if there are indeed any
> NaN (which should not, since the data are categories, not generated by
> math).
> Thanks!
> 
> On Fri, Sep 3, 2021 at 8:26 AM PIKAL Petr  wrote:
>> 
>> Hi Luigi.
>> 
>> Weird. But maybe it is the desired behaviour of summary when calculating
>> mean of numeric column full of NAs.
>> 
>> See example
>> 
>> dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))
>> 
>> # change all values in second column to NA
>> dat[,2] <- NA
>> # change some of them to NAN
>> dat[5:6, 2:3] <- 0/0
>> 
>> # see summary
>> summary(dat)
>>x y z
>> Mode:logical   Min.   : NA   Min.   :-1.9798
>> NA's:110   1st Qu.: NA   1st Qu.:-0.4729
>>Median : NA   Median : 0.1745
>>Mean   :NaN   Mean   : 0.1856
>>3rd Qu.: NA   3rd Qu.: 0.8017
>>Max.   : NA   Max.   : 2.5075
>>NA's   :110   NA's   :2
>> 
>> # change NAN values to NA
>> dat[sapply(dat, is.nan)] <- NA
>> *
>> 
>> #summary is same
>> summary(dat)
>>x y z
>> Mode:logical   Min.   : NA   Min.   :-1.9798
>> NA's:110   1st Qu.: NA   1st Qu.:-0.4729
>>Median : NA   Median : 0.1745
>>Mean   :NaN   Mean   : 0.1856
>>3rd Qu.: NA   3rd Qu.: 0.8017
>>Max.   : NA   Max.   : 2.5075
>>NA's   :110   NA's   :2
>> 
>> # but no NAN value in data
>> dat[1:10,]
>>x  y  z
>> 1  NA NA -0.9148696
>> 2  NA NA  0.7110570
>> 3  NA NA -0.1901676
>> 4  NA NA  0.5900650
>> 5  NA NA NA
>> 6  NA NA NA
>> 7  NA NA  0.7987658
>> 8  NA NA -0.5225229
>> 9  NA NA  0.7673103
>> 10 NA NA -0.5263897
>> 
>> So my "nice compact command"
>> dat[sapply(dat, is.nan)] <- NA
>> 
>> works as expected, but summary gives as mean NAN.
>> 
>> Cheers
>> Petr
>> 
>>> -Original Message-
>>> From: R-help  On Behalf Of Luigi Marongiu
>>> Sent: Thursday, September 2, 2021 3:46 PM
>>> To: Andrew Simmons 
>>> Cc: r-help 
>>> Subject: Re: [R] How to globally convert NaN to NA in dataframe?
>>> 
>>> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I still
>> get
>>> NaN when using the summary function, for instance one of the columns give:
>>> ```
>>> Min.   : NA
>>> 1st Qu.: NA
>>> Median : NA
>>> Mean   :NaN
>>> 3rd Qu.: NA
>>> Max.   : NA
>>> NA's   :110
>>> ```
>>> I tried to implement the second solution but:
>>> ```
>>> df <- lapply(x, function(xx) {
>>>  xx[is.nan(xx)] <- NA
>>> })
 str(df)
>>> List of 1
>>> $ sd_ef_rash_loc___palm: logi NA
>>> ```
>>> What am I getting wrong?
>>> Thanks
>>> 
>>> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons 
>>> wrote:
 
 Hello,
 
 
 I would use something like:
 
 
 x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
 as.data.frame() x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
 })
 
 
 This prevents attributes from being changed in 'x', but accomplishes the
>>> same thing as you have above, I hope this helps!
 
 On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu 
>>> wrote:
> 
> Hello,
> I have some NaN values in some elements of a dataframe that I would
> like to convert to NA.
> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
> Is there an alternative for the global modification at once of all
> instances?
> I have seen from
> https://stackoverflow.com/questions/18142117/how-to-replace-nan-
>>> value
> -with-zero-in-a-huge-data-frame/18143097#18143097
> that once could use:
> ```
> 
> is.nan.data.frame <- function(x)
> do.call(cbind, lapply(x, is.nan))
> 
> data123[is.nan(data123)] <- 0
> ```
> replacing o with NA, but I got
> ```
> str(df)
>> logi NA
> ```
> when modifying my dataframe df.
> What would be the correct syntax?
> Thank you
> 
> 
> 
> --
> Best regards,
> Luigi
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Luigi
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To 

Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-03 Thread Luigi Marongiu
Fair enough, I'll check the actual data to see if there are indeed any
NaN (which should not, since the data are categories, not generated by
math).
Thanks!

On Fri, Sep 3, 2021 at 8:26 AM PIKAL Petr  wrote:
>
> Hi Luigi.
>
> Weird. But maybe it is the desired behaviour of summary when calculating
> mean of numeric column full of NAs.
>
> See example
>
> dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))
>
> # change all values in second column to NA
> dat[,2] <- NA
> # change some of them to NAN
> dat[5:6, 2:3] <- 0/0
>
> # see summary
> summary(dat)
> x y z
>  Mode:logical   Min.   : NA   Min.   :-1.9798
>  NA's:110   1st Qu.: NA   1st Qu.:-0.4729
> Median : NA   Median : 0.1745
> Mean   :NaN   Mean   : 0.1856
> 3rd Qu.: NA   3rd Qu.: 0.8017
> Max.   : NA   Max.   : 2.5075
> NA's   :110   NA's   :2
>
> # change NAN values to NA
> dat[sapply(dat, is.nan)] <- NA
> *
>
> #summary is same
> summary(dat)
> x y z
>  Mode:logical   Min.   : NA   Min.   :-1.9798
>  NA's:110   1st Qu.: NA   1st Qu.:-0.4729
> Median : NA   Median : 0.1745
> Mean   :NaN   Mean   : 0.1856
> 3rd Qu.: NA   3rd Qu.: 0.8017
> Max.   : NA   Max.   : 2.5075
> NA's   :110   NA's   :2
>
> # but no NAN value in data
> dat[1:10,]
> x  y  z
> 1  NA NA -0.9148696
> 2  NA NA  0.7110570
> 3  NA NA -0.1901676
> 4  NA NA  0.5900650
> 5  NA NA NA
> 6  NA NA NA
> 7  NA NA  0.7987658
> 8  NA NA -0.5225229
> 9  NA NA  0.7673103
> 10 NA NA -0.5263897
>
> So my "nice compact command"
> dat[sapply(dat, is.nan)] <- NA
>
> works as expected, but summary gives as mean NAN.
>
> Cheers
> Petr
>
> > -Original Message-
> > From: R-help  On Behalf Of Luigi Marongiu
> > Sent: Thursday, September 2, 2021 3:46 PM
> > To: Andrew Simmons 
> > Cc: r-help 
> > Subject: Re: [R] How to globally convert NaN to NA in dataframe?
> >
> > `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I still
> get
> > NaN when using the summary function, for instance one of the columns give:
> > ```
> > Min.   : NA
> > 1st Qu.: NA
> > Median : NA
> > Mean   :NaN
> > 3rd Qu.: NA
> > Max.   : NA
> > NA's   :110
> > ```
> > I tried to implement the second solution but:
> > ```
> > df <- lapply(x, function(xx) {
> >   xx[is.nan(xx)] <- NA
> > })
> > > str(df)
> > List of 1
> >  $ sd_ef_rash_loc___palm: logi NA
> > ```
> > What am I getting wrong?
> > Thanks
> >
> > On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons 
> > wrote:
> > >
> > > Hello,
> > >
> > >
> > > I would use something like:
> > >
> > >
> > > x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
> > > as.data.frame() x[] <- lapply(x, function(xx) {
> > > xx[is.nan(xx)] <- NA_real_
> > > xx
> > > })
> > >
> > >
> > > This prevents attributes from being changed in 'x', but accomplishes the
> > same thing as you have above, I hope this helps!
> > >
> > > On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu 
> > wrote:
> > >>
> > >> Hello,
> > >> I have some NaN values in some elements of a dataframe that I would
> > >> like to convert to NA.
> > >> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
> > >> Is there an alternative for the global modification at once of all
> > >> instances?
> > >> I have seen from
> > >> https://stackoverflow.com/questions/18142117/how-to-replace-nan-
> > value
> > >> -with-zero-in-a-huge-data-frame/18143097#18143097
> > >> that once could use:
> > >> ```
> > >>
> > >> is.nan.data.frame <- function(x)
> > >> do.call(cbind, lapply(x, is.nan))
> > >>
> > >> data123[is.nan(data123)] <- 0
> > >> ```
> > >> replacing o with NA, but I got
> > >> ```
> > >> str(df)
> > >> > logi NA
> > >> ```
> > >> when modifying my dataframe df.
> > >> What would be the correct syntax?
> > >> Thank you
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Luigi
> > >>
> > >> __
> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Best regards,
> > Luigi
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do 

Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-03 Thread PIKAL Petr
Hi Luigi.

Weird. But maybe it is the desired behaviour of summary when calculating
mean of numeric column full of NAs.

See example

dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))

# change all values in second column to NA
dat[,2] <- NA
# change some of them to NAN
dat[5:6, 2:3] <- 0/0

# see summary
summary(dat)
x y z  
 Mode:logical   Min.   : NA   Min.   :-1.9798  
 NA's:110   1st Qu.: NA   1st Qu.:-0.4729  
Median : NA   Median : 0.1745  
Mean   :NaN   Mean   : 0.1856  
3rd Qu.: NA   3rd Qu.: 0.8017  
Max.   : NA   Max.   : 2.5075  
NA's   :110   NA's   :2

# change NAN values to NA
dat[sapply(dat, is.nan)] <- NA
*

#summary is same
summary(dat)
x y z  
 Mode:logical   Min.   : NA   Min.   :-1.9798  
 NA's:110   1st Qu.: NA   1st Qu.:-0.4729  
Median : NA   Median : 0.1745  
Mean   :NaN   Mean   : 0.1856  
3rd Qu.: NA   3rd Qu.: 0.8017  
Max.   : NA   Max.   : 2.5075  
NA's   :110   NA's   :2

# but no NAN value in data
dat[1:10,]
x  y  z
1  NA NA -0.9148696
2  NA NA  0.7110570
3  NA NA -0.1901676
4  NA NA  0.5900650
5  NA NA NA
6  NA NA NA
7  NA NA  0.7987658
8  NA NA -0.5225229
9  NA NA  0.7673103
10 NA NA -0.5263897

So my "nice compact command"
dat[sapply(dat, is.nan)] <- NA

works as expected, but summary gives as mean NAN.

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Luigi Marongiu
> Sent: Thursday, September 2, 2021 3:46 PM
> To: Andrew Simmons 
> Cc: r-help 
> Subject: Re: [R] How to globally convert NaN to NA in dataframe?
> 
> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I still
get
> NaN when using the summary function, for instance one of the columns give:
> ```
> Min.   : NA
> 1st Qu.: NA
> Median : NA
> Mean   :NaN
> 3rd Qu.: NA
> Max.   : NA
> NA's   :110
> ```
> I tried to implement the second solution but:
> ```
> df <- lapply(x, function(xx) {
>   xx[is.nan(xx)] <- NA
> })
> > str(df)
> List of 1
>  $ sd_ef_rash_loc___palm: logi NA
> ```
> What am I getting wrong?
> Thanks
> 
> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons 
> wrote:
> >
> > Hello,
> >
> >
> > I would use something like:
> >
> >
> > x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
> > as.data.frame() x[] <- lapply(x, function(xx) {
> > xx[is.nan(xx)] <- NA_real_
> > xx
> > })
> >
> >
> > This prevents attributes from being changed in 'x', but accomplishes the
> same thing as you have above, I hope this helps!
> >
> > On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu 
> wrote:
> >>
> >> Hello,
> >> I have some NaN values in some elements of a dataframe that I would
> >> like to convert to NA.
> >> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
> >> Is there an alternative for the global modification at once of all
> >> instances?
> >> I have seen from
> >> https://stackoverflow.com/questions/18142117/how-to-replace-nan-
> value
> >> -with-zero-in-a-huge-data-frame/18143097#18143097
> >> that once could use:
> >> ```
> >>
> >> is.nan.data.frame <- function(x)
> >> do.call(cbind, lapply(x, is.nan))
> >>
> >> data123[is.nan(data123)] <- 0
> >> ```
> >> replacing o with NA, but I got
> >> ```
> >> str(df)
> >> > logi NA
> >> ```
> >> when modifying my dataframe df.
> >> What would be the correct syntax?
> >> Thank you
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Luigi
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> --
> Best regards,
> Luigi
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.