Re: [R] detect and replace outliers by the average

Rui Barradas Thu, 20 Apr 2023 12:17:37 -0700

Às 19:58 de 20/04/2023, Rui Barradas escreveu:

Às 19:46 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:

Hi Rui:



here is the dataset

factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0 5555 520
0 610 720
0 710 670
0 610 9999
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 8888
1 6666 600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78
2 98
2 5
2 321 NA

with many thanks
abou
______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*

On Thu, Apr 20, 2023 at 2:44 PM Rui Barradas <ruipbarra...@sapo.pt>wrote:

Às 19:36 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:

Dear All:



*Re:* detect and replace outliers by the average



The dataset, please see attached, contains a group factoring column “
*factor*” and two columns of data “x1” and “x2” with some NA values. I

need

some help to detect the outliers and replace it and the NAs with the
average within each level (0,1,2) for each variable “x1” and “x2”.



I tried the below code, but it did not accomplish what I want to do.





data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)

data

replace_outlier_with_mean <- function(x) {

replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))#### ,

na.rm=TRUE NOT working

}

data[] <- lapply(data, replace_outlier_with_mean)





Thank you all very much for your help in advance.





with many thanks

abou


______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

There is no data set attached, see the posting guide on what file
extensions are allowed as attachments.

As for the question, try to compute mean(x, na.rm = TRUE)  first, then

use this value in the replace instruction. Without data I'm justguessing.


Hope this helps,

Rui Barradas

Hello,

Here is a way. It uses ave in the function to group the data by the factor.


df1 <- "factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0 5555 520
0 610 720
0 710 670
0 610 9999
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 8888
1 6666 600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78 NA
2 98 NA
2 5 NA
2 321 NA"
df1 <- read.table(text = df1, header = TRUE,
                   colClasses = c("factor", "numeric", "numeric"))


replace_outlier_with_mean <- function(x, f) {
   ave(x, f, FUN = \(y) {
     i <- is.na(y) | y %in% boxplot.stats(y, do.conf = FALSE)$out
     y[i] <- mean(y, na.rm = TRUE)
     y
   })
}

lapply(df1[-1], replace_outlier_with_mean, f = df1$factor)
#> $x1

#> [1] 700.0000 700.0000 470.0000 710.0000 1258.1250 610.0000710.0000#> [8] 610.0000 690.0000 580.0000 690.0000 1261.7778 450.0000700.0000#> [15] 400.0000 1261.7778 500.0000 680.0000 117.0000 120.0000130.0000#> [22] 120.0000 125.0000 194.6923 194.6923 130.0000 123.0000194.6923

#> [29]   98.0000  194.6923  194.6923
#>
#> $x2

#> [1] 700.0000 500.0000 470.0000 560.0000 520.0000 720.0000670.0000#> [8] 1767.3750 620.0000 540.0000 690.0000 401.0000 580.0000700.0000

#> [15] 1406.9000  600.0000  400.0000  650.0000   63.0000   68.0000 73.0000

#> [22] 69.0000 54.0000 70.0000 62.0000 168.4444 70.0000168.4444

#> [29]  168.4444  168.4444  168.4444


Hope this helps,

Rui Barradas

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

A simpler version of the same function, this time with replace(), likethe OP. The results are identical().



replace_outlier_with_mean <- function(x, f) {
  ave(x, f, FUN = \(y) {
    i <- is.na(y) | y %in% boxplot.stats(y, do.conf = FALSE)$out
    replace(y, i, mean(y, na.rm = TRUE))
  })
}

Also, my data copy&paste from a previous mail, is wrong, there are 3NA's in the wrong column. The following is better.


df1 <- read.table("data.txt", header = TRUE, sep = "\t",
                  colClasses = c("factor", "numeric", "numeric"))


Hope this helps,

Rui Barradas

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] detect and replace outliers by the average

Reply via email to