On Mar 18, 2011, at 10:53 AM, Ram H. Sharma wrote:

Thanks, Jim for the idea.

I tried with save as list. I can not write to a table with "write.table", I could not find a function that is write.list or equivalent. Even if it is
list I think it would be difficult to post-processing than as table.

outx<- as.list(apply(datafr1, 2, fout))
write.table (outx, "outlier.csv", sep=",")

Use `dump` to save as an R an object that can later be `source`()- eded, which is what I think you want, .... or `capture.output` to save as the text representation you would see at the console which would suffer from difficulty in restoring as an R object.

--
David.

Ram



On Fri, Mar 18, 2011 at 10:04 AM, jim holtman <jholt...@gmail.com> wrote:

I think it was suggested that you save your output to a 'list' and
then you will have it in a format that can accept variable numbers of
items in each element and it is also in a form that you can easily
process it to create whatever other output you might need.

On Fri, Mar 18, 2011 at 7:24 AM, Ram H. Sharma <sharma.ra...@gmail.com >
wrote:
Hi Dennis and R-users

Thank you for more help. I am pretty close, but challenge still remain is
forcing the output with different length to output dataframe.

x <- data.frame(apply(datafr1, 2, fout))
Error in data.frame(var1 = c(-0.70777998321315, 0.418602152926712,
2.08356737154810,  :
arguments imply differing number of rows: 28, 12, 20, 19

As I need to work with >2000 variables, my intension here is to save this output to such way that it would be further manipulated. Topline is to
save
in dataframe that have extreme values for the variable concerned and
bottomline is automate to save the output printed in the screen to a
textfile.

Thank you for help once again.

Ram


On Fri, Mar 18, 2011 at 3:16 AM, Dennis Murphy <djmu...@gmail.com>
wrote:

Hi:

Is this what you're after?

fout <- function(x) {
    lim <- median(x) + c(-2, 2) * mad(x)
    x[x < lim[1] | x > lim[2]]
  }
apply(datafr1, 2, fout)
$var1
[1] 17.5462078 18.4548214 0.7083442 1.9207578 -1.2296787 17.4948240 [7] 19.5702558 1.6181150 20.9791652 -1.3542099 1.8215087 -1.0296303 [13] 20.5237930 17.5366497 18.5657566 0.9335419 19.7519983 17.8607968 [19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409 19.6949309
[25] 1.9712347

$var2
[1]  37.3822087  35.6490641  35.6000785  38.5981086  -1.6504275
37.1419290
[7]  37.7605230  40.3508689   0.6639900   2.4695841  38.8209491
39.9087921
[13]  38.9907585  35.8279437   2.7870799  37.0941113   0.6308583
36.4556638
[19] -10.2384849   2.8480199  -7.7680457  35.7076539  -0.5467739
3.4702765
[25]  40.4818580   3.2864273   1.4917174

$var3
[1] 74.252563 68.396391 68.845461 -5.006545 66.083402 76.036577 [7] 75.112586 -6.374241 63.883549 64.041216 -19.764360 -15.051017 [13] -9.782767 64.696013 70.970648 -4.562031 -22.135003 70.549310 [19] 69.495915 -4.095587 86.612375 87.029526 70.072126 -6.421695
[25] 65.737536

$var4
[1] 81.476483 87.098767 -10.451616 91.927329 86.588952 85.080950
[7]  84.958645  -9.456368  86.270876 -22.936779  83.314032

Double checks:
apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))
        var1      var2      var3      var4
[1,]  2.12167  3.779415 -3.736066 -3.471752
[2,] 17.37176 34.929800 62.969733 80.224799
apply(datafr1, 2, range)
         var1      var2      var3      var4
[1,] -2.668841 -10.23848 -22.13500 -22.93678
[2,] 21.803714  40.48186  87.02953  91.92733

Assuming you wanted to do this columnwise (by variable), it appears to
be
doing the right thing.

HTH,
Dennis


On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma <sharma.ra...@gmail.com
wrote:

Dear R community members

I have been struggling on this simple question, but never get
appropriate
solution. So please help.

# my data, though I have a large number of variables
var1 <- rnorm(500, 10,4)
var2 <- rnorm(500, 20, 8)
var3 <- rnorm(500, 30, 18)
var4 <- rnorm(500, 40, 20)
datafr1 <- data.frame(var1, var2, var3, var4)

# my unsuccessful codes
nvar <- ncol(datafr1)
for (i in 1:nvar) {
            out1 <- NULL
            out2 <- NULL
            medianx <- median(getdata[,i], na.rm = TRUE)
            show(madx <- mad(getdata[,i], na.rm = TRUE))
            MD1 <- c(medianx + 2*madx)
            MD2 <- c(medianx - 2*madx)
out1[i] <- which(getdata[,i] > MD1) # store data that are
greater than median + 2 mad
out2[i] <- which (getdata[,1] < MD2) # store data that are
greater than median - 2 mad
           resultdf <- data.frame(out1, out2)
           write.table (resultdf, "out.csv", sep=",")
            }


My idea here is to store those value which are either greater than
median
+
2 *MAD or less than median - 2*MAD. Each variable have different length
of
output.

The following last error message:
Error in data.frame(out1, out2) :
arguments imply differing number of rows: 2, 0
In addition: Warning messages:
1: In out1[i] <- which(getdata[, i] > MD1) :
number of items to replace is not a multiple of replacement length
2: In out2[i] <- which(getdata[, 1] < MD2) :
number of items to replace is not a multiple of replacement length
3: In out1[i] <- which(getdata[, i] > MD1) :
number of items to replace is not a multiple of replacement length

Thank you in advance for helping me.

Best regards;
RHS

      [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html >
and provide commented, minimal, self-contained, reproducible code.




      [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html >
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to