Hi Dennis and R-users

Thank you for more help. I am pretty close, but challenge still remain is
forcing the output with different length to output dataframe.

> x <- data.frame(apply(datafr1, 2, fout))
Error in data.frame(var1 = c(-0.70777998321315, 0.418602152926712,
2.08356737154810,  :
  arguments imply differing number of rows: 28, 12, 20, 19

As I need to work with >2000 variables, my intension here is to save this
output to such way that it would be further manipulated. Topline is to save
in dataframe that have extreme values for the variable concerned and
bottomline is automate to save the output printed in the screen to a
textfile.

Thank you for help once again.

Ram


On Fri, Mar 18, 2011 at 3:16 AM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
> Is this what you're after?
>
> fout <- function(x) {
>      lim <- median(x) + c(-2, 2) * mad(x)
>      x[x < lim[1] | x > lim[2]]
>    }
> > apply(datafr1, 2, fout)
> $var1
>  [1] 17.5462078 18.4548214  0.7083442  1.9207578 -1.2296787 17.4948240
>  [7] 19.5702558  1.6181150 20.9791652 -1.3542099  1.8215087 -1.0296303
> [13] 20.5237930 17.5366497 18.5657566  0.9335419 19.7519983 17.8607968
> [19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409 19.6949309
> [25]  1.9712347
>
> $var2
>  [1]  37.3822087  35.6490641  35.6000785  38.5981086  -1.6504275
> 37.1419290
>  [7]  37.7605230  40.3508689   0.6639900   2.4695841  38.8209491
> 39.9087921
> [13]  38.9907585  35.8279437   2.7870799  37.0941113   0.6308583
> 36.4556638
> [19] -10.2384849   2.8480199  -7.7680457  35.7076539  -0.5467739
> 3.4702765
> [25]  40.4818580   3.2864273   1.4917174
>
> $var3
>  [1]  74.252563  68.396391  68.845461  -5.006545  66.083402  76.036577
>  [7]  75.112586  -6.374241  63.883549  64.041216 -19.764360 -15.051017
> [13]  -9.782767  64.696013  70.970648  -4.562031 -22.135003  70.549310
> [19]  69.495915  -4.095587  86.612375  87.029526  70.072126  -6.421695
> [25]  65.737536
>
> $var4
>  [1]  81.476483  87.098767 -10.451616  91.927329  86.588952  85.080950
>  [7]  84.958645  -9.456368  86.270876 -22.936779  83.314032
>
> Double checks:
> > apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))
>          var1      var2      var3      var4
> [1,]  2.12167  3.779415 -3.736066 -3.471752
> [2,] 17.37176 34.929800 62.969733 80.224799
> > apply(datafr1, 2, range)
>           var1      var2      var3      var4
> [1,] -2.668841 -10.23848 -22.13500 -22.93678
> [2,] 21.803714  40.48186  87.02953  91.92733
>
> Assuming you wanted to do this columnwise (by variable), it appears to be
> doing the right thing.
>
> HTH,
> Dennis
>
>
> On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma <sharma.ra...@gmail.com>wrote:
>
>> Dear R community members
>>
>> I have been struggling on this simple question, but never get appropriate
>> solution. So please help.
>>
>>  # my data, though I have a large number of variables
>> var1 <- rnorm(500, 10,4)
>> var2 <- rnorm(500, 20, 8)
>> var3 <- rnorm(500, 30, 18)
>> var4 <- rnorm(500, 40, 20)
>> datafr1 <- data.frame(var1, var2, var3, var4)
>>
>> # my unsuccessful codes
>>  nvar <- ncol(datafr1)
>> for (i in 1:nvar) {
>>              out1 <- NULL
>>              out2 <- NULL
>>              medianx <- median(getdata[,i], na.rm = TRUE)
>>              show(madx <- mad(getdata[,i], na.rm = TRUE))
>>              MD1 <- c(medianx + 2*madx)
>>              MD2 <- c(medianx - 2*madx)
>>              out1[i] <- which(getdata[,i] > MD1) # store data that are
>> greater than median + 2 mad
>>              out2[i] <- which (getdata[,1] < MD2) # store data that are
>> greater than median - 2 mad
>>             resultdf <- data.frame(out1, out2)
>>             write.table (resultdf, "out.csv", sep=",")
>>              }
>>
>>
>> My idea here is to store those value which are either greater than median
>> +
>> 2 *MAD or less than median - 2*MAD. Each variable have different length of
>> output.
>>
>> The following last error message:
>> Error in data.frame(out1, out2) :
>>  arguments imply differing number of rows: 2, 0
>> In addition: Warning messages:
>> 1: In out1[i] <- which(getdata[, i] > MD1) :
>>  number of items to replace is not a multiple of replacement length
>> 2: In out2[i] <- which(getdata[, 1] < MD2) :
>>  number of items to replace is not a multiple of replacement length
>> 3: In out1[i] <- which(getdata[, i] > MD1) :
>>  number of items to replace is not a multiple of replacement length
>>
>> Thank you in advance for helping me.
>>
>> Best regards;
>> RHS
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to