On Mar 18, 2011, at 10:53 AM, Ram H. Sharma wrote:
Thanks, Jim for the idea.
I tried with save as list. I can not write to a table with
"write.table", I
could not find a function that is write.list or equivalent. Even if
it is
list I think it would be difficult to post-processing than as table.
outx<- as.list(apply(datafr1, 2, fout))
write.table (outx, "outlier.csv", sep=",")
Use `dump` to save as an R an object that can later be `source`()-
eded, which is what I think you want,
.... or `capture.output` to save as the text representation you would
see at the console which would suffer from difficulty in restoring as
an R object.
--
David.
Ram
On Fri, Mar 18, 2011 at 10:04 AM, jim holtman <jholt...@gmail.com>
wrote:
I think it was suggested that you save your output to a 'list' and
then you will have it in a format that can accept variable numbers of
items in each element and it is also in a form that you can easily
process it to create whatever other output you might need.
On Fri, Mar 18, 2011 at 7:24 AM, Ram H. Sharma <sharma.ra...@gmail.com
>
wrote:
Hi Dennis and R-users
Thank you for more help. I am pretty close, but challenge still
remain is
forcing the output with different length to output dataframe.
x <- data.frame(apply(datafr1, 2, fout))
Error in data.frame(var1 = c(-0.70777998321315, 0.418602152926712,
2.08356737154810, :
arguments imply differing number of rows: 28, 12, 20, 19
As I need to work with >2000 variables, my intension here is to
save this
output to such way that it would be further manipulated. Topline
is to
save
in dataframe that have extreme values for the variable concerned and
bottomline is automate to save the output printed in the screen to a
textfile.
Thank you for help once again.
Ram
On Fri, Mar 18, 2011 at 3:16 AM, Dennis Murphy <djmu...@gmail.com>
wrote:
Hi:
Is this what you're after?
fout <- function(x) {
lim <- median(x) + c(-2, 2) * mad(x)
x[x < lim[1] | x > lim[2]]
}
apply(datafr1, 2, fout)
$var1
[1] 17.5462078 18.4548214 0.7083442 1.9207578 -1.2296787
17.4948240
[7] 19.5702558 1.6181150 20.9791652 -1.3542099 1.8215087
-1.0296303
[13] 20.5237930 17.5366497 18.5657566 0.9335419 19.7519983
17.8607968
[19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409
19.6949309
[25] 1.9712347
$var2
[1] 37.3822087 35.6490641 35.6000785 38.5981086 -1.6504275
37.1419290
[7] 37.7605230 40.3508689 0.6639900 2.4695841 38.8209491
39.9087921
[13] 38.9907585 35.8279437 2.7870799 37.0941113 0.6308583
36.4556638
[19] -10.2384849 2.8480199 -7.7680457 35.7076539 -0.5467739
3.4702765
[25] 40.4818580 3.2864273 1.4917174
$var3
[1] 74.252563 68.396391 68.845461 -5.006545 66.083402
76.036577
[7] 75.112586 -6.374241 63.883549 64.041216 -19.764360
-15.051017
[13] -9.782767 64.696013 70.970648 -4.562031 -22.135003
70.549310
[19] 69.495915 -4.095587 86.612375 87.029526 70.072126
-6.421695
[25] 65.737536
$var4
[1] 81.476483 87.098767 -10.451616 91.927329 86.588952
85.080950
[7] 84.958645 -9.456368 86.270876 -22.936779 83.314032
Double checks:
apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))
var1 var2 var3 var4
[1,] 2.12167 3.779415 -3.736066 -3.471752
[2,] 17.37176 34.929800 62.969733 80.224799
apply(datafr1, 2, range)
var1 var2 var3 var4
[1,] -2.668841 -10.23848 -22.13500 -22.93678
[2,] 21.803714 40.48186 87.02953 91.92733
Assuming you wanted to do this columnwise (by variable), it
appears to
be
doing the right thing.
HTH,
Dennis
On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma <sharma.ra...@gmail.com
wrote:
Dear R community members
I have been struggling on this simple question, but never get
appropriate
solution. So please help.
# my data, though I have a large number of variables
var1 <- rnorm(500, 10,4)
var2 <- rnorm(500, 20, 8)
var3 <- rnorm(500, 30, 18)
var4 <- rnorm(500, 40, 20)
datafr1 <- data.frame(var1, var2, var3, var4)
# my unsuccessful codes
nvar <- ncol(datafr1)
for (i in 1:nvar) {
out1 <- NULL
out2 <- NULL
medianx <- median(getdata[,i], na.rm = TRUE)
show(madx <- mad(getdata[,i], na.rm = TRUE))
MD1 <- c(medianx + 2*madx)
MD2 <- c(medianx - 2*madx)
out1[i] <- which(getdata[,i] > MD1) # store data
that are
greater than median + 2 mad
out2[i] <- which (getdata[,1] < MD2) # store data
that are
greater than median - 2 mad
resultdf <- data.frame(out1, out2)
write.table (resultdf, "out.csv", sep=",")
}
My idea here is to store those value which are either greater than
median
+
2 *MAD or less than median - 2*MAD. Each variable have different
length
of
output.
The following last error message:
Error in data.frame(out1, out2) :
arguments imply differing number of rows: 2, 0
In addition: Warning messages:
1: In out1[i] <- which(getdata[, i] > MD1) :
number of items to replace is not a multiple of replacement length
2: In out2[i] <- which(getdata[, 1] < MD2) :
number of items to replace is not a multiple of replacement length
3: In out1[i] <- which(getdata[, i] > MD1) :
number of items to replace is not a multiple of replacement length
Thank you in advance for helping me.
Best regards;
RHS
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html
>
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html
>
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.