Re: [R] Entropy based feature selection in R
Hello everyone, Any thoughts in this one please? The only thing I found was the FSelector package (http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction/Feature_Selection#Aviable_Feature_Ranking_Techniques_in_FSelector_Package). Unfortunately though it seems to be far from scalable on my data (~300k features, ~10k instances). I would appreciate some advice on this. Thanks in advance. Andy -- View this message in context: http://r.789695.n4.nabble.com/Entropy-based-feature-selection-in-R-tp3708056p3740878.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automating an R function call
Thanks for your help everyone! I'm happy enough with an asynchronous solution here. Thanks! Robert -- View this message in context: http://r.789695.n4.nabble.com/Automating-an-R-function-call-tp3740070p3740333.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adjacency Matrix help
I have created an adjacency matrix but have not been able to figure something out. I need to put zeros on the diagonal of the adjacency matrix. For instance, location (i,i) to equal 0. Please help. Thanks -- View this message in context: http://r.789695.n4.nabble.com/Adjacency-Matrix-help-tp3740946p3740946.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding an average time spent
Hello R help! I am extremely new to R (as in 3 just days) and I've been using it to do some pretty basic things. I am frustratingly stuck on one point, and am so so so close to figuring it out, but just far enough away to ask for some (perhaps embarrassingly easy) help. I have a dataset, visitors, that has a variable called Time.Spent. Time.Spent consists of times in the format hh:mm:ss , and it is a measurement, kind of like a timer, of the amount of time someone spent in a museum exhibit. I need to find the average time spent. I've figured the easiest way to do this would be to convert it into seconds. I found a function that someone wrote on how to do this here: http://stackoverflow.com/questions/1389428/dealing-with-time-periods-such-as-5-minutes-and-30-seconds-in-r I thought this would be the answer! However, when I run the code, it works perfectly for the first variable in the first observation, but then repeats the same answer all the way down the rows. Sorry for the wordiness, here's the code I have: # The function to convert hh:mm:ss into just seconds: time.to.seconds - function(time) { time - strsplit(time, :)[[1]] return ((as.numeric(time[1]) * 60 * 60) + (as.numeric(time[2]) * 60) + (as.numeric(time[3]))) } # I've tried many things to then create a new variable in the dataset visitors: visitors$TimeInSeconds - time.to.seconds(time=c(visitors$Time.Spent)) # Or visitors$TimeInSeconds - time.to.seconds(visitors$Time.Spent) I figure it has something to do with the fact that strsplit() makes a list? Do I need a loop to go through each variable? I know this is a huge question but any hints at al would be very much appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Finding-an-average-time-spent-tp3740391p3740391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sapply to bind columns, with repeat?
Hi Weidong Gu, This works! For my clarity, and so I can repeat this process if need be: The 'mat' generates a matrix using whatever is supplied to x (i.e. coop.dat) using the columns from position 9:length(x) of 6 columns (by row). The 'rem.col' generates a matrix of the first 1:8 columns of 8 columns. The 'return' statement calls the function to cbind together rem.col and mat. Then 'apply' this all to coop.dat, by rows, using function reorg. Is this correct? Thank you very much, Katrina On Fri, Aug 12, 2011 at 10:28 AM, Weidong Gu anopheles...@gmail.com wrote: Katrina, try this. reorg-function(x){ mat-matrix(x[9:length(x)],ncol=6,byrow=T) rem.col-matrix(rep(x[1:8],nrow(mat)),byrow=T,ncol=8) return(data.frame(cbind(rem.col,mat))) } co-do.call('rbind',apply(coop.dat,1,function(x) reorg(x))) You may need to tweak a bit to fit exactly what you want. Weidong Gu On Fri, Aug 12, 2011 at 2:35 AM, Katrina Bennett kebenn...@alaska.edu wrote: Hi R-help, I am working with US COOP network station data and the files are concatenated in single rows for all years, but I need to pull these apart into rows for each day. To do this, I need to extract part of each row such as station id, year, mo, and repeat this against other variables in the row (days). My problem is that there are repeated values for each day, and the files are fixed width field without order. Here is an example of just one line of data. coop.raw - c(DLY09752806TMAX F2010010620107 00049 20107 00062 B0207 00041 20207 00049 B0307 00040 20307 00041 B0407 00042 20407 00040 B0507 00041 20507 00042 B0607 00043 20607 00041 B0707 00055 20707 00043 B0807 00039 20807 00055 B0907 00037 20907 00039 B1007 00038 21007 00037 B1107 00048 21107 00038 B1207 00050 21207 00048 B1307 00051 21307 00050 B1407 00058 21407 00051 B1507 00068 21507 00058 B1607 00065 21607 00068 B1707 00068 21707 00065 B1807 00067 21807 00068 B1907 00068 21907 00067 B2007 00069 22007 00068 B2107 00057 22107 00069 B2207 00048 22207 00057 B2307 00051 22307 00048 B2407 00073 22407 00051 B2507 00062 22507 00073 B2607 00056 22607 00062 B2707 00053 22707 00056 B2807 00064 22807 00053 B2907 00057 22907 00064 B3007 00047 23007 00057 B3107 00046 23107 00047 B) write.csv(coop.raw, coop.tmp, row.names=F, quote=F) coop.dat - read.fwf(coop.tmp, widths = c(c(3,8,4,2,4,2,4,3),rep(c(2,2,1,5,1,1),62)), na.strings=c(), skip=1, as.is=T) rep.name - rep(c(day,hr,met,dat,fl1,fl2), 62) rep.count - rep(c(1:62), each=6, 1) names(coop.dat) - c(rect, id, elem, unt, year, mo, fill, numval, paste(rep.name, rep.count, sep=_)) I would like to generate output that contains in one row, the columns id, elem, unt, year, mo, and numval. Binded to these initial columns, I would like only day_1, hr_1, met_1, dat_1, fl1_1, and fl2_1. Then, in the next row I would like repeated the initial columns id, elem, unt, year, mo, and numval and then binded day_2, hr_2, met_2, dat_2, fl1_2, and f2_2 and so on until all the data for all rows has been allocated. Then, move onto the next row and repeat. I think I should be able to do this with some sort of sapply or lapply function, but I'm struggling with the format for repeating the initial columns, and then skipping through the next columns. Thank you, Katrina __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] post
Hello, I was trying to plot multiple graph using par(mfrow=c(3,2)). But this is giving me the following error: Error in axis(side = side, at = at, labels = labels, ...) : X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 8 could not be loaded Could someone decode this error, please. Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjacency Matrix help
diag(adjMatrix) -0 On Aug 13, 2011, at 7:34 AM, collegegurl69 wrote: I have created an adjacency matrix but have not been able to figure something out. I need to put zeros on the diagonal of the adjacency matrix. For instance, location (i,i) to equal 0. Please help. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding an average time spent
On Fri, Aug 12, 2011 at 4:23 PM, erinbspace erin.brasw...@gmail.com wrote: Hello R help! I am extremely new to R (as in 3 just days) and I've been using it to do some pretty basic things. I am frustratingly stuck on one point, and am so so so close to figuring it out, but just far enough away to ask for some (perhaps embarrassingly easy) help. I have a dataset, visitors, that has a variable called Time.Spent. Time.Spent consists of times in the format hh:mm:ss , and it is a measurement, kind of like a timer, of the amount of time someone spent in a museum exhibit. I need to find the average time spent. I've figured the easiest way to do library(chron) Time.Spent - c(12:12:10, 13:12:10) times(mean(as.numeric(times(Time.Spent [1] 12:42:10 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjacency Matrix help
Thanks so much for your quick reply. it seems to work. the problem is that it now places actual zeros on the diagonal whereas the rest of the adjacency matrix has dots to represent zeroes. Do you have any ideas on how to change these zeros to dots like in the rest of the adj matrix? Or is it the same thing? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Adjacency-Matrix-help-tp3740946p3740996.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimization problems
Dear R users I am trying to use OPTIMX(OPTIM) for nonlinear optimization. There is no error in my code but the results are so weird (see below). When I ran via OPTIM, the results are that Initial values are that theta0 = 0.6 1.6 0.6 1.6 0.7. (In fact true vales are 0.5,1.0,0.8,1.2, 0.6.) optim(par=theta0, fn=obj.fy, method=BFGS, control=list(trace=1, maxit=1), hessian=T) initial value -0.027644 final value -0.027644 converged $par [1] 0.6 1.6 0.6 1.6 0.7 $value [1] -0.02764405 $counts function gradient 11 $convergence [1] 0 $message NULL $hessian [,1] [,2] [,3] [,4] [,5] [1,]00000 [2,]00000 [3,]00000 [4,]00000 [5,]00000 When I ran via OPTIMX, the results are that optimx(par=theta0, fn=obj.fy, method=BFGS, control=list(maxit=1), hessian=T) par fvalues method fns grs itns conv KKT1 KKT2 xtimes 1 0.6, 1.6, 0.6, 1.6, 0.7 -0.02764405 BFGS 1 1 NULL0 TRUE NA 8.71 Whenever I used different initial values, the initial ones are the answer of OPTIMX(OPTIM). Would you plz explain why it happened? or any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/optimization-problems-tp3741005p3741005.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization problems
To be honest, The first derivative of my objective function is very complicated so I ignore this. Could it lead to this sort of problem? Kathie -- View this message in context: http://r.789695.n4.nabble.com/optimization-problems-tp3741005p3741010.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any alternatives to draw.colorkey from lattice package?
On 08/13/2011 04:34 AM, Mikhail Titov wrote: Hello! I’d like to have a continuous color bar on my lattice xyplot with colors lets say from topo.colors such that it has ticks labels at few specific points only. Right now I use do.breaks level.colors with somewhat large number of steps. The problem is that color change point doesn’t necessary correspond to the value I’d like to label. Since I have many color steps and I don’t need high precision I generate labels like this labels- ifelse( sapply(at,function(x) any(abs(att-x).03)) , sprintf(depth= %s ft, at), ) , where `att` has mine points of interest on color scale bar and `at` corresponds to color change points used with level.colors . It is a bit inconvenient as I have to adjust threshold `.03`, number of color steps so that it labels only adjacent color change point with my labels. Q: Are there any ready to use functions that would generate some kind of GRaphical OBject with continuous color scale bar/key with custom at/labels such that it would work with `legend` argument of xyplot from lattice? Hi Mikhail, I think that color.legend in the plotrix package will do what you are asking, but it is in base graphics, and may not work with lattice. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding an average time spent
On 08/13/2011 06:23 AM, erinbspace wrote: Hello R help! I am extremely new to R (as in 3 just days) and I've been using it to do some pretty basic things. I am frustratingly stuck on one point, and am so so so close to figuring it out, but just far enough away to ask for some (perhaps embarrassingly easy) help. I have a dataset, visitors, that has a variable called Time.Spent. Time.Spent consists of times in the format hh:mm:ss , and it is a measurement, kind of like a timer, of the amount of time someone spent in a museum exhibit. I need to find the average time spent. I've figured the easiest way to do this would be to convert it into seconds. I found a function that someone wrote on how to do this here: http://stackoverflow.com/questions/1389428/dealing-with-time-periods-such-as-5-minutes-and-30-seconds-in-r I thought this would be the answer! However, when I run the code, it works perfectly for the first variable in the first observation, but then repeats the same answer all the way down the rows. Sorry for the wordiness, here's the code I have: # The function to convert hh:mm:ss into just seconds: time.to.seconds- function(time) { time- strsplit(time, :)[[1]] return ((as.numeric(time[1]) * 60 * 60) + (as.numeric(time[2]) * 60) + (as.numeric(time[3]))) } # I've tried many things to then create a new variable in the dataset visitors: visitors$TimeInSeconds- time.to.seconds(time=c(visitors$Time.Spent)) # Or visitors$TimeInSeconds- time.to.seconds(visitors$Time.Spent) I figure it has something to do with the fact that strsplit() makes a list? Do I need a loop to go through each variable? I know this is a huge question but any hints at al would be very much appreciated. Hi erinbspace, By hard coding the [[1]] in your function, you are automatically taking the first element of any list. If you want to convert a vector of times, try this: time.to.seconds- function(time) { time-strsplit(time, :)[[1]] return(as.numeric(time[1]) * 3600 + as.numeric(time[2]) * 60 + as.numeric(time[3])) } watch.times-c(0:2:31,0:4:12,0:0:47) # use sapply to step through the vector of times sapply(watch.times,time.to.seconds) 0:2:31 0:4:12 0:0:47 151252 47 Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any alternatives to draw.colorkey from lattice package?
You can just specify the label positions, you don't need to give labels for every color change point: (there is an 'at' for the color changes and a 'labels$at' for the labels) levelplot(rnorm(100) ~ x * y, expand.grid(x = 1:10, y = 1:10), colorkey = list(at = seq(-3,3,length=100), labels = list(labels = paste(-3:3, units), at = -3:3))) On 13 August 2011 19:59, Jim Lemon j...@bitwrit.com.au wrote: On 08/13/2011 04:34 AM, Mikhail Titov wrote: Hello! I’d like to have a continuous color bar on my lattice xyplot with colors lets say from topo.colors such that it has ticks labels at few specific points only. Right now I use do.breaks level.colors with somewhat large number of steps. The problem is that color change point doesn’t necessary correspond to the value I’d like to label. Since I have many color steps and I don’t need high precision I generate labels like this labels- ifelse( sapply(at,function(x) any(abs(att-x).03)) , sprintf(depth= %s ft, at), ) , where `att` has mine points of interest on color scale bar and `at` corresponds to color change points used with level.colors . It is a bit inconvenient as I have to adjust threshold `.03`, number of color steps so that it labels only adjacent color change point with my labels. Q: Are there any ready to use functions that would generate some kind of GRaphical OBject with continuous color scale bar/key with custom at/labels such that it would work with `legend` argument of xyplot from lattice? Hi Mikhail, I think that color.legend in the plotrix package will do what you are asking, but it is in base graphics, and may not work with lattice. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression
dear R users, my data looks like this PM10 Ref UZ JZ WT RH FT WR 1 10.973195 4.338874 nein Winter Dienstag ja nein West 26.381684 2.250446 nein SommerSonntag nein ja Süd 3 62.586512 66.304869 ja SommerSonntag nein nein Ost 45.590101 8.526152 ja Sommer Donnerstag nein nein Nord 5 30.925054 16.073091 nein WinterSonntag nein nein Ost 6 10.750567 2.285075 nein Winter Mittwoch nein nein Süd 7 39.118316 17.128691 ja SommerSonntag nein nein Ost 89.327564 7.038572 ja Sommer Montag nein nein Nord 9 52.271744 15.021977 nein Winter Montag nein nein Ost 10 27.388416 22.449102 ja Sommer Montag nein nein Ost . . . . til 200 I'm trying to make a linear regression between PM10 and Ref for each of the four WR, I've tried this: plot(Nord$PM10 ~ Nord$Ref, main=Nord, xlab=Ref, ylab=PM10) but it does not work, because Nord cannot be found what was wrong? how can I do it? please help me [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing on groups argument to xyplot within a plotting function
The problem is that xyplot tries to evaluate 'groups' in 'data' or in the formula environment. Your local function environment (where the variable named groups is defined) is neither of these. There are a couple of ways to get the evaluation to work out; here is one: pb - list(F1 = 1:8, F2 = 1:8, Type = c('a','a','a','a','b','b','b','b')) foo - function(x,data,groups, ...){ ccall - quote(xyplot(x,data=data, ...)) ccall$groups - substitute(groups) eval(ccall) } foo(F1 ~ F2, pb, groups = Type) Hope that helps -Felix On 11 August 2011 19:42, Fredrik Karlsson dargo...@gmail.com wrote: Hi, I am constructing a plotting function that I would like to behave like plotting functions within the lattice package. It takes a groups argument, which I process, and then I would like to pass that argument on to the xyplot function for the actual plotting. However, what ever I do, get an error that the variable is missing. A short illustration: Given the data set names(pb) [1] Type Sex Speaker Vowel IPA F0 F1 [8] F2 F3 and these test functions: TESTFUN - function(x,data,groups){ xyplot(x,data=data,groups=groups) } TESTFUN2 - function(x,data,groups){ xyplot(x,data=data,groups=substitute(groups)) } TESTFUN3 - function(x,data,groups){ groups - eval(substitute(groups), data, environment(x)) xyplot(x,data=data,groups=groups) } I fail to get groups to be passed on to xyplot correctly: TESTFUN(F1 ~ F2,data=pb,groups=Type) Error in eval(expr, envir, enclos) : object 'groups' not found TESTFUN2(F1 ~ F2,data=pb,groups=Type) Error in prepanel.default.function(groups = groups, x = c(2280L, 2400L, : object 'groups' not found TESTFUN3(F1 ~ F2,data=pb,groups=Type) Error in eval(expr, envir, enclos) : object 'groups' not found Please help me understand what I am doing wrong. /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Excluding NAs from round correlation
Hello, I am quite new to R and I am trying to get a round correlation from a table with dozens of columns. However, all the columns contain several blank places which show to me as NAs. Then, when I type round(cor(data),2), I get no results - everything (except correlation of one column with the same one, of course) is NA. I do not want to replace NA with zero, because it would ruin the results. I just want R not to look at NA and correlate just places with numbers. Is it possible? Thank you very much for help! -- View this message in context: http://r.789695.n4.nabble.com/Excluding-NAs-from-round-correlation-tp3741296p3741296.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression
your dataframe needs to be called Nord. If it is not, then replace Nord with the actual name of your dataframe On Sat, Aug 13, 2011 at 10:43 PM, maggy yan kiot...@googlemail.com wrote: dear R users, my data looks like this PM10 Ref UZ JZ WT RH FT WR 1 10.973195 4.338874 nein Winter Dienstag ja nein West 2 6.381684 2.250446 nein Sommer Sonntag nein ja Süd 3 62.586512 66.304869 ja Sommer Sonntag nein nein Ost 4 5.590101 8.526152 ja Sommer Donnerstag nein nein Nord 5 30.925054 16.073091 nein Winter Sonntag nein nein Ost 6 10.750567 2.285075 nein Winter Mittwoch nein nein Süd 7 39.118316 17.128691 ja Sommer Sonntag nein nein Ost 8 9.327564 7.038572 ja Sommer Montag nein nein Nord 9 52.271744 15.021977 nein Winter Montag nein nein Ost 10 27.388416 22.449102 ja Sommer Montag nein nein Ost . . . . til 200 I'm trying to make a linear regression between PM10 and Ref for each of the four WR, I've tried this: plot(Nord$PM10 ~ Nord$Ref, main=Nord, xlab=Ref, ylab=PM10) but it does not work, because Nord cannot be found what was wrong? how can I do it? please help me [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excluding NAs from round correlation
check ?cor Please note the parameter 'use' Weidong Gu On Sat, Aug 13, 2011 at 9:06 AM, Julie julie.novak...@gmail.com wrote: Hello, I am quite new to R and I am trying to get a round correlation from a table with dozens of columns. However, all the columns contain several blank places which show to me as NAs. Then, when I type round(cor(data),2), I get no results - everything (except correlation of one column with the same one, of course) is NA. I do not want to replace NA with zero, because it would ruin the results. I just want R not to look at NA and correlate just places with numbers. Is it possible? Thank you very much for help! -- View this message in context: http://r.789695.n4.nabble.com/Excluding-NAs-from-round-correlation-tp3741296p3741296.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post
On 13.08.2011 06:52, bdeep...@ibab.ac.in wrote: Hello, I was trying to plot multiple graph using par(mfrow=c(3,2)). But this is giving me the following error: Error in axis(side = side, at = at, labels = labels, ...) : X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 8 could not be loaded The font is missing. You may want to install some more fonts on your machine. Uwe Ligges Could someone decode this error, please. Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization problems
optimx with BFGS uses optim, so you actually incur some overhead unnecessarily. And BFGS really needs good gradients (as does Rvmmin and Rcgmin which are updated BFGS and CG, but all in R and with bounds or box constraints). From the Hessian, your function is (one of the many!) that have pretty bad numerical properties. With all 0s, Newton is spinning in his grave. Probably the gradient is small also. So the optimizers decide they are at a minimum. As a first step, I'd suggest - checking that the function is computed correctly. That is, does your function give the correct value? - try a few other points nearby. Are any lower than your first point? - Use numDeriv and get the gradient (and possibly Hessian) at each of these nearby points. These steps may reveal either that you have a bug in the function, or that it is pretty nasty numerically. In the latter case, you really need to try to find an equivalent function e.g., log(f) that can be minimized more easily. For information, I'm rather slowly working on a function test suite to do this. Also a lot of changes are going on in optimx to try to catch some of the various nasties. These appear first in the R-forge development versions. Use and comments welcome. If you DO find a lower point, then I'd give Nelder-Mead a try. Ravi Varadhan has a variant of this that may do a little better in dfoptim. You could also be a bit lazy and try optimx with the control all.methods=TRUE. Not recommended for production use, but often helpful in seeing if any method can get some traction. Cheers, JN On 08/13/2011 06:00 AM, r-help-requ...@r-project.org wrote: -- Message: 47 Date: Sat, 13 Aug 2011 01:12:09 -0700 (PDT) From: Kathie kathryn.lord2...@gmail.com To: r-help@r-project.org Subject: [R] optimization problems Message-ID: 1313223129383-3741005.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Dear R users I am trying to use OPTIMX(OPTIM) for nonlinear optimization. There is no error in my code but the results are so weird (see below). When I ran via OPTIM, the results are that Initial values are that theta0 = 0.6 1.6 0.6 1.6 0.7. (In fact true vales are 0.5,1.0,0.8,1.2, 0.6.) optim(par=theta0, fn=obj.fy, method=BFGS, control=list(trace=1, maxit=1), hessian=T) initial value -0.027644 final value -0.027644 converged $par [1] 0.6 1.6 0.6 1.6 0.7 $value [1] -0.02764405 $counts function gradient 11 $convergence [1] 0 $message NULL $hessian [,1] [,2] [,3] [,4] [,5] [1,]00000 [2,]00000 [3,]00000 [4,]00000 [5,]00000 When I ran via OPTIMX, the results are that optimx(par=theta0, fn=obj.fy, method=BFGS, control=list(maxit=1), hessian=T) par fvalues method fns grs itns conv KKT1 KKT2 xtimes 1 0.6, 1.6, 0.6, 1.6, 0.7 -0.02764405 BFGS 1 1 NULL0 TRUE NA 8.71 Whenever I used different initial values, the initial ones are the answer of OPTIMX(OPTIM). Would you plz explain why it happened? or any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/optimization-problems-tp3741005p3741005.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimization problems
Kathie, It is very difficult to help without adequate information. What does your objective function look like? Are you maximizing (in which case you have to make sure that the sign of the objective function is correct) or minimizing? Can you try optimx with the control option all.methods=TRUE? Hope this is helpful, Ravi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define variables from a matrix
There may well be more efficient ways to do this, but here's one attempt: foo - function(x, val) if(any(x == val, na.rm = TRUE)) which(x == val) else NA u - apply(A, 1, function(x) foo(x, 20L)) v - apply(A, 1, function(x) foo(x, 100L)) ifelse(u v, v, NA) [1] 3 5 NA NA NA HTH, Dennis On Fri, Aug 12, 2011 at 7:18 PM, gallon li gallon...@gmail.com wrote: I have a following matrix and wish to define a variable based the variable A=matrix(0,5,5) A[1,]=c(30,20,100,120,90) A[2,]=c(40,30,20,50,100) A[3,]=c(50,50,40,30,30) A[4,]=c(30,20,40,50,50) A[5,]=c(30,50,NA,NA,100) A [,1] [,2] [,3] [,4] [,5] [1,] 30 20 100 120 90 [2,] 40 30 20 50 100 [3,] 50 50 40 30 30 [4,] 30 20 40 50 50 [5,] 30 50 NA NA 100 I want to define two variables: X is the first column in each row that is equal to 20, for example, for the first row, I need X=2; 2nd row, X=3; 3rd row, X=NA; 4th row, X=2, 5th row, X=NA; Y is then the first column in each row that is equal to 100 if before this a 20 has been reached, for example, for the first row, Y=3; 2nd row, Y=5; 3rd row, Y=NA, 4th row, Y=NA; 5th row, Y=NA. the matrix may involve NA as well. How can I define these two variables quickly? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression
Hi: Try something like this, using dat as the name of your data frame: xyplot(PM10 ~ Ref | WR, data = dat, type = c('p', 'r')) The plot looks silly with the data snippet you provided, but should hopefully look more sensible with the complete data. The code creates a four panel plot, one per direction, with points and a least squares regression line fit in each panel. The regression line is specific to a data subset, not the entire data frame. HTH, Dennis On Sat, Aug 13, 2011 at 5:43 AM, maggy yan kiot...@googlemail.com wrote: dear R users, my data looks like this PM10 Ref UZ JZ WT RH FT WR 1 10.973195 4.338874 nein Winter Dienstag ja nein West 2 6.381684 2.250446 nein Sommer Sonntag nein ja Süd 3 62.586512 66.304869 ja Sommer Sonntag nein nein Ost 4 5.590101 8.526152 ja Sommer Donnerstag nein nein Nord 5 30.925054 16.073091 nein Winter Sonntag nein nein Ost 6 10.750567 2.285075 nein Winter Mittwoch nein nein Süd 7 39.118316 17.128691 ja Sommer Sonntag nein nein Ost 8 9.327564 7.038572 ja Sommer Montag nein nein Nord 9 52.271744 15.021977 nein Winter Montag nein nein Ost 10 27.388416 22.449102 ja Sommer Montag nein nein Ost . . . . til 200 I'm trying to make a linear regression between PM10 and Ref for each of the four WR, I've tried this: plot(Nord$PM10 ~ Nord$Ref, main=Nord, xlab=Ref, ylab=PM10) but it does not work, because Nord cannot be found what was wrong? how can I do it? please help me [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define variables from a matrix
On Aug 12, 2011, at 7:18 PM, gallon li wrote: I have a following matrix and wish to define a variable based the variable A=matrix(0,5,5) A[1,]=c(30,20,100,120,90) A[2,]=c(40,30,20,50,100) A[3,]=c(50,50,40,30,30) A[4,]=c(30,20,40,50,50) A[5,]=c(30,50,NA,NA,100) A [,1] [,2] [,3] [,4] [,5] [1,] 30 20 100 120 90 [2,] 40 30 20 50 100 [3,] 50 50 40 30 30 [4,] 30 20 40 50 50 [5,] 30 50 NA NA 100 I want to define two variables: X is the first column in each row that is equal to 20, for example, for the first row, I need X=2; 2nd row, X=3; 3rd row, X=NA; 4th row, X=2, 5th row, X=NA; X - apply(A, 1, function(x) which(x==20) ) is.na(X) - !unlist(lapply(X, length)) X The first command seems obvious, but the second might be a bit obscure. It says assign NA to any X whose length is non-zero (i.e. positive in the case of length). Y is then the first column in each row that is equal to 100 if before this a 20 has been reached, for example, for the first row, Y=3; 2nd row, Y=5; 3rd row, Y=NA, 4th row, Y=NA; 5th row, Y=NA. Y - apply(A, 1, function(x) which(x==20)*(which(x==20) which(x==100) ) ) is.na(Y) - !unlist(lapply(Y, length)) Y -- David. the matrix may involve NA as well. How can I define these two variables quickly? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any alternatives to draw.colorkey from lattice package?
Felix: Thank you! Perhaps I should read documentation more careful as I missed that another `at`. lattice latticeExtra are so marvelous so I hardly want to use anything else. Mikhail On 08/13/2011 07:31 AM, Felix Andrews wrote: You can just specify the label positions, you don't need to give labels for every color change point: (there is an 'at' for the color changes and a 'labels$at' for the labels) levelplot(rnorm(100) ~ x * y, expand.grid(x = 1:10, y = 1:10), colorkey = list(at = seq(-3,3,length=100), labels = list(labels = paste(-3:3, units), at = -3:3))) On 13 August 2011 19:59, Jim Lemon j...@bitwrit.com.au wrote: On 08/13/2011 04:34 AM, Mikhail Titov wrote: Hello! I’d like to have a continuous color bar on my lattice xyplot with colors lets say from topo.colors such that it has ticks labels at few specific points only. Right now I use do.breaks level.colors with somewhat large number of steps. The problem is that color change point doesn’t necessary correspond to the value I’d like to label. Since I have many color steps and I don’t need high precision I generate labels like this labels- ifelse( sapply(at,function(x) any(abs(att-x).03)) , sprintf(depth= %s ft, at), ) , where `att` has mine points of interest on color scale bar and `at` corresponds to color change points used with level.colors . It is a bit inconvenient as I have to adjust threshold `.03`, number of color steps so that it labels only adjacent color change point with my labels. Q: Are there any ready to use functions that would generate some kind of GRaphical OBject with continuous color scale bar/key with custom at/labels such that it would work with `legend` argument of xyplot from lattice? Hi Mikhail, I think that color.legend in the plotrix package will do what you are asking, but it is in base graphics, and may not work with lattice. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R's handling of high dimensional data
Hello all, I am looking at doing text classification on very high dimensional data (about 300,000 or more features) and upto 2000 documents. I am quite new to R though, and was just wondering if R and it's libraries would scale to such high dimensions. Any thoughts will be much appreciated. Thanks. Andy -- View this message in context: http://r.789695.n4.nabble.com/R-s-handling-of-high-dimensional-data-tp3741758p3741758.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] degrees of freedom does not appear in the summary lmer :(
Hi , Could someone pls help me about this topic, I dont know how can i extract them from my model!! Thanks, Sophie -- View this message in context: http://r.789695.n4.nabble.com/degrees-of-freedom-does-not-appear-in-the-summary-lmer-tp3741327p3741327.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting and quantiles
Dear R users, This is most likely very basic question but I am new to R and would really appreciate some tips on those two problems. 1) I need to plot variables from a data frame. Because of some few high numbers my graph is really strange looking. How could I plot a fraction of the samples (like 0.1 (10%), 0.2 up to for example 0.6) on x axis and values 'boundaries' (like any value ' 100', '101-200' and ' 201') on the y axis? This needs to be a simple line plot like the one I attached for an example. The values would come from one column. 2) I have a data frame with values and need to subset the rows based on the values. I wanted to order them (with increasing values) and divide into 3-4 groups. I though about using quantile but I want the group to be something like '1-25', '26-50', '51-75', '75-100' (ordered and for example 25th percentile, 26-50th etc). I could just look for a median divide into two and then again (or use quantiles 0.25, 0.5, 0.7 and 1 and then get rid of all rows in 0.25 that are in 0.5 etc) but surely there must by a faster and simpler way to do that (I need to do this a lot on different columns)? Thanks for your help, Markattachment: viewer.png__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Individual p-values for correlation matrices
Dear all, I am calculating each-against-each correlations for a number of variables, in order to use the correlations as distances. This is easy enough using just cor(), but not if I want to have a p-value for each calculated correlation, and especially if I want to correct them for multiple testing (but see below). I do that currently on foot, looping around the variables to apply cor.test to each combination of two variables. Is there a function or a package that would do that for me? Specifically, what I do is # a is the data matrix for( i in 1:ncol( a ) ) { for( j in (i+1):ncol(a) ) { result - cor.test( a[,i], a[,j], method=spear ) # store the result somehow } } This is slow and I seek a better solution. As I mentioned before, I correct the p-values using Bonferroni correction, which does not assume independence of the hypotheses to be tested (obviously that is the case here). However, is there a better method to do this? Bonferroni results in a large number of false negatives. Kind regards, j. -- Dr. January Weiner 3 -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Own R function doubt
Hi to all the people again, I was writting a simply function in R, and wish to collect the results in a excel file. The work goes as follows, Ciervos-function(K1, K0, A, R,M,Pi,Hembras) {B-(K1-K0)/A T1-(R*Pi*Hembras-M*Pi+B)/(Pi-M*Pi+R*Pi*Hembras) P1-Pi-B R1-P1*Hembras*R M1-P1*M T2-(R1-M1+B)/(P1-M1+R1) P2-P1-B R2-P2*Hembras*R M2-P2*M T3-(R2-M2+B)/(P2-M2+R2) P3-P2-B R3-P3*Hembras*R M3-P3*M T4-(R3-M3+B)/(P3-M3+R3) P4-P3-B R4-P4*Hembras*R M4-P4*M T5-(R4-M4+B)/(P4-M4+R4) P5-P4-B R5-P5*Hembras*R M5-P5*M T6-(R5-M5+B)/(P5-M5+R5) P6-P5-B R6-P6*Hembras*R M6-P6*M T7-(R6-M6+B)/(P6-M6+R6) P7-P6-B R7-P7*Hembras*R M7-P7*M T8-(R7-M7+B)/(P7-M7+R7) P8-P7-B R8-P8*Hembras*R M8-P8*M T9-(R8-M8+B)/(P8-M8+R8) P9-P8-B R9-P9*Hembras*R M9-P9*M T10-(R9-M9+B)/(P9-M9+R9) P10-P9-B R10-P10*Hembras*R M10-P10*M result-list(B,T1,P1,R1,M1,T2,P2,R2,M2,T3,P4,R4,M4,T5,P5,R5,M5,T6,P6,R6,T6,P7,R7,M7,T8, P8,R8,M8,T9,P9,R9,M9,T10,P10,R10,M10) return(result) } library(memisc) Gestion-as.data.frame(Simulate(Ciervos(K1, K0, A, R,M,Pi,Hembras), expand.grid(K1=c(420,580),K0=c(300,600),A=3,R=0.4,M=0.1,Pi=420,Hembras=0.5),nsim=1,seed=1)) xls.getshlib() write.xls(Gestion, PoblacionCiervos.xls) All is fine with the function, by the results (the parameters from B to M10) are collected in excel by the column names result 1, result 2, etc, and I wish to collect the results with their proper name (B instead of result 1; T1 instead of result 2, etc). I will ackonowledge any help, many thanks pablo -- View this message in context: http://r.789695.n4.nabble.com/Own-R-function-doubt-tp3741463p3741463.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fit.mult.impute() in Hmisc
On Thu, Mar 31, 2011 at 2:56 PM, Yuelin Li li...@mskcc.org wrote: I tried multiple imputation with aregImpute() and fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. The warning message below suggests that summary(f) of fit.mult.impute() would only use the last imputed data set. Thus, the whole imputation process is ignored. Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. Hello. I fiddled around with rms multiple imputation when I was preparing these notes from our R summer course. I ran into that same thing you did, and my conclusion is slightly different from yours. http://pj.freefaculty.org/guides/Rcourse/multipleImputation/multipleImputation-1-lecture.pdf Look down to slide 80 or so, where I launch off into that question. It appears to me that aregImpute will give the right answer for fitters from rms, but if you want to feel confident about the results for other fitters, you should use mitools or some other paramater combining approach. My conclusion (slide 105) is Please note: the standard errors in the output based on lrm match the std.errors estimated by MItools. Thus I conclude sqrt(diag(cov(fit.mult.impute.object) did not give correct results But the standard errors in summary(f) agree with the values from sqrt(diag(vcov(f))) to the 4th decimal point. It would seem that summary(f) actually adjusts for multiple imputation? Does summary(f) in Hmisc 3.8-3 actually adjust for MI? If it does not adjust for MI, then how do I get the MI-adjusted coefficients and standard errors? I can't seem to find answers in the documentations, including rereading section 8.10 of the Harrell (2001) book Googling located a thread in R-help back in 2003, which seemed dated. Many thanks in advance for the help, Yuelin. http://idecide.mskcc.org --- library(Hmisc) Loading required package: survival Loading required package: splines data(kyphosis, package = rpart) kp - lapply(kyphosis, function(x) + { is.na(x) - sample(1:length(x), size = 10); x }) kp - data.frame(kp) kp$kyp - kp$Kyphosis == present set.seed(7) imp - aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10, + type = pmm, match = closest) Iteration 13 f - fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp, + family = binomial, data = kp) Variance Inflation Factors Due to Imputation: (Intercept) Age Start Number 1.06 1.28 1.17 1.12 Rate of Missing Information: (Intercept) Age Start Number 0.06 0.22 0.14 0.10 d.f. for t-distribution for Tests of Single Coefficients: (Intercept) Age Start Number 2533.47 193.45 435.79 830.08 The following fit components were averaged over the 10 model fits: fitted.values linear.predictors Warning message: In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, : Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. f Call: fitter(formula = formula, family = binomial, data = completed.data) Coefficients: (Intercept) Age Start Number -3.6971 0.0118 -0.1979 0.6937 Degrees of Freedom: 80 Total (i.e. Null); 77 Residual Null Deviance: 80.5 Residual Deviance: 58 AIC: 66 sqrt(diag(vcov(f))) (Intercept) Age Start Number 1.5444782 0.0063984 0.0652068 0.2454408 -0.1979/0.0652068 [1] -3.0350 summary(f) Call: fitter(formula = formula, family = binomial, data = completed.data) Deviance Residuals: Min 1Q Median 3Q Max -1.240 -0.618 -0.288 -0.109 2.409 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -3.6971 1.5445 -2.39 0.0167 Age 0.0118 0.0064 1.85 0.0649 Start -0.1979 0.0652 -3.03 0.0024 Number 0.6937 0.2454 2.83 0.0047 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.508 on 80 degrees of freedom Residual deviance: 57.965 on 77 degrees of freedom AIC: 65.97 Number of Fisher Scoring iterations: 5 = Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution,
Re: [R] fit.mult.impute() in Hmisc
For your approach how do you know that either summary or vcov used multiple imputation? You are using a non-rms fitting function so be careful. Compare with using the lrm fitting function. Also repace Design with the rms package. Please omit confidentiality notices from your e-mails. Frank I tried multiple imputation with aregImpute() and fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. The warning message below suggests that summary(f) of fit.mult.impute() would only use the last imputed data set. Thus, the whole imputation process is ignored. Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. But the standard errors in summary(f) agree with the values from sqrt(diag(vcov(f))) to the 4th decimal point. It would seem that summary(f) actually adjusts for multiple imputation? Does summary(f) in Hmisc 3.8-3 actually adjust for MI? If it does not adjust for MI, then how do I get the MI-adjusted coefficients and standard errors? I can't seem to find answers in the documentations, including rereading section 8.10 of the Harrell (2001) book Googling located a thread in R-help back in 2003, which seemed dated. Many thanks in advance for the help, Yuelin. http://idecide.mskcc.org --- library(Hmisc) Loading required package: survival Loading required package: splines data(kyphosis, package = rpart) kp - lapply(kyphosis, function(x) + { is.na(x) - sample(1:length(x), size = 10); x }) kp - data.frame(kp) kp$kyp - kp$Kyphosis == present set.seed(7) imp - aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10, + type = pmm, match = closest) Iteration 13 f - fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp, + family = binomial, data = kp) Variance Inflation Factors Due to Imputation: (Intercept) Age Start Number 1.061.281.171.12 Rate of Missing Information: (Intercept) Age Start Number 0.060.220.140.10 d.f. for t-distribution for Tests of Single Coefficients: (Intercept) Age Start Number 2533.47 193.45 435.79 830.08 The following fit components were averaged over the 10 model fits: fitted.values linear.predictors Warning message: In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, : Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. f Call: fitter(formula = formula, family = binomial, data = completed.data) Coefficients: (Intercept) AgeStart Number -3.6971 0.0118 -0.1979 0.6937 Degrees of Freedom: 80 Total (i.e. Null); 77 Residual Null Deviance: 80.5 Residual Deviance: 58 AIC: 66 sqrt(diag(vcov(f))) (Intercept) Age Start Number 1.5444782 0.0063984 0.0652068 0.2454408 -0.1979/0.0652068 [1] -3.0350 summary(f) Call: fitter(formula = formula, family = binomial, data = completed.data) Deviance Residuals: Min 1Q Median 3Q Max -1.240 -0.618 -0.288 -0.109 2.409 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -3.6971 1.5445 -2.39 0.0167 Age 0.0118 0.00641.85 0.0649 Start-0.1979 0.0652 -3.03 0.0024 Number0.6937 0.24542.83 0.0047 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.508 on 80 degrees of freedom Residual deviance: 57.965 on 77 degrees of freedom AIC: 65.97 Number of Fisher Scoring iterations: 5 - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/fit-mult-impute-in-Hmisc-tp3419037p3741881.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] seeking advice about rounding error and %%
A client came into our consulting center with some data that had been damaged by somebody who opened it in MS Excel. The columns were supposed to be integer valued, 0 through 5, but some of the values were mysteriously damaged. There were scores like 1.18329322 and such in there. Until he tracks down the original data and finds out what went wrong, he wants to take all fractional valued scores and convert to NA. As a quick hack, I suggest an approach using %% x - c(1,2,3,1.1,2.12131, 2.001) x %% 1 [1] 0.0 0.0 0.0 0.1 0.12131 0.00100 which(x %% 1 0) [1] 4 5 6 xbad - which(x %% 1 0) x[xbad] - NA x [1] 1 2 3 NA NA NA I worry about whether x %% 1 may ever return a non zero result for an integer because of rounding error. Is there a recommended approach? What about zapsmall on the left, but what on the right of ? which( zapsmall(x %% 1) 0 ) Thanks in advance -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Own R function doubt
It sounds like the data frame produced by Simulate() doesn't set the names you want. You can probably fix this by including colnames(Gestion) c(B,T1,... # etc) immediately after the simulation. Can't confirm this without knowing which of the excel/R interface packages you're using, but I'd be willing to be that if you asked R for colnames(Gestion) you'd see result 1, result 2 etc that show up in Excel later. Hope this helps -- feel free to let me know if this doesn't work, Michael Weylandt On Sat, Aug 13, 2011 at 10:50 AM, garciap garc...@usal.es wrote: Hi to all the people again, I was writting a simply function in R, and wish to collect the results in a excel file. The work goes as follows, Ciervos-function(K1, K0, A, R,M,Pi,Hembras) {B-(K1-K0)/A T1-(R*Pi*Hembras-M*Pi+B)/(Pi-M*Pi+R*Pi*Hembras) P1-Pi-B R1-P1*Hembras*R M1-P1*M T2-(R1-M1+B)/(P1-M1+R1) P2-P1-B R2-P2*Hembras*R M2-P2*M T3-(R2-M2+B)/(P2-M2+R2) P3-P2-B R3-P3*Hembras*R M3-P3*M T4-(R3-M3+B)/(P3-M3+R3) P4-P3-B R4-P4*Hembras*R M4-P4*M T5-(R4-M4+B)/(P4-M4+R4) P5-P4-B R5-P5*Hembras*R M5-P5*M T6-(R5-M5+B)/(P5-M5+R5) P6-P5-B R6-P6*Hembras*R M6-P6*M T7-(R6-M6+B)/(P6-M6+R6) P7-P6-B R7-P7*Hembras*R M7-P7*M T8-(R7-M7+B)/(P7-M7+R7) P8-P7-B R8-P8*Hembras*R M8-P8*M T9-(R8-M8+B)/(P8-M8+R8) P9-P8-B R9-P9*Hembras*R M9-P9*M T10-(R9-M9+B)/(P9-M9+R9) P10-P9-B R10-P10*Hembras*R M10-P10*M result-list(B,T1,P1,R1,M1,T2,P2,R2,M2,T3,P4,R4,M4,T5,P5,R5,M5,T6,P6,R6,T6,P7,R7,M7,T8, P8,R8,M8,T9,P9,R9,M9,T10,P10,R10,M10) return(result) } library(memisc) Gestion-as.data.frame(Simulate(Ciervos(K1, K0, A, R,M,Pi,Hembras), expand.grid(K1=c(420,580),K0=c(300,600),A=3,R=0.4,M=0.1,Pi=420,Hembras=0.5),nsim=1,seed=1)) xls.getshlib() write.xls(Gestion, PoblacionCiervos.xls) All is fine with the function, by the results (the parameters from B to M10) are collected in excel by the column names result 1, result 2, etc, and I wish to collect the results with their proper name (B instead of result 1; T1 instead of result 2, etc). I will ackonowledge any help, many thanks pablo -- View this message in context: http://r.789695.n4.nabble.com/Own-R-function-doubt-tp3741463p3741463.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] degrees of freedom does not appear in the summary lmer :(
Hi Sophie, It is not clear what the degrees of freedom should be in an lmer model, so their not appearing is intentional. There is fairly extensive discussion of this topic in the archives for the R-sig-mixed list. See, for example: http://rwiki.sciviews.org/doku.php?id=guides:lmer-tests Cheers, Josh On Sat, Aug 13, 2011 at 6:31 AM, xy wtemptat...@hotmail.co.uk wrote: Hi , Could someone pls help me about this topic, I dont know how can i extract them from my model!! Thanks, Sophie -- View this message in context: http://r.789695.n4.nabble.com/degrees-of-freedom-does-not-appear-in-the-summary-lmer-tp3741327p3741327.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression
Don't forget to load `lattice` package. `latticeExtra` with `panel.ablineeq` can be also helpful. This was however for plotting. For subset regression by each WR without plotting you'd use something like `lapply` or `sapply`. ans - sapply(unique(data$WR), function(dir) { out - list(lm(PM10~Ref, subset(data, WR=dir))) names(out) - dir out }) `ans$West` will return one of the results. There are many ways to skin a cat. Perhaps it was not the best one. Mikhail On 08/13/2011 11:30 AM, Dennis Murphy wrote: Hi: Try something like this, using dat as the name of your data frame: xyplot(PM10 ~ Ref | WR, data = dat, type = c('p', 'r')) The plot looks silly with the data snippet you provided, but should hopefully look more sensible with the complete data. The code creates a four panel plot, one per direction, with points and a least squares regression line fit in each panel. The regression line is specific to a data subset, not the entire data frame. HTH, Dennis On Sat, Aug 13, 2011 at 5:43 AM, maggy yan kiot...@googlemail.com wrote: dear R users, my data looks like this PM10 Ref UZ JZ WT RH FT WR 1 10.973195 4.338874 nein Winter Dienstag ja nein West 26.381684 2.250446 nein SommerSonntag nein ja Süd 3 62.586512 66.304869 ja SommerSonntag nein nein Ost 45.590101 8.526152 ja Sommer Donnerstag nein nein Nord 5 30.925054 16.073091 nein WinterSonntag nein nein Ost 6 10.750567 2.285075 nein Winter Mittwoch nein nein Süd 7 39.118316 17.128691 ja SommerSonntag nein nein Ost 89.327564 7.038572 ja Sommer Montag nein nein Nord 9 52.271744 15.021977 nein Winter Montag nein nein Ost 10 27.388416 22.449102 ja Sommer Montag nein nein Ost . . . . til 200 I'm trying to make a linear regression between PM10 and Ref for each of the four WR, I've tried this: plot(Nord$PM10 ~ Nord$Ref, main=Nord, xlab=Ref, ylab=PM10) but it does not work, because Nord cannot be found what was wrong? how can I do it? please help me [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting and quantiles
I believe you received an informative answer to both these questions from Daniel Maiter one hour and twenty five minutes after sending your question: I repeat it here just in case you didn't get it. -- Q1 is very opaque because you are not even saying what kind of plot you want. For a regular scatterplot, you have multiple options. a.) select only the data in the given intervals and plot the data b.) plot the entire data, but restrict the graph region to the intervals you are interested in, or c.) winsorize the data (i.e., set values below the lower cutoff and above the upper cutoff to the cutoff values Which one you want to do depends on which one makes the most sense given the purpose of your analysis Say: x-rnorm(100) y-x+rnorm(100) Then a.) plot(y~x,data=data.frame(x,y)[ x2x-2 , ]) #plots y against x only for xs between -2 and 2 b.) plot(y~x,xlim=c(-2,2)) #plots all y agains x, but restricts the plotting region to -2 to 2 on the x-axis c.) x-replace(x,x2,2) x-replace(x,x(-2),-2) plot(y~x) #sets all x-values below -2 and above 2 to these cutoffs Q2: look at the cut() function. ?cut HTH, Daniel - If you need more information, a different solution, or further clarification, please ask new questions. Michael Weylandt On Sat, Aug 13, 2011 at 10:10 AM, Mark D. d.mar...@ymail.com wrote: Dear R users, This is most likely very basic question but I am new to R and would really appreciate some tips on those two problems. 1) I need to plot variables from a data frame. Because of some few high numbers my graph is really strange looking. How could I plot a fraction of the samples (like 0.1 (10%), 0.2 up to for example 0.6) on x axis and values 'boundaries' (like any value ' 100', '101-200' and ' 201') on the y axis? This needs to be a simple line plot like the one I attached for an example. The values would come from one column. 2) I have a data frame with values and need to subset the rows based on the values. I wanted to order them (with increasing values) and divide into 3-4 groups. I though about using quantile but I want the group to be something like '1-25', '26-50', '51-75', '75-100' (ordered and for example 25th percentile, 26-50th etc). I could just look for a median divide into two and then again (or use quantiles 0.25, 0.5, 0.7 and 1 and then get rid of all rows in 0.25 that are in 0.5 etc) but surely there must by a faster and simpler way to do that (I need to do this a lot on different columns)? Thanks for your help, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient use of lm over a matrix vs. using apply over rows
Good Bless you Duncan. Your explanation is crisp and to the point. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/efficient-use-of-lm-over-a-matrix-vs-using-apply-over-rows-tp870810p3742043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NA in lm model summary, no NA in data table!
Dear users of R! I have problems with linear model summary. I do not have any NA values in my data table, but in summary of linear model there are some NA instead of resulst. I don´t know why :( I am interested in ecological factors influencing temperature in ant nest, I have data concerning ant nest temperature and air temperature and some nest parameters as explanatory variable. Nest and air temperatues are different every day, nest parameters as size, moisture, shading do not change with the date. It looks like this: date nest nest.temp nest.t.fluctuation air.temperature nest.size nest.GPS rain sun 1.3.A1 25.3 5.02 12.3 1.06856 225 1247 2.3.A1 23.1 4.5 11.9 1.06856 225 1247 ... In results I can see: summary(model3) Call: lm(formula = t.change ~ nest + dat.1 + GPS + moist + volume + T.prum + a.flukt + sun.year) Residuals: Min 1Q Median 3Q Max -5.0853 -0.1879 0.0104 0.1874 4.0023 Coefficients: (3 not defined because of singularities) Estimate Std. Error t value Pr(|t|) (Intercept) -5.035e+00 3.805e+00 -1.3230.186 nestA2 -1.742e+01 2.672e+00 -6.522 1.02e-10 *** nestA3 -2.371e+00 2.880e-01 -8.232 4.75e-16 *** nestA4 -7.886e+00 1.140e+00 -6.920 7.35e-12 *** nestB1 -5.000e+00 7.298e-01 -6.852 1.16e-11 *** nestB2 7.874e-01 1.897e-01 4.151 3.54e-05 *** nestB3 2.435e+00 3.852e-01 6.321 3.66e-10 *** nestB4 -4.804e+00 7.522e-01 -6.387 2.41e-10 *** nestC1 -1.985e+01 3.002e+00 -6.613 5.67e-11 *** nestC2 -8.721e+00 1.291e+00 -6.753 2.25e-11 *** nestC3 -2.143e+01 3.254e+00 -6.585 6.80e-11 *** nestC4 -6.586e+00 9.884e-01 -6.663 4.08e-11 *** dat.1 -1.610e-04 1.026e-04 -1.5680.117 GPS NA NA NA NA moist5.675e-01 8.669e-02 6.546 8.75e-11 *** volume NA NA NA NA T.prum -1.138e-02 1.608e-03 -7.078 2.48e-12 *** a.flukt -1.584e-02 2.415e-03 -6.558 8.10e-11 *** sun.yearNA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4568 on 1199 degrees of freedom (92 observations deleted due to missingness) Multiple R-squared: 0.2305, Adjusted R-squared: 0.2209 F-statistic: 23.95 on 15 and 1199 DF, p-value: 2.2e-16 help.search(singularities) No help files found with alias or concept or title matching ‘singularities’ using fuzzy matching. I count data separately for each season of year (spring, summer...), togehter I have more than 1000 rows in table (91 days for each of 12 nests). There are no NA values in my data, most of factors are numeric vectors, there are only 2 factors. I have chcecked whether the factors are saved as factors, I have searched for NA values... Ihave tried to load the date many times... but nothing. When I do the model forward, when I start with GPS it is ok, the summary shows DF, sum of squares, p, but when I fitt whole model and update it by taking away unsignificant variables the model shows NA in summary. It writes something about singularities but I can´t found it i help. The most strange is, that this problem occures only in some data sheeds, for example it occurs in spring but not in summer. But the data arangement and process of counting in R I have used are identical. Please, could you help me? Thank you very much Stefy -- View this message in context: http://r.789695.n4.nabble.com/NA-in-lm-model-summary-no-NA-in-data-table-tp3741822p3741822.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excluding NAs from round correlation
Thank you, I found this in the help pack: use: an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings everything, all.obs, complete.obs, na.or.complete, or pairwise.complete.obs I should probably use na.or.complete when I want to get results not saying NA, is it right? -- View this message in context: http://r.789695.n4.nabble.com/Excluding-NAs-from-round-correlation-tp3741296p3741924.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is Inverse link functions in linear modelling of location
Hello I have a problem with the following function (http://www.oga-lab.net/RGM2/func.php?rd_id=ismev:gev.fit): gev.fit(xdat, ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = Nelder-Mead, maxit = 1, ...) For the Parameter mulink i neet to pass the Inverse link functions for generalized linear modelling of the location. Is it possible to define an linear trend, where the slope is fitted by this function? Thank you! Best regards -- View this message in context: http://r.789695.n4.nabble.com/what-is-Inverse-link-functions-in-linear-modelling-of-location-tp3742010p3742010.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excluding NAs from round correlation
The help pack says: use: an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings everything, all.obs, complete.obs, na.or.complete, or pairwise.complete.obs If I used everything, the results would be NAs again. all.obs would result in error. complete.obs gives me error too. na.or.complete gives me all NAs... But pairwise.complete.obs finally got the right results. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-NAs-from-round-correlation-tp3741296p3742039.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] degrees of freedom does not appear in the summary lmer :(
Hi: This is worth reading and bookmarking: http://glmm.wikidot.com/faq HTH, Dennis On Sat, Aug 13, 2011 at 6:31 AM, xy wtemptat...@hotmail.co.uk wrote: Hi , Could someone pls help me about this topic, I dont know how can i extract them from my model!! Thanks, Sophie -- View this message in context: http://r.789695.n4.nabble.com/degrees-of-freedom-does-not-appear-in-the-summary-lmer-tp3741327p3741327.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient use of lm over a matrix vs. using apply over rows
Good Bless you Duncan. Your explanation is crisp and to the point. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/efficient-use-of-lm-over-a-matrix-vs-using-apply-over-rows-tp870810p3742058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Casualty Actuarial Society request for proposals for R Workshop
I'm a property-casualty actuary, use R in at my job, and lurk on the list. In conjunction with one of its meetings, the Casualty Actuarial Society (I'm a member) is looking for proposals from people to teach a workshop in R and I thought members of the list might be interested. I've pasted the information below. My apologies if this posting violates list rules. Thanks. Kevin http://www.casact.org/cms/index.cfm?fa=viewArticlearticleID=1613 2012 RPM Seminar Committee Welcomes Proposals for New R Workshop 08/08/2011 — I. Casualty Actuarial Society The Casualty Actuarial Society (CAS) was organized in 1914 as a professional society with the purpose of advancing the body of knowledge of actuarial science applied to property, casualty and similar risk exposures. This is accomplished through communication with the publics affected by insurance, the presentation and discussion of papers, attendance at seminars and workshops, collection of a library, funded research activities, and other means. The membership of the CAS includes over 4,000 actuaries employed by insurance companies, consulting firms, brokers, and the government. Additional information about the CAS can be found on the CAS website. II. CAS Ratemaking/Product Management Seminar The CAS Ratemaking/Product Management (“RPM”) Seminar is scheduled to take place in Philadelphia, PA on March 19-21, 2012. As with previous RPM Seminars, full day workshops on 4 different subject areas of particular interest are scheduled to be offered on the first day, or Monday, March 19, 2012. Examples of schedules, workshop descriptions and presentations from previous such workshops can be found on the CAS website. In response to feedback received from previous RPM Seminar attendees, the RPM Seminar Planning Committee (“Committee”) intends to include Introduction to R as one of the workshop topics at the 2012 seminar, to provide hands-on R training for beginners. III. Project Specifications The Committee wishes to enlist subject matter experts to develop and conduct the above described 1 day workshop on R. The workshop should be customized to focus on the critical issues surrounding creation of an R program. The Committee is most interested in providing training in the following areas, but is open to considering additional steps that are offered by respondents: R interface Programming in R R datasets R packages Actuarial models in R The workshop will need to include a dataset for analysis during the session(s), to be provided to attendees with sufficient lead time that they can become knowledgeable about the dataset before the workshop begins. An assignment could accompany the dataset so that attendees can review the data and perform necessary data analysis. It is expected that the presenters will use a computer and an LCD projector, and that attendees will be able to use their own computers to conduct analysis on the dataset prior to and during the workshop. The participants should have access to the R software during the workshop and should be provided instructions on how to download the required version of R prior to the workshop along with any required packages. Expected workshop attendance would be 50 persons. The presenters must adhere to the same requirements and deadlines imposed on all workshop/RPM Seminar presenters, in terms of working with Committee session coordinators and making materials available to attendees prior to the seminar. IV. Proposal Requirements Proposals are due by September 12, 2011 and should include the following items: A clear description of seminar education content Demonstrated experience within the field Three professional references All submitted proposals will be evaluated equally. The Committee will, by October 10, 2011, select the respondent who, in the judgment of the Committee, is best able to perform the work as specified herein. The Committee reserves the right not to accept any proposal if an acceptable proposal is not received. Interested parties should submit their proposals and any questions in writing via e-mail to Vincent Edwards, CAS Manager, Professional Education, at vedwa...@casact.org. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjacency Matrix help
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of collegegurl69 Sent: Saturday, August 13, 2011 1:01 AM To: r-help@r-project.org Subject: Re: [R] Adjacency Matrix help Thanks so much for your quick reply. it seems to work. the problem is that it now places actual zeros on the diagonal whereas the rest of the adjacency matrix has dots to represent zeroes. Do you have any ideas on how to change these zeros to dots like in the rest of the adj matrix? Or is it the same thing? Thanks. This is one of the reasons why it is useful/important to provide a reproducible example. When I think of an adjacency matrix, my default mental representation is a numeric matrix which was reinforced by the request for zeros on the diagonal. So, how did you create this matrix? Could you post a self-contained, reproducible example as the posting guide requests? At a minimum, can apply str() to your matrix and post the output? Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compiling r from source on Windows 7 (64 bit)
Dear R People: Hope you're having a nice Saturday. I'm trying to compile R-2.13.1 from source on Windows 7 (64 bit). I've been able to compile on a 32 bit without any problems. I changed my BINPREF64, WIN, DEFS_W64 in MkRules.local and did the usual stuff with the jpeg, etc. But things are jogging along and I get the following: Makefile.win:28: ../../../../etc/x64/Makeconf: No such file or directory Has anyone run across this, please? Should I possibly just switch back to 32 bit, do you think, please? I need to compile from source because I'm building packages. Thanks for any help. Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compiling r from source on Windows 7 (64 bit)
Erin, You can build packages without compiling from source (what made you think you couldn't?). Did you make sure when you installed the Rtools (I am assuming you are using those rather than going out and getting everything you need on your own) that you included everything for 64 bit builds? When switching between 32 64, I typically only switch between WIN = 32 and WIN = 64. Josh On Sat, Aug 13, 2011 at 6:18 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Dear R People: Hope you're having a nice Saturday. I'm trying to compile R-2.13.1 from source on Windows 7 (64 bit). I've been able to compile on a 32 bit without any problems. I changed my BINPREF64, WIN, DEFS_W64 in MkRules.local and did the usual stuff with the jpeg, etc. But things are jogging along and I get the following: Makefile.win:28: ../../../../etc/x64/Makeconf: No such file or directory Has anyone run across this, please? Should I possibly just switch back to 32 bit, do you think, please? I need to compile from source because I'm building packages. Thanks for any help. Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] seeking advice about rounding error and %%
Hi Paul, What about using: x[x != as.integer(x)] - NA I cannot think of a situation off hand where this would fail to turn every non integer to missing. I wonder if there is really a point to this? Can the client proceed with data analysis with any degree of confidence when an unknown mechanism has altered data in unknown ways? Could Excel have sometimes changed one integer to another (e.g., 4s became 1.18whatever, but 3s became 1s or)? Cheers, Josh On Sat, Aug 13, 2011 at 12:42 PM, Paul Johnson pauljoh...@gmail.com wrote: A client came into our consulting center with some data that had been damaged by somebody who opened it in MS Excel. The columns were supposed to be integer valued, 0 through 5, but some of the values were mysteriously damaged. There were scores like 1.18329322 and such in there. Until he tracks down the original data and finds out what went wrong, he wants to take all fractional valued scores and convert to NA. As a quick hack, I suggest an approach using %% x - c(1,2,3,1.1,2.12131, 2.001) x %% 1 [1] 0.0 0.0 0.0 0.1 0.12131 0.00100 which(x %% 1 0) [1] 4 5 6 xbad - which(x %% 1 0) x[xbad] - NA x [1] 1 2 3 NA NA NA I worry about whether x %% 1 may ever return a non zero result for an integer because of rounding error. Is there a recommended approach? What about zapsmall on the left, but what on the right of ? which( zapsmall(x %% 1) 0 ) Thanks in advance -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R\ Compiling R from source on Windows 7 (64 bit) solved
Hello again. Due to the excellent help from Josh Wiley, I ran back in the C:/R directory with only changing WIN = 64 in the MkRules.local file (other than the JPEG, etc). All was well. Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using get() or similar function to access more than one element in a vector
Dear R-users, I've written a script that produces a frequency table for a group of texts. The table has a total frequency for each word type and individual frequency counts for each of the files. (I have not included the code for creating the column headers.) Below is a sample: Word Total 01.txt 02.txt 03.txt 04.txt 05.txt the 22442 26673651157921323097 I 18377 3407 454 824 449 3746 and 15521 23772174 891 10062450 to 13598 17161395 905 10211983 of 12834 16471557 941 11271887 it 12440 2160 916 497 493 2449 you 12036 2283 356 293 106 2435 I've encountered two problems when I try to construct and save the file. The combined.sorted.freq.list is a named integer vector in which the integers are the total frequency counts for each word. The names are the words. For each of the individual lists I've created frequency lists that are sorted in the order of the combined list. (NAs have been replaced with 0). These are called combined. plus the number of the file. If I were to write the line to save the file manually, it would look like this: combined.table-paste(names(combined.sorted.freq.list), combined.sorted.freq.list, combined.01, combined.02, combined.03, combined.04, combined.05, combined.06, combined.07, combined.08, combined.09, combined.10, combined.11, combined.12, sep=\t) #creates a table with columns for the combined and all of the component lists However, each time I run the script, there may be a differing number of text files. I created a list of the individual frequency counts called combined.file.list combined.file.count-1:length(selected.files) #counts number of files originally selected combined.file.list-paste(combined, combined.file.count, sep=.) #creates the file names for the combined lists by catenating combined with each file number separated by a period by recycled the string combined for each number I then tried to include it as one of the elements to be pasted by using get(). combined.table-paste(names(combined.sorted.freq.list), combined.sorted.freq.list, get(combined.file.list[]), sep=\t) #intended to create a table with columns for the combined and all of the component lists Unfortunately, the get() function only gets the first component list since get() can apparently only access one object. This results in a table with only the total frequency and the amount of the first text: Word Total 01.txt the 22442 2667 I 18377 3407 and 15521 2377 to 13598 1716 of 12834 1647 it 12440 2160 you 12036 2283 If I try to construct the file piece by piece as they are created, I get an error message that a vector of more than 1.3 Gb cannot be created. Does anyone know how I could use get() or some other method to access all of the files named in a vector? Many thank for any help you can offer! Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] seeking advice about rounding error and %%
How about something like: If(round(x)!=x){zap} not exactly working code but might help Ken On Aug 13, 2554 BE, at 3:42 PM, Paul Johnson pauljoh...@gmail.com wrote: A client came into our consulting center with some data that had been damaged by somebody who opened it in MS Excel. The columns were supposed to be integer valued, 0 through 5, but some of the values were mysteriously damaged. There were scores like 1.18329322 and such in there. Until he tracks down the original data and finds out what went wrong, he wants to take all fractional valued scores and convert to NA. As a quick hack, I suggest an approach using %% x - c(1,2,3,1.1,2.12131, 2.001) x %% 1 [1] 0.0 0.0 0.0 0.1 0.12131 0.00100 which(x %% 1 0) [1] 4 5 6 xbad - which(x %% 1 0) x[xbad] - NA x [1] 1 2 3 NA NA NA I worry about whether x %% 1 may ever return a non zero result for an integer because of rounding error. Is there a recommended approach? What about zapsmall on the left, but what on the right of ? which( zapsmall(x %% 1) 0 ) Thanks in advance -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using get() or similar function to access more than one element in a vector
Hi Joseph, Without a reproducible example, you probably will not get the precise code for a solution but look at ?list Rather than doing what you are doing now, put everything into a list, and then you will not need to use get() at all. You will just work with the whole list. It can take a bit to get to get used to working that way, but it is worth it. Cheers, Josh On Sat, Aug 13, 2011 at 9:34 PM, Joseph Sorell josephsor...@gmail.com wrote: Dear R-users, I've written a script that produces a frequency table for a group of texts. The table has a total frequency for each word type and individual frequency counts for each of the files. (I have not included the code for creating the column headers.) Below is a sample: Word Total 01.txt 02.txt 03.txt 04.txt 05.txt the 22442 2667 3651 1579 2132 3097 I 18377 3407 454 824 449 3746 and 15521 2377 2174 891 1006 2450 to 13598 1716 1395 905 1021 1983 of 12834 1647 1557 941 1127 1887 it 12440 2160 916 497 493 2449 you 12036 2283 356 293 106 2435 I've encountered two problems when I try to construct and save the file. The combined.sorted.freq.list is a named integer vector in which the integers are the total frequency counts for each word. The names are the words. For each of the individual lists I've created frequency lists that are sorted in the order of the combined list. (NAs have been replaced with 0). These are called combined. plus the number of the file. If I were to write the line to save the file manually, it would look like this: combined.table-paste(names(combined.sorted.freq.list), combined.sorted.freq.list, combined.01, combined.02, combined.03, combined.04, combined.05, combined.06, combined.07, combined.08, combined.09, combined.10, combined.11, combined.12, sep=\t) #creates a table with columns for the combined and all of the component lists However, each time I run the script, there may be a differing number of text files. I created a list of the individual frequency counts called combined.file.list combined.file.count-1:length(selected.files) #counts number of files originally selected combined.file.list-paste(combined, combined.file.count, sep=.) #creates the file names for the combined lists by catenating combined with each file number separated by a period by recycled the string combined for each number I then tried to include it as one of the elements to be pasted by using get(). combined.table-paste(names(combined.sorted.freq.list), combined.sorted.freq.list, get(combined.file.list[]), sep=\t) #intended to create a table with columns for the combined and all of the component lists Unfortunately, the get() function only gets the first component list since get() can apparently only access one object. This results in a table with only the total frequency and the amount of the first text: Word Total 01.txt the 22442 2667 I 18377 3407 and 15521 2377 to 13598 1716 of 12834 1647 it 12440 2160 you 12036 2283 If I try to construct the file piece by piece as they are created, I get an error message that a vector of more than 1.3 Gb cannot be created. Does anyone know how I could use get() or some other method to access all of the files named in a vector? Many thank for any help you can offer! Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.