Re: [R] linear regression for grouped data
Hi: There are some advantages to taking a plyr approach to this type of problem. The basic idea is to fit a linear model to each subgroup and save the results in a list, from which you can extract what you want piece by piece. library(plyr) # One of those SAS style data sets... > df <- data.frame(matrix(scan(), ncol = 3, byrow = TRUE)) 1: 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 16: 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168 32: 37739 29.7 168 37746 97.4 37: Read 36 items # A little cleanup: names(df) <- c('ID', 'x', 'y') df$ID <- factor(df$ID) # Fit a linear model to each sub-data frame identified by ID # and send the results to a list object # dlply takes a data frame as input and outputs a list # the grouping variable is ID # the argument d in the function is the sub-data frame of a given ID lr1 <- dlply(df, .(ID), function(d) lm(y ~ x, data = d)) # So you can do things like: # Grab the model coefficients # (input is a list, output is a data frame) > ldply(lr1, function(m) m$coef) ID (Intercept) x 1 76 -11699. 0.32176123 2 111 680.6007 -0.01890034 3 1213900.5051 -0.10174534 4 168 -136322.4296 3.61371841 # export the R^2 values > ldply(lr1, function(m) summary(m)$r.squared) IDV1 1 76 0.3718840 2 111 1.000 3 121 0.9367437 4 168 0.6993811 # Extract the residuals and predicted values to another list > llply(lr1, function(m) cbind(m$resid, m$fitted)) $`76` [,1] [,2] 1 -20.762884 36.56288 2 24.867175 42.03282 3 -4.104291 69.70429 $`111` [,1] [,2] 40 10.3 50 4.8 $`121` [,1] [,2] 6 0.4371678 15.562832 7 -0.4610869 15.461087 8 1.2610869 8.338913 9 -1.2371678 8.237168 $`168` [,1] [,2] 10 9.57509 12.32491 11 -25.98953 55.68953 12 16.41444 80.98556 # Plot the residuals vs. fitted values for each model (don't blink :) # the _ means that no object is returned; the plot is a side effect l_ply(lr1, function(d) plot(resid(d) ~ fitted(d))) These are just some examples; clearly, there is a lot more one could do with this type of structure. HTH, Dennis On Tue, Dec 28, 2010 at 6:23 PM, Entropi ntrp wrote: > Hi, > I have been examining large data and need to do simple linear regression > with the data which is grouped based on the values of a particular > attribute. For instance, consider three columns : ID, x, y, and I need to > regress x on y for each distinct value of ID. Specifically, for the set of > data corresponding to each of the 4 values of ID (76,111,121,168) in the > below data, I should invoke linear regression 4 times. The challenge is > that, the length of the ID vector is around 2 and therefore linear > regression must be done automatically for each distinct value of ID. > > IDx y > 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 > 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168 > 37739 29.7 168 37746 97.4 > I was wondering whether there is an easy way to group data based on the > values of ID in R so that linear regression can be done easily for each > group determined by each value of ID. Or, is the only way to construct > loops with 'for' or 'while' in which a matrix is generated for each > distinct value of ID that stores corresponding values of x and y by > screening the entire ID vector? > > Thanks in advance, > > Yasin > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression for grouped data
library(nlme) lmList(y ~ x | factor(ID), myData) This gives a list of fitted model objects. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Entropi ntrp Sent: Wednesday, 29 December 2010 12:24 PM To: r-help@r-project.org Subject: [R] linear regression for grouped data Hi, I have been examining large data and need to do simple linear regression with the data which is grouped based on the values of a particular attribute. For instance, consider three columns : ID, x, y, and I need to regress x on y for each distinct value of ID. Specifically, for the set of data corresponding to each of the 4 values of ID (76,111,121,168) in the below data, I should invoke linear regression 4 times. The challenge is that, the length of the ID vector is around 2 and therefore linear regression must be done automatically for each distinct value of ID. IDx y 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168 37739 29.7 168 37746 97.4 I was wondering whether there is an easy way to group data based on the values of ID in R so that linear regression can be done easily for each group determined by each value of ID. Or, is the only way to construct loops with 'for' or 'while' in which a matrix is generated for each distinct value of ID that stores corresponding values of x and y by screening the entire ID vector? Thanks in advance, Yasin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filling up holes
Dear 'analyst41' (it would be a courtesy to know who you are) Here is a low-level way to do it. First create some dummy data > allDates <- seq(as.Date("2010-01-01"), by = 1, length.out = 50) > client_ID <- sample(LETTERS[1:5], 50, rep = TRUE) > value <- 1:50 > date <- sample(allDates) > clientData <- data.frame(client_ID, date, value) At this point clientData has 50 rows, with 5 clients, each with a sample of datas. Everything is in random order execept "value". Now write a little function to fill out a subset of the data consisting of one client's data only: > fixClient <- function(cData) { + dateRange <- range(cData$date) + dates <- seq(dateRange[1], dateRange[2], by = 1) + fullSet <- data.frame(client_ID = as.character(cData$client_ID[1]), + date = dates, value = NA) + + fullSet$value[match(cData$date, dates)] <- cData$value + fullSet + } Now split up the data, apply the fixClient function to each section and re-combine them again: > allData <- do.call(rbind, +lapply(split(clientData, clientData$client_ID), fixClient)) Check: > head(allData) client_ID date value A.1 A 2010-01-0436 A.2 A 2010-01-0518 A.3 A 2010-01-06NA A.4 A 2010-01-07NA A.5 A 2010-01-08NA A.6 A 2010-01-0949 > Seems OK. At this point the data are in sorted order by client and date, but that should not matter. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of analys...@hotmail.com Sent: Wednesday, 29 December 2010 10:45 AM To: r-help@r-project.org Subject: [R] filling up holes I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression for grouped data
On Dec 28, 2010, at 9:23 PM, Entropi ntrp wrote: Hi, I have been examining large data and need to do simple linear regression with the data which is grouped based on the values of a particular attribute. For instance, consider three columns : ID, x, y, and I need to regress x on y for each distinct value of ID. Specifically, for the set of data corresponding to each of the 4 values of ID (76,111,121,168) in the below data, I should invoke linear regression 4 times. The challenge is that, the length of the ID vector is around 2 and therefore linear regression must be done automatically for each distinct value of ID. IDx y 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168 37739 29.7 168 37746 97.4 Let's say that is a dataframe named "indat. Try: lapply(split(indat, as.factor(indat$ID)), function(df) {lm(y ~ x, data=df)} ) I was wondering whether there is an easy way to group data based on the values of ID in R so that linear regression can be done easily for each group determined by each value of ID. Or, is the only way to construct loops with 'for' or 'while' in which a matrix is generated for each distinct value of ID that stores corresponding values of x and y by screening the entire ID vector? Thanks in advance, Yasin -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression for grouped data
Hi, I have been examining large data and need to do simple linear regression with the data which is grouped based on the values of a particular attribute. For instance, consider three columns : ID, x, y, and I need to regress x on y for each distinct value of ID. Specifically, for the set of data corresponding to each of the 4 values of ID (76,111,121,168) in the below data, I should invoke linear regression 4 times. The challenge is that, the length of the ID vector is around 2 and therefore linear regression must be done automatically for each distinct value of ID. IDx y 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168 37739 29.7 168 37746 97.4 I was wondering whether there is an easy way to group data based on the values of ID in R so that linear regression can be done easily for each group determined by each value of ID. Or, is the only way to construct loops with 'for' or 'while' in which a matrix is generated for each distinct value of ID that stores corresponding values of x and y by screening the entire ID vector? Thanks in advance, Yasin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] filling up holes
I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using lapply and split to plot up subsets of a vector
The data= argument to plot only makes sense if the first argument is a formula. So if you change the plot command in your function to plot(ln.o2con~lnbm,data=df) you might get what you want. But I would suggest you take a look at the plot produced by library(lattice) xyplot(ln.o2con~lnbm|sp.id,data=one) which might be more useful. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 28 Dec 2010, karmakiller wrote: Hi, I would like to be able to plot data from each of the sp.id on individual plots. At the moment I can plot all the data on one graph with the following commands but I cannot figure out how to get individual graph for each sp.id. i<- function(df)plot(lnbm,ln.o2con,data=df) j<- lapply(split(one,one$sp.id),i) I have searched on the net and through the threads here but I cannot find anything that matches what I am trying to do. Any help would be greatly appreciated. Thanx -- View this message in context: http://r.789695.n4.nabble.com/using-lapply-and-split-to-plot-up-subsets-of-a-vector-tp3166634p3166634.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using lapply and split to plot up subsets of a vector
Hi, I would like to be able to plot data from each of the sp.id on individual plots. At the moment I can plot all the data on one graph with the following commands but I cannot figure out how to get individual graph for each sp.id. i<- function(df)plot(lnbm,ln.o2con,data=df) j<- lapply(split(one,one$sp.id),i) I have searched on the net and through the threads here but I cannot find anything that matches what I am trying to do. Any help would be greatly appreciated. Thanx -- View this message in context: http://r.789695.n4.nabble.com/using-lapply-and-split-to-plot-up-subsets-of-a-vector-tp3166634p3166634.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem applying McNemar's - Different values in SPSS and R
Marc Schwartz [Tue, Dec 28, 2010 at 07:14:49PM CET]: [...] > > An old question of mine: Is there any reason not to use binom.test() > > other than historical reasons? > (I meant "in lieu of the McNemar approximation", sorry if some misunderstanding ensued). > I may be missing the context of your question, but I frequently see > exact binomial tests being used when one is comparing the > presumptively known probability of some dichotomous characteristic > versus that which is observed in an independent sample. For example, > in single arm studies where one is comparing an observed event rate > against a point estimate for a presumptive historical control. In the McNemar context (as used by SPSS) the null hypothesis is p=0.5. > I also see the use of exact binomial (Clopper-Pearson) confidence > intervals being used when one wants to have conservative CI's, given > that the nominal coverage of these are at least as large as > requested. That is, 95% exact CI's will be at least that large, but > in reality can tend to be well above that, depending upon various > factors. This is well documented in various papers. Confidence intervals are not that regularly used in the McNemar context, as the conditional probability "a > b given they are unequal" is not that much an interpretable quantity as is the event probability in a single arm study. > I generally tend to use Wilson CI's for binomial proportions when reporting analyses. I have my own code but these are implemented in various R functions, including Frank's binconf() in Hmisc. Thanks for the hint. -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:johan...@huesing.name from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levelplot blocks size
Here is a basic example: tmp.df <- expand.grid( x= 1:100, y=1:100 ) tmp.df$z <- with(tmp.df, x+2*y) library(lattice) levelplot( z ~ x + y, data=tmp.df ) tx2 <- with(tmp.df, cut(x, seq(0.5, 100.5, 10) ) ) ty2 <- with(tmp.df, cut(y, seq(0.5, 100.5, 20) ) ) tmp.df2 <- aggregate(tmp.df, list( tx2, ty2 ), mean ) levelplot( z ~ x + y, data=tmp.df2 ) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of jonathan > Sent: Monday, December 27, 2010 7:00 PM > To: r-help@r-project.org > Subject: Re: [R] levelplot blocks size > > > Thanks for your help. > > Might you be able to explain in a little more detail how to use those > functions to solve this specific problem? > > I'm happy to put in the work myself and have looked up those functions > but > am new to R and still a little unsure about how I would go about using > those > functions to solve my problem. > > Thanks, > > Jonathan > -- > View this message in context: http://r.789695.n4.nabble.com/levelplot- > blocks-size-tp3089972p3165638.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] batch file output
On Tue, Dec 28, 2010 at 5:09 AM, Mikkel Grum wrote: > I run a batch file with the following command in Windows XP: > > C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore > C:\users\me\file.out 2>&1 I'm a bit surprised this worked for you...did you customize your build so that Rterm.exe is in \bin\ rather than a subfolder for its specific architecture? > Is there any way to get only the output of R in file.out, without getting all > the code from file.R too? I did not see anyone else mention this, so I wanted to add that with R CMD BATCH you can add the --slave argument to avoid needing to add options(echo = FALSE) to all your scripts. The --no-timing option stops proc.time() from running at the end. For example from the command prompt I can run 'sample.R' using 32 bit R: C:\R\R-2.12.1\bin\i386\R CMD BATCH --slave --no-timing "sample.R" "sampleout.txt" HTH, Josh > Any help greatly appreciated, > Mikkel > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] another superscript problem
Hi, this seems to work, plot.new() legend("topleft", legend=as.expression(c(bquote(.(txt) == .(obv)*degree), "Von Mises distribution"))) HTH, baptiste On 28 December 2010 20:17, Tyler Dean Rudolph wrote: > legend("topleft", legend=c(bquote(.(txt) == .(obv)*degree), "Von Mises > distribution")) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in combined for() and if() code
Hi all, I haven't solved the problem of filtering the data, but I have managed to find all the peaks in the data despite their relatively flat nature using peaks() in the IDPmisc package. It works really well for my data and the ability to set a lower threshold for peaks to report is convenient as well. Maybe I'll came back to the data filtering problem later. Thanks for your help and comments, Nate On Tue, Dec 28, 2010 at 10:49 AM, David Winsemius wrote: > > On Dec 28, 2010, at 1:08 PM, Nathan Miller wrote: > > Hello, >> >> I am trying to filter a data set like below so that the peaks in the Phase >> value are more obvious and can be identified by a peak finding function >> following the useful advise of Carl Witthoft. I have written the following >> >> for(i in length(data$Phase)){ >> newphase=if(abs(data$Phase[i+1]-data$Phase[i])>6){ >> data$Phase[i+1] >> }else{data$Phase[i] >> } >> } >> >> I get the following error which I have not seen before when I paste the >> code >> into R >> >> Error in if (abs(data$Phase[i + 1] - data$Phase[i]) > 6) { : >> missing value where TRUE/FALSE needed >> >> I don't have much experience with such loops as I have tried to avoid >> using >> them in the past. Can anyone identify the error(s) in the code I have >> written or a simpler means of writing such a filter? >> > > Sometimes it's more informative to look at the data first. Here's a plot of > the data with the first and second differences underneath> > > plot(data, ylim=c(-5, max(data$Phase)) ) > lines(data$Time[-1], diff(data$Phase) ) > lines(data$Time[-(1:2)], diff(diff(data$Phase)), col="red") > > Your data had rather flat-topped maxima. These maxima are defined by the > interval between the times when the first differences are zero (OR go from > positive to negative) AND the second differences are negative (OR zero). > > There is a package on CRAN: > > http://cran.r-project.org/web/packages/msProcess/index.html > > that purports to do peak finding. I would think the local maxima in > you data might need some filtering and presumably the mass-spec people have > need of that too. > > > > > >> Thank you, >> Nate >> >> >> data= >> Time Phase >> 1 0.000 15.18 >> 2 0.017 13.42 >> 3 0.034 11.40 >> 4 0.051 18.31 >> 5 0.068 25.23 >> 6 0.085 33.92 >> 7 0.102 42.86 >> 8 0.119 42.87 >> 9 0.136 42.88 >> 10 0.153 42.88 >> 11 0.170 42.87 >> 12 0.186 42.88 >> 13 0.203 42.88 >> 14 0.220 42.78 >> 15 0.237 33.50 >> 16 0.254 24.81 >> 17 0.271 17.20 >> 18 0.288 10.39 >> 19 0.305 13.97 >> 20 0.322 16.48 >> 21 0.339 14.75 >> 22 0.356 20.80 >> 23 0.373 25.79 >> 24 0.390 31.25 >> 25 0.407 39.89 >> 26 0.423 40.04 >> 27 0.440 40.05 >> 28 0.457 40.05 >> 29 0.474 40.05 >> 30 0.491 40.05 >> 31 0.508 40.06 >> 32 0.525 40.07 >> 33 0.542 32.23 >> 34 0.559 23.90 >> 35 0.576 17.86 >> 36 0.592 11.63 >> 37 0.609 12.78 >> 38 0.626 13.12 >> 39 0.643 10.93 >> 40 0.660 10.63 >> 41 0.677 10.82 >> 42 0.694 11.84 >> 43 0.711 20.44 >> 44 0.728 27.33 >> 45 0.745 34.22 >> 46 0.762 41.55 >> 47 0.779 41.55 >> 48 0.796 41.55 >> 49 0.813 41.53 >> 50 0.830 41.53 >> 51 0.847 41.52 >> 52 0.864 41.52 >> 53 0.880 41.53 >> 54 0.897 41.53 >> 55 0.914 33.07 >> 56 0.931 25.12 >> 57 0.948 19.25 >> 58 0.965 11.30 >> 59 0.982 12.48 >> 60 0.999 13.85 >> 61 1.016 13.62 >> 62 1.033 12.62 >> 63 1.050 19.39 >> 64 1.067 25.48 >> 65 1.084 31.06 >> 66 1.101 39.49 >> 67 1.118 39.48 >> 68 1.135 39.46 >> 69 1.152 39.45 >> 70 1.169 39.43 >> 71 1.185 39.42 >> 72 1.202 39.42 >> 73 1.219 39.41 >> 74 1.236 39.41 >> 75 1.253 37.39 >> 76 1.270 29.03 >> 77 1.287 20.61 >> 78 1.304 14.07 >> 79 1.321 9.12 >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] another superscript problem
Part of the reason I was having difficulty is that I'm trying to add a legend with more than one element: plot(1,1) obv = 5 txt = "Pop mean" # this works legend("topleft", legend=bquote(.(txt) == .(obv)*degree)) # but this doesn't legend("topleft", legend=c(bquote(.(txt) == .(obv)*degree), "Von Mises distribution")) How can I go about using multiple legend elements with mathematical/latin annotation in both? Tyler On Mon, Dec 27, 2010 at 8:22 PM, Peter Ehlers wrote: > On 2010-12-27 16:51, David Winsemius wrote: > >> >> On Dec 27, 2010, at 6:40 PM, T.D. Rudolph wrote: >> >> >>> I've exceeded the maximum time I am willing to accept for solving >>> simple >>> problems so I thank all in advance for your assistance. >>> >>> I am trying to plot text combined with an object value and a >>> superscript. >>> >>> obv = 5 >>> text = "Population mean =" >>> ss = ^o # degrees >>> >>> Something like this (very naive so you get the idea): >>> expression(text, obv, ss) >>> >>> paste(text, obv) # works ...but of course I either lose the value of >>> obv or >>> the superscript in the translation using expression, and bquote >>> doesn't seem >>> to accept the asterisk before the first element. >>> >> >> I had trouble figuring out your real intent, since you have only been >> describing what didn't work but see if this his halfway there: >> >> plot(1,1) >> obv = 5 >> text = "Population mean =" # you should really avoid using function >> names for variables! >> text(.8,.8, bquote(.(text)~.(obv)^o) ) >> >> The ^o seems a bit of a dodge but it looks ok so if you're happy, go >> > > Instead of ^o, use the word 'degree' (see ?plotmath) > > text(.8,.8, bquote(.(text)~.(obv)*degree) ) > > and, personally, I would let R handle the '=' sign: > > txt <- "Pop mean" > text(1, 1.1, bquote(.(txt) == .(obv)*degree)) > > Peter Ehlers > > with it. >> >>> >>> I am a little bungled by the varying syntax used for bquote and all >>> the >>> rest; sometimes R seems more complicated than it needs to be for a >>> relatively simple problem (and for me this is one of those cases!)... >>> >>> Tyler >>> -- >>> >> >> >> David Winsemius, MD >> West Hartford, CT >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gamma & Lognormal Model
Thank you Michael! -- View this message in context: http://r.789695.n4.nabble.com/Gamma-Lognormal-Model-tp3165408p3166318.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in combined for() and if() code
On Dec 28, 2010, at 1:08 PM, Nathan Miller wrote: Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])>6){ data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i]) > 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Sometimes it's more informative to look at the data first. Here's a plot of the data with the first and second differences underneath> plot(data, ylim=c(-5, max(data$Phase)) ) lines(data$Time[-1], diff(data$Phase) ) lines(data$Time[-(1:2)], diff(diff(data$Phase)), col="red") Your data had rather flat-topped maxima. These maxima are defined by the interval between the times when the first differences are zero (OR go from positive to negative) AND the second differences are negative (OR zero). There is a package on CRAN: http://cran.r-project.org/web/packages/msProcess/index.html that purports to do peak finding. I would think the local maxima in you data might need some filtering and presumably the mass-spec people have need of that too. Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in combined for() and if() code
On 28/12/2010 1:13 PM, Uwe Ligges wrote: On 28.12.2010 19:08, Nathan Miller wrote: Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ Nonsense: In this case the loop will only run once for i=length(data$Phase) Yes, I missed that. you probably want for(i in seq_along(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])>6){ Nonsense: 1. if()... won't return any useful result. if (cond) v1 else v2 does return a value (either v1 or v2). So the construction newphase = if (abs(data$Phase[i+1] will set newphase to a new value each time through the loop. That's probably not what was intended... 2. i+1 is not within your data That's the only one I saw. Duncan Murdoch Uwe Ligges data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i])> 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem applying McNemar's - Different values in SPSS and R
On Dec 28, 2010, at 11:47 AM, Johannes Huesing wrote: > Marc Schwartz [Tue, Dec 28, 2010 at 06:30:59PM CET]: >> >> On Dec 28, 2010, at 11:05 AM, Manoj Aravind wrote: >> >>> Hi friends, >>> I get different values for McNemar's test in R and SPSS. Which one should i >>> rely on when the p values differ. > > [...] >> >> >> The SPSS test appears to be an exact test, whereas the default R function >> does not perform an exact test, so you are not comparing Apples to Apples... >> > > Indeed, binom.test(11, 14) renders the same p-value as SPSS, whereas > mcnemar.test() uses the approximation (|a_12 - a_21| - 1)²/(a_21 + a_12) > with the "-1" removed if correct=FALSE. > > An old question of mine: Is there any reason not to use binom.test() > other than historical reasons? I may be missing the context of your question, but I frequently see exact binomial tests being used when one is comparing the presumptively known probability of some dichotomous characteristic versus that which is observed in an independent sample. For example, in single arm studies where one is comparing an observed event rate against a point estimate for a presumptive historical control. I also see the use of exact binomial (Clopper-Pearson) confidence intervals being used when one wants to have conservative CI's, given that the nominal coverage of these are at least as large as requested. That is, 95% exact CI's will be at least that large, but in reality can tend to be well above that, depending upon various factors. This is well documented in various papers. I generally tend to use Wilson CI's for binomial proportions when reporting analyses. I have my own code but these are implemented in various R functions, including Frank's binconf() in Hmisc. HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in combined for() and if() code
On 28.12.2010 19:08, Nathan Miller wrote: Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ Nonsense: In this case the loop will only run once for i=length(data$Phase) you probably want for(i in seq_along(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])>6){ Nonsense: 1. if()... won't return any useful result. 2. i+1 is not within your data Uwe Ligges data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i])> 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in combined for() and if() code
On 28/12/2010 1:08 PM, Nathan Miller wrote: Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])>6){ When i is at its maximum, i+1 will be beyond the length of data$Phase, so you shouldn't use it as an index. Duncan Murdoch data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i])> 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in combined for() and if() code
Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])>6){ data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i]) > 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster way to do it??...using apply?
I don't know if it's any faster, but it is also possible this way: y <- ifelse(x ==1, round(runif(x)), sign(x)) -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly r-help-boun...@r-project.org wrote on 12/28/2010 12:48:04 PM: > [image removed] > > Re: [R] Faster way to do it??...using apply? > > Henrique Dallazuanna > > to: > > M.Ribeiro > > 12/28/2010 12:51 PM > > Sent by: > > r-help-boun...@r-project.org > > Cc: > > r-help > > Try this indeed > > replace(replace(x, x == 1, sample(0:1, sum(x == 1), rep = TRUE)), x == 2, 1) > > On Tue, Dec 28, 2010 at 3:14 PM, M.Ribeiro wrote: > > > > > Hi Henrique, > > Thanks for the fast answer, > > The only problem in your code, which I think I didn't mention in my message > > is that I would like one different random sampling procedure for each 1 in > > my vector > > > > The way it was written, it samples only once and replace by every 1: > > > x = as.matrix(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)) > > > replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1) > > [,1] > > [1,]1 > > [2,]1 > > [3,]1 > > [4,]1 > > [5,]1 > > [6,]1 > > [7,]1 > > [8,]1 > > [9,]1 > > [10,]1 > > [11,]1 > > [12,]1 > > [13,]1 > > [14,]1 > > [15,]1 > > > > Thanks > > > > -- > > View this message in context: > > http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply- > tp3166161p3166203.html > > Sent from the R help mailing list archive at Nabble.com. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster way to do it??...using apply?
Try this indeed replace(replace(x, x == 1, sample(0:1, sum(x == 1), rep = TRUE)), x == 2, 1) On Tue, Dec 28, 2010 at 3:14 PM, M.Ribeiro wrote: > > Hi Henrique, > Thanks for the fast answer, > The only problem in your code, which I think I didn't mention in my message > is that I would like one different random sampling procedure for each 1 in > my vector > > The way it was written, it samples only once and replace by every 1: > > x = as.matrix(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)) > > replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1) > [,1] > [1,]1 > [2,]1 > [3,]1 > [4,]1 > [5,]1 > [6,]1 > [7,]1 > [8,]1 > [9,]1 > [10,]1 > [11,]1 > [12,]1 > [13,]1 > [14,]1 > [15,]1 > > Thanks > > -- > View this message in context: > http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166203.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem applying McNemar's - Different values in SPSS and R
Marc Schwartz [Tue, Dec 28, 2010 at 06:30:59PM CET]: > > On Dec 28, 2010, at 11:05 AM, Manoj Aravind wrote: > > > Hi friends, > > I get different values for McNemar's test in R and SPSS. Which one should i > > rely on when the p values differ. [...] > > > The SPSS test appears to be an exact test, whereas the default R function > does not perform an exact test, so you are not comparing Apples to Apples... > Indeed, binom.test(11, 14) renders the same p-value as SPSS, whereas mcnemar.test() uses the approximation (|a_12 - a_21| - 1)²/(a_21 + a_12) with the "-1" removed if correct=FALSE. An old question of mine: Is there any reason not to use binom.test() other than historical reasons? -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:johan...@huesing.name from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Jaccard dissimilarity matrix for PCA
This sounds like something I could use.. I'm kind of new with R, meaning I've having some minor troubles all the time... Say I have a range of binary(0,1) variables X1 to Xn, with missing data for different cases. At the moment my data is a binary indicator matrix; rows representing the i individuals or subjects, columns representing presence(1)/absence(0) of various characteristics. Actually I have 5 groups of variables (102 variables in total), describing different aspects of the subject(s) I'm studying (people; i.e. refugees). O -> O1 to O43 A -> A1 to A38 R -> R1 to R6 AP -> AP1 to AP8 PT -> PT1 to PT7 Can someone help me with the programming of a jaccard matrix in prabclus (or in any other package). I'm having troubles defining the input-object to the function, I think? I get error messages like: 'x' must be an array of at least two dimensions ERROR: argument is not a matrix Jacob Christian Hennig wrote: > > jaccard in package prabclus computes a Jaccard matrix for you. > -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3166205.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster way to do it??...using apply?
Hi Henrique, Thanks for the fast answer, The only problem in your code, which I think I didn't mention in my message is that I would like one different random sampling procedure for each 1 in my vector The way it was written, it samples only once and replace by every 1: > x = as.matrix(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)) > replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1) [,1] [1,]1 [2,]1 [3,]1 [4,]1 [5,]1 [6,]1 [7,]1 [8,]1 [9,]1 [10,]1 [11,]1 [12,]1 [13,]1 [14,]1 [15,]1 Thanks -- View this message in context: http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166203.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem applying McNemar's - Different values in SPSS and R
On Dec 28, 2010, at 11:05 AM, Manoj Aravind wrote: > Hi friends, > I get different values for McNemar's test in R and SPSS. Which one should i > rely on when the p values differ. > I came across this problem when i started learning R and seriously give up > on SPSS or any other proprietary software. > Thank u in advance > > Output in SPSS follows > > *Crosstab* > > > hsc > > Total > > ABN > > NE > > ABN > > tvs > > ABN > > Count > > 40 > > 3 > > 43 > > Row % > > 93.0% > > 7.0% > > 100.0% > > COL% > > 78.4% > > 30.0% > > 70.5% > > NE > > Count > > 11 > > 7 > > 18 > > Row % > > 61.1% > > 38.9% > > 100.0% > > COL% > > 21.6% > > 70.0% > > 29.5% > > Total > > Count > > 51 > > 10 > > 61 > > Row % > > 83.6% > > 16.4% > > 100.0% > > COL% > > 100.0% > > 100.0% > > 100.0% > > > > * Chi-Square Tests* > > > Value > > Exact Sig. (2-sided) > > McNemar Test > > .057(a) > > N of Valid Cases > > 61 > > a Binomial distribution used. > > Output from R is as follows > >> tvshsc<- > > + matrix(c(40,11,3,7), > > + nrow=2, > > + dimnames=list("TVS"=c("ABN","NE"), > > + "HSC"=c("ABN","NE"))) > >> tvshsc > > HSC > > TVS ABN NE > > ABN 40 3 > > NE 11 7 > >> mcnemar.test(tvshsc) > > > McNemar's Chi-squared test with continuity correction > > > data: tvshsc > > McNemar's chi-squared = 3.5, df = 1, p-value = 0.06137 > > Regards > > Dr. B Manoj Aravind The SPSS test appears to be an exact test, whereas the default R function does not perform an exact test, so you are not comparing Apples to Apples... Try this using the 'exact2x2' CRAN package: > require(exact2x2) Loading required package: exact2x2 Loading required package: exactci > mcnemar.exact(matrix(c(40, 11, 3, 7), 2, 2)) Exact McNemar test (with central confidence intervals) data: matrix(c(40, 11, 3, 7), 2, 2) b = 3, c = 11, p-value = 0.05737 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.04885492 1.03241985 sample estimates: odds ratio 0.2727273 HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem applying McNemar's - Different values in SPSS and R
Hi friends, I get different values for McNemar's test in R and SPSS. Which one should i rely on when the p values differ. I came across this problem when i started learning R and seriously give up on SPSS or any other proprietary software. Thank u in advance Output in SPSS follows *Crosstab* hsc Total ABN NE ABN tvs ABN Count 40 3 43 Row % 93.0% 7.0% 100.0% COL% 78.4% 30.0% 70.5% NE Count 11 7 18 Row % 61.1% 38.9% 100.0% COL% 21.6% 70.0% 29.5% Total Count 51 10 61 Row % 83.6% 16.4% 100.0% COL% 100.0% 100.0% 100.0% * Chi-Square Tests* Value Exact Sig. (2-sided) McNemar Test .057(a) N of Valid Cases 61 a Binomial distribution used. Output from R is as follows > tvshsc<- + matrix(c(40,11,3,7), + nrow=2, + dimnames=list("TVS"=c("ABN","NE"), + "HSC"=c("ABN","NE"))) > tvshsc HSC TVS ABN NE ABN 40 3 NE 11 7 > mcnemar.test(tvshsc) McNemar's Chi-squared test with continuity correction data: tvshsc McNemar's chi-squared = 3.5, df = 1, p-value = 0.06137 Regards Dr. B Manoj Aravind [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster way to do it??...using apply?
Try this: replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1) On Tue, Dec 28, 2010 at 2:43 PM, M.Ribeiro wrote: > > Hi, > I have a simple task, but I am looking for a clever and fast way to do it: > > I have a vector x with 0,1 or 2 and I want to create another vector y with > the same length following the rules: > If the element in x is equal to 0, the element in y is equal to 0 > If the element in x is equal to 2, the element in y is equal to 1 > If the element in x is equal to 1, the element in y is either 0 or 1 > (sample > from c(0,1)) > > thus the vector > > x > [,1] > [1,]0 > [2,]2 > [3,]1 > [4,]2 > [5,]0 > [6,]1 > [7,]2 > > could produce the vector y (this is one of the possibilities since y|x=1 is > either 0 or 1 > > > y > [,1] > [1,]0 > [2,]1 > [3,]1 > [4,]1 > [5,]0 > [6,]0 > [7,]1 > > > I know how to do this using for loops but I was wondering if you guys could > suggest a better way > Thanks > > -- > View this message in context: > http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166161.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Faster way to do it??...using apply?
Hi, I have a simple task, but I am looking for a clever and fast way to do it: I have a vector x with 0,1 or 2 and I want to create another vector y with the same length following the rules: If the element in x is equal to 0, the element in y is equal to 0 If the element in x is equal to 2, the element in y is equal to 1 If the element in x is equal to 1, the element in y is either 0 or 1 (sample from c(0,1)) thus the vector > x [,1] [1,]0 [2,]2 [3,]1 [4,]2 [5,]0 [6,]1 [7,]2 could produce the vector y (this is one of the possibilities since y|x=1 is either 0 or 1 > y [,1] [1,]0 [2,]1 [3,]1 [4,]1 [5,]0 [6,]0 [7,]1 I know how to do this using for loops but I was wondering if you guys could suggest a better way Thanks -- View this message in context: http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166161.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] batch file output
On Dec 28, 2010, at 10:38 AM, Mikkel Grum wrote: Thanks. The way I run it, I can determine what version of R to run with which script. Don't know how to do that with R CMD BATCH. Seems as though something like this (using absolute path to the instance of R.exe) should work: C:\R\R-2.12.1\bin\R CMD BATCH [options] infile [outfile] At least if I remember my command line Windows conventions ... it's been a few years. -- David. Placing options(echo = FALSE) in the infile solves my problem. I got that from the page you linked to. Mikkel --- On Tue, 12/28/10, David Winsemius wrote: From: David Winsemius Subject: Re: [R] batch file output To: "David Winsemius" Cc: "Mikkel Grum" , r-help@r-project.org Date: Tuesday, December 28, 2010, 8:30 AM On Dec 28, 2010, at 8:27 AM, David Winsemius wrote: On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote: I run a batch file with the following command in Windows XP: C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me\file.out 2>&1 Is there any way to get only the output of R in file.out, without getting all the code from file.R too? Put a sink(file="C:\users\me\file2.out") Would probably work better to use forward slashes. in the file.R would be one way but your general strategy looks a bit strange. One does not generally use the interactive version of R for batch execution. See: http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Jaccard dissimilarity matrix for PCA
jaccard in package prabclus computes a Jaccard matrix for you. By the way, if you want to do hierarchical clustering, it doesn't seem to be a good idea to me to run PCA first. Why not cluster the dissimilarity matrix directly without information loss by PCA? (I should not make too general statements on this because generally how to cluster data always depends on the aim of clustering, the cluster concept you are interested in etc.) prabclus also contains clustering methods for such data; have a look at the functions prabclust and hprabclust (however, they are documented as functions for clustering species distribution ranges, so if your application is different, you may have to think about whether and how to adapt them). Hope this helps, Christian On Tue, 28 Dec 2010, Flabbergaster wrote: Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for another way to explore the clusters present in my data. Another problem is, that I have cases with missing values on different variables. Jacob -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] batch file output
Thanks. The way I run it, I can determine what version of R to run with which script. Don't know how to do that with R CMD BATCH. Placing options(echo = FALSE) in the infile solves my problem. I got that from the page you linked to. Mikkel --- On Tue, 12/28/10, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] batch file output > To: "David Winsemius" > Cc: "Mikkel Grum" , r-help@r-project.org > Date: Tuesday, December 28, 2010, 8:30 AM > > On Dec 28, 2010, at 8:27 AM, David Winsemius wrote: > > > > > On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote: > > > >> I run a batch file with the following command in > Windows XP: > >> > >> C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore > C:\users\me\file.out 2>&1 > >> > >> Is there any way to get only the output of R in > file.out, without getting all the code from file.R too? > > > > Put a sink(file="C:\users\me\file2.out") > > Would probably work better to use forward slashes. > > > in the file.R would be one way but your general > strategy looks a bit strange. One does not generally use the > interactive version of R for batch execution. See: > > > > http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html > > > > -- > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Jaccard dissimilarity matrix for PCA
Jacob, You might have a look at the vegan package. It might compute the Jaccard distance and it might have some other toolsa that you might be interested in. Dave From: Flabbergaster To: r-help@r-project.org Date: 12/28/2010 08:26 AM Subject: [R] Jaccard dissimilarity matrix for PCA Sent by: r-help-boun...@r-project.org Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for another way to explore the clusters present in my data. Another problem is, that I have cases with missing values on different variables. Jacob -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Jaccard dissimilarity matrix for PCA
Flabbergaster gmail.com> writes: > My problem is, that I don't know how to compute the jaccard dissimilarity > matrix in R? Which package to use, and so on... http://rss.acs.unt.edu/Rdoc/library/arules/html/dissimilarity.html http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/vegdist.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading sas7bdat files into R
Whoops - thought I was replying to google medstats instead of r-help. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Reading-sas7bdat-files-into-R-tp3165608p3166047.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Jaccard dissimilarity matrix for PCA
Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for another way to explore the clusters present in my data. Another problem is, that I have cases with missing values on different variables. Jacob -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gamma & Lognormal Model
At 20:08 27/12/2010, Louisa wrote: Dear, I'm very new to R Gui and I have to make an assignment on Gamma Regressions. Surfing on the web doesn't help me very much so i hope this forum may be a step forward. Well since you are so honest about it being homework try Googling for lognormal gamma regression The top hit from where I am sitting is an extensive set of notes with examples in R although beware the use of _ for <- The question sounds as follows: The data set is in the library MASS first install library(MASS) then type data(mammals) attach(mammals) At this point you should complain that you are being taught poor practice as it is nearly always better to use the data= parameter and not attach data frames. Assignment: Fit the gamma model and lognormal model for the mammals data. I appreciate any help you can provide. Best Wishes, Louisa -- View this message in context: http://r.789695.n4.nabble.com/Gamma-Lognormal-Model-tp3165408p3165408.html Sent from the R help mailing list archive at Nabble.com. Michael Dewey http://www.aghmed.fsnet.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] batch file output
On Tue, Dec 28, 2010 at 8:09 AM, Mikkel Grum wrote: > I run a batch file with the following command in Windows XP: > > C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore > C:\users\me\file.out 2>&1 > > Is there any way to get only the output of R in file.out, without getting all > the code from file.R too? > > Any help greatly appreciated, > Mikkel Try Rscript.exe in your R distribution. Also in the batchfiles distribution, http://batchfiles.googlecode.com, there is a file #Rscript.bat, that can be used to turn an R script into a Windows batch file. #Rscript without arguments gives instructions. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] batch file output
On Dec 28, 2010, at 8:27 AM, David Winsemius wrote: On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote: I run a batch file with the following command in Windows XP: C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore \file.R> C:\users\me\file.out 2>&1 Is there any way to get only the output of R in file.out, without getting all the code from file.R too? Put a sink(file="C:\users\me\file2.out") Would probably work better to use forward slashes. in the file.R would be one way but your general strategy looks a bit strange. One does not generally use the interactive version of R for batch execution. See: http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] batch file output
On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote: I run a batch file with the following command in Windows XP: C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore \file.R> C:\users\me\file.out 2>&1 Is there any way to get only the output of R in file.out, without getting all the code from file.R too? Put a sink(file="C:\users\me\file2.out") in the file.R would be one way but your general strategy looks a bit strange. One does not generally use the interactive version of R for batch execution. See: http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Link prediction in social network with R
Dear Eu, On Wed, Dec 22, 2010 at 12:00 AM, EU JIN LOK wrote: > > Dear R users > > I'm a novice user of R and have absolutely no prior knowledge of social > network analysis, so apologies if my question is trivial. I've spent alot of > time trying to solve this on my own but I really can't so hope someone here > can help me out. Cheers! > > The dataset: > I'm trying to predict the existance of links (True or False) in a test set > using a training set. Both data sets are in an "edgelist" format, where User > IDs represents nodes in both columns with the 1st column directing to the 2nd > column (see figure 1 below). Using the AUC to evaluate the performance, I am > looking for the best algorithm to predict the existance of links in the test > data (50% are true and rest are false). > > Figure 1: >> training > Vertices: 1133143 > Edges: 999 > Directed: TRUE > Edges: > > [0] 105 -> 850956 > [1] 105 -> 1073420 > [2] 105 -> 1102667 > [3] 165 -> 888346 > [4] 165 -> 579649 > [5] 165 -> 136665 > etc.. > > I'm having problems obtaining the probability scores for the links / edges as > most of the scores are for the nodes. An example of this is the graph.knn and > page.rank module in igraph. > > So my questions are: > 1) What do I need to do to obtain the scores for the links instead of the > nodes (I presume it must be a data preparation step that I must be missing > out)? In general, most people are interested in the nodes of the network, so most network indices are node level. If you want edge-level indices, you can create another graph from yours, by transforming the edges into vertices and vice-versa. Two vertices are connected in the new graph, if the corresponding two edges in the old graph share an incident vertex. However, I am sure that there are some vertex measures that don't make sense for edges at all, so you need to be careful with this, especially with the interpretation of the results. Another possibility is to use the few edge-level indices, e.g. edge betweenness, or just define analog edge measures for the existing vertex measures. > 2) Which R package would be the best for running the various techniques - > Jackard index, Adamic-Adar, common neightbours, PropFlow, etc The first three are implemented in igraph if I remember well. > 3) How to implement a supervised learning method such as random forest (I am > guessing I need to obtain a feature list but again, how can I get the scores > for the edges)? I am not an expert on this, but there are are several R packages for supervised methods, random forests as well, look around on CRAN. I hope this helps, Best, Gabor > Hope I've explain my questions well but do let me know if more clarification > is need. > > Thanks in advance > Eu Jin > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Gabor Csardi UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] batch file output
I run a batch file with the following command in Windows XP: C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me\file.out 2>&1 Is there any way to get only the output of R in file.out, without getting all the code from file.R too? Any help greatly appreciated, Mikkel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bayesian Belief Networks in R
On Thu, Dec 23, 2010 at 09:12:41AM -0500, Data Analytics Corp. wrote: > Hi, > > Does anyone know of a package for or any implementation of a Bayesian > Belief Network in R? Different types of graphical models in R including Bayesian networks are described in CRAN Task View gR http://cran.at.r-project.org/web/views/gR.html Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] foreach + dopar: how to check progress of parallel computations?
Dear expeRts, I use foreach to do parallel computations. Is it possible to have some progress output written while the computations are done? In the minimal example below, I just print a number ("n") to check the progress. If you run this example with "%do%" instead of "%dopar%", then the computations are done sequentially and the number n is printed to the console. I am looking for something similar but with %dopar%. In the minimal example you can see that n is not written to the console if the computations are done in parallel. How [with which construction] can I check the progress? Cheers, Marius ## load packages library(doSNOW) library(Rmpi) library(foreach) ## parameters param.1 <- 1:2 #c("a1", "b1") param.2 <- 1:4 #c("a2", "b2", "c2", "d2") ## setup cluster cl <- makeCluster(mpi.universe.size(), type ="MPI") registerDoSNOW(cl) ## main work n <- 1 res <- foreach(p1 = param.1) %:% foreach(p2 = param.2) %dopar% { print(n) p1 * p2 n <- n + 1 } stopCluster(cl) # stop cluster res # result __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.