Re: [R] Defining multiple variables in a loop
Taylor White wrote Good day, For lack of a better solution (or perhaps I am ignorant to something more elegant), I have been bootstrapping panel data by hand so to speak and I would like to know if there is a way to define multiple variables in a loop using the loop variable. I found a post (here: https://stat.ethz.ch/pipermail/r-help/2002-October/026305.html ) that discussed naming multiple variables but it didn't seem to allow me to define the variables as something more useful. I tried the code immediately below (plus some variations) and it just didn't work. for (i in 1:20) { assign(paste(country., i, sep = ) - subset(OECDFiscal2, Country == i) } Leaving aside the question whether this is a good way of doing what you want, this is the wrong syntax. ?assign will show you the correct syntax. assign(paste(country., i, sep = ) , subset(OECDFiscal2, Country == i)) Berend -- View this message in context: http://r.789695.n4.nabble.com/Defining-multiple-variables-in-a-loop-tp4634361p4634387.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop for multiple plots in figure
Hello, I have longitudinal data of the form below from N subjects; I am trying to create figure with N small subplots on a single page, in which each plot is from only one subject, and in each plot there is a separate curve for each value of param1. So in this case, there would be four plots on the page (one each for Bob, Steve, Kevin and Dave), and each plot would have two separate curves (one for param1 = 1 and one for param1 = 0). The main title of the plot should be the subject name. I also need to sort the order of the plots on the page by param2. I can do this with a small number of subjects using manual commands. For a larger number I know that a 'for loop' is called for, but can't figure out how to get each of the subjects to plot separately, could not figure it out from the existing posts. For now I want to do this in the basic environment though I know that lattice could also work (might try that later). Any help appreciated tC - textConnection( Subject XvarYvarparam1 param2 bob 9 100 1 100 bob 0 250 1 200 steve 2 454 1 50 bob -5 271 0 35 bob 3 10 0 74 steve 1 500 1 365 kevin 5 490 1 546 bob 8 855 0 76 dave2 233 0 343 steve -10 388 0 556 steve -7 284 1 388 dave3 568 1 555 kevin 4 247 0 57 bob 6 300 1 600 ) data - read.table(header=TRUE, tC) close.connection(tC) rm(tC) par(mfrow=c(2,2) -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] equality of values
Hello, Just as a complement, if you want to see a solution, and why. x - 3/5 y - 2/5 z - 1/5 # x - (y + z) # not zero x == y + z all.equal(x, y + z) # x - y - z # not zero x - y == z all.equal(x - y, z) # To see why it works: abs diff less than tolerance equal - function(x, y, tol=.Machine$double.eps) abs(x - y) tol equal(x, y + z) equal(x - y, z) Hope this helps, Rui Barradas Em 24-06-2012 23:14, Andreia Leite escreveu: Thanks a lot for your explanation! I see the point and I think I'll solve my question using a different method! Andreia Leite On Sun, Jun 24, 2012 at 10:53 PM, Oliver Ruebenacker cur...@gmail.comwrote: Hello, The FAQ answer 7.31 may be confusing to noobs and the referenced paper is long. The essence is: (1) It's not an R issue, it is a floating point (flop) arithmetic issue. (2) The issue is that flop arithmetic is almost always only approximate. (3) Therefore, asking for exact equality of two flops is almost always a mistake. (4) In simple cases, you can take the difference of two flops and ask whether it is good enough (5) For extended calculations, you need to keep in mind that errors propagate and accumulate. (6) Some ill-defined problems may lead to solutions that aren't solutions (e.g. trying to invert a singular matrix) (7) Some well-defined problems may defy a straight-forward approach (e.g. solving stiff differential equations) Take care Oliver On Sun, Jun 24, 2012 at 5:29 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Please read R FAQ 7.31, Why doesn't R think these numbers are equal? Sarah On Sun, Jun 24, 2012 at 1:58 PM, Andreia Leite andreiaheitorle...@gmail.com wrote: Dear list, I'm trying to do a loop where I should choose a cell of a data frame comparing it with another. The code actually works for the majority of the loop but for some reason it doesn't and I can't figure out why. When I tried to understand why I've noticed this: dif[11] [1] 118.8333 linhasUpdate[10,6] [1] 118.8333 dif[11]==linhasUpdate[10,6] [1] FALSE Even though the values are the same R says they aren't! This happens for some of the values (44) and for the others (128) it works fine. Does anybody why is this happening or a way to fix it? I' using 2.15.0. Thanks a lot! Andreia Leite -- -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant President and Founder of Knowomics (http://www.knowomics.com/wiki/Oliver_Ruebenacker) Consultant at Predictive Medicine (http://predmed.com/people/oliverruebenacker.html) SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for multiple plots in figure
Hi, Here's one approach: plot_one - function(d){ with(d, plot(Xvar, Yvar, t=n)) # set limits with(d[d$param1 == 0,], lines(Xvar, Yvar, lty=1)) # first line with(d[d$param1 == 1,], lines(Xvar, Yvar, lty=2)) # second line } par(mfrow=c(2,2)) plyr::d_ply(data, Subject, plot_one) HTH, b. On 24 June 2012 23:06, Marcel Curlin cemar...@u.washington.edu wrote: Hello, I have longitudinal data of the form below from N subjects; I am trying to create figure with N small subplots on a single page, in which each plot is from only one subject, and in each plot there is a separate curve for each value of param1. So in this case, there would be four plots on the page (one each for Bob, Steve, Kevin and Dave), and each plot would have two separate curves (one for param1 = 1 and one for param1 = 0). The main title of the plot should be the subject name. I also need to sort the order of the plots on the page by param2. I can do this with a small number of subjects using manual commands. For a larger number I know that a 'for loop' is called for, but can't figure out how to get each of the subjects to plot separately, could not figure it out from the existing posts. For now I want to do this in the basic environment though I know that lattice could also work (might try that later). Any help appreciated tC - textConnection( Subject Xvar Yvar param1 param2 bob 9 100 1 100 bob 0 250 1 200 steve 2 454 1 50 bob -5 271 0 35 bob 3 10 0 74 steve 1 500 1 365 kevin 5 490 1 546 bob 8 855 0 76 dave 2 233 0 343 steve -10 388 0 556 steve -7 284 1 388 dave 3 568 1 555 kevin 4 247 0 57 bob 6 300 1 600 ) data - read.table(header=TRUE, tC) close.connection(tC) rm(tC) par(mfrow=c(2,2) -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cannot use simple=TRUE in boot
I am doing boot with a large database, thus want to use simple=TRUE to reduce the memory used. I alreday set up sim=ordinary, stype=i , but I don't know how to set n=0. In fact, I don't know what does n=0 mean? For example, a-c(1:1000) b-c(2:1001) c-cbind(a,b) library(boot) table-function(c,indices){ d-c[indices,] e-mean(d[,1]) return(e) } boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE) AND, r give Error in statistic(data, original, ...) : unused argument(s) (simlpe = TRUE) Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axis in r plot
I have the graph plotted with x axis(-50 to 250) and y axis (-50 to 500).I need the x axis values(-50 to 250) with spacing between two tick marks as 1or 0.1.The graph should be wider to get enough resolution. -- View this message in context: http://r.789695.n4.nabble.com/axis-in-r-plot-tp4634199p4634391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance level (p) for t-value in package zelig
According to standard likelihood theory these are actually not t-values, but z-values, i.e., they asymptotically follow a standard normal distribution under the null hypothesis. This means that you could use pnorm instead of pt to get the p-values, but an easier solution is probably to use the clm-function (for Cumulative Link Models) from the ordinal package - here you get the p-values automatically. Cheers, Rune On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote: This advice is almost certainly false! A t-statistic can be calculated, but the distribution will not necessarily be student's t nor will the df be those of the rse. See, for example, rlm() in MASS, where values of the t-statistic are given without p values. If Brian Ripley says that p values cannot be straightforwardly calculated by pt(), then believe it! -- Bert On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote: Michael, Try ?pt Best Ozgur -- View this message in context: http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rune Haubo Bojesen Christensen PhD Student, M.Sc. Eng. Phone: (+45) 45 25 33 63 Mobile: (+45) 30 26 45 54 DTU Informatics, Section for Statistics Technical University of Denmark, Build. 305, Room 122, DK-2800 Kgs. Lyngby, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cannot use simple=TRUE in boot
Hello, Your code works with me, unchanged: boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE) boot ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = c, statistic = table, R = 100, sim = ordinary, stype = i, simple = TRUE) Bootstrap Statistics : original biasstd. error t1*500.5 0.762098.876232 But your error message shows a typo in 'simlpe' so, check your typing. Also, 'c', 'table' and 'boot' are really bad names for data objects or functions, they already are R functions. Hope this helps, Rui Barradas Em 25-06-2012 08:26, he chen escreveu: I am doing boot with a large database, thus want to use simple=TRUE to reduce the memory used. I alreday set up sim=ordinary, stype=i , but I don't know how to set n=0. In fact, I don't know what does n=0 mean? For example, a-c(1:1000) b-c(2:1001) c-cbind(a,b) library(boot) table-function(c,indices){ d-c[indices,] e-mean(d[,1]) return(e) } boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE) AND, r give Error in statistic(data, original, ...) : unused argument(s) (simlpe = TRUE) Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance level (p) for t-value in package zelig
On 25/06/2012 09:32, Rune Haubo wrote: According to standard likelihood theory these are actually not t-values, but z-values, i.e., they asymptotically follow a standard normal distribution under the null hypothesis. This means that you Whose 'standard'? It is conventional to call a value of t-like statistic (i.e. a ratio of the form value/standard error) a 't-value'. And that is nothing to do with 'likelihood theory' (t statistics predate the term 'likelihood'!). The separate issue is whether a t statistic is even approximately t-distributed (and if so, on what df?), and another is if it is asymptotically normal. For the latter you have to say what you mean by 'asymptotic': we have lost a lot of the context, but as this does not appear to be IID univariate observations: - 'standard likelihood theory' is unlikely to apply. - standard asymptotics may well not be a good approximation (in regression modelling, people tend to fit more complex models to large datasets, which is often why a large dataset was collected). - even for IID observations the derivation of the t distribution assumes normality. The difference between a t distribution and a normal distribution is practically insignificant unless the df is small. And if the df is small, one can rarely rely on the CLT for approximate normality could use pnorm instead of pt to get the p-values, but an easier solution is probably to use the clm-function (for Cumulative Link Models) from the ordinal package - here you get the p-values automatically. Cheers, Rune On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote: This advice is almost certainly false! A t-statistic can be calculated, but the distribution will not necessarily be student's t nor will the df be those of the rse. See, for example, rlm() in MASS, where values of the t-statistic are given without p values. If Brian Ripley says that p values cannot be straightforwardly calculated by pt(), then believe it! -- Bert On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote: Michael, Try ?pt Best Ozgur -- View this message in context: http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axis in r plot
Hi I have the graph plotted with x axis(-50 to 250) and y axis (-50 to 500).I need the x axis values(-50 to 250) with spacing between two tick marks as 1or 0.1.The graph should be wider to get enough resolution. For the first case you shall have some special display as you will need at least 15000 pixels wide screen. length(seq(-50,250,.1)) [1] 3001 I assume 4-5 pixels between ticks as minimum spacing. For the second case it is slightly better, however still your screen resolution shall be at least somewhere near 2000 pixels. length(seq(-50,250,1)) [1] 301 According to Wikipedia 27-30 inch HD display can probably cope with your request. Regards Petr -- View this message in context: http://r.789695.n4.nabble.com/axis-in-r- plot-tp4634199p4634391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replacing text with a carriage return
I have a comma separated data file with no carriage returns and what I'd like to do is 1. read the data as a block text 2. search for the string that starts each record record_start, and replace this with a carriage return. Replace will do, or just add a carriage return before it. The string is the same for each record, but it is enclosed in double quote marks in the file. 3. Write the results out as a csv file. Let's say file.text looked like this: record_start, data item 1, data item 2, record_start, data item 3, data item 4 and I wanted: ,data item 1, data item 2, data item 3, data item 4 text - readLines(file.txt,encoding=UTF-8) text - gsub(record_start, /n, text) write.csv(text, file2.csv) This gives me /n in the text file, enclosed in the double quotes that were there in the file around record_start already. Even if the double quotes weren't there though, I'm still not sure this would work. (BTW, I can live with the first incorrect comma in the output file because I can just remove it manually.) Can anyone suggest a solution? Thank you, Thomas Chesney This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?
On 12-06-24 4:50 PM, Søren Højsgaard wrote: Dear all, Indexing matrices from the Matrix package with [i,j] seems to be very slow. For example: library(rbenchmark) library(Matrix) mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20) MM- as(mm, Matrix) lookup- function(mat){ for (i in 1:nrow(mat)){ for (j in 1:ncol(mat)){ mat[i,j] } } } benchmark(lookup(mm), lookup(MM), columns=c(test, replications, elapsed, relative), replications=50) test replications elapsed relative 1 lookup(mm) 500.01 1 2 lookup(MM) 508.77 877 I would have expected a small overhead when indexing a matrix from the Matrix package, but this result is really surprising... Does anybody know if there are faster alternatives to [i,j] ? There's also a large overhead when indexing a dataframe, though Matrix appears to be slower. It's designed to work on whole matrices at a time, not single entries. So I'd suggest that if you need to use [i,j] indexing, then try to arrange your code to localize the access, and extract a submatrix as a regular fast matrix first. (Or if it will fit in memory, convert the whole thing to a matrix just for the access. If I just add the line mat - as.matrix(mat) at the start of your lookup function, it becomes several hundred times faster.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot with Log10 and base-exponent axis
Dear Mr Dunlap, Your solution works really fine. Thank you for your time, Best wishes, Luigi -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: 23 June 2012 18:48 To: Luigi Cc: 'Martin Maechler'; r-help@r-project.org Subject: RE: [R] Boxplot with Log10 and base-exponent axis To change the y-limits you need to add ylim=c(min,max) to the call to boxplot. The limits there should be in the units of the y variable (not its log10). par(usr)[3:4] report the y-limits on the log10 scale (they will be expanded a bit from your desired limits so the plot does not extend all the way to edge). boxplot(abs(tan(1:100)), log=y, ylim=c(.001, 1000)) par(usr)[3:4] [1] -3.24 3.24 10 ^ par(usr)[3:4] [1] 5.754399e-04 1.737801e+03 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Luigi [mailto:marongiu.lu...@gmail.com] Sent: Saturday, June 23, 2012 8:49 AM To: William Dunlap Cc: 'Martin Maechler'; r-help@r-project.org Subject: RE: [R] Boxplot with Log10 and base-exponent axis Thank you! This works good. I understand that the value are now in Log10 scale, although I did not understand what is happening at line 4 of your script. I would like to ask how can I change the y limits since they now depend on par(user). What if I'd like y limits extending from 1000 to 1 000 000? I've tried to modify ylim (line 3) and ceiling (line 4) but I obtained errors. Best wishes, Luigi ### UPDATED EXAMPLE # generationg random numbers x-runif(100, min=0, max=10) #plotting boxplot(x, log = y, yaxt=n) ylim - par(usr)[3:4] log10AtY - seq(ceiling(ylim[1]), floor(ylim[2])) axis(side=2, at=10^log10AtY, lab=as.expression(lapply(log10AtY, function(y)bquote(10^.(y) # -- Your Response The key is to supply an expression, not text, to the labels argument to axis. See help(plotmath) for details. Here is an example: 1 x - list(One=10^(sin(1:10)+5), Two=10^(cos(1:30)*2)) 2 boxplot(x, log=y, yaxt=n) 3 ylim - par(usr)[3:4] 4 log10AtY - seq(ceiling(ylim[1]), floor(ylim[2])) 5 axis(side=2, at=10^log10AtY, lab=as.expression(lapply(log10AtY, function(y)bquote(10^.(y) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Luigi Sent: Friday, June 22, 2012 7:54 AM To: r-help@r-project.org Subject: [R] Boxplot with Log10 and base-exponent axis Dear all, I would like to (i) produce boxplot graphs with axis in logarithm in base 10 and (ii) showing the values on the axis in 10^exponent format rather than 10E+exponent. To illustrate with an example, I have some widely spread data that I chart plot using boxplot() [figure on the left]; the log=y option of boxplot() I obtained the natural logarithm conversion of the data and the unfriendly notation baseE+exponent [figure on the centre]; if I log10 the data I obtain the desired plot, but the axis are showing only the exponent. [figure on the right]. Can anybody help? Best regards Luigi Marongiu, MSc ### EXAMPLE # generationg random numbers x-runif(100, min=0, max=10) # create plot par(mfrow = c(1,3)) #plotting in the left side boxplot(x, xlab=Linear values) #plotting in the centre boxplot(x, log = y, xlab=y axis logged) # creating log10 values and plotting on the right side Log.base10.x-log10(x) boxplot(Log.base10.x, xlab=LOG10 of data) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing text with a carriage return
Hello, Instead of replacing record_start with newlines, you can split the string by it. And use 'gsub' to make it prettier. x - readLines(test.txt) x y - gsub(\, , x) # remove the double quotes y - unlist(strsplit(y, record_start,)) # do the split, with comma # remove leading blanks and trailing blanks/commas y - gsub(^[[:blank:]]|[[:blank:]]$|,[[:blank:]]$, , y) # see the result and save it to file y writeLines(text=y, con=result.txt) Hope this helps, Rui Barradas Em 25-06-2012 10:17, Thomas escreveu: I have a comma separated data file with no carriage returns and what I'd like to do is 1. read the data as a block text 2. search for the string that starts each record record_start, and replace this with a carriage return. Replace will do, or just add a carriage return before it. The string is the same for each record, but it is enclosed in double quote marks in the file. 3. Write the results out as a csv file. Let's say file.text looked like this: record_start, data item 1, data item 2, record_start, data item 3, data item 4 and I wanted: ,data item 1, data item 2, data item 3, data item 4 text - readLines(file.txt,encoding=UTF-8) text - gsub(record_start, /n, text) write.csv(text, file2.csv) This gives me /n in the text file, enclosed in the double quotes that were there in the file around record_start already. Even if the double quotes weren't there though, I'm still not sure this would work. (BTW, I can live with the first incorrect comma in the output file because I can just remove it manually.) Can anyone suggest a solution? Thank you, Thomas Chesney This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cannot use simple=TRUE in boot
Dear Rui, Thanks. I check the spell in the code, no wrong (just spell wrong in the error message). After I download the newest 2.15 version, the code work fine with me, too. I guess it resulted from the old version (2.7.1) I used. 2012/6/25 Rui Barradas ruipbarra...@sapo.pt: Hello, Your code works with me, unchanged: boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE) boot ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = c, statistic = table, R = 100, sim = ordinary, stype = i, simple = TRUE) Bootstrap Statistics : original bias std. error t1* 500.5 0.76209 8.876232 But your error message shows a typo in 'simlpe' so, check your typing. Also, 'c', 'table' and 'boot' are really bad names for data objects or functions, they already are R functions. Hope this helps, Rui Barradas Em 25-06-2012 08:26, he chen escreveu: I am doing boot with a large database, thus want to use simple=TRUE to reduce the memory used. I alreday set up sim=ordinary, stype=i , but I don't know how to set n=0. In fact, I don't know what does n=0 mean? For example, a-c(1:1000) b-c(2:1001) c-cbind(a,b) library(boot) table-function(c,indices){ d-c[indices,] e-mean(d[,1]) return(e) } boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE) AND, r give Error in statistic(data, original, ...) : unused argument(s) (simlpe = TRUE) Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Best Regards, Yours, Chen He PhD Candidate Institute of Population Research Peking University, Beijing, China __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Very simple question R5 setRefClass and Initialize
Hi, New to this list, and glad I found R5 which si a great improvement to R for my work. Something seems odd anyway, when I do a setRefClass, the initialize method execute at the class definition. I would like it to execute only when an instance is created. How can I do that? I found no way to test if the code execution is when the class definition happens or when a $new instance is created. Thanks for your help. For example my class definition looks like this : mongoDbUser = setRefClass(mongoDbUser, fields = list( auth = logical, host = character, username = character, password = character, db = character, connector = mongo ), methods = list( initialize = function(auth,host,username,password,db,connector){ print(initialization) } ) ) If I execute this code, it says initialization, but it shouldn't, I created no instance, right?? I would like initialization to appear only when I do : mongoDbUser$new(...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fractional Factorial - Wrong values using lm-function
Hello. I'm a new user of R, and I have a question regarding the use of aov and lm-functions. I'm doing a fractional factorial experiment at our production site, and I need to familiarize myself with the analysis before I conduct the experiment. I've been working my way through the examples provided at http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm , but I can't get the results provided in the trial model calculations. Why is this. Here is how I have tried to do it: data.catapult=read.table(Fractional.txt,header=T) #Read the data in the table provided in the example. data.catapult Distanceh s b l e 1 28.00 3.25 0 1 0 80 2 99.00 4.00 10 2 2 62 3126.50 4.75 20 2 4 80 4126.50 4.75 0 2 4 45 5 45.00 3.25 20 2 4 45 6 35.00 4.75 0 1 0 45 7 45.00 4.00 10 1 2 62 8 28.25 4.75 20 1 0 80 9 85.00 4.75 0 1 4 80 10 8.00 3.25 20 1 0 45 1136.50 4.75 20 1 4 45 1233.00 3.25 0 1 4 45 1384.50 4.00 10 2 2 62 1428.50 4.75 20 2 0 45 1533.50 3.25 0 2 0 45 1636.00 3.25 20 2 0 80 1784.00 4.75 0 2 0 80 1845.00 3.25 20 1 4 80 1937.50 4.00 10 1 2 62 20 106.00 3.25 0 2 4 80 aov.catapult = aov(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult) summary(aov.catapult) Df Sum Sq Mean Sq F value Pr(F) h1 29092909 15.854 0.01638 * s1 19641964 10.701 0.03076 * b1 75377537 41.072 0.00305 ** l1 64906490 35.369 0.00401 ** e1 22972297 12.518 0.02406 * h:s 1122 122 0.667 0.45998 h:b 1345 345 1.878 0.24247 h:l 1354 354 1.929 0.23724 h:e 1 0 0 0.001 0.97578 s:b 1161 161 0.877 0.40199 s:l 1 20 20 0.107 0.75966 s:e 1114 114 0.622 0.47427 b:l 1926 926 5.049 0.08795 . b:e 1124 124 0.677 0.45689 l:e 1158 158 0.860 0.40623 Residuals4734 184 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 This seems just about right to me. However, when I attempt to make the linear model, based on main factors and two-factor interactions, I get a completely different result: lm.catapult = lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult) summary(lm.catapult) Call: lm(formula = Distance ~ h + s + b + l + e + h * s + h * b + h * l + h * e + s * b + s * l + s * e + b * l + b * e + l * e, data = data.catapult) Residuals: 1 2 3 4 5 6 7 8 9 10 -0.8100 22.3875 -3.6763 -3.8925 -3.8925 -0.8576 7.0852 -0.8100 -0.8100 -0.8576 11 12 13 14 15 16 17 18 19 20 -0.8576 -0.8576 7.8875 -3.8925 -3.8925 -3.6763 -3.6763 -0.8100 -0.4148 -3.6763 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 25.031042 100.791955 0.248 0.8161 h-3.687500 22.466457 -0.164 0.8776 s 0.475446 2.446791 0.194 0.8554 b -39.417973 44.906164 -0.878 0.4296 l -18.938988 12.233954 -1.548 0.1965 e-0.158449 1.230683 -0.129 0.9038 h:s -0.368750 0.451546 -0.817 0.4600 h:b 12.375000 9.030925 1.370 0.2425 h:l 3.135417 2.257731 1.389 0.2372 h:e 0.008333 0.258026 0.032 0.9758 s:b -0.634375 0.677319 -0.937 0.4020 s:l -0.055469 0.169330 -0.328 0.7597 s:e 0.015268 0.019352 0.789 0.4743 b:l 7.609375 3.386597 2.247 0.0879 . b:e 0.318397 0.387008 0.823 0.4569 l:e 0.089732 0.096760 0.927 0.4062 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 13.55 on 4 degrees of freedom Multiple R-squared: 0.9697, Adjusted R-squared: 0.8563 F-statistic: 8.545 on 15 and 4 DF, p-value: 0.02559 This result is nothing like the results provided in the example. Why is this? Any help is very much appreciated. Regards, Ståle. -- View this message in context: http://r.789695.n4.nabble.com/Fractional-Factorial-Wrong-values-using-lm-function-tp4634400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.spss function
Hi dear all, I am trying to import a dataset from SPSS into R (R x64 2.15.0). I used both read.spss(D:/Untitled2.sav, use.value.labels = TRUE, to.data.frame = TRUE, max.value.labels = Inf, trim.factor.names = FALSE,trim_values = TRUE, reencode = NA, use.missings = to.data.frame) read.spss(D:\\Untitled2.sav, to.data.frame = TRUE, stringsAsFactors = FALSE) but the result is: Error: could not find function read.spss What should I do to import data set into R? _ Best Regards Niloofar.Javanrouh MSc Of BioStatistics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using multiple cpu's - scaling in processing power
Hi All In the past I have worked with parallel processing in R where a function F is applied to the elements of a list L. The more cpu cores one has, the faster the process will run. At the time of launching the process for (F,L) I will have a certain fixed number of cpu's that I can use. I have tested this approach and it works fine (i.e. package 'multicore' , using 'mapply' ) But now I am encountering a slightly different situation. I have a task (F,L) that will run for a *long* time (a week) even if I have N cpu's processing it. N is the maximum possible number cpus that I can use, however they will not all be available when I start the process. So the problem is that, when I start the process, I may have only n1 N cpu's at my disposal. AFter some time, I then have n1 n2 N cpu's at my disposal. After some more time, I have n2 n3 N cpu's and finally, at one point, I will have N cpu's that I can work with. I scale in cpu power over the duration of the process. Why this is the case does not matter. Essentially I cannot control when new cpu's become available nor how many of them will become available at that point. With this I cannot use the standard approach above, where all the cpu cores have to be available before I launch the process !! It would help me if someone knew if R offered a solution for this type of processing. But I would also be happy for pointers to non-R resources that could deal with this. Thanks Soren - http://censix.com -- View this message in context: http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss function
you have to load the 'foriegn' package first. Sent from my iPad On Jun 25, 2012, at 5:12, niloo javan niloo_ja...@yahoo.com wrote: Hi dear all, I am trying to import a dataset from SPSS into R (R x64 2.15.0). I used both read.spss(D:/Untitled2.sav, use.value.labels = TRUE, to.data.frame = TRUE, max.value.labels = Inf, trim.factor.names = FALSE,trim_values = TRUE, reencode = NA, use.missings = to.data.frame) read.spss(D:\\Untitled2.sav, to.data.frame = TRUE, stringsAsFactors = FALSE) but the result is: Error: could not find function read.spss What should I do to import data set into R? _ Best Regards Niloofar.Javanrouh MSc Of BioStatistics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] special simbol (±) in a legend
dear reserachers, I am looking for a expression about a special symbol (±) in a Legend and add on front of Mean ± SD the symbol of bracket sorry if the example is not great disptest - matrix(rnorm(10),nrow=10) disptest.means- rowMeans(disptest) plot(1:10,disptest.means) dispersion(1:10,disptest.means,(disptest.means+disptest ),(disptest.means-disptest ),intervals=FALSE,arrow.cap=0.001,arrow.gap=0) legend(topleft, legend=c(Mean,Mean +/- SD), pch=c(1,NA), bty=n,cex=1.3) thanks for all suggestions Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply() on columns
I do now know how to navigate through the table. But my question is, what kind of graphical and numerical summaries can I make with keeping in mind that I am interested in the average working hour per employee? -- View this message in context: http://r.789695.n4.nabble.com/Apply-on-columns-tp4633468p4634411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] special simbol (±) in a legend
The posting guide asked for 'at a minimum' information, which you have not provided (nor do we know what graphics device this is). But on many OSes and devices simply use the Unicode character \u00b1. Or use plotmath (see ?plotmath). On 25/06/2012 12:03, gianni lavaredo wrote: dear reserachers, I am looking for a expression about a special symbol (±) in a Legend and add on front of Mean ± SD the symbol of bracket sorry if the example is not great disptest - matrix(rnorm(10),nrow=10) disptest.means- rowMeans(disptest) plot(1:10,disptest.means) dispersion(1:10,disptest.means,(disptest.means+disptest ),(disptest.means-disptest ),intervals=FALSE,arrow.cap=0.001,arrow.gap=0) legend(topleft, legend=c(Mean,Mean +/- SD), pch=c(1,NA), bty=n,cex=1.3) thanks for all suggestions Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply() on columns
I do now know how to navigate through the table. But my question is, what kind of graphical and numerical summaries can I make with keeping in mind that I am interested in the average working hour per employee? Rather vague question without data or code, so rather vague answer. For numerical values summarised according to some factor see ?aggregate ?by ?tapply For graphical summary maybe ?dotplot ?boxplot or maybe you could try package ggplot2 Regards Petr -- View this message in context: http://r.789695.n4.nabble.com/Apply-on- columns-tp4633468p4634411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply() on columns
Hello, It's not that complicated. f - function(index, dat, offset=5){ x - data.frame(Emp=dat[, index], Hours=dat[, index + offset]) aggregate(Hours~Emp, data=x, sum) } sp - read.table(supermarkt.txt) lapply(1:5, f, sp) Also, please provide context. (Quote the post you are reffering to.) Hope this helps, Rui Barradas Em 25-06-2012 11:43, faelsendoorn escreveu: I do now know how to navigate through the table. But my question is, what kind of graphical and numerical summaries can I make with keeping in mind that I am interested in the average working hour per employee? -- View this message in context: http://r.789695.n4.nabble.com/Apply-on-columns-tp4633468p4634411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply() on columns
Hello, again. Sorry, I've used 'sum' when it should be 'mean'. Rui Barradas Em 25-06-2012 12:49, Rui Barradas escreveu: Hello, It's not that complicated. f - function(index, dat, offset=5){ x - data.frame(Emp=dat[, index], Hours=dat[, index + offset]) aggregate(Hours~Emp, data=x, sum) } sp - read.table(supermarkt.txt) lapply(1:5, f, sp) Also, please provide context. (Quote the post you are reffering to.) Hope this helps, Rui Barradas Em 25-06-2012 11:43, faelsendoorn escreveu: I do now know how to navigate through the table. But my question is, what kind of graphical and numerical summaries can I make with keeping in mind that I am interested in the average working hour per employee? -- View this message in context: http://r.789695.n4.nabble.com/Apply-on-columns-tp4633468p4634411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Arules - predict function issues - subscript out of bounds
Hi, I'm doing a Market Basket Analysis in which I have a list of transaction id's in column 2 and transactions(product names) in column 1. I read this into a transaction file using a txn-read.transaction(file=data.csv,format='single', rm.duplicates=F, cols=c(1,2)) If I want to use the apriori algorithm everything seems to be running fine. However it is when I want to form cluster of these transactional patterns I find problems. I formed clusters using the following code: s-sample(txn,1000) d-dissimilarity(s, method=Jaccard) clustering-pam(d,k=5) But when I'm trying to predict this on the larger set it keeps throwing an subscript out of bound error Label-predict(s[clustering$mediods],txn,method=Jaccard) Can anyone explain to me why this keeps happening ?? I've tried this on other datasets like Groceries/ Adult in the arules package and it seems to work fine !! Thanks, Ankur [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using multiple cpu's - scaling in processing power
Hi Soren Have you looked into using the condor scheduler (http://research.cs.wisc.edu/condor/)? I'm not aware of it linking up with multicore or other parallel processing code inside R, but I've used it to run multiple R processes on a variable number of processors where N can both increase and decrease over time. hth On Mon, Jun 25, 2012 at 5:11 AM, soren wilkening m...@censix.com wrote: Hi All In the past I have worked with parallel processing in R where a function F is applied to the elements of a list L. The more cpu cores one has, the faster the process will run. At the time of launching the process for (F,L) I will have a certain fixed number of cpu's that I can use. I have tested this approach and it works fine (i.e. package 'multicore' , using 'mapply' ) But now I am encountering a slightly different situation. I have a task (F,L) that will run for a *long* time (a week) even if I have N cpu's processing it. N is the maximum possible number cpus that I can use, however they will not all be available when I start the process. So the problem is that, when I start the process, I may have only n1 N cpu's at my disposal. AFter some time, I then have n1 n2 N cpu's at my disposal. After some more time, I have n2 n3 N cpu's and finally, at one point, I will have N cpu's that I can work with. I scale in cpu power over the duration of the process. Why this is the case does not matter. Essentially I cannot control when new cpu's become available nor how many of them will become available at that point. With this I cannot use the standard approach above, where all the cpu cores have to be available before I launch the process !! It would help me if someone knew if R offered a solution for this type of processing. But I would also be happy for pointers to non-R resources that could deal with this. Thanks Soren - http://censix.com -- View this message in context: http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drew Tyre School of Natural Resources University of Nebraska-Lincoln 416 Hardin Hall, East Campus 3310 Holdrege Street Lincoln, NE 68583-0974 phone: +1 402 472 4054 fax: +1 402 472 2946 email: aty...@unl.edu http://snr.unl.edu/tyre http://aminpractice.blogspot.com http://www.flickr.com/photos/atiretoo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] do.call or something instead of for
Dear R users, I'd like to compute X like below. X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t} where W_{i,t} are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0} Of course, I can do this with for statement, but I don't think it's good idea because i and t are too big. So, my question is that Is there any better idea to avoid for statement for this problem? Thank you in advance. Kathie -- View this message in context: http://r.789695.n4.nabble.com/do-call-or-something-instead-of-for-tp4634421.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tune.svm (e1071) takes forever
Hi, I am in the process of migrating my machine learning scripts from bash + libsvm to R and for the past weeks I have been battling a substantial problem when using the tune.svm function from the e1071 package. The tune.svm function that performs the grid-search for the best cost and gamma parameters takes really long time compared to the grid search provided by the libsvm library. The tune.svm takes 10 times longer than the grid search from libsvm. For a very large data set this is too slow. I would like to ask if there is any way to speed up the grid search. Here is the function call I use: tune.obj - tune.svm( formula, data=dataset, cost = 10^(0:2), gamma = 10^(-6:-4) ) Ideally I would like to search for cost 10^(0:3) and gamma = 10^(-6:-3) however that takes even longer. Thanks, Aram. -- View this message in context: http://r.789695.n4.nabble.com/tune-svm-e1071-takes-forever-tp4634415.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very simple question R5 setRefClass and Initialize
Help anyone? Is this impossible? I tried to use the initFields method but it did not change anything. Le 25/06/2012 10:55, Julien Cochennec a écrit : Hi, New to this list, and glad I found R5 which si a great improvement to R for my work. Something seems odd anyway, when I do a setRefClass, the initialize method execute at the class definition. I would like it to execute only when an instance is created. How can I do that? I found no way to test if the code execution is when the class definition happens or when a $new instance is created. Thanks for your help. For example my class definition looks like this : mongoDbUser = setRefClass(mongoDbUser, fields = list( auth = logical, host = character, username = character, password = character, db = character, connector = mongo ), methods = list( initialize = function(auth,host,username,password,db,connector){ print(initialization) } ) ) If I execute this code, it says initialization, but it shouldn't, I created no instance, right?? I would like initialization to appear only when I do : mongoDbUser$new(...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
Hello. Thank you for the help. However, I'm not sure your reply answers my question. Let me rephrase: I'm trying to reproduce the values in the second table in the example on http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm. The table shows the summary of the linear model, which are the values I'm trying to reproduce, using the input in the example. When I use the lm-function on the data, I get values completely different from those given in the example (I've provided these values in my first post). Obviously I'm missing something - why can't I reproduce the values in the example using: lm.catapult = lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult) summary(lm.catapult) ? I hope this was clearer. Regards, Ståle Nordås -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: 25. juni 2012 13:50 To: Ståle Nordås Cc: R help Subject: Re: [R] Fractional Factorial - Wrong values using lm-function Hi, You need to use, anova(lm.catapult) Analysis of Variance Table Response: Distance Df Sum Sq Mean Sq F value Pr(F) h 1 2909.3 2909.3 15.8538 0.016378 * s 1 1963.6 1963.6 10.7005 0.030755 * b 1 7536.9 7536.9 41.0720 0.003046 ** l 1 6490.3 6490.3 35.3687 0.004010 ** e 1 2297.0 2297.0 12.5177 0.024056 * h:s 1 122.4 122.4 0.6669 0.459978 h:b 1 344.6 344.6 1.8777 0.242467 h:l 1 353.9 353.9 1.9286 0.237236 h:e 1 0.2 0.2 0.0010 0.975783 s:b 1 161.0 161.0 0.8772 0.401991 s:l 1 19.7 19.7 0.1073 0.759658 s:e 1 114.2 114.2 0.6225 0.474270 b:l 1 926.4 926.4 5.0486 0.087946 . b:e 1 124.2 124.2 0.6769 0.456887 l:e 1 157.8 157.8 0.8600 0.406226 Residuals 4 734.0 183.5 #the summary result you got is the summary of linear model, while the summary of aov is the anova summary. A.K. - Original Message - From: Staleno s...@bergen-plastics.no To: r-help@r-project.org Cc: Sent: Monday, June 25, 2012 5:26 AM Subject: [R] Fractional Factorial - Wrong values using lm-function Hello. I'm a new user of R, and I have a question regarding the use of aov and lm-functions. I'm doing a fractional factorial experiment at our production site, and I need to familiarize myself with the analysis before I conduct the experiment. I've been working my way through the examples provided at http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm , but I can't get the results provided in the trial model calculations. Why is this. Here is how I have tried to do it: data.catapult=read.table(Fractional.txt,header=T) #Read the data in the table provided in the example. data.catapult Distance h s b l e 1 28.00 3.25 0 1 0 80 2 99.00 4.00 10 2 2 62 3 126.50 4.75 20 2 4 80 4 126.50 4.75 0 2 4 45 5 45.00 3.25 20 2 4 45 6 35.00 4.75 0 1 0 45 7 45.00 4.00 10 1 2 62 8 28.25 4.75 20 1 0 80 9 85.00 4.75 0 1 4 80 10 8.00 3.25 20 1 0 45 11 36.50 4.75 20 1 4 45 12 33.00 3.25 0 1 4 45 13 84.50 4.00 10 2 2 62 14 28.50 4.75 20 2 0 45 15 33.50 3.25 0 2 0 45 16 36.00 3.25 20 2 0 80 17 84.00 4.75 0 2 0 80 18 45.00 3.25 20 1 4 80 19 37.50 4.00 10 1 2 62 20 106.00 3.25 0 2 4 80 aov.catapult = aov(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=da ta.catapult) summary(aov.catapult) Df Sum Sq Mean Sq F value Pr(F) h 1 2909 2909 15.854 0.01638 * s 1 1964 1964 10.701 0.03076 * b 1 7537 7537 41.072 0.00305 ** l 1 6490 6490 35.369 0.00401 ** e 1 2297 2297 12.518 0.02406 * h:s 1 122 122 0.667 0.45998 h:b 1 345 345 1.878 0.24247 h:l 1 354 354 1.929 0.23724 h:e 1 0 0 0.001 0.97578 s:b 1 161 161 0.877 0.40199 s:l 1 20 20 0.107 0.75966 s:e 1 114 114 0.622 0.47427 b:l 1 926 926 5.049 0.08795 . b:e 1 124 124 0.677 0.45689 l:e 1 158 158 0.860 0.40623 Residuals 4 734 184 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 This seems just about right to me. However, when I attempt to make the linear model, based on main factors and two-factor interactions, I get a completely different result: lm.catapult = lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=dat a.catapult) summary(lm.catapult) Call: lm(formula = Distance ~ h + s + b + l + e + h * s + h * b + h * l + h * e + s * b + s * l + s * e + b * l + b * e + l * e, data = data.catapult) Residuals: 1 2 3 4 5 6 7 8 9 10 -0.8100 22.3875 -3.6763 -3.8925
Re: [R] do.call or something instead of for
Hi: Use arima.sim because when you have recursive relationships like that, there's no way to not loop. If arima.sim doesn't allow for the trend piece (0.1*t ), then you can do that part seperately using cumsum(0.1*(1:t)) and then add it back in to what arima.sim gives you. I don't remember if arima.sim allows for a trend. It might. Mark On Mon, Jun 25, 2012 at 8:39 AM, Kathie kathryn.lord2...@gmail.com wrote: Dear R users, I'd like to compute X like below. X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t} where W_{i,t} are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0} Of course, I can do this with for statement, but I don't think it's good idea because i and t are too big. So, my question is that Is there any better idea to avoid for statement for this problem? Thank you in advance. Kathie -- View this message in context: http://r.789695.n4.nabble.com/do-call-or-something-instead-of-for-tp4634421.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do.call or something instead of for
Hi Kathryn: I'm sorry because I didn't read your question carefully enough. arima.sim won't help because you don't have a normal error term. I think you have to loop unless someone else knows of a way that I'm not aware of. good luck. On Mon, Jun 25, 2012 at 8:39 AM, Kathie kathryn.lord2...@gmail.com wrote: Dear R users, I'd like to compute X like below. X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t} where W_{i,t} are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0} Of course, I can do this with for statement, but I don't think it's good idea because i and t are too big. So, my question is that Is there any better idea to avoid for statement for this problem? Thank you in advance. Kathie -- View this message in context: http://r.789695.n4.nabble.com/do-call-or-something-instead-of-for-tp4634421.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Possible bug in is.na.data.frame(): input without columns
On 06/20/2012 05:36 PM, Mikko Korpela wrote: Hello list! Let's construct a matrix / data.frame with 0 columns, but 0 rows, and non-NULL rownames. Then, call is.na() on both the data.frame and the matrix. We find that is.na.data.frame() gives an error. When row.names are removed, is.na.data.frame() returns NULL. I think that the NULL result is also wrong. From ?is.na: The method ‘is.na.data.frame’ returns a logical matrix with the same dimensions as the data frame, and with dimnames taken from the row and column names of the data frame. No problems are seen when is.na() is run on the matrix. What do you think, should a formal bug report be filed? Bug report filed to R bugzilla. -- Mikko Korpela Aalto University School of Science Department of Information and Computer Science __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questions about doing analysis based on time
Arrgh yes I did mean dput(head(mydata, 100)). Thanks for catching it. John Kane Kingston ON Canada -Original Message- From: michael.weyla...@gmail.com Sent: Fri, 22 Jun 2012 14:25:30 -0500 To: jrkrid...@inbox.com Subject: Re: [R] Questions about doing analysis based on time On Fri, Jun 22, 2012 at 2:18 PM, John Kane jrkrid...@inbox.com wrote: Hi and welcome to the R-help list. It would be much better for readers to get your data in a more easily used format. There is a function called dput() that will output your data in a way that R can read easily. We don't need to see all the data but perhaps hundred lines of it would be nice. Try this where your file is called mydata # just copy the line below and paste into R head(mydata, 100) I think you mean dput(head(mydata, 100)) OP: Once you put this up I'll give more reply, but for now I'd suggest you try to put your data in a proper time series class (zoo/xts if I might give a personal-ish plug) which will make all these calculations much easier. Best, Michael # Now copy the output and paste it into your wordprocess as a reply to the list and we will have decent data to work with. John Kane Kingston ON Canada -Original Message- From: mikeedinge...@gmail.com Sent: Fri, 22 Jun 2012 09:21:40 -0700 (PDT) To: r-help@r-project.org Subject: [R] Questions about doing analysis based on time I have a spreadsheet that I've read into R using read.csv. I've also attached it. It looks like this (except there are 1600+ entries): Sunday SunDate SunTime SunScore 1 5/9/2010 0:00 0:00 127 2 6/12/2011 0:00 0:00 125 3 6/15/2008 0:04 0:04 98 4 8/3/2008 0:07 0:07 118 5 7/24/2011 0:07 0:07 122 6 5/25/2008 0:09 0:09 104 7 5/20/2012 0:11 0:11 124 8 10/18/2009 0:12 0:12 121 9 3/14/2010 0:12 0:12 117 10 1/2/2011 0:12 0:12 131 SunDate and SunTime are both factors. In order to change the class to something I can work with, I use the following: Sunday$SunTime-as.POSIXlt(SunTime,tz=””,”%H:%M”) Sunday$SunDate-as.POSIXlt(SunDate,tz=””,”%m/%d/%Y %H:%M”) Now, the str(Sunday) command yields: 'data.frame': 1644 obs. of 3 variables: $ SunDate : POSIXlt, format: 2010-05-09 00:00:00 2011-06-12 00:00:00 ... $ SunTime : POSIXlt, format: 2012-06-18 00:00:00 2012-06-18 00:00:00 ... $ SunScore: int 127 125 98 118 122 104 124 121 117 131 ... I think all the elements in Sunday are correct for me to do what I want to do, but I don't know how to do them. 1. How can I get the mean score by hour? For example, I want the mean score of all the entries between 0:00 and 0:59, then 1:00 and 1:59, etc. 2. Is it possible for me to create a histogram by hour for each score over a certain point? For example, I want to make a histogram of all scores above 140 by the hour they occurred in. Is that possible? These last few might not be possibe (at least with R), but I'll ask anyway. I've got another data set similar to the one above, except it's got 12,000 entries over four years. If I do the same commands as above to turn Date and Time into POSIXlt, is it possible for me to do the following: 1. The data was recorded at irregular intervals, and the difference between recorded points can range from anywhere between 1 hour and up to 7. Is it possible, when data isn't recorded between two points, to insert the hours that are unrecorded along with the average of what that hour is. This is sort of a pre-requisite for the next two. 2. If one of the entries has a Score above a certain point, is it possible to determine how long it was above that point and determine the mean for all the instances this occurred. For example: 01/01/11 01:00 AM 101 01/01/11 02:21 AM 142 01/01/11 03:36 AM 156 01/01/11 04:19 AM 130 01/01/11 05:12 AM 146 01/01/11 06:49 AM 116 01/01/11 07:09 AM 111 There are two spans where it's above 140. The two and three o'clock hours, and the 5 o'clock hour. So the mean time would be 1.5 hours. Is it possible for R to do this over a much larger time period? 3. If a score reaches a certain point, is it possible for R to determine the average time between that and when the score reaches another point. For example: 01/01/11 01:01 AM 101 01/01/11 02:21 AM 121 01/01/11 03:14 AM 134 01/01/11 04:11 AM 149 01/01/11 05:05 AM 119 01/01/11 06:14 AM 121 01/01/11 07:19 AM 127 01/01/11 08:45 AM 134 01/01/11 09:11 AM 142 01/01/11 10:10 AM 131 The score goes above 120 during the 2 AM hour and doesn't go above 140 until the 4 AM hour. Then it goes above 120 again in the 6 AM hour, but doesn't go above 140 until the 9 AM hour. So the average time to go from 120 to 140 is 2.5 hours. Can R does this over a much larger time frame? If anyone knows
Re: [R] Questions about doing analysis based on time
On Fri, Jun 22, 2012 at 3:45 PM, APOCooter mikeedinge...@gmail.com wrote: [snip and rearrange] try to put your data in a proper time series class (zoo/xts if I might give a personal-ish plug) which will make all these calculations much easier. I thought that's what I was doing with as.POSIXlt? as.POSIXlt converts your time stamps to one of many formats (though it's really probably better to use as.POSIXct instead) -- zoo/xts objects are containers for time series data and provide you with methods/wrappers to do many common time-series style calculations. E.g., rollapply() or period.apply() Best, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do.call or something instead of for
On Mon, Jun 25, 2012 at 05:39:45AM -0700, Kathie wrote: Dear R users, I'd like to compute X like below. X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t} where W_{i,t} are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0} Of course, I can do this with for statement, but I don't think it's good idea because i and t are too big. So, my question is that Is there any better idea to avoid for statement for this problem? Hi. The recurrence does not use index i. So, it is possible to vectorize over i. Try the following. m - 10 # bound on i n - 6 # bound on tt W - matrix(runif(m*n, max=2), nrow=m, ncol=n) X - matrix(nrow=m, ncol=n) X[, 1] - 1 + 5*W[, 1] for (tt in seq.int(from=2, length=n-1)) { X[, tt] - 0.1*(tt-1) + 2*X[, tt-1] + W[, tt] } The code uses tt instead of t to avoid confusion with the transpose function. If n is always at least 2, then seq.int(from=2, length=n-1) may be replaced by 2:n. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do.call or something instead of for
thanks peter. I was thinking more about t but you're right in that there's an i there also. my bad ( twice ). On Mon, Jun 25, 2012 at 9:37 AM, Petr Savicky savi...@cs.cas.cz wrote: On Mon, Jun 25, 2012 at 05:39:45AM -0700, Kathie wrote: Dear R users, I'd like to compute X like below. X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t} where W_{i,t} are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0} Of course, I can do this with for statement, but I don't think it's good idea because i and t are too big. So, my question is that Is there any better idea to avoid for statement for this problem? Hi. The recurrence does not use index i. So, it is possible to vectorize over i. Try the following. m - 10 # bound on i n - 6 # bound on tt W - matrix(runif(m*n, max=2), nrow=m, ncol=n) X - matrix(nrow=m, ncol=n) X[, 1] - 1 + 5*W[, 1] for (tt in seq.int(from=2, length=n-1)) { X[, tt] - 0.1*(tt-1) + 2*X[, tt-1] + W[, tt] } The code uses tt instead of t to avoid confusion with the transpose function. If n is always at least 2, then seq.int(from=2, length=n-1) may be replaced by 2:n. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
Staleno: As always, you need to read the Help file carefully. From ?aov: The main difference from lm is in the way print, summary and so on handle the fit: this is expressed in the traditional language of the analysis of variance rather than that of linear models. summary.aov() computes sequential ss. lm() uses the t-statistics for the estimated coefficients. They are not the same if non-orthogonal contrasts are used. Of course, the coefficients and fits **are** identical. If you don't know what this means, consult a statistician or linear models references. -- Bert On Mon, Jun 25, 2012 at 2:26 AM, Staleno s...@bergen-plastics.no wrote: Hello. I'm a new user of R, and I have a question regarding the use of aov and lm-functions. I'm doing a fractional factorial experiment at our production site, and I need to familiarize myself with the analysis before I conduct the experiment. I've been working my way through the examples provided at http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm , but I can't get the results provided in the trial model calculations. Why is this. Here is how I have tried to do it: data.catapult=read.table(Fractional.txt,header=T) #Read the data in the table provided in the example. data.catapult Distanceh s b l e 1 28.00 3.25 0 1 0 80 2 99.00 4.00 10 2 2 62 3126.50 4.75 20 2 4 80 4126.50 4.75 0 2 4 45 5 45.00 3.25 20 2 4 45 6 35.00 4.75 0 1 0 45 7 45.00 4.00 10 1 2 62 8 28.25 4.75 20 1 0 80 9 85.00 4.75 0 1 4 80 10 8.00 3.25 20 1 0 45 1136.50 4.75 20 1 4 45 1233.00 3.25 0 1 4 45 1384.50 4.00 10 2 2 62 1428.50 4.75 20 2 0 45 1533.50 3.25 0 2 0 45 1636.00 3.25 20 2 0 80 1784.00 4.75 0 2 0 80 1845.00 3.25 20 1 4 80 1937.50 4.00 10 1 2 62 20 106.00 3.25 0 2 4 80 aov.catapult = aov(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult) summary(aov.catapult) Df Sum Sq Mean Sq F value Pr(F) h1 29092909 15.854 0.01638 * s1 19641964 10.701 0.03076 * b1 75377537 41.072 0.00305 ** l1 64906490 35.369 0.00401 ** e1 22972297 12.518 0.02406 * h:s 1122 122 0.667 0.45998 h:b 1345 345 1.878 0.24247 h:l 1354 354 1.929 0.23724 h:e 1 0 0 0.001 0.97578 s:b 1161 161 0.877 0.40199 s:l 1 20 20 0.107 0.75966 s:e 1114 114 0.622 0.47427 b:l 1926 926 5.049 0.08795 . b:e 1124 124 0.677 0.45689 l:e 1158 158 0.860 0.40623 Residuals4734 184 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This seems just about right to me. However, when I attempt to make the linear model, based on main factors and two-factor interactions, I get a completely different result: lm.catapult = lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult) summary(lm.catapult) Call: lm(formula = Distance ~ h + s + b + l + e + h * s + h * b + h * l + h * e + s * b + s * l + s * e + b * l + b * e + l * e, data = data.catapult) Residuals: 1 2 3 4 5 6 7 8 9 10 -0.8100 22.3875 -3.6763 -3.8925 -3.8925 -0.8576 7.0852 -0.8100 -0.8100 -0.8576 11 12 13 14 15 16 17 18 19 20 -0.8576 -0.8576 7.8875 -3.8925 -3.8925 -3.6763 -3.6763 -0.8100 -0.4148 -3.6763 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 25.031042 100.791955 0.248 0.8161 h-3.687500 22.466457 -0.164 0.8776 s 0.475446 2.446791 0.194 0.8554 b -39.417973 44.906164 -0.878 0.4296 l -18.938988 12.233954 -1.548 0.1965 e-0.158449 1.230683 -0.129 0.9038 h:s -0.368750 0.451546 -0.817 0.4600 h:b 12.375000 9.030925 1.370 0.2425 h:l 3.135417 2.257731 1.389 0.2372 h:e 0.008333 0.258026 0.032 0.9758 s:b -0.634375 0.677319 -0.937 0.4020 s:l -0.055469 0.169330 -0.328 0.7597 s:e 0.015268 0.019352 0.789 0.4743 b:l 7.609375 3.386597 2.247 0.0879 . b:e 0.318397 0.387008 0.823 0.4569 l:e 0.089732 0.096760 0.927 0.4062 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 13.55 on 4 degrees of freedom Multiple R-squared: 0.9697, Adjusted R-squared: 0.8563 F-statistic: 8.545 on 15 and 4 DF, p-value: 0.02559 This result is nothing like the results provided in the example. Why is this? Any help is very much appreciated. Regards, Ståle. -- View
Re: [R] Fractional Factorial - Wrong values using lm-function
They are coding the variables as factors and using orthogonal polynomial contrasts. This: data.catapult - data.frame(data.catapult$Distance, do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T))) contrasts(data.catapult$h) - contrasts(data.catapult$s) - contrasts(data.catapult$l) - contrasts(data.catapult$e) - contr.poly(3, contrasts=F) contrasts(data.catapult$b) - contr.poly(2, contrasts=F) lm1 - lm(Distance ~ .^2, data=data.catapult) summary(lm1) gets you closer (same intercept at least), but I can't explain the remaining differences. I'm not even sure why the results to look like they do (interaction terms like a*b not a:b and one level for each interaction). Hope that helps, Simon Knapp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 112, Issue 25
While lm() is a linear modeling, the constraints make it easier to solve with a nonlinear tool. Both my packages Rvmmin and Rcgmin (I recommend the R-forge versions as more up-to-date) have bounds constraints and masks i.e., fixed parameters. I am actually looking for example problems of this type that are more recent than the ones that got me into this 30 years ago. Do contact me off-list if you have something that could be shared. I'd also welcome discussion on appropriate tools for such constrained linear modeling problems. They are, I believe, more or less present in most linear modeling situations, but we rarely impose the constraints explicitly, and tend to use lm() and (hopefully) check if the solution obeys the conditions. Best, John Nash On 06/25/2012 06:00 AM, r-help-requ...@r-project.org wrote: Message: 5 Date: Sun, 24 Jun 2012 03:34:10 -0700 (PDT) From: rgoodman rosa.good...@gmail.com To: r-help@r-project.org Subject: Re: [R] Constrained coefficients in lm (correction) Message-ID: 1340534050627-4634321.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hi Jorge, Did you ever figure this out? I want to do the same thing with the additional constraint of the coef for x1 = 2. lm(Y~offset(2*x1)+x2+x3,data=mydata) where b= coeff for x2, c = coeff for x3, b+c=1 and b and c0. I've loaded the systemfit package, but the suggestion R*beta0 = q, where R is R.restr and q is q.restr in the function call makes no sense to me. Cheers, Rosie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
... but this is tantalisingly close: dat1 - with(data.catapult, data.frame( Distance, h=C(h, poly, 1), s=C(s, poly, 1), l=C(l, poly, 1), e=C(e, poly, 1), b=C(b, poly, 1) ) ) lm4 - lm(Distance ~ .^2, data = dat1) summary(lm4) ... wish I knew what it meant. On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote: They are coding the variables as factors and using orthogonal polynomial contrasts. This: data.catapult - data.frame(data.catapult$Distance, do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T))) contrasts(data.catapult$h) - contrasts(data.catapult$s) - contrasts(data.catapult$l) - contrasts(data.catapult$e) - contr.poly(3, contrasts=F) contrasts(data.catapult$b) - contr.poly(2, contrasts=F) lm1 - lm(Distance ~ .^2, data=data.catapult) summary(lm1) gets you closer (same intercept at least), but I can't explain the remaining differences. I'm not even sure why the results to look like they do (interaction terms like a*b not a:b and one level for each interaction). Hope that helps, Simon Knapp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
and finally... thingy - function(x) { x - C(x, poly, 1) tmp - contrasts(x) contrasts(x, 1) - 2 * tmp / sum(abs(tmp)) x } dat2 - with(data.catapult, data.frame( Distance, h=thingy(h), s=thingy(s), l=thingy(l), e=thingy(e), b=thingy(b) ) ) lm5 - lm(Distance ~ .^2, data = dat2) summary(lm5) On Tue, Jun 26, 2012 at 12:35 AM, Simon Knapp sleepingw...@gmail.com wrote: ... but this is tantalisingly close: dat1 - with(data.catapult, data.frame( Distance, h=C(h, poly, 1), s=C(s, poly, 1), l=C(l, poly, 1), e=C(e, poly, 1), b=C(b, poly, 1) ) ) lm4 - lm(Distance ~ .^2, data = dat1) summary(lm4) ... wish I knew what it meant. On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote: They are coding the variables as factors and using orthogonal polynomial contrasts. This: data.catapult - data.frame(data.catapult$Distance, do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T))) contrasts(data.catapult$h) - contrasts(data.catapult$s) - contrasts(data.catapult$l) - contrasts(data.catapult$e) - contr.poly(3, contrasts=F) contrasts(data.catapult$b) - contr.poly(2, contrasts=F) lm1 - lm(Distance ~ .^2, data=data.catapult) summary(lm1) gets you closer (same intercept at least), but I can't explain the remaining differences. I'm not even sure why the results to look like they do (interaction terms like a*b not a:b and one level for each interaction). Hope that helps, Simon Knapp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?
Martin, I had a nagging feeling that there must be more to this than my previous reply suggested, and have been digging a bit further. Basically I would expect these p-values to not be great if you had nested random effects (such as main effects + their interaction), but your case looked rather straightforward... After some digging it turns out that the p-value for Placename is wrong in this case, as you suspected. R routine `qr' has a `tol' argument that by default is set to 1e-7. In the computation of the test statistic for re terms I had called qr without changing this default to the value 0 that is actually appropriate (I should have noticed this, I've been caught out by it before). With the correct 'tol', Placement is non-significant. This will be fixed in the next release, but that won't happen until I've tested the random effects p-value approximations a bit more thoroughly. best, Simon On 11/06/12 14:56, Martijn Wieling wrote: Dear Simon, I ran an additional analysis using bam (mgcv 1.7-17) with three random intercepts and no non-linearities, and compared these to the results of lmer (lme4). Using bam results in a significant random intercept (even though it has a very low edf-value), while the lmer results show no variance associated to the random intercept of Placename. Should I drop the random intercept of Placename and if so, how is this apparent from the results of bam? Summaries of both models are shown below. With kind regards, Martijn l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) + (1|Placename), data=wrddst); print(l1,cor=F) Linear mixed model fit by REML Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1 | Placename) Data: wrddst AICBIC logLik deviance REMLdev -44985 -44927 22498 -45009 -44997 Random effects: GroupsNameVariance Std.Dev. Word (Intercept) 0.0800944 0.283009 Key (Intercept) 0.0013641 0.036933 Placename (Intercept) 0.000 0.00 Residual 0.0381774 0.195390 Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40 Fixed effects: Estimate Std. Error t value (Intercept) -0.003420.01513 -0.23 geogamfit0.992490.02612 37.99 m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs=re) + s(Key,bs=re) + s(Placename,bs=re), data=wrddst,method=REML); summary(m1,freq=F) Family: gaussian Link function: identity Formula: RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = re) + s(Key, bs = re) + s(Placename, bs = re) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.003420.01513 -0.2260.821 geogamfit0.992490.02612 37.9912e-16 *** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Approximate significance of smooth terms: edf Ref.df F p-value s(Word) 3.554e+02347 634.7162e-16 *** s(Key) 2.946e+02316 23.0542e-16 *** s(Placename) 1.489e-04 38 7.2822e-16 *** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 R-sq.(adj) = 0.693 Deviance explained = 69.4% REML score = -22498 Scale est. = 0.038177 n = 112608 On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R] ml-node+s789695n4631060...@n4.nabble.com wrote: Having looked at this further, I've made some changes in mgcv_1.7-17 to the p-value computations for terms that can be penalized to zero during fitting (e.g. s(x,bs=re), s(x,m=1) etc). The Wald statistic based p-values from summary.gam and anova.gam (i.e. what you get from e.g. anova(a) where a is a fitted gam object) are quite well founded for smooth terms that are non-zero under full penalization (e.g. a cubic spline is a straight line under full penalization). For such smooths, an extension of Nychka's (1988) result on CI's for splines gives a well founded distributional result on which to base a Wald statistic. However, the Nychka result requires the smoothing bias to be substantially less than the smoothing estimator variance, and this will often not be the case if smoothing can actually penalize a term to zero (to understand why, see argument in appendix of Marra Wood, 2012, Scandinavian Journal of Statistics, 39,53-74). Simulation testing shows that this theoretical concern has serious practical consequences. So for terms that can be penalized to zero, alternative approximations have to be used, and these are now implemented in mgcv_1.7-17 (see ?summary.gam). The approximate test performed by anova(a,b) (a and b are fitted gam objects) is less well founded. It is a reasonable approximation when each smooth term in the models could in principle be well approximated by an unpenalized term of rank approximately equal to the edf of the smooth term, but otherwise the p-values produced are likely to be much too small. In particular simulation testing suggests that the test is not to be trusted with s(...,bs=re) terms, and can be poor if the models
[R] compare between mixdist models
Hi all! Is there any way to compare mixdist models with a different number of components using AIC or BIC? I'm looking for some similar functionality as in flexmix, but not sure which to stick with: mixdist or flexmix.. Yakir Gagnon cell+1 919 886 3877 office +1 919 684 7188 Johnsen Lab Biology Department Box 90338 Duke University Durham, NC 27708 BioSci Building Room 307 http://fds.duke.edu/db/aas/Biology/postdoc/yg32 http://www.biology.duke.edu/johnsenlab/people/yakir.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using multiple cpu's - scaling in processing power
Thanks Drew a first glance at Condor tells me that this will very likely do what I need in terms of flexible allocation of (cpu) resources. Unfortunately there is not much R integration so I will need to see if setting up Condor and submitting R jobs to it is worth the effort. Cheers Soren - http://censix.com -- View this message in context: http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405p4634438.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tight Axes in Prepanel Function
How do I specify a tight y-axis, where the plot completely fills the y-axis range, inside the prepanel function? For example, consider the following code: require(lattice) set.seed(12345) x - 1:1000 y - cumsum(rnorm(length(x))) prepanel.test - function(x, y, groups = NULL, subscripts = NULL, ...) { if (is.null(groups)) { result - list(ylim = range(y)) } else { result - list(ylim = range(y, finite = TRUE)) } return (result) } print(xyplot(y~x, prepanel = prepanel.test, type = 'l', grid = TRUE), split = c(1,1,1,2), more = TRUE) print(xyplot(y~x, ylim = range(y, finite = TRUE), type = 'l', grid = TRUE), split = c(1,2,1,2)) The top plot has extra whitespace at the top and bottom. Is there any way to eliminate this without having to specify 'ylim' directly in the call the 'xyplot'? (The reason I want to include this in the prepanel function is that I want to add conditioning and use the combineLimits function in the latticeExtra package.) Thanks. - Elliot [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using multiple cpu's - scaling in processing power
Oh I was wrong. R package 'GridR' seems to be what one needs to run R scripts on Condor directly ... interesting. thanks again. - http://censix.com -- View this message in context: http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405p4634442.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combineLimits and Dates
I'm having some trouble using the latticeExtra 'combineLimits' function with a Date x-variable: require(lattice) set.seed(12345) dates - seq(as.Date(2011-01-01), as.Date(2011-12-31), days) dat - data.frame(d = rep(dates, 4), g = factor(rep(rep(c(1,2), each = length(dates)), 2)), h = factor(rep(c(a, b), each = length(dates)*2)), x = rnorm(4 * length(dates)), y = rnorm(4 * length(dates))) plt1 - xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales = list(relation = free), auto.key = TRUE) plt1 - useOuterStrips(plt1) plt1 - combineLimits(plt1) The x-axis labels are right after the call to 'useOuterStrips' but they get converted to numeric after the call to 'combineLimits'. How do I keep them as date labels? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] array complexity for the MH test
I will try again. 1. If you have 5 groups, and split each in half, then you have 10 groups. If MH is applicable to 5 groups, then it should be applicable to the 10 groups as long as they are disjoint groups. 2. I don't think MH applies to your example because the groups do not have similar behavior. That is why I suggested you look at dose response behavior instead. 3. vcd is by David Meyer [aut, cre], Achim Zeileis [aut], Kurt Hornik [aut], Michael Friendly [ctb] Rich On Sun, Jun 24, 2012 at 5:30 AM, francogrex francog...@mail.com wrote: Thanks for your answer. The answer advertises your VCD package, which by the way is a very nice package that I use and recommend for everyone doing such kind of data analysis. However if you really examine the answer you gave me, it does not really or specifically answer my question. -- View this message in context: http://r.789695.n4.nabble.com/array-complexity-for-the-MH-test-tp4633999p4634318.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] setdiff datframes
hi, I have 2 files example 1 and example 2 and would like to know what is in example2 and not in example1 (attached) V1 contain data which could be in duplicated which I am using as identifiers I used setdiff(example2$V1,example1$V1) to find the identifiers which are specific to example2: [1] rs2276598 rs17253672 I am looking for a way to get an output with all columns (V1 to V14) for these 2 identifiers thanks for any suggestions format example1 V1 V2 V3 V4 V5 V6 1 rs4685 2:198257795 C ENSG0115524 ENST0424674 Transcript 2 rs4685 2:198257795 C ENSG0115524 ENST0335508 Transcript 3rs788018 2:198265526 G ENSG0115524 ENST0335508 Transcript 4rs788023 2:198283305 C ENSG0115524 ENST0335508 Transcript 5 rs41284843 2:25536827 A ENSG0119772 ENST0406659 Transcript 6 rs41284843 2:25536827 A ENSG0119772 ENST0321117 Transcript 7 rs41284843 2:25536827 A ENSG0119772 ENST0264709 Transcript 8 rs41284843 2:25536827 A ENSG0119772 ENST0380756 Transcript 9 rs3729680 3:178927410 G ENSG0121879 ENST0263967 Transcript 10 rs61744960 4:106156163 A ENSG0168769 ENST0305737 Transcript 11 rs61744960 4:106156163 A ENSG0168769 ENST0413648 Transcript 12 rs61744960 4:106156163 A ENSG0168769 ENST0540549 Transcript 13 rs61744960 4:106156163 A ENSG0168769 ENST0545826 Transcript 14 rs61744960 4:106156163 A ENSG0168769 ENST0380013 Transcript 15 rs61744960 4:106156163 A ENSG0168769 ENST0535110 Transcript 16 rs61744960 4:106156163 A ENSG0168769 ENST0394764 Transcript 17 rs61744960 4:106156163 A ENSG0168769 ENST0513237 Transcript 18 rs61744960 4:106156163 A ENSG0168769 ENST0265149 Transcript 19 rs2454206 4:106196951 G ENSG0168769 ENST0540549 Transcript 20 rs2454206 4:106196951 G ENSG0168769 ENST0513237 Transcript V7 V8 V9 V10 V11 V12V13 1 SYNONYMOUS_CODING 704 705 235 V gtA/gtG rs4685 2 SYNONYMOUS_CODING 3749 3657 1219 V gtA/gtG rs4685 3 SYNONYMOUS_CODING 2723 2631 877 G ggT/ggC rs788018 4 SYNONYMOUS_CODING 515 423 141 K aaA/aaG rs788023 5 SYNONYMOUS_CODING 365 279 P ccC/ccT rs41284843 6 SYNONYMOUS_CODING 264 279 P ccC/ccT rs41284843 7 SYNONYMOUS_CODING 365 279 P ccC/ccT rs41284843 8 NMD_TRANSCRIPT,SYNONYMOUS_CODING 264 279 P ccC/ccT rs41284843 9 NON_SYNONYMOUS_CODING 1330 1173 391 I/M atA/atG rs3729680 10NON_SYNONYMOUS_CODING 1468 1064 355 G/D gGt/gAt rs61744960 11NON_SYNONYMOUS_CODING 1204 1064 355 G/D gGt/gAt rs61744960 12NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 13NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 14NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 15NON_SYNONYMOUS_CODING 1167 1064 355 G/D gGt/gAt rs61744960 16NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 17NON_SYNONYMOUS_CODING 1924 1127 376 G/D gGt/gAt rs61744960 18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 19NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta rs2454206 20NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta rs2454206 V14 1 ENSP=ENSP0409435;HGNC=SF3B1 2 ENSP=ENSP0335321;HGNC=SF3B1 3 ENSP=ENSP0335321;HGNC=SF3B1 4 ENSP=ENSP0335321;HGNC=SF3B1 5 ENSP=ENSP0384852;HGNC=DNMT3A 6 ENSP=ENSP0324375;HGNC=DNMT3A 7 ENSP=ENSP0264709;HGNC=DNMT3A 8 ENSP=ENSP0370132;HGNC=DNMT3A 9 ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA 10 ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2 11 ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 12 ENSP=ENSP0442788;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 13 ENSP=ENSP0442867;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2 14 ENSP=ENSP0369351;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 15 ENSP=ENSP0438851;PolyPhen=probably_damaging(0.998);SIFT=deleterious(0.01);HGNC=TET2 16 ENSP=ENSP0378245;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2 17 ENSP=ENSP0425443;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 18 ENSP=ENSP0265149;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2 19 ENSP=ENSP0442788;PolyPhen=benign(0.029);SIFT=tolerated(0.15);HGNC=TET2 20 ENSP=ENSP0425443;PolyPhen=benign(0.029);SIFT=tolerated(0.15);HGNC=TET2 example2 V1 V2 V3
Re: [R] setdiff datframes
Hello, It is recomended to post data using dput(). Something like dput(head(DF, 20)) # or 30 Then, paste the output of this command in a post. Untested, but I think it should work: # Create a logical index into 'example2' ix - example2$V1 %in% setdiff(example2$V1,example1$V1) example2[ix, ] # which rows does it index? You could also save the results of setdiff() and then see which of example2$V1 are %in% it. If the data.frame has many rows, this way could save some memory. (setdifff(9 gives only 2 character values, versus 2 TRUE plus n-2 FALSE). Hope this helps, Rui Barradas Em 25-06-2012 17:37, nathalie escreveu: hi, I have 2 files example 1 and example 2 and would like to know what is in example2 and not in example1 (attached) V1 contain data which could be in duplicated which I am using as identifiers I used setdiff(example2$V1,example1$V1) to find the identifiers which are specific to example2: [1] rs2276598 rs17253672 I am looking for a way to get an output with all columns (V1 to V14) for these 2 identifiers thanks for any suggestions format example1 V1 V2 V3 V4 V5 V6 1 rs4685 2:198257795 C ENSG0115524 ENST0424674 Transcript 2 rs4685 2:198257795 C ENSG0115524 ENST0335508 Transcript 3rs788018 2:198265526 G ENSG0115524 ENST0335508 Transcript 4rs788023 2:198283305 C ENSG0115524 ENST0335508 Transcript 5 rs41284843 2:25536827 A ENSG0119772 ENST0406659 Transcript 6 rs41284843 2:25536827 A ENSG0119772 ENST0321117 Transcript 7 rs41284843 2:25536827 A ENSG0119772 ENST0264709 Transcript 8 rs41284843 2:25536827 A ENSG0119772 ENST0380756 Transcript 9 rs3729680 3:178927410 G ENSG0121879 ENST0263967 Transcript 10 rs61744960 4:106156163 A ENSG0168769 ENST0305737 Transcript 11 rs61744960 4:106156163 A ENSG0168769 ENST0413648 Transcript 12 rs61744960 4:106156163 A ENSG0168769 ENST0540549 Transcript 13 rs61744960 4:106156163 A ENSG0168769 ENST0545826 Transcript 14 rs61744960 4:106156163 A ENSG0168769 ENST0380013 Transcript 15 rs61744960 4:106156163 A ENSG0168769 ENST0535110 Transcript 16 rs61744960 4:106156163 A ENSG0168769 ENST0394764 Transcript 17 rs61744960 4:106156163 A ENSG0168769 ENST0513237 Transcript 18 rs61744960 4:106156163 A ENSG0168769 ENST0265149 Transcript 19 rs2454206 4:106196951 G ENSG0168769 ENST0540549 Transcript 20 rs2454206 4:106196951 G ENSG0168769 ENST0513237 Transcript V7 V8 V9 V10 V11 V12V13 1 SYNONYMOUS_CODING 704 705 235 V gtA/gtG rs4685 2 SYNONYMOUS_CODING 3749 3657 1219 V gtA/gtG rs4685 3 SYNONYMOUS_CODING 2723 2631 877 G ggT/ggC rs788018 4 SYNONYMOUS_CODING 515 423 141 K aaA/aaG rs788023 5 SYNONYMOUS_CODING 365 279 P ccC/ccT rs41284843 6 SYNONYMOUS_CODING 264 279 P ccC/ccT rs41284843 7 SYNONYMOUS_CODING 365 279 P ccC/ccT rs41284843 8 NMD_TRANSCRIPT,SYNONYMOUS_CODING 264 279 P ccC/ccT rs41284843 9 NON_SYNONYMOUS_CODING 1330 1173 391 I/M atA/atG rs3729680 10NON_SYNONYMOUS_CODING 1468 1064 355 G/D gGt/gAt rs61744960 11NON_SYNONYMOUS_CODING 1204 1064 355 G/D gGt/gAt rs61744960 12NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 13NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 14NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 15NON_SYNONYMOUS_CODING 1167 1064 355 G/D gGt/gAt rs61744960 16NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 17NON_SYNONYMOUS_CODING 1924 1127 376 G/D gGt/gAt rs61744960 18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 19NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta rs2454206 20NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta rs2454206 V14 1 ENSP=ENSP0409435;HGNC=SF3B1 2 ENSP=ENSP0335321;HGNC=SF3B1 3 ENSP=ENSP0335321;HGNC=SF3B1 4 ENSP=ENSP0335321;HGNC=SF3B1 5 ENSP=ENSP0384852;HGNC=DNMT3A 6 ENSP=ENSP0324375;HGNC=DNMT3A 7 ENSP=ENSP0264709;HGNC=DNMT3A 8 ENSP=ENSP0370132;HGNC=DNMT3A 9 ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA 10 ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2 11 ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 12 ENSP=ENSP0442788;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 13 ENSP=ENSP0442867;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2 14
Re: [R] Very simple question R5 setRefClass and Initialize
When I try your code I'm told that 'mongo' is not a defined class. Removing that field from the definition, I do not get a print on definition. mongoDbUser = setRefClass(mongoDbUser, + fields = list( + auth = logical, + host = character, + username = character, + password = character, + db = character + ), + methods = list( + initialize = + function(auth,host,username,password,db,connector){ + print(initialization) + } + ) + ) R.Version()$version.string [1] R version 2.15.1 Patched (2012-06-22 r59603) So I guess there's more to the story. In S4-land initialize methods can be tricky (e.g., invoked when an object is coerced between types) and a common pattern is to write a constructor rather than initialize methods MongoDbUser - function(username, password, auth, host, db, connector) { ## initialize, then mongoDbUser$new(auth, host, username, password, db, connector } this also separates implementation from interface and provides the opportunity to give hints to the user about data types Martin Morgan - Original Message - Help anyone? Is this impossible? I tried to use the initFields method but it did not change anything. Le 25/06/2012 10:55, Julien Cochennec a écrit : Hi, New to this list, and glad I found R5 which si a great improvement to R for my work. Something seems odd anyway, when I do a setRefClass, the initialize method execute at the class definition. I would like it to execute only when an instance is created. How can I do that? I found no way to test if the code execution is when the class definition happens or when a $new instance is created. Thanks for your help. For example my class definition looks like this : mongoDbUser = setRefClass(mongoDbUser, fields = list( auth = logical, host = character, username = character, password = character, db = character, connector = mongo ), methods = list( initialize = function(auth,host,username,password,db,connector){ print(initialization) } ) ) If I execute this code, it says initialization, but it shouldn't, I created no instance, right?? I would like initialization to appear only when I do : mongoDbUser$new(...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] graph displays
Good Afternoon, I'm trying to create a graph that displays the best way the following information. For instance organized by bar graph, A, B, C Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s 1 A 476375 116 125 129 131 131 131 131 2 B 3764451125 19 61 131 186 186 3 C 1762256612 29 91 171 186 186 thanks -- View this message in context: http://r.789695.n4.nabble.com/graph-displays-tp4634448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph displays
There's no way we can tell you the best way to display your information, because we don't know anything about it. The best display method has a lot to do with what the data are, and what you're trying to illustrate. That said, here are two possibilities, one using the bar graph you requested, and both using only base graphics. # PLEASE use dput() to provide your data, rather than pasting it in testdata - structure(list(Source = c(A, B, C), X1000s = c(47L, 37L, 17L), X600s = c(63L, 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L ), X10s = c(131L, 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s, X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s, X1s), class = data.frame, row.names = c(1, 2, 3)) barplot(as.matrix(testdata[,-1]), beside=TRUE, col=1:3) legend(topleft, LETTERS[1:3], col=1:3, pch=15) plot(1:10, testdata[1, -1], type=b, ylim=c(0, 200), xaxt=n, col=1) axis(1, 1:10, colnames(testdata)[-1]) lines(1:10, testdata[2, -1], type=b, col=2) lines(1:10, testdata[3, -1], type=b, col=3) legend(topleft, LETTERS[1:3], col=1:3, pch=15) Sarah On Mon, Jun 25, 2012 at 12:24 PM, MSousa ricardosousa2...@clix.pt wrote: Good Afternoon, I'm trying to create a graph that displays the best way the following information. For instance organized by bar graph, A, B, C Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s 1 A 47 63 75 116 125 129 131 131 131 131 2 B 37 64 45 11 25 19 61 131 186 186 3 C 17 62 25 66 12 29 91 171 186 186 thanks -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting a Bi-Weibull
I want to fit a bi-Weibull distribution, as defined by Evans et al. in their book Statistical distributions: http://www.scribd.com/doc/57260630/164/FIVE-PARAMETER-BI-WEIBULL-DISTRIBUTION I would like to know if there is any package in R that already does it, or any quick procedure to accomplish it. Thank you in advance, Luís Borda de Água __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setdiff datframes
thanks it works brilliantly wrote: Hello, It is recomended to post data using dput(). Something like dput(head(DF, 20)) # or 30 Then, paste the output of this command in a post. Untested, but I think it should work: # Create a logical index into 'example2' ix - example2$V1 %in% setdiff(example2$V1,example1$V1) example2[ix, ] # which rows does it index? You could also save the results of setdiff() and then see which of example2$V1 are %in% it. If the data.frame has many rows, this way could save some memory. (setdifff(9 gives only 2 character values, versus 2 TRUE plus n-2 FALSE). Hope this helps, Rui Barradas Em 25-06-2012 17:37, nathalie escreveu: hi, I have 2 files example 1 and example 2 and would like to know what is in example2 and not in example1 (attached) V1 contain data which could be in duplicated which I am using as identifiers I used setdiff(example2$V1,example1$V1) to find the identifiers which are specific to example2: [1] rs2276598 rs17253672 I am looking for a way to get an output with all columns (V1 to V14) for these 2 identifiers thanks for any suggestions format example1 V1 V2 V3 V4 V5 V6 1 rs4685 2:198257795 C ENSG0115524 ENST0424674 Transcript 2 rs4685 2:198257795 C ENSG0115524 ENST0335508 Transcript 3rs788018 2:198265526 G ENSG0115524 ENST0335508 Transcript 4rs788023 2:198283305 C ENSG0115524 ENST0335508 Transcript 5 rs41284843 2:25536827 A ENSG0119772 ENST0406659 Transcript 6 rs41284843 2:25536827 A ENSG0119772 ENST0321117 Transcript 7 rs41284843 2:25536827 A ENSG0119772 ENST0264709 Transcript 8 rs41284843 2:25536827 A ENSG0119772 ENST0380756 Transcript 9 rs3729680 3:178927410 G ENSG0121879 ENST0263967 Transcript 10 rs61744960 4:106156163 A ENSG0168769 ENST0305737 Transcript 11 rs61744960 4:106156163 A ENSG0168769 ENST0413648 Transcript 12 rs61744960 4:106156163 A ENSG0168769 ENST0540549 Transcript 13 rs61744960 4:106156163 A ENSG0168769 ENST0545826 Transcript 14 rs61744960 4:106156163 A ENSG0168769 ENST0380013 Transcript 15 rs61744960 4:106156163 A ENSG0168769 ENST0535110 Transcript 16 rs61744960 4:106156163 A ENSG0168769 ENST0394764 Transcript 17 rs61744960 4:106156163 A ENSG0168769 ENST0513237 Transcript 18 rs61744960 4:106156163 A ENSG0168769 ENST0265149 Transcript 19 rs2454206 4:106196951 G ENSG0168769 ENST0540549 Transcript 20 rs2454206 4:106196951 G ENSG0168769 ENST0513237 Transcript V7 V8 V9 V10 V11 V12 V13 1 SYNONYMOUS_CODING 704 705 235 V gtA/gtG rs4685 2 SYNONYMOUS_CODING 3749 3657 1219 V gtA/gtG rs4685 3 SYNONYMOUS_CODING 2723 2631 877 G ggT/ggC rs788018 4 SYNONYMOUS_CODING 515 423 141 K aaA/aaG rs788023 5 SYNONYMOUS_CODING 365 279 P ccC/ccT rs41284843 6 SYNONYMOUS_CODING 264 279 P ccC/ccT rs41284843 7 SYNONYMOUS_CODING 365 279 P ccC/ccT rs41284843 8 NMD_TRANSCRIPT,SYNONYMOUS_CODING 264 279 P ccC/ccT rs41284843 9 NON_SYNONYMOUS_CODING 1330 1173 391 I/M atA/atG rs3729680 10NON_SYNONYMOUS_CODING 1468 1064 355 G/D gGt/gAt rs61744960 11NON_SYNONYMOUS_CODING 1204 1064 355 G/D gGt/gAt rs61744960 12NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 13NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 14NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 15NON_SYNONYMOUS_CODING 1167 1064 355 G/D gGt/gAt rs61744960 16NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 17NON_SYNONYMOUS_CODING 1924 1127 376 G/D gGt/gAt rs61744960 18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 19NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta rs2454206 20NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta rs2454206 V14 1 ENSP=ENSP0409435;HGNC=SF3B1 2 ENSP=ENSP0335321;HGNC=SF3B1 3 ENSP=ENSP0335321;HGNC=SF3B1 4 ENSP=ENSP0335321;HGNC=SF3B1 5 ENSP=ENSP0384852;HGNC=DNMT3A 6 ENSP=ENSP0324375;HGNC=DNMT3A 7 ENSP=ENSP0264709;HGNC=DNMT3A 8 ENSP=ENSP0370132;HGNC=DNMT3A 9 ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA 10 ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2 11 ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 12
Re: [R] Questions about doing analysis based on time
Oh, whoops. When I responded the first time, I had done dput(head(Sunday, 100)) too, but it didn't look useful. It was only just now that I saw that it basically prints out the vectors. Sorry about that. Anyways, here's dput(head(Sunday, 100)): dput(head(Sunday, 100)) structure(list(SunDate = structure(c(1046L, 1058L, 1080L, 1462L, 1268L, 977L, 935L, 158L, 646L, 46L, 1357L, 1131L, 465L, 512L, 1562L, 1555L, 1401L, 503L, 294L, 200L, 1368L, 701L, 364L, 318L, 1120L, 1316L, 486L, 338L, 1604L, 1496L, 925L, 201L, 1007L, 1339L, 758L, 615L, 896L, 38L, 969L, 1543L, 1212L, 612L, 904L, 1197L, 372L, 842L, 571L, 1596L, 1229L, 408L, 912L, 1026L, 1452L, 473L, 409L, 714L, 578L, 1059L, 743L, 579L, 495L, 1280L, 460L, 1081L, 1470L, 1348L, 865L, 346L, 725L, 118L, 604L, 604L, 254L, 397L, 1605L, 149L, 389L, 785L, 1163L, 1206L, 707L, 1189L, 14L, 1590L, 1269L, 26L, 1476L, 513L, 662L, 874L, 283L, 670L, 1613L, 1075L, 1112L, 263L, 623L, 1425L, 1240L, 1240L), .Label = c(1/1/2012 10:27, 1/1/2012 13:59, 1/1/2012 14:50, 1/1/2012 15:44, 1/1/2012 16:48, 1/1/2012 20:43, 1/1/2012 22:41, 1/1/2012 3:43, 1/1/2012 4:57, 1/10/2010 12:37, 1/10/2010 17:03, 1/10/2010 21:04, 1/10/2010 9:19, 1/11/2009 1:25, 1/11/2009 10:51, 1/11/2009 14:13, 1/11/2009 18:44, 1/11/2009 22:45, 1/11/2009 6:25, 1/15/2012 10:33, 1/15/2012 13:48, 1/15/2012 18:54, 1/15/2012 21:58, 1/15/2012 7:00, 1/15/2012 7:06, 1/16/2011 1:26, 1/16/2011 11:30, 1/16/2011 16:15, 1/16/2011 18:36, 1/16/2011 20:42, 1/16/2011 5:23, 1/17/2010 10:56, 1/17/2010 14:25, 1/17/2010 19:13, 1/17/2010 21:00, 1/17/2010 21:55, 1/17/2010 6:54, 1/18/2009 0:45, 1/18/2009 12:10, 1/18/2009 15:13, 1/18/2009 18:27, 1/18/2009 20:39, 1/18/2009 22:19, 1/18/2009 3:12, 1/18/2009 8:38, 1/2/2011 0:12, 1/2/2011 13:05, 1/2/2011 17:32, 1/2/2011 18:06, 1/2/2011 19:28, 1/2/2011 20:42, 1/2/2011 23:40, 1/2/2011 6:43, 1/2/2011 6:44, 1/2/2011 7:20, 1/22/2012 12:05, 1/22/2012 12:59, 1/22/2012 17:13, 1/22/2012 19:25, 1/22/2012 20:08, 1/22/2012 23:11, 1/22/2012 6:30, 1/23/2011 13:09, 1/23/2011 18:40, 1/23/2011 2:35, 1/23/2011 20:15, 1/23/2011 23:44, 1/23/2011 6:25, 1/23/2011 9:46, 1/24/2010 11:47, 1/24/2010 14:13, 1/24/2010 16:20, 1/24/2010 18:06, 1/24/2010 19:52, 1/24/2010 6:56, 1/25/2009 15:13, 1/25/2009 19:47, 1/25/2009 5:18, 1/25/2009 6:49, 1/25/2009 9:26, 1/29/2012 11:10, 1/29/2012 14:06, 1/29/2012 14:13, 1/29/2012 18:00, 1/29/2012 3:22, 1/29/2012 7:13, 1/3/2010 16:07, 1/3/2010 19:55, 1/3/2010 19:56, 1/3/2010 2:10, 1/3/2010 8:31, 1/30/2011 11:27, 1/30/2011 15:38, 1/30/2011 16:58, 1/30/2011 2:34, 1/30/2011 21:01, 1/30/2011 23:58, 1/30/2011 7:08, 1/31/2010 10:31, 1/31/2010 13:36, 1/31/2010 16:29, 1/31/2010 18:09, 1/4/2009 11:15, 1/4/2009 16:29, 1/4/2009 17:06, 1/4/2009 20:14, 1/4/2009 21:58, 1/4/2009 8:49, 1/8/2012 14:02, 1/8/2012 15:29, 1/8/2012 19:37, 1/8/2012 22:25, 1/8/2012 6:17, 1/9/2011 11:46, 1/9/2011 15:53, 1/9/2011 19:12, 1/9/2011 5:08, 10/10/2010 1:13, 10/10/2010 12:05, 10/10/2010 13:12, 10/10/2010 16:06, 10/10/2010 17:40, 10/10/2010 2:46, 10/10/2010 20:46, 10/10/2010 23:52, 10/11/2009 11:26, 10/11/2009 16:07, 10/11/2009 17:47, 10/11/2009 20:41, 10/11/2009 3:20, 10/11/2009 7:03, 10/12/2008 13:13, 10/12/2008 18:13, 10/12/2008 2:19, 10/12/2008 21:22, 10/12/2008 5:50, 10/12/2008 9:45, 10/16/2011 1:57, 10/16/2011 10:43, 10/16/2011 12:09, 10/16/2011 15:23, 10/16/2011 16:58, 10/16/2011 18:16, 10/16/2011 19:14, 10/16/2011 23:49, 10/16/2011 6:27, 10/16/2011 9:00, 10/16/2011 9:35, 10/17/2010 1:18, 10/17/2010 11:02, 10/17/2010 15:30, 10/17/2010 17:06, 10/17/2010 19:31, 10/17/2010 20:33, 10/17/2010 22:04, 10/17/2010 3:24, 10/17/2010 6:07, 10/18/2009 0:12, 10/18/2009 10:15, 10/18/2009 14:54, 10/18/2009 16:52, 10/18/2009 18:49, 10/18/2009 23:10, 10/18/2009 5:17, 10/19/2008 11:52, 10/19/2008 14:11, 10/19/2008 15:15, 10/19/2008 16:47, 10/19/2008 18:40, 10/19/2008 22:53, 10/19/2008 4:31, 10/19/2008 7:02, 10/19/2008 9:20, 10/2/2011 12:25, 10/2/2011 14:21, 10/2/2011 15:20, 10/2/2011 16:04, 10/2/2011 5:06, 10/2/2011 7:33, 10/23/2011 10:41, 10/23/2011 12:41, 10/23/2011 13:15, 10/23/2011 13:53, 10/23/2011 14:07, 10/23/2011 16:59, 10/23/2011 20:25, 10/23/2011 5:39, 10/24/2010 12:05, 10/24/2010 16:24, 10/24/2010 18:16, 10/24/2010 20:55, 10/24/2010 22:10, 10/24/2010 3:24, 10/24/2010 8:49, 10/25/2009 11:07, 10/25/2009 18:12, 10/25/2009 22:17, 10/25/2009 5:06, 10/25/2009 8:08, 10/26/2008 0:23, 10/26/2008 0:37, 10/26/2008 12:24, 10/26/2008 15:47, 10/26/2008 21:08, 10/26/2008 23:55, 10/26/2008 4:28, 10/26/2008 7:58, 10/3/2010 16:35, 10/3/2010 19:25, 10/3/2010 2:38, 10/3/2010 7:43, 10/30/2011 11:29, 10/30/2011 13:15, 10/30/2011 14:22, 10/30/2011 16:01, 10/30/2011 16:02, 10/30/2011 22:28, 10/30/2011 3:05, 10/30/2011 5:48, 10/30/2011 8:01, 10/31/2010 10:47, 10/31/2010 14:57, 10/31/2010 15:48, 10/31/2010 16:57, 10/31/2010 18:20, 10/31/2010 2:34, 10/31/2010 22:10, 10/31/2010 3:32, 10/31/2010 5:59, 10/4/2009 16:19, 10/4/2009 2:20, 10/4/2009 20:23, 10/4/2009 23:09, 10/5/2008 10:56,
Re: [R] Conditioned Latin Hypercube Sampling within determined distance
Hi, I'm sorry for not explaining well. I'm trying to do my best, but it's alittle complicated for me. I will send what I've tried but I think I should also provide some files to make possible for you to run the codes I'm running, but i couldn't find a way to do it. I have 2 rasters to represent environmental characteristics of my area (slasc.asc and weasc.asc). I also have a raster containing the cost of reaching every place in the area (euc_.asc). The function clhs allows to sample in places according to lowest-reaching costs, it means the clhs will choose points where is easier to go. To make this, I should use the code res - clhs(s, size = 30, iter = 2000, progress = FALSE, simple = FALSE, cost = cost). Then the R returns the error: Error in .local(x, ...) : Could not find the cost attribute I don't know what i'm doing wrong. I was wondering if somebody knows what I should try. The R-help says the cost is a character giving the name or an integer giving the index of the attribute in x that gives a cost that can be use to constrain the cLHS sampling. If NULL (default), the cost-constrained implementation is not used. Below it's the codes I'm using: require(raster) require(clhs) require(proj4) require(rgdal) # 1 - load the data r1-raster(D:/Sg/Lavr/R/slasc.asc) r2-raster(D:/Sg/Lavr/R/weasc.asc) #2 - load cost raster # I'M NOT SURE IF THIS STEP IS CORRECT rcost-raster(D:/Sg/Lavr/R/euc_.asc) #3 - set spatial references projection(r1) - +proj=utm +zone=23 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs projection(r2) - +proj=utm +zone=23 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs projection(rcost) - +proj=utm +zone=23 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs #4 - creates a multiple raster*object s = stack(r1, r2, rcost) #5 - Conditional latin hypercube sampling res - clhs(s, size = 30, iter = 2000, progress = FALSE, simple = FALSE, cost = rcost) Error in .local(x, ...) : Could not find the cost attribute. Thank you! Sergio -- View this message in context: http://r.789695.n4.nabble.com/Conditioned-Latin-Hypercube-Sampling-within-determined-distance-tp4633974p4634457.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph displays
xx - structure(list(X1000s = c(47L, 37L, 17L), X600s = c(63L, 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L, 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L), X1s = c(131L, 186L, 186L)), .Names = c(X1000s, X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s, X1s), class = data.frame, row.names = c(A, B, C)) #then this should do barplot(t(xx)) John Kane Kingston ON Canada -Original Message- From: ricardosousa2...@clix.pt Sent: Mon, 25 Jun 2012 09:24:36 -0700 (PDT) To: r-help@r-project.org Subject: [R] graph displays Good Afternoon, I'm trying to create a graph that displays the best way the following information. For instance organized by bar graph, A, B, C Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s 1 A 476375 116 125 129 131 131 131 131 2 B 3764451125 19 61 131 186 186 3 C 1762256612 29 91 171 186 186 thanks -- View this message in context: http://r.789695.n4.nabble.com/graph-displays-tp4634448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss function
On Jun 25, 2012, at 12:30 , Jim Holtman wrote: you have to load the 'foriegn' package first. 'foreign' works better Sent from my iPad ...with spellcheck disabled, it seems ;-) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combineLimits and Dates
Hi Elliot This works on Win 7 ver 2.15 useOuterStrips(combineLimits( xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales = list(y = list(relation = free), x = list( at =seq(from = as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month), labels = format(seq(from = as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month), %b)) ), auto.key = TRUE) )) amend the seq as required and the format if required see ?strptime for format HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 02:28 26/06/2012, you wrote: I'm having some trouble using the latticeExtra 'combineLimits' function with a Date x-variable: require(lattice) set.seed(12345) dates - seq(as.Date(2011-01-01), as.Date(2011-12-31), days) dat - data.frame(d = rep(dates, 4), g = factor(rep(rep(c(1,2), each = length(dates)), 2)), h = factor(rep(c(a, b), each = length(dates)*2)), x = rnorm(4 * length(dates)), y = rnorm(4 * length(dates))) plt1 - xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales = list(relation = free), auto.key = TRUE) plt1 - useOuterStrips(plt1) plt1 - combineLimits(plt1) The x-axis labels are right after the call to 'useOuterStrips' but they get converted to numeric after the call to 'combineLimits'. How do I keep them as date labels? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rrdf package for mac not working
rrdf is incredibly helpful, but I've notice that the rrdf package for mac hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there is still a plan to maintain that in the long run, or if there is some other alternative to read RDF files. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing Arima Parameters from Excel to R
Hello everyone, I would like to have an Excel SpreadSheet with a Macro that allows me to select between the following Models (in the form of a combobox): AR, MA, ARMA and ARIMA. And depending on the model chosen, give Excel the parameters (for p, q, d and P, D, Q). I want to find a way to connect R with Excel in such a way that, If I select the desired model and the required parameters I could pass those to the arima.fit function and then generate the graph and produce the forecasts from R. Is there a way to do this? Best regards, Paul -- View this message in context: http://r.789695.n4.nabble.com/Passing-Arima-Parameters-from-Excel-to-R-tp4634467.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting binomial data to a probit distribution without an equation
Hey everyone, I've been reading an old scientific paper (well, not that old, about 15 years) and I want to verify the authors' statistical results. The paper is fairly unclear about what exactly they did, and none of the relatively simple commands I'm familiar with are producing results similar to theirs. The data is dose-response, recorded as binomial data: structure(list(X1 = c(10, 10, 12, 13, 14, 15, 16, 18, 20, 20, 23, 23, 25, 30, 45, 46, 46, 48, 50, 52), X2 = c(0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1)), .Names = c(X1, X2), row.names = c(NA, 20L), class = data.frame) The quote(s) from the paper is as follows: Maximum likelihood was applied to probit models (normal, lognormal, weibull, logistic, and log-logistic) and the data to determine the incidence relationships. The lognormal and loglogistic models best represent the data. and later: Binary data are utilized for the maximum likelihood dose response calculations I tried using a simple glm() with a probit linker, but that produced border-line nonsense results. This made sense when I thought about it more, as R was trying to fit a regression line to raw binary data as opposed to binned/high repetition binary data. I've also messed around with maximum likelihood but they're not clear about what equation they're using. In the end, I guess what I'm trying to do is: figure out how they're estimating their probit parameters from a binary data set. To me, estimating parameters seems very different from doing a GLM. Is this even possible to do? Is there a package out there than performs this function or is it in the basic functionality of R and I'm just being dumb?? -Mort (apologies if this is too much theoretical statistics and not enough 'R') -- View this message in context: http://r.789695.n4.nabble.com/Fitting-binomial-data-to-a-probit-distribution-without-an-equation-tp4634466.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setdiff datframes
Hi, Try this: example1-data.frame(V1=c(rs4685,rs4685,rs788018,rs788023),V2=c(2:198257795,2:198257795,2:198265526,2:198283305)) example2-data.frame(V1=c(rs4685,rs4675,rs788018,rs788023),V2=c(2:198257795,2:198258795,2:198265526,2:198283305)) subset(example2,!(V1 %in% example1$V1)) V1 V2 2 rs4675 2:198258795 A.K. - Original Message - From: nathalie n...@sanger.ac.uk To: r-help@r-project.org Cc: Sent: Monday, June 25, 2012 12:37 PM Subject: [R] setdiff datframes hi, I have 2 files example 1 and example 2 and would like to know what is in example2 and not in example1 (attached) V1 contain data which could be in duplicated which I am using as identifiers I used setdiff(example2$V1,example1$V1) to find the identifiers which are specific to example2: [1] rs2276598 rs17253672 I am looking for a way to get an output with all columns (V1 to V14) for these 2 identifiers thanks for any suggestions format example1 V1 V2 V3 V4 V5 V6 1 rs4685 2:198257795 C ENSG0115524 ENST0424674 Transcript 2 rs4685 2:198257795 C ENSG0115524 ENST0335508 Transcript 3 rs788018 2:198265526 G ENSG0115524 ENST0335508 Transcript 4 rs788023 2:198283305 C ENSG0115524 ENST0335508 Transcript 5 rs41284843 2:25536827 A ENSG0119772 ENST0406659 Transcript 6 rs41284843 2:25536827 A ENSG0119772 ENST0321117 Transcript 7 rs41284843 2:25536827 A ENSG0119772 ENST0264709 Transcript 8 rs41284843 2:25536827 A ENSG0119772 ENST0380756 Transcript 9 rs3729680 3:178927410 G ENSG0121879 ENST0263967 Transcript 10 rs61744960 4:106156163 A ENSG0168769 ENST0305737 Transcript 11 rs61744960 4:106156163 A ENSG0168769 ENST0413648 Transcript 12 rs61744960 4:106156163 A ENSG0168769 ENST0540549 Transcript 13 rs61744960 4:106156163 A ENSG0168769 ENST0545826 Transcript 14 rs61744960 4:106156163 A ENSG0168769 ENST0380013 Transcript 15 rs61744960 4:106156163 A ENSG0168769 ENST0535110 Transcript 16 rs61744960 4:106156163 A ENSG0168769 ENST0394764 Transcript 17 rs61744960 4:106156163 A ENSG0168769 ENST0513237 Transcript 18 rs61744960 4:106156163 A ENSG0168769 ENST0265149 Transcript 19 rs2454206 4:106196951 G ENSG0168769 ENST0540549 Transcript 20 rs2454206 4:106196951 G ENSG0168769 ENST0513237 Transcript V7 V8 V9 V10 V11 V12 V13 1 SYNONYMOUS_CODING 704 705 235 V gtA/gtG rs4685 2 SYNONYMOUS_CODING 3749 3657 1219 V gtA/gtG rs4685 3 SYNONYMOUS_CODING 2723 2631 877 G ggT/ggC rs788018 4 SYNONYMOUS_CODING 515 423 141 K aaA/aaG rs788023 5 SYNONYMOUS_CODING 365 27 9 P ccC/ccT rs41284843 6 SYNONYMOUS_CODING 264 27 9 P ccC/ccT rs41284843 7 SYNONYMOUS_CODING 365 27 9 P ccC/ccT rs41284843 8 NMD_TRANSCRIPT,SYNONYMOUS_CODING 264 27 9 P ccC/ccT rs41284843 9 NON_SYNONYMOUS_CODING 1330 1173 391 I/M atA/atG rs3729680 10 NON_SYNONYMOUS_CODING 1468 1064 355 G/D gGt/gAt rs61744960 11 NON_SYNONYMOUS_CODING 1204 1064 355 G/D gGt/gAt rs61744960 12 NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 13 NON_SYNONYMOUS_CODING 1924 1064 355 G/D gGt/gAt rs61744960 14 NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 15 NON_SYNONYMOUS_CODING 1167 1064 355 G/D gGt/gAt rs61744960 16 NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 17 NON_SYNONYMOUS_CODING 1924 1127 376 G/D gGt/gAt rs61744960 18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064 355 G/D gGt/gAt rs61744960 19 NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta rs2454206 20 NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta rs2454206 V14 1 ENSP=ENSP0409435;HGNC=SF3B1 2 ENSP=ENSP0335321;HGNC=SF3B1 3 ENSP=ENSP0335321;HGNC=SF3B1 4 ENSP=ENSP0335321;HGNC=SF3B1 5 ENSP=ENSP0384852;HGNC=DNMT3A 6 ENSP=ENSP0324375;HGNC=DNMT3A 7 ENSP=ENSP0264709;HGNC=DNMT3A 8 ENSP=ENSP0370132;HGNC=DNMT3A 9 ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA 10 ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2 11 ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 12 ENSP=ENSP0442788;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 13 ENSP=ENSP0442867;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2 14 ENSP=ENSP0369351;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2 15 ENSP=ENSP0438851;PolyPhen=probably_damaging(0.998);SIFT=deleterious(0.01);HGNC=TET2 16
Re: [R] Questions about doing analysis based on time
Hello, The following worked with the supplied dataset, named 'Sunday'. dat - Sunday dat$SunDate - as.POSIXct(Sunday$SunDate, format=%m/%d/%Y %H:%M) dat - na.exclude(dat) h - hours(dat$SunDate) # 1. Two different displays aggregate(dat$SunScore, by=list(h), mean) aggregate(SunScore ~ h, data=dat, mean) # 2. Several graphs, including hostogram ix - dat$SunScore 140 hist(h[ix]) boxplot(SunScore[ix] ~ h[ix], data=dat) plot(SunScore[ix] ~ jitter(h[ix], 0.5), data=dat, col=h[ix]+2, pch=16) I haven't thought about the other questions, but these seem to be a start. Hope this helps, Rui Barradas Em 25-06-2012 18:56, APOCooter escreveu: Oh, whoops. When I responded the first time, I had done dput(head(Sunday, 100)) too, but it didn't look useful. It was only just now that I saw that it basically prints out the vectors. Sorry about that. Anyways, here's dput(head(Sunday, 100)): dput(head(Sunday, 100)) structure(list(SunDate = structure(c(1046L, 1058L, 1080L, 1462L, 1268L, 977L, 935L, 158L, 646L, 46L, 1357L, 1131L, 465L, 512L, 1562L, 1555L, 1401L, 503L, 294L, 200L, 1368L, 701L, 364L, 318L, 1120L, 1316L, 486L, 338L, 1604L, 1496L, 925L, 201L, 1007L, 1339L, 758L, 615L, 896L, 38L, 969L, 1543L, 1212L, 612L, 904L, 1197L, 372L, 842L, 571L, 1596L, 1229L, 408L, 912L, 1026L, 1452L, 473L, 409L, 714L, 578L, 1059L, 743L, 579L, 495L, 1280L, 460L, 1081L, 1470L, 1348L, 865L, 346L, 725L, 118L, 604L, 604L, 254L, 397L, 1605L, 149L, 389L, 785L, 1163L, 1206L, 707L, 1189L, 14L, 1590L, 1269L, 26L, 1476L, 513L, 662L, 874L, 283L, 670L, 1613L, 1075L, 1112L, 263L, 623L, 1425L, 1240L, 1240L), .Label = c(1/1/2012 10:27, 1/1/2012 13:59, 1/1/2012 14:50, 1/1/2012 15:44, 1/1/2012 16:48, 1/1/2012 20:43, 1/1/2012 22:41, 1/1/2012 3:43, 1/1/2012 4:57, 1/10/2010 12:37, 1/10/2010 17:03, 1/10/2010 21:04, 1/10/2010 9:19, 1/11/2009 1:25, 1/11/2009 10:51, 1/11/2009 14:13, 1/11/2009 18:44, 1/11/2009 22:45, 1/11/2009 6:25, 1/15/2012 10:33, 1/15/2012 13:48, 1/15/2012 18:54, 1/15/2012 21:58, 1/15/2012 7:00, 1/15/2012 7:06, 1/16/2011 1:26, 1/16/2011 11:30, 1/16/2011 16:15, 1/16/2011 18:36, 1/16/2011 20:42, 1/16/2011 5:23, 1/17/2010 10:56, 1/17/2010 14:25, 1/17/2010 19:13, 1/17/2010 21:00, 1/17/2010 21:55, 1/17/2010 6:54, 1/18/2009 0:45, 1/18/2009 12:10, 1/18/2009 15:13, 1/18/2009 18:27, 1/18/2009 20:39, 1/18/2009 22:19, 1/18/2009 3:12, 1/18/2009 8:38, 1/2/2011 0:12, 1/2/2011 13:05, 1/2/2011 17:32, 1/2/2011 18:06, 1/2/2011 19:28, 1/2/2011 20:42, 1/2/2011 23:40, 1/2/2011 6:43, 1/2/2011 6:44, 1/2/2011 7:20, 1/22/2012 12:05, 1/22/2012 12:59, 1/22/2012 17:13, 1/22/2012 19:25, 1/22/2012 20:08, 1/22/2012 23:11, 1/22/2012 6:30, 1/23/2011 13:09, 1/23/2011 18:40, 1/23/2011 2:35, 1/23/2011 20:15, 1/23/2011 23:44, 1/23/2011 6:25, 1/23/2011 9:46, 1/24/2010 11:47, 1/24/2010 14:13, 1/24/2010 16:20, 1/24/2010 18:06, 1/24/2010 19:52, 1/24/2010 6:56, 1/25/2009 15:13, 1/25/2009 19:47, 1/25/2009 5:18, 1/25/2009 6:49, 1/25/2009 9:26, 1/29/2012 11:10, 1/29/2012 14:06, 1/29/2012 14:13, 1/29/2012 18:00, 1/29/2012 3:22, 1/29/2012 7:13, 1/3/2010 16:07, 1/3/2010 19:55, 1/3/2010 19:56, 1/3/2010 2:10, 1/3/2010 8:31, 1/30/2011 11:27, 1/30/2011 15:38, 1/30/2011 16:58, 1/30/2011 2:34, 1/30/2011 21:01, 1/30/2011 23:58, 1/30/2011 7:08, 1/31/2010 10:31, 1/31/2010 13:36, 1/31/2010 16:29, 1/31/2010 18:09, 1/4/2009 11:15, 1/4/2009 16:29, 1/4/2009 17:06, 1/4/2009 20:14, 1/4/2009 21:58, 1/4/2009 8:49, 1/8/2012 14:02, 1/8/2012 15:29, 1/8/2012 19:37, 1/8/2012 22:25, 1/8/2012 6:17, 1/9/2011 11:46, 1/9/2011 15:53, 1/9/2011 19:12, 1/9/2011 5:08, 10/10/2010 1:13, 10/10/2010 12:05, 10/10/2010 13:12, 10/10/2010 16:06, 10/10/2010 17:40, 10/10/2010 2:46, 10/10/2010 20:46, 10/10/2010 23:52, 10/11/2009 11:26, 10/11/2009 16:07, 10/11/2009 17:47, 10/11/2009 20:41, 10/11/2009 3:20, 10/11/2009 7:03, 10/12/2008 13:13, 10/12/2008 18:13, 10/12/2008 2:19, 10/12/2008 21:22, 10/12/2008 5:50, 10/12/2008 9:45, 10/16/2011 1:57, 10/16/2011 10:43, 10/16/2011 12:09, 10/16/2011 15:23, 10/16/2011 16:58, 10/16/2011 18:16, 10/16/2011 19:14, 10/16/2011 23:49, 10/16/2011 6:27, 10/16/2011 9:00, 10/16/2011 9:35, 10/17/2010 1:18, 10/17/2010 11:02, 10/17/2010 15:30, 10/17/2010 17:06, 10/17/2010 19:31, 10/17/2010 20:33, 10/17/2010 22:04, 10/17/2010 3:24, 10/17/2010 6:07, 10/18/2009 0:12, 10/18/2009 10:15, 10/18/2009 14:54, 10/18/2009 16:52, 10/18/2009 18:49, 10/18/2009 23:10, 10/18/2009 5:17, 10/19/2008 11:52, 10/19/2008 14:11, 10/19/2008 15:15, 10/19/2008 16:47, 10/19/2008 18:40, 10/19/2008 22:53, 10/19/2008 4:31, 10/19/2008 7:02, 10/19/2008 9:20, 10/2/2011 12:25, 10/2/2011 14:21, 10/2/2011 15:20, 10/2/2011 16:04, 10/2/2011 5:06, 10/2/2011 7:33, 10/23/2011 10:41, 10/23/2011 12:41, 10/23/2011 13:15, 10/23/2011 13:53, 10/23/2011 14:07, 10/23/2011 16:59, 10/23/2011 20:25, 10/23/2011 5:39, 10/24/2010 12:05, 10/24/2010 16:24, 10/24/2010 18:16, 10/24/2010 20:55, 10/24/2010 22:10, 10/24/2010 3:24, 10/24/2010 8:49, 10/25/2009 11:07, 10/25/2009 18:12, 10/25/2009 22:17, 10/25/2009 5:06, 10/25/2009 8:08, 10/26/2008 0:23, 10/26/2008
[R] weighted logistic regression
I have an epidemiologic dataset and want to do a weighted logistic regression. I have a continuous exposure variable and I want to see if it is associated with a binary outcome. The accuracy of the exposure value is highly variable and I want the logistic regression to assign greater weight/importance to exposure values that are highly accurate, and lower weight to exposure values that are less accurate (there is a third variable which specifies the quality of the exposure value). Is the weights argument for the glm command appropriate for this? Thanks, Katie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do.call or something instead of for
On 26/06/12 00:39, Kathie wrote: Dear R users, I'd like to compute X like below. X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t} where W_{i,t} are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0} Of course, I can do this with for statement, but I don't think it's good idea because i and t are too big. So, my question is that Is there any better idea to avoid for statement for this problem? Thank you in advance. You could use filter(), as follows: foo - function (n) { w0 - 1+5*runif(1,0,2) w - c(w0,runif(n-1,0,2)) y - filter(w,filter=2,method=recursive) mu - sapply(0:(n-1),function(k){if(k==0) return(0);sum((1:k)*2^((k-1):0))*0.1}) mu + y } (This of course just does univariate time series and ignores the subscript i. To get multivariate time series, just use foo() to generate as many components as you want; they are independent.) Check: set.seed(42) x - foo(10) x Time Series: Start = 1 End = 10 Frequency = 1 [1] 10.14806 22.27027 45.31282 92.58654 186.85657 375.25133 [7] 752.57585 1506.12103 3014.35603 6031.02220 set.seed(42) w0 - 1+5*runif(1,0,1) w - c(w0,runif(9,0,2) 0.1+2*x[1]+w[2] [1] 22.27027# Bingo. 0.2+2*x[2]+w[3] [1] 45.31282# Bingo. etc. Notice that the generated series explodes rather rapidly. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.