Re: [R] shapiro wilk normality test
You may consider the nortest package. http://cran.r-project.org/web/packages/nortest/index.html Regards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com [EMAIL PROTECTED] wrote: Hi everybody, somehow i dont get the shapiro wilk test for normality. i just can´t find what the H0 is . i tried : shapiro.test(rnorm(5000)) Shapiro-Wilk normality test data: rnorm(5000) W = 0.9997, p-value = 0.6205 If normality is the H0, the test says it´s probably not normal, doesn´t it ? 5000 is the biggest n allowed by the test... are there any other test ? ( i know qqnorm already ;) thanks in advance matthias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- CH Chan Research Assistant - KWH http://www.macgrass.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel Trend Function
Hi Felipe, Daniel mentions imputation is a disputed practice. There are recommendations and rules of thumb for its use. I am not sure that imputation is disputed. I would be interested to see some links to articles recommending against its use. Paul - Original Message - From: Felipe Carrillo [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, July 13, 2008 5:46 AM Subject: [R] Excel Trend Function Hi: I have a dataset and need to interpolate for missing days. In Excel I either average from sampled days from above and below the missing days or use the TREND function to make up for the missing values. I have been reading about na.approx, is this function similar to the TREND function? Which is the best recommendable way to make up for missing data? Here's my dataset: weeks 17,18,26 and 46 have 0 daysSamp. Year Week daysSamp Lower TotalPD Upper varTotalPD 2006 47 6 126988 188259 249530 1045878675 2006 48 7 189155 253350 317545 1148102355 2006 49 7 103300 132741 162182 241480186 2006 50 6 11801 252576 493352 16151006813 2006 51 7 2348 3671 4994 487926 2006 52 5 2606 29901 57196 215454181 2006 2 7 2968 4513 6058 664723 2006 3 7 1128 1889 2650 161231 2006 4 7 479 963 1447 65196 2006 5 7 2819 4413 6007 708094 2006 6 6 -1009 3128 7264 4766743 2006 7 7 -5239 10769 26777 71387835 2006 8 7 150 503 856 34685 2006 9 7 1858 2989 4120 356562 2006 10 7 193 494 795 25281 2006 11 7 125 346 567 13627 2006 12 7 432 767 1102 31189 2006 13 7 1229 1867 2505 113569 2006 14 7 813 1339 1865 77140 2006 15 4 -66 124 315 10105 2006 16 7 152 903 1654 157242 2006 17 0 2006 18 0 2006 19 5 0 0 0 0 2006 20 4 0 0 0 0 2006 21 5 0 0 0 0 2006 22 6 0 0 0 0 2006 23 7 -65 285 635 34112 2006 24 6 0 0 0 0 2006 25 7 0 0 0 0 2006 26 0 2006 27 4 228 931 1634 137726 2006 28 4 801 2231 3662 569977 2006 29 4 4544 9242 13939 6147522 2006 30 5 15798 28465 41131 44697915 2006 31 5 25398 41049 56701 68245523 2006 32 5 48197 82216 116235 322416917 2006 33 5 142980 230411 317841 2129630128 2006 34 5 227141 360468 493794 4952314336 2006 35 5 467244 756325 1045405 23281569629 2006 36 5 281049 463331 645614 9256900449 2006 37 2 227636 620330 1013023 42961663047 2006 38 3 478990 983472 1487954 70903343603 2006 39 7 539690 846522 1153354 26228718974 2006 40 7 320959 457866 594773 5221891252 2006 41 7 427561 582452 737343 6683813344 2006 42 7 271788 351103 430418 1752614293 2006 43 7 165019 208853 252687 535301133 2006 44 7 91514 117390 143266 186537178 2006 45 7 59061 79187 99313 112842787 2006 46 0 Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a warning message from lmer
Thanks for reminding me. But, the problem still exists when I combine these two 'joy' levels. Quoting Douglas Bates [EMAIL PROTECTED]: By the way, did you notice that the levels of Emotion include both joy and joy . You may want to correct that. On Sat, Jul 12, 2008 at 7:47 AM, Douglas Bates [EMAIL PROTECTED] wrote: On Sat, Jul 12, 2008 at 6:23 AM, Lan Wei [EMAIL PROTECTED] wrote: Hi all, I have a problem when running lmer. In my data set, Agree is a binary(0/1) response. WalkerID and ObsID is the identification number of the subjects. the description of the other variables are as follows: levels(regdat$Display) [1] Dynamic Static levels(regdat$Survey) [1] HM1_A HM1_B HM1_C HM2_A HM2_B HM2_C ST_A ST_B ST_C levels(regdat$Emotion) [1] aneu ang con joy joy sad levels(regdat$ObsGender) [1] F M levels(regdat$WalkerGender) [1] F M the watning is: fit1-lmer(Agree~Display+Survey+Emotion+WalkerGender+ObsGender+(1|WalkerID)+(1|ObsID),family=binomial(link='logit'),data=regdat) Warning message: In mer_finalize(ans, verbose) : gr cannot be computed at initial par (65) Does anybody have some hint to solve this problem? I'd very much appreciate it! In situations like this it is best to add the argument verbose = TRUE in the call to lmer so that you can see the progress of the iterations. (Also, you may want to call glmer directly. When you call lmer with a non-gaussian family it simply calls glmer. You can avoid the extra step.) This call is returning a warning about evaluation of the gradient at the initial values of the parameters. I'm not sure if it then goes on to optimize the approximated deviance. If the approximated deviance is not being minimized for this model you may want to start with a simpler model, omitting some of the terms in the fixed effects. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Position in a vector of the last value n - *SOLVED*
Yes, your version (func2) is quick, quickest for longer vectors: m - matrix(rexp(6e6,rate=0.05), nrow=5) # 120 cols m[m20] - 20 func1 - function(v,cut=20) max(which(vcut)) func2 - function(v,cut=20) { +x - which(vcut) +x[length(x)] + } func3 - function(v,cut=20) tail(which(vcut), 1) system.time(apply(m, 2, func1)) user system elapsed 0.580.010.59 system.time(apply(m, 2, func2)) user system elapsed 0.480.040.53 system.time(apply(m, 2, func3)) user system elapsed 0.550.000.56 -John Thaden -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Saturday, July 12, 2008 6:56 AM To: Thaden, John J Cc: r-help@r-project.org Subject: Re: [R] Position in a vector of the last value n - *SOLVED* A slight modification gives the equivalent results instead of using 'tail' m - matrix(rexp(6e6,rate=0.05), nrow=600) # 5,000 cols m[m20] - 20 func1 - function(v,cut=20) max(which(v20)) func2 - function(v,cut=20) { + x - which(v20) + x[length(x)] + } system.time(apply(m, 2, func1)) user system elapsed 1.330.051.47 # user system elapsed # 0.400.020.42 system.time(apply(m, 2, func2)) user system elapsed 1.310.081.44 # user system elapsed # 0.700.050.75 Here is another view using Rprof on the first version. You can see that 'tail' takes a fair amount of time; accounts for the differences in timing: /cygdrive/c: perl perf/bin/readrprof.pl tempxx.txt 0 2.7 root 1.1.8 system.time 2. .1.7 eval 3. . .1.7 eval 4. . . .1.7 apply 5. . . . |1.5 FUN 6. . . . | .0.8 tail 7. . . . | . .0.5 which 8. . . . | . . .0.1 8. . . . | . . .0.0 8. . . . | . . .0.0 ! 7. . . . | . .0.3 tail.default 8. . . . | . . .0.2 stopifnot 9. . . . | . . . .0.1 eval 9. . . . | . . . .0.0 match.call 9. . . . | . . . .0.0 any 6. . . . | .0.5 which 7. . . . | . .0.1 7. . . . | . .0.1 7. . . . | . .0.0 names- 7. . . . | . .0.0 is.na 5. . . . |0.1 aperm 5. . . . |0.0 unlist 6. . . . | .0.0 lapply 5. . . . |0.0 is.null 2. .0.1 gc 1.0.8 matrix 2. .0.7 as.vector 3. . .0.6 rexp 1.0.1 /cygdrive/c: On Fri, Jul 11, 2008 at 12:23 PM, Thaden, John J [EMAIL PROTECTED] wrote: I had written asking for a simple way to extract the Index of the last value in a vector greater than some cutoff, e.g., the index, 6, for a cutoff of 20 and this example vector: v - c(20, 134, 45, 20, 24, 500, 20, 20, 20) Thank you, Alain Guillet, for this simple solution sent to me offlist: max(which(v 20) Also, thank you Lisa Readdy for a lengthier solution. Other offerings yielded the value instead of the index (the phrasing of my question apparently was misleading): v[max(which(v 20))] (Henrique Dallazuanna) tail(v[v20],1)(Jim Holtman) Jim's use of tail() suggests a variant to Alain's solution tail(which(v 20), 1) This is faster than the max() version with long vectors, but, to my surprise, slower (on my WinXP Lenovo T61 laptop) in a rough mockup of my column-wise apply() usage: m - matrix(rexp(3e6,rate=0.05), nrow=600) # 5,000 cols m[m20] - 20 func1 - function(v,cut=20) max(which(v20)) func2 - function(v,cut=20) tail(which(v20),1) system.time(apply(m, 2, func1)) # user system elapsed # 0.400.020.42 system.time(apply(m, 2, func2)) # user system elapsed # 0.700.050.75 Thank you again, Alain and others. John On Thu, Jul 10, 2008 at 9:41 AM, John Thaden wrote: This shouldn't be hard, but it's just not coming to me: Given a vector, e.g., v - c(20, 134, 45, 20, 24, 500, 20, 20, 20) how can I get the index of the last value in the vector that has a value greater than n, in the example, with n 20? I'm looking for an efficient function I can use on very large matrices, as the FUN argument in the apply() command. Confidentiality Notice: This e-mail message, including a...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? Confidentiality Notice: This e-mail message, including a...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R crash with ATLAS precompiled Rblas.dll on Windows XP Core2 Duo
Yes, that Rblas.dll is known to be faulty, and the person who built it is unable to re-build it. It needs to be removed from CRAN. (I've also tried to build on Core 2 Duo, and my Cygwin installation has a compiler crash during the build.) On Tue, 8 Jul 2008, Law, Jason wrote: I noticed a problem using R 2.7.1 on Windows XP SP2 with the precompiled Atlas Rblas.dll. Running the code below causes R to crash. I started R using Rgui --vanilla and am using the precompiled Atlas Rblas.dll from cran.fhcrc.org dated 17-Jul-2007 05:04 for Core2 Duo. The code that causes the crash: x - rnorm(100) y - rnorm(100) z - rnorm(100) loess(z ~ x * y) loess(z ~ x) does not cause a crash using the Atlas BLAS and neither does running the above code with the Rblas.dll that came with R 2.7.1. In addition, the code runs fine using the Atlas BLAS under R 2.6.2. The windows error information that is printed to the screen when R closes: AppName: rgui.exeAppVer: 2.71.45970.0ModName: rblas.dll ModVer: 2.51.42199.0 Offset: 000501cc sessionInfo returns: R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base I checked the R FAQ, R for Windows FAQ, and the README associated with the Atlas BLAS on CRAN and couldn't find any information related to possible crash causes. I've used the ATLAS BLAS for about 6 months on this machine (it's a new machine) with R 2.6.2. Using debug(stats:::simpleLoess), I've found that the crash occurs on the first iteration of the line: z - .C(R_loess_raw, as.double(y), as.double(x), as.double(weights), as.double(robust), as.integer(D), as.integer(N), as.double(span), as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr), as.double(span * cell), as.character(surf.stat), fitted.values = double(N), parameter = integer(7), a = integer(max.kd), xi = double(max.kd), vert = double(2 * D), vval = double((D + 1) * max.kd), diagonal = double(N), trL = double(1), delta1 = double(1), delta2 = double(1), as.integer(surf.stat == interpolate/exact)) After that, I'm kind of stuck in terms of tracking it down. Thanks for any input, Jason Law City of Portland, OR __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to build a package which loads Rgraphviz (if installed)...
Søren Højsgaard wrote: The tricky part is not getting it through the checks on my computer. It is when I upload to CRAN I get the problems, because their computers need Rgraphviz as well... (Suggests does not seem to be the solution...) Søren, CRAN (which means my machine in the case when Windows is concerned) has a running version of Rgraphiz nowadays (since a week or so). If you like I can trigger updates of your packages. Best wishes, Uwe Cheers Søren Fra: Duncan Murdoch [mailto:[EMAIL PROTECTED] Sendt: sø 13-07-2008 00:36 Til: Søren Højsgaard Cc: William Revelle; [EMAIL PROTECTED] Emne: Re: [R] How to build a package which loads Rgraphviz (if installed)... On 12/07/2008 6:27 PM, Søren Højsgaard wrote: Bill, Thanks for the suggestion, but it does not solve the problem; I get the same warning from rcmd check. I suspect that rcmd check actually checks that any package referred to in require() is declared in the DESCRIPTION file. From the version numbers from your 'psych' package I guess you are stuck with the same problem??? There are varying degrees of dependence. Probably Suggests is what you want. Note that *you* need to have RGraphViz to make it through the checks, but other users won't need it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R crash with ATLAS precompiled Rblas.dll on Windows XP Core2 Duo
Prof Brian Ripley wrote: Yes, that Rblas.dll is known to be faulty, and the person who built it is unable to re-build it. It needs to be removed from CRAN. Whoops, I forgot to remove it and will do so this afternoon. Uwe (I've also tried to build on Core 2 Duo, and my Cygwin installation has a compiler crash during the build.) On Tue, 8 Jul 2008, Law, Jason wrote: I noticed a problem using R 2.7.1 on Windows XP SP2 with the precompiled Atlas Rblas.dll. Running the code below causes R to crash. I started R using Rgui --vanilla and am using the precompiled Atlas Rblas.dll from cran.fhcrc.org dated 17-Jul-2007 05:04 for Core2 Duo. The code that causes the crash: x - rnorm(100) y - rnorm(100) z - rnorm(100) loess(z ~ x * y) loess(z ~ x) does not cause a crash using the Atlas BLAS and neither does running the above code with the Rblas.dll that came with R 2.7.1. In addition, the code runs fine using the Atlas BLAS under R 2.6.2. The windows error information that is printed to the screen when R closes: AppName: rgui.exe AppVer: 2.71.45970.0 ModName: rblas.dll ModVer: 2.51.42199.0 Offset: 000501cc sessionInfo returns: R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base I checked the R FAQ, R for Windows FAQ, and the README associated with the Atlas BLAS on CRAN and couldn't find any information related to possible crash causes. I've used the ATLAS BLAS for about 6 months on this machine (it's a new machine) with R 2.6.2. Using debug(stats:::simpleLoess), I've found that the crash occurs on the first iteration of the line: z - .C(R_loess_raw, as.double(y), as.double(x), as.double(weights), as.double(robust), as.integer(D), as.integer(N), as.double(span), as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr), as.double(span * cell), as.character(surf.stat), fitted.values = double(N), parameter = integer(7), a = integer(max.kd), xi = double(max.kd), vert = double(2 * D), vval = double((D + 1) * max.kd), diagonal = double(N), trL = double(1), delta1 = double(1), delta2 = double(1), as.integer(surf.stat == interpolate/exact)) After that, I'm kind of stuck in terms of tracking it down. Thanks for any input, Jason Law City of Portland, OR __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing RWinEdt
[EMAIL PROTECTED] wrote: From the R console I invoke: install.packages(RWinEdt) and get: Warning in install.packages(RWinEdt) : argument 'lib' is missing: using 'F:\Users\Kevin\Documents/R/win-library/2.7' --- Please select a CRAN mirror for use in this session --- trying URL 'http://streaming.stat.iastate.edu/CRAN/bin/windows/contrib/2.7/RWinEdt_1.8-0.zip' Content type 'application/zip' length 361598 bytes (353 Kb) opened URL downloaded 353 Kb package 'RWinEdt' successfully unpacked and MD5 sums checked The downloaded packages are in F:\Users\Kevin\AppData\Local\Temp\RtmpOIlW0F\downloaded_packages updating HTML package descriptions So it seems to have worked. But when I use the 'library' command I get: library(RWinEdt) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : cannot open file 'F:\Program Files (x86)\WinEdt Team\WinEdt\R.ver': No such file or directory Error : .onAttach failed in 'attachNamespace' Error: package/namespace load failed for 'RWinEdt' Any ideas on how I can install this package? With administrator privileges, since it needs to write some files into the WinEdt directory. Best wishes, Uwe Ligges Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
Hi! Well, if you look at the output: shapiro.test(rnorm(5000))     Shapiro-Wilk normality test data: rnorm(5000) W = 0.9997, p-value = 0.6205 You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis. H0: normal data   vs H1: not normal So shapiro.wilk test is saying that your data are normal and it's correct! Bye Marta - Messaggio originale - Da: C.H. [EMAIL PROTECTED] A: Bunny, lautloscrew.com [EMAIL PROTECTED] Cc: r-help@r-project.org Inviato: Domenica 13 luglio 2008, 7:27:43 Oggetto: Re: [R] shapiro wilk normality test You may consider the nortest package. http://cran.r-project.org/web/packages/nortest/index.html Regards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com [EMAIL PROTECTED] wrote: Hi everybody, somehow i dont get the shapiro wilk test for normality. i just can´t find what the H0 is . i tried :  shapiro.test(rnorm(5000))     Shapiro-Wilk normality test data: rnorm(5000) W = 0.9997, p-value = 0.6205 If normality is the H0, the test says it´s probably not normal, doesn´t it ? 5000 is the biggest n allowed by the test... are there any other test ? ( i know qqnorm already ;) thanks in advance matthias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- CH Chan Research Assistant - KWH http://www.macgrass.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Vuoi incontrare Rihanna? [[elided Yahoo spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assoociative array?
The reason for the empty levels was I did not put drop=TRUE on the split to remove unused levels. Here is the revised script: set.seed(1) # start with a known number x - data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=sample(letters[1:4], 20, TRUE), b=runif(20)) x cat a b 1A d 0.82094629 2B a 0.64706019 3B c 0.78293276 4C a 0.55303631 5A b 0.52971958 6C b 0.78935623 7C a 0.02333120 8B b 0.47723007 9B d 0.73231374 10 A b 0.69273156 11 A b 0.47761962 12 A c 0.86120948 13 C b 0.43809711 14 B a 0.24479728 15 C d 0.07067905 16 B c 0.09946616 17 C d 0.31627171 18 C a 0.51863426 19 B c 0.66200508 20 C b 0.40683019 # drop unused groups from the split (z - split(x, list(x$cat, x$a), drop=TRUE)) $B.a cat a b 2B a 0.6470602 14 B a 0.2447973 $C.a cat a b 4C a 0.55303631 7C a 0.02333120 18 C a 0.51863426 $A.b cat a b 5A b 0.5297196 10 A b 0.6927316 11 A b 0.4776196 $B.b cat a b 8 B b 0.4772301 $C.b cat a b 6C b 0.7893562 13 C b 0.4380971 20 C b 0.4068302 $A.c cat a b 12 A c 0.8612095 $B.c cat a b 3B c 0.78293276 16 B c 0.09946616 19 B c 0.66200508 $A.d cat a b 1 A d 0.8209463 $B.d cat a b 9 B d 0.7323137 $C.d cat a b 15 C d 0.07067905 17 C d 0.31627171 # access the value ('b' in this instance); two ways- should be the same z[[1]]$b [1] 0.6470602 0.2447973 z$B.a$b [1] 0.6470602 0.2447973 On Sun, Jul 13, 2008 at 1:26 AM, [EMAIL PROTECTED] wrote: This is almost it. Maybe it is as good as can be expected. The only problem that I see is that this seems to form a Category/SubCategory pair where none existed in the original data. For example, A might have two sub-categories a and b, and B might have two categories c and d. As far as I can tell the method that you outlined forms a Category/SubCategory pair like B a or B b where none existed. This results in alot of empty lists and it seems to take a long time to generate. But if that is as good as it gets then I can live with it. I know that I said one more question. But I have run into a problem. c - split(x, x$Category) returns a vector of the rows in each of the categories. Now I would like to access the Quantity column within this split vector. I can see it listed. I just can't access it. I have tried c[1]$Quantity and c[1,2] both which give me errors. Any ideas? Sorry this is so hard for me. I am more used to C type arrays and C type arrays of structures. This seems to be somewhat different. Thank you. Kevin jim holtman [EMAIL PROTECTED] wrote: Is this something like what you were asking for? The output of a 'split' will be a list of the dataframe subsets for the categories you have specified. x - data.frame(g1=sample(LETTERS[1:2],30,TRUE), + g2=sample(letters[1:2], 30, TRUE), + g3=1:30) y - split(x, list(x$g1, x$g2)) str(y) List of 4 $ A.a:'data.frame':7 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 1 1 1 1 1 1 1 ..$ g2: Factor w/ 2 levels a,b: 1 1 1 1 1 1 1 ..$ g3: int [1:7] 3 4 6 8 9 13 24 $ B.a:'data.frame':7 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 2 2 2 2 2 2 2 ..$ g2: Factor w/ 2 levels a,b: 1 1 1 1 1 1 1 ..$ g3: int [1:7] 10 11 16 17 18 20 25 $ A.b:'data.frame':6 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 1 1 1 1 1 1 ..$ g2: Factor w/ 2 levels a,b: 2 2 2 2 2 2 ..$ g3: int [1:6] 2 12 23 26 27 29 $ B.b:'data.frame':10 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 2 2 2 2 2 2 2 2 2 2 ..$ g2: Factor w/ 2 levels a,b: 2 2 2 2 2 2 2 2 2 2 ..$ g3: int [1:10] 1 5 7 14 15 19 21 22 28 30 y $A.a g1 g2 g3 3 A a 3 4 A a 4 6 A a 6 8 A a 8 9 A a 9 13 A a 13 24 A a 24 $B.a g1 g2 g3 10 B a 10 11 B a 11 16 B a 16 17 B a 17 18 B a 18 20 B a 20 25 B a 25 $A.b g1 g2 g3 2 A b 2 12 A b 12 23 A b 23 26 A b 26 27 A b 27 29 A b 29 $B.b g1 g2 g3 1 B b 1 5 B b 5 7 B b 7 14 B b 14 15 B b 15 19 B b 19 21 B b 21 22 B b 22 28 B b 28 30 B b 30 y[[2]] g1 g2 g3 10 B a 10 11 B a 11 16 B a 16 17 B a 17 18 B a 18 20 B a 20 25 B a 25 On Sat, Jul 12, 2008 at 8:51 PM, [EMAIL PROTECTED] wrote: OK. Now I know that I am dealing with a data frame. One last question on this topic. a - read.csv() gives me a dataframe. If I have 'c - split(x, x$Category), then what is returned by split in this case? c[1] seems to be OK but c[2] is not right in my mind. If I run ci - split(nrow(a), a$Category). And then ci[1] seems to be the rows associated with the first category, c[2] is the indices/rows associated with the second category, etc. But this seems different than c[1], c[2], etc. Using the techniques below I
Re: [R] Reading Multi-value data fields for descriptive analysis
This may do what you want: x - read.table(/tempxx.txt, comment=, quote=, sep=|, header=TRUE, as.is=TRUE) # split out by name z - lapply(seq(nrow(x)), function(.row){ + .result - NULL + # construct the data output + for (i in c('picnic', 'food', 'other')){ + .split - strsplit(x[.row,][[i]], ;#) + .result - rbind(.result, cbind(name=x[.row,][['name']], field=i, value=unlist(.split))) + } + .result + }) z [[1]] namefieldvalue [1,] Yogi Bear picnic Yes [2,] Yogi Bear food Hamburgers [3,] Yogi Bear food Hot Dogs [4,] Yogi Bear food I rely on others to bring the good stuff [5,] Yogi Bear other \Softball [6,] Yogi Bear other Blanket [7,] Yogi Bear other I bring boo-boo, but he hides\ [[2]] name fieldvalue [1,] Boo-Boo picnic Yes [2,] Boo-Boo food Potato Salad [3,] Boo-Boo food Cole Slaw [4,] Boo-Boo food whatever Yogi doesn't eat [5,] Boo-Boo other Lawn Chairs [6,] Boo-Boo other Blanket [7,] Boo-Boo other my running shoes [[3]] name fieldvalue [1,] Ranger Rick picnic No [2,] Ranger Rick food I told you I don't picnic [3,] Ranger Rick other a big net and handcuffs [[4]] name fieldvalue [1,] Magilla Gorilla picnic Yes [2,] Magilla Gorilla food Hamburgers [3,] Magilla Gorilla food Hot Dogs [4,] Magilla Gorilla food Potato Salad [5,] Magilla Gorilla food Cole Slaw [6,] Magilla Gorilla food BBQ Chicken [7,] Magilla Gorilla other Softball [8,] Magilla Gorilla other Volleyball [9,] Magilla Gorilla other Lawn Chairs [10,] Magilla Gorilla other Blanket On Sun, Jul 13, 2008 at 12:56 AM, Hohm, Dale [EMAIL PROTECTED] wrote: Thanks for the reply Jim. Here is a representation of the data I want to analyze - 10 records as requested. Each line can easily include an ID number as below. So I want to determine a frequency or percentage of respondents that bring each of the 5 foods (Hamburgers, Hot Dogs, Potato Salad, Cole Slaw and BBQ Chicken) and how many Other write-ins there are. The same for what else is brought besides food (Softball, Volleyball, Lawn Chairs and Blanket) as well as a count of Other write-ins. I'll also need to be able to discern how many brought Hambergers AND a Blanket or how many brought a Softball AND a Vollyball etc. ID|Your Name|Do you picnic?|What is your favorite picnic food?|What do you bring besides food? 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good stuff|Softball;#Blanket;#I bring boo-boo, but he hides 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn Chairs;#Blanket;#my running shoes 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket 5|Foghorn Leghorn|Yes|Hot Dogs;#Cole Slaw;#I say, I say, BBQ Chicken?|Softball;#Blanket 6|Peter Potamus|Yes|Hamburgers;#Hot Dogs;#anything, just a lot of it|Softball;#Lawn Chairs;#hot air balloon 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit 8|Fleegle, Bingo, Drooper and Snorky|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#A banana split|a laugh track 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my flying car 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Heavens to Murgatroyd! Exit stage left! Thanks in advance, Dale -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Saturday, July 12, 2008 11:32 AM To: Hohm, Dale Cc: r-help@r-project.org Subject: Re: [R] Reading Multi-value data fields for descriptive analysis Can you provide a more complete example (say 10 lines) of what the input is like. Does each line have a unique index that can be related to it? Do you want to summarize all the multi1-n values of Col2? Do you want to know the percentage of input lines that have a Col3/multi-value4 on them? You could read in the data as you have indicated below and add a column that is the record number and therefore you would have have to worry about trying to say if it existed or not. For example, you might have: Rec#|col#|value 1|1|single 1|2|multi1 1|2|multi2 1|3|multi1 2|1|single 3|1|single 3|2|multi1 There are a number of potential ways of representing the data, but a lot depends on what you want to do with it, so a more extensive example of the input, along with the type of output you would like will help in providing an answer. On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale [EMAIL PROTECTED] wrote: Hello, I'm looking for help on the best approach to get multi-value data fields into R for simple descriptive analysis. - I am new to this list and new to R, but I really want to get over the hump and get productive with it. Some help with how to best get the
Re: [R] shapiro wilk normality test
Marta Colombo wrote: Hi! Well, if you look at the output: shapiro.test(rnorm(5000))     Shapiro-Wilk normality test data: rnorm(5000) W = 0.9997, p-value = 0.6205 You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis. H0: normal data   vs H1: not normal So shapiro.wilk test is saying that your data are normal and it's correct! Bye Marta A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence. Your first sentence is correct, but not the second. Why test for normality? What downstream method depends on it? If normality is in doubt why not use a method that doesn't require it? Frank Harrell - Messaggio originale - Da: C.H. [EMAIL PROTECTED] A: Bunny, lautloscrew.com [EMAIL PROTECTED] Cc: r-help@r-project.org Inviato: Domenica 13 luglio 2008, 7:27:43 Oggetto: Re: [R] shapiro wilk normality test You may consider the nortest package. http://cran.r-project.org/web/packages/nortest/index.html Regards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com [EMAIL PROTECTED] wrote: Hi everybody, somehow i dont get the shapiro wilk test for normality. i just can´t find what the H0 is . i tried :  shapiro.test(rnorm(5000))     Shapiro-Wilk normality test data: rnorm(5000) W = 0.9997, p-value = 0.6205 If normality is the H0, the test says it´s probably not normal, doesn´t it ? 5000 is the biggest n allowed by the test... are there any other test ? ( i know qqnorm already ;) thanks in advance matthias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a classic paper with that title? If so, I'd be pleased to have a reference to it! Thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 15:55:35 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
http://www.bmj.com/cgi/content/full/311/7003/485 Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding Sent: Sunday, July 13, 2008 10:56 AM To: Frank E Harrell Jr Cc: r-help@r-project.org Subject: Re: [R] shapiro wilk normality test On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a classic paper with that title? If so, I'd be pleased to have a reference to it! Thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 15:55:35 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
G'day all, On Sun, 13 Jul 2008 15:55:38 +0100 (BST) (Ted Harding) [EMAIL PROTECTED] wrote: On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. I would have thought that we need more data would qualify as a conclusion. :) Please read the classic paper Absence of Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a classic paper with that title? If so, I'd be pleased to have a reference to it! Of course, I do not know for sure which paper Frank has in mind, but google and google schoar readily come up with papers/editorials that have a nearly identical title: http://www.bmj.com/cgi/content/full/311/7003/485 http://bmj.bmjjournals.com/cgi/content/full/328/7438/476 (see also http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831) http://www.ncbi.nlm.nih.gov/pubmed/6829975 My money is on Frank having the first of these publications in mind. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another packaging question
Uwe Ligges [EMAIL PROTECTED] [Sat, Jul 12, 2008 at 11:48:38AM CEST]: Johannes Huesing wrote: I am still trying to build a package. At the moment I am stuck with a file not found error message when processing R code from the tests subdirectory. What would be the accurate relative path for files in the tests directory to access files in the data directory? You do not need a path. Just say data(your_data's_name) (if data is not already under Lazy Loading) in the R code. I guess you rely on an installed package when the tests are executed, don't you? Yes, that did it, thank you. Yes, and I had to learn that I have to include library(packagename) in the example files- -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[EMAIL PROTECTED] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] initialize a factor vector
What is the least surprising way of initializing a factor with predefined levels and with length 0? as.factor(c(eins, zwei, drei))[FALSE] does the job but looks a bit weird. -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[EMAIL PROTECTED] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another packaging question
Johannes Huesing wrote: Uwe Ligges [EMAIL PROTECTED] [Sat, Jul 12, 2008 at 11:48:38AM CEST]: Johannes Huesing wrote: I am still trying to build a package. At the moment I am stuck with a file not found error message when processing R code from the tests subdirectory. What would be the accurate relative path for files in the tests directory to access files in the data directory? You do not need a path. Just say data(your_data's_name) (if data is not already under Lazy Loading) in the R code. I guess you rely on an installed package when the tests are executed, don't you? Yes, that did it, thank you. Yes, and I had to learn that I have to include library(packagename) in the example files- Not in the examples but in the tests, as far as I know. Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
Many thanks to Berwin, and also to Charles Annis, for the references. The're good! Ted. On 13-Jul-08 15:22:03, Berwin A Turlach wrote: G'day all, On Sun, 13 Jul 2008 15:55:38 +0100 (BST) (Ted Harding) [EMAIL PROTECTED] wrote: On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. I would have thought that we need more data would qualify as a conclusion. :) Please read the classic paper Absence of Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a classic paper with that title? If so, I'd be pleased to have a reference to it! Of course, I do not know for sure which paper Frank has in mind, but google and google schoar readily come up with papers/editorials that have a nearly identical title: http://www.bmj.com/cgi/content/full/311/7003/485 http://bmj.bmjjournals.com/cgi/content/full/328/7438/476 (see also http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831) http://www.ncbi.nlm.nih.gov/pubmed/6829975 My money is on Frank having the first of these publications in mind. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 18:01:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I read only specific columns using read.csv or other read function
I was not able to follow the solution posted. Could you demonstrate this technique on an example data set. Thanks! dat - data.frame(a = letters[1:3], b = LETTERS[1:3], c = 1:3, d = 3:1) On Wed, Jul 2, 2008 at 1:13 PM, Charles C. Berry [EMAIL PROTECTED] wrote: On Wed, 2 Jul 2008, Ben Tupper wrote: On Jul 2, 2008, at 6:53 AM, Philip James Smith wrote: Hi R people: I have huge files with as many as 5000 columns. I'd really like to read only certain columns of those files. I know column names I want to read. I looked at the documentation of read.csv . Although there is a col.names option, it allows users to specify the names of the columns, rather than to pick the columns of interest. Any suggestions on how to pick the columns I want to read only, rather than the entire file, would be greatly appreciated. There is a unix utility called 'cut' that enables stuff like columns.1.3.5.to.7 - read.csv( pipe( cut -d, -f1,3,5-7 your.file ) ) and using col.pos - match(names.of.variables.you.want, scan(your.file, what=character(0), nlines=1 ) will enable you to set up the call to pipe. HTH, Chuck Hello, I think you want explicitly set the colClasses argument such that the columns you *don't* want are set to NULL and all others are set to appropriate classes. Cheers, Ben Phil Smith Duluth, GA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper [EMAIL PROTECTED] I GoodSearch for Ashwood Waldorf School. Raise money for your favorite charity or school just by searching the Internet with GoodSearch - www.goodsearch.com - powered by Yahoo! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mpirun question with Rmpi
Erin Hodgess [EMAIL PROTECTED] writes: Dear R People: I'm running Rmpi on a single machine and I have the following statement from the command line: mpirun -np 3 ./R --no-save eek1.in stuff4.out All three versions of eek1.in write to the same location, over-writing one another. You happen to see the results of the third process; some other time you might see the three outputs intermingled. The solutions are to have eek1.in create an appropriate output file (e.g., using mpi.comm.rank() to uniquify a base name) or to design eek1.in so that one of the nodes collates and outputs the results from all the others (i.e., only one node writes; a typical solution might include mpi.gather.Robj followed by a conditional based on mpi.comm.rank). Hope that helps. Martin The stuff4.out file only contains the third result. Is there a way to fix this such that it shows all 3 sets, please Thanks in advance, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] initialize a factor vector
On Sun, 2008-07-13 at 18:47 +0200, Johannes Huesing wrote: What is the least surprising way of initializing a factor with predefined levels and with length 0? as.factor(c(eins, zwei, drei))[FALSE] does the job but looks a bit weird. Notice that one does not need to specify any data as argument 'x' to factor() because, by default, x = character(). Therefore, we need only specify the levels we want: factor(levels = c(one,two,three)) factor(0) Levels: one two three HTH G __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel Trend Function
Disputed was probably not the correct wording for it. However, imputation means that you make assumptions regarding the distribution of your missing data dependent on the data that is available to you. Felipe had time series data, and it is common to predict from the past to the future in such models. However, I tried to outline why making these assumptions may be critical in Felipe's case due to the environment in which his data was missing (namely that the distribution of his variable around the missing values seemed odd and raised questions whether or not they were measure without error). The rule of thumb for imputation that I remember is: If you are not sure how plausible the assumptions were that you would have to make - drop it. But better, since newer, advice maybe available on that. However, I cannot remember reading a single study in my field in which data was imputed (I don't say there is none, but if they exist, they must be very rare), meaning that observations with missing data are typically dropped. Therefore, how you will be received for imputing data does not only depend on the particular application, but also on the research tradition of your field. General sources that review data imputation as well as associated problems are: Horton and Lipsitz, 2001, The American Statistician Schafer and Graham, 2002, Psychological Methods Best, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von paulandpen Gesendet: Sunday, July 13, 2008 4:27 AM An: [EMAIL PROTECTED]; [EMAIL PROTECTED] Betreff: Re: [R] Excel Trend Function Hi Felipe, Daniel mentions imputation is a disputed practice. There are recommendations and rules of thumb for its use. I am not sure that imputation is disputed. I would be interested to see some links to articles recommending against its use. Paul - Original Message - From: Felipe Carrillo [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, July 13, 2008 5:46 AM Subject: [R] Excel Trend Function Hi: I have a dataset and need to interpolate for missing days. In Excel I either average from sampled days from above and below the missing days or use the TREND function to make up for the missing values. I have been reading about na.approx, is this function similar to the TREND function? Which is the best recommendable way to make up for missing data? Here's my dataset: weeks 17,18,26 and 46 have 0 daysSamp. Year Week daysSamp Lower TotalPD Upper varTotalPD 2006 47 6 126988 188259 249530 1045878675 2006 48 7 189155 253350 317545 1148102355 2006 49 7 103300 132741 162182 241480186 2006 50 6 11801 252576 493352 16151006813 2006 51 7 2348 3671 4994 487926 2006 52 5 2606 29901 57196 215454181 2006 2 7 2968 4513 6058 664723 2006 3 7 1128 1889 2650 161231 2006 4 7 479 963 1447 65196 2006 5 7 2819 4413 6007 708094 2006 6 6 -1009 3128 7264 4766743 2006 7 7 -5239 10769 26777 71387835 2006 8 7 150 503 856 34685 2006 9 7 1858 2989 4120 356562 2006 10 7 193 494 795 25281 2006 11 7 125 346 567 13627 2006 12 7 432 767 1102 31189 2006 13 7 1229 1867 2505 113569 2006 14 7 813 1339 1865 77140 2006 15 4 -66 124 315 10105 2006 16 7 152 903 1654 157242 2006 17 0 2006 18 0 2006 19 5 0 0 0 0 2006 20 4 0 0 0 0 2006 21 5 0 0 0 0 2006 22 6 0 0 0 0 2006 23 7 -65 285 635 34112 2006 24 6 0 0 0 0 2006 25 7 0 0 0 0 2006 26 0 2006 27 4 228 931 1634 137726 2006 28 4 801 2231 3662 569977 2006 29 4 4544 9242 13939 6147522 2006 30 5 15798 28465 41131 44697915 2006 31 5 25398 41049 56701 68245523 2006 32 5 48197 82216 116235 322416917 2006 33 5 142980 230411 317841 2129630128 2006 34 5 227141 360468 493794 4952314336 2006 35 5 467244 756325 1045405 23281569629 2006 36 5 281049 463331 645614 9256900449 2006 37 2 227636 620330 1013023 42961663047 2006 38 3 478990 983472 1487954 70903343603 2006 39 7 539690 846522 1153354 26228718974 2006 40 7 320959 457866 594773 5221891252 2006 41 7 427561 582452 737343 6683813344 2006 42 7 271788 351103 430418 1752614293 2006 43 7 165019 208853 252687 535301133 2006 44 7 91514 117390 143266 186537178 2006 45 7 59061 79187 99313 112842787 2006 46 0 Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
[R] rm anova
Dear all, I am new to r and most happy for now:) I would like to ask an issue about rm-anova. I have data of an experiment with 24 subjects 3 treatment (8 replicates for each treatment) and 8 sampling through time. data sheet is something like that(just an example, not real). sample id,response(tp),treatment,date a1,119,HP,27 june a2,120,MP,27 june a3,150,C,27 june a4,100,C,27 june .. .. a1,90 HP, 7 july a2,80,MP,7 july a3,170,C,7 july a4,50,C,7 july . . . is it correct to formulize rm-anova as demo - aov(tn_mgl ~ factor(TN)*factor(prefix) + Error(sample/(factor(TN)+factor(prefix thanks in advance, best regards korhan ozkan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quick plotmath question
While the earlier solutions involving expression() and paste() work great, unfortunately Gabor's first suggestion doesn't display on the OS X default quartz device, and Gabor's second suggestion displays on quartz, but not to the pdf() device. In any event, the first replies in this thread provide a sufficient solution for me, so thanks all! Mike On 12-Jul-08, at 1:40 PM, Gabor Grothendieck wrote: And this gives a slightly different one: plot(1, main = \u394i \ubb 0) 2008/7/12 Gabor Grothendieck [EMAIL PROTECTED]: This works on my Windows Vista system: plot(1, main = \u394i \u300b 0) See: http://www.fileformat.info/info/unicode/char/300b/index.htm http://www.fileformat.info/info/unicode/char/394/index.htm On Sat, Jul 12, 2008 at 10:12 AM, Mike Lawrence [EMAIL PROTECTED] wrote: Hi all, Worked looked around for a while on this to no avail. I'm trying to create a plotmath expression that achieves: Δi 0 and while: expression(Delta*i0) comes close, I'd prefer to have the (denoting very much greater than). Maybe is a non-standard expression and therefore not supported? Mike -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University www.memetic.ca The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University www.memetic.ca The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] initialize a factor vector
Gavin Simpson [EMAIL PROTECTED] [Sun, Jul 13, 2008 at 08:18:37PM CEST]: On Sun, 2008-07-13 at 18:47 +0200, Johannes Huesing wrote: [...] as.factor(c(eins, zwei, drei))[FALSE] does the job but looks a bit weird. [...] factor(levels = c(one,two,three)) factor(0) Levels: one two three Ah, ok, I was unaware of factor(), only knew as.factor(). Many thanks Johannes -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[EMAIL PROTECTED] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
Frank E Harrell Jr [EMAIL PROTECTED] [Sun, Jul 13, 2008 at 08:07:37PM CEST]: (Ted Harding) wrote: On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence. [...] It's real. Full text is available to all: http://www.bmj.com/cgi/content/full/311/7003/485 The quotation is attributed to the late Carl Sagan who seemed to have used it as a strawman argument , see http://oyhus.no/AbsenceOfEvidence.html. -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[EMAIL PROTECTED] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I read only specific columns using read.csv or other read function
On Sun, 13 Jul 2008, Juliet Hannah wrote: I was not able to follow the solution posted. Could you demonstrate this technique on an example data set. Thanks! dat - data.frame(a = letters[1:3], b = LETTERS[1:3], c = 1:3, d = 3:1) Using your example: dat - data.frame(a = letters[1:3], b = LETTERS[1:3], c = 1:3, d = 3:1) write.csv(dat,file=yourFrame.csv) col.pos - match(c(b,d), scan(yourFrame.csv,sep=',',what=character(0),nlines=1)) Read 5 items con - pipe( paste( cut -d, -f,paste(col.pos,collapse=','), yourFrame.csv,sep='')) cols.b.d - read.csv( con ) cols.b.d b d 1 A 3 2 B 2 3 C 1 HTH, Chuck On Wed, Jul 2, 2008 at 1:13 PM, Charles C. Berry [EMAIL PROTECTED] wrote: On Wed, 2 Jul 2008, Ben Tupper wrote: On Jul 2, 2008, at 6:53 AM, Philip James Smith wrote: Hi R people: I have huge files with as many as 5000 columns. I'd really like to read only certain columns of those files. I know column names I want to read. I looked at the documentation of read.csv . Although there is a col.names option, it allows users to specify the names of the columns, rather than to pick the columns of interest. Any suggestions on how to pick the columns I want to read only, rather than the entire file, would be greatly appreciated. There is a unix utility called 'cut' that enables stuff like columns.1.3.5.to.7 - read.csv( pipe( cut -d, -f1,3,5-7 your.file ) ) and using col.pos - match(names.of.variables.you.want, scan(your.file, what=character(0), nlines=1 ) will enable you to set up the call to pipe. HTH, Chuck Hello, I think you want explicitly set the colClasses argument such that the columns you *don't* want are set to NULL and all others are set to appropriate classes. Cheers, Ben Phil Smith Duluth, GA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper [EMAIL PROTECTED] I GoodSearch for Ashwood Waldorf School. Raise money for your favorite charity or school just by searching the Internet with GoodSearch - www.goodsearch.com - powered by Yahoo! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Hello everyone, I am using the following code to try to calculate the mean : dat-read.table(file=C:\\Documents and Settings.txt) dat-as.numeric(dat) x1.mmean(dat) I am getting the following error message Error in eval.with.vis(expr,envir,enclos): (list) object cannot be coerced to typedouble' I do not understand what is wrong as I thought that I have changed dat to a numeric.Whenever I list x1.m all I get are NA Thank you Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
What does 'str(dat)' show? the statement dat - as.numeric(dat) says you are trying to make an entire dataframe numeric. This is probably not what you want to do. What is it you want to do? Have you tried summary(dat) e.g., x - data.frame(a=1:10, b=101:110, c=letters[1:10]) summary(x) a b c Min. : 1.00 Min. :101.0 a :1 1st Qu.: 3.25 1st Qu.:103.2 b :1 Median : 5.50 Median :105.5 c :1 Mean : 5.50 Mean :105.5 d :1 3rd Qu.: 7.75 3rd Qu.:107.8 e :1 Max. :10.00 Max. :110.0 f :1 (Other):4 On Sun, Jul 13, 2008 at 4:05 PM, Paul Adams [EMAIL PROTECTED] wrote: Hello everyone, I am using the following code to try to calculate the mean : dat-read.table(file=C:\\Documents and Settings.txt) dat-as.numeric(dat) x1.mmean(dat) I am getting the following error message Error in eval.with.vis(expr,envir,enclos): (list) object cannot be coerced to typedouble' I do not understand what is wrong as I thought that I have changed dat to a numeric.Whenever I list x1.m all I get are NA Thank you Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stem and leaf plot: how to edit the stem-values
Hi, I would like to make a stem and leaf plot and I want to edit the category-names. So, by doing this: x - c(1,2,2,3,3,3,3,2,2,1) stem(x) I get: 1 | 00 1 | 2 | 2 | 3 | First Question: Why do I get gaps between the categories? (like in line 2 and line 4) And second: How can I edit the categories so that I can create something like that: category A | 00 category B | category C | __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assoociative array?
Thank you I will try drop=TRUE. In the mean time do you know how I can access the members (for lack of a better term) of the results of a split? In the sample you provided below you have: z - split(x, list(x$cat, x$a), drop=TRUE) Now I can print out 'z[1], z[2] etc' This is nice but what if I want the access/iterate through all of the members of a particular column in z. You have given some methods like z[[1]]$b to access the specific columns in z. I notice for your example z[[1]]$b prints out two values. Can I assume that z[[1]]$b is a vecotr? So if I want to find the mean i can 'mean(z[[1]]$b)' and it will give me the mean value of the b columns in z? (similarily sum, and range, etc.). Does nrows(z[[1]]$b) return two in your example below? I would like to find out how many elements are in z[1]. Or would it be just as fast to do 'nrows(z[1])'? Thank you for this extended session on data frames, matrices, and vectors. I feel much more comfortable with the concepts now. Kevin jim holtman [EMAIL PROTECTED] wrote: The reason for the empty levels was I did not put drop=TRUE on the split to remove unused levels. Here is the revised script: set.seed(1) # start with a known number x - data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=sample(letters[1:4], 20, TRUE), b=runif(20)) x cat a b 1A d 0.82094629 2B a 0.64706019 3B c 0.78293276 4C a 0.55303631 5A b 0.52971958 6C b 0.78935623 7C a 0.02333120 8B b 0.47723007 9B d 0.73231374 10 A b 0.69273156 11 A b 0.47761962 12 A c 0.86120948 13 C b 0.43809711 14 B a 0.24479728 15 C d 0.07067905 16 B c 0.09946616 17 C d 0.31627171 18 C a 0.51863426 19 B c 0.66200508 20 C b 0.40683019 # drop unused groups from the split (z - split(x, list(x$cat, x$a), drop=TRUE)) $B.a cat a b 2B a 0.6470602 14 B a 0.2447973 $C.a cat a b 4C a 0.55303631 7C a 0.02333120 18 C a 0.51863426 $A.b cat a b 5A b 0.5297196 10 A b 0.6927316 11 A b 0.4776196 $B.b cat a b 8 B b 0.4772301 $C.b cat a b 6C b 0.7893562 13 C b 0.4380971 20 C b 0.4068302 $A.c cat a b 12 A c 0.8612095 $B.c cat a b 3B c 0.78293276 16 B c 0.09946616 19 B c 0.66200508 $A.d cat a b 1 A d 0.8209463 $B.d cat a b 9 B d 0.7323137 $C.d cat a b 15 C d 0.07067905 17 C d 0.31627171 # access the value ('b' in this instance); two ways- should be the same z[[1]]$b [1] 0.6470602 0.2447973 z$B.a$b [1] 0.6470602 0.2447973 On Sun, Jul 13, 2008 at 1:26 AM, [EMAIL PROTECTED] wrote: This is almost it. Maybe it is as good as can be expected. The only problem that I see is that this seems to form a Category/SubCategory pair where none existed in the original data. For example, A might have two sub-categories a and b, and B might have two categories c and d. As far as I can tell the method that you outlined forms a Category/SubCategory pair like B a or B b where none existed. This results in alot of empty lists and it seems to take a long time to generate. But if that is as good as it gets then I can live with it. I know that I said one more question. But I have run into a problem. c - split(x, x$Category) returns a vector of the rows in each of the categories. Now I would like to access the Quantity column within this split vector. I can see it listed. I just can't access it. I have tried c[1]$Quantity and c[1,2] both which give me errors. Any ideas? Sorry this is so hard for me. I am more used to C type arrays and C type arrays of structures. This seems to be somewhat different. Thank you. Kevin jim holtman [EMAIL PROTECTED] wrote: Is this something like what you were asking for? The output of a 'split' will be a list of the dataframe subsets for the categories you have specified. x - data.frame(g1=sample(LETTERS[1:2],30,TRUE), + g2=sample(letters[1:2], 30, TRUE), + g3=1:30) y - split(x, list(x$g1, x$g2)) str(y) List of 4 $ A.a:'data.frame':7 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 1 1 1 1 1 1 1 ..$ g2: Factor w/ 2 levels a,b: 1 1 1 1 1 1 1 ..$ g3: int [1:7] 3 4 6 8 9 13 24 $ B.a:'data.frame':7 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 2 2 2 2 2 2 2 ..$ g2: Factor w/ 2 levels a,b: 1 1 1 1 1 1 1 ..$ g3: int [1:7] 10 11 16 17 18 20 25 $ A.b:'data.frame':6 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 1 1 1 1 1 1 ..$ g2: Factor w/ 2 levels a,b: 2 2 2 2 2 2 ..$ g3: int [1:6] 2 12 23 26 27 29 $ B.b:'data.frame':10 obs. of 3 variables: ..$ g1: Factor w/ 2 levels A,B: 2 2 2 2 2 2 2 2 2 2 ..$ g2: Factor w/ 2 levels a,b: 2 2 2 2 2 2 2 2 2 2 ..$ g3: int [1:10] 1 5 7 14 15 19 21 22 28 30 y
Re: [R] Difficultes with grep
Thank you, but this is not what i want exactly.. i would want to launch function myfun with this script: table - sample(LETTERS[1:5], 20,TRUE) name - A myfun - function(name) { + r - grep (name[^0-9], table ) + return (r) } but if I do it ,R doesn't accept this.. i want this because i have in table (a data frame) ,a list of element that are hsa-mir-N (when N is any number)..so, if i put in argument name this is: hsa-mir-70 ,function matches hsa-mir-70, but also (for instance) hsa-mir-700, 710 724 and so on... infact i insert a square brackets ^0-9 to exclude any other number after those i have given (in name).. It's a little complex situation.. :( p.s: i can't put in name simply: hsa-mir-20 to match all hsa-mir-20 in the data frame because i need to match hsa-mir-20 but also (for instance) hsa-mir-20-3... or hsa-mir-20b, hsa-mir-20a. and not those elements with another number after the 20 jholtman wrote: I think this is what you want table - sample(LETTERS[1:5], 20,TRUE) name - A myfun - function(name) { + r - grep (name, table ) + return (r) } myfun(name) [1] 4 7 14 18 table [1] E B D A B B A B E B C C C A E D D A D C On Fri, Jul 11, 2008 at 1:57 PM, Fran100681 [EMAIL PROTECTED] wrote: Hello everybody! I'm using R and I have a little problem about function grep. I 've got to make a new function in which grep is present. So the first argument of grep is the string we want to find,ok..but in this case I define a function x before , x receives an argument in a object name (for instance), then inside function x ,i define a grep.. so i want to set as pattern (1st argument of grep) what i put in name and not the string name... how do i do that? ex: name - Tom myfun - function(name) { r - grep (name, table ) return (r) } It returns nothing because it searches the word name in table rather Tom... I hope to receive some little help because this is stopping me in my projcet :/ ... i m'not able to reach a solution! Thanks a lot! -- View this message in context: http://www.nabble.com/Difficultes-with-grep-tp18409347p18409347.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Difficultes-with-grep-tp18409347p18428404.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing RWinEdt
I checked and reapplied full access to the whole WinEdt directory. I am not the administrator but I am a member of the administrators group. This couple with the fact the I have given full control to the WinEdt directory suggests that the problem is elsewhere. But this is not the first time that I have run into permission problems with Windows 2008 server. Kevin Uwe Ligges [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: From the R console I invoke: install.packages(RWinEdt) and get: Warning in install.packages(RWinEdt) : argument 'lib' is missing: using 'F:\Users\Kevin\Documents/R/win-library/2.7' --- Please select a CRAN mirror for use in this session --- trying URL 'http://streaming.stat.iastate.edu/CRAN/bin/windows/contrib/2.7/RWinEdt_1.8-0.zip' Content type 'application/zip' length 361598 bytes (353 Kb) opened URL downloaded 353 Kb package 'RWinEdt' successfully unpacked and MD5 sums checked The downloaded packages are in F:\Users\Kevin\AppData\Local\Temp\RtmpOIlW0F\downloaded_packages updating HTML package descriptions So it seems to have worked. But when I use the 'library' command I get: library(RWinEdt) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : cannot open file 'F:\Program Files (x86)\WinEdt Team\WinEdt\R.ver': No such file or directory Error : .onAttach failed in 'attachNamespace' Error: package/namespace load failed for 'RWinEdt' Any ideas on how I can install this package? With administrator privileges, since it needs to write some files into the WinEdt directory. Best wishes, Uwe Ligges Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
Ted Harding [EMAIL PROTECTED] [Sun, Jul 13, 2008 at 10:59:21PM CEST]: On 13-Jul-08 19:53:47, Johannes Huesing wrote: Frank E Harrell Jr [EMAIL PROTECTED] [Sun, Jul 13, 2008 at 08:07:37PM CEST]: (Ted Harding) wrote: On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. [...] But absence of evidence, in my interpretation (which I believe is right for the statistical context of non-significant P-values), means that we do not know about A: we do not have enough information. What would the p-value have to be like in your opinion to make the null hypothesis look more likely after the experiment than before? The proof is, basically, given in terms of a 2-valued logic where every term is either TRUE or FALSE. In the real world we have at least a third possible value: UNKNOWN (or, as R would put it, NA). How would the probabilities that A is NA be affected by the outcome of an experiment like this? If this probability is affected, how does this leave the probability that A is T or F unaffected? Or do you assign the NA status to the data collected? A high p-value does not always equate that you might as well have collected nothing but missing values. Of course I buy into the notion that a point estimate with a measure of accuracy is much better suited to describe your data; but a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis. -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[EMAIL PROTECTED] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] any way to set defaults for par?
I know how to set graphic parameters by calling par(), but what I'd like is a way to set the default values so that subsequent calls to par() use my defaults. The reason to want this is that every time I create a new graphic window (I'm using quartz on OSX, and so far no answers in the Mac mailing list), my parameters get reset to the builtin defaults. I read about the unexported variable .Pars, but would like to know if there's any way to manipulate that variable. thanks Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difficultes with grep
On Sun, 13 Jul 2008, Fran100681 wrote: Thank you, but this is not what i want exactly.. i would want to launch function myfun with this script: table - sample(LETTERS[1:5], 20,TRUE) name - A myfun - function(name) { + r - grep (name[^0-9], table ) ...XX This is not correct syntax. And likely R told you so. If you intend name[^0-9] to be read as character, you need to quote it. Perhaps you want a more complicated regex than the one Jim handed you, like name - A[^0-9] ?? + return (r) } but if I do it ,R doesn't accept this.. i want this because i have in table (a data frame) ,a list of element that are hsa-mir-N (when N is any number)..so, if i put in argument name this is: hsa-mir-70 ,function matches hsa-mir-70, but also (for instance) hsa-mir-700, 710 724 and so on... infact i insert a square brackets ^0-9 to exclude any other number after those i have given (in name).. It's a little complex situation.. :( p.s: i can't put in name simply: hsa-mir-20 to match all hsa-mir-20 in the data frame because i need to match hsa-mir-20 but also (for instance) hsa-mir-20-3... or hsa-mir-20b, hsa-mir-20a. and not those elements with another number after the 20 jholtman wrote: I think this is what you want table - sample(LETTERS[1:5], 20,TRUE) name - A myfun - function(name) { + r - grep (name, table ) + return (r) } myfun(name) [1] 4 7 14 18 table [1] E B D A B B A B E B C C C A E D D A D C On Fri, Jul 11, 2008 at 1:57 PM, Fran100681 [EMAIL PROTECTED] wrote: Hello everybody! I'm using R and I have a little problem about function grep. I 've got to make a new function in which grep is present. So the first argument of grep is the string we want to find,ok..but in this case I define a function x before , x receives an argument in a object name (for instance), then inside function x ,i define a grep.. so i want to set as pattern (1st argument of grep) what i put in name and not the string name... how do i do that? ex: name - Tom myfun - function(name) { r - grep (name, table ) return (r) } It returns nothing because it searches the word name in table rather Tom... I hope to receive some little help because this is stopping me in my projcet :/ ... i m'not able to reach a solution! Thanks a lot! -- View this message in context: http://www.nabble.com/Difficultes-with-grep-tp18409347p18409347.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Difficultes-with-grep-tp18409347p18428404.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assoociative array?
On Sun, Jul 13, 2008 at 5:45 PM, [EMAIL PROTECTED] wrote: Thank you I will try drop=TRUE. In the mean time do you know how I can access the members (for lack of a better term) of the results of a split? In the sample you provided below you have: z - split(x, list(x$cat, x$a), drop=TRUE) You can do 'str(z)' to see the structure of 'z'. In most cases, you should be able to reference by the keys, if they exist: n - 20 set.seed(1) x - data.frame(a=sample(LETTERS[1:2], n,TRUE), b=sample(letters[1:4], n, TRUE), val=runif(n)) z - split(x, list(x$a, x$b), drop=TRUE) str(z) List of 8 $ A.a:'data.frame':2 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 1 1 ..$ b : Factor w/ 4 levels a,b,c,d: 1 1 ..$ val: num [1:2] 0.647 0.245 $ B.a:'data.frame':3 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 2 2 2 ..$ b : Factor w/ 4 levels a,b,c,d: 1 1 1 ..$ val: num [1:3] 0.5530 0.0233 0.5186 $ A.b:'data.frame':3 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 1 1 1 ..$ b : Factor w/ 4 levels a,b,c,d: 2 2 2 ..$ val: num [1:3] 0.530 0.693 0.478 $ B.b:'data.frame':4 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 2 2 2 2 ..$ b : Factor w/ 4 levels a,b,c,d: 2 2 2 2 ..$ val: num [1:4] 0.789 0.477 0.438 0.407 $ A.c:'data.frame':3 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 1 1 1 ..$ b : Factor w/ 4 levels a,b,c,d: 3 3 3 ..$ val: num [1:3] 0.8612 0.0995 0.6620 $ B.c:'data.frame':1 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 2 ..$ b : Factor w/ 4 levels a,b,c,d: 3 ..$ val: num 0.783 $ A.d:'data.frame':1 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 1 ..$ b : Factor w/ 4 levels a,b,c,d: 4 ..$ val: num 0.821 $ B.d:'data.frame':3 obs. of 3 variables: ..$ a : Factor w/ 2 levels A,B: 2 2 2 ..$ b : Factor w/ 4 levels a,b,c,d: 4 4 4 ..$ val: num [1:3] 0.7323 0.0707 0.3163 Here are some examples of accessing the data: z$B.d a bval 9 B d 0.73231374 15 B d 0.07067905 17 B d 0.31627171 # or just the value (it is a vector) z$B.d$val [1] 0.73231374 0.07067905 0.31627171 # or by name z[[B.d]]$val [1] 0.73231374 0.07067905 0.31627171 # or by absolute number z[[8]]$val [1] 0.73231374 0.07067905 0.31627171 # take the mean mean(z$B.d$val) [1] 0.3730882 # get the length length(z$B.d$val) [1] 3 Now I can print out 'z[1], z[2] etc' This is nice but what if I want the access/iterate through all of the members of a particular column in z. You have given some methods like z[[1]]$b to access the specific columns in z. I notice for your example z[[1]]$b prints out two values. Can I assume that z[[1]]$b is a vecotr? So if I want to find the mean i can 'mean(z[[1]]$b)' and it will give me the mean value of the b columns in z? (similarily sum, and range, etc.). Does nrows(z[[1]]$b) return two in your example below? I would like to find out how many elements are in z[1]. Or would it be just as fast to do 'nrows(z[1])'? Thank you for this extended session on data frames, matrices, and vectors. I feel much more comfortable with the concepts now. Kevin jim holtman [EMAIL PROTECTED] wrote: The reason for the empty levels was I did not put drop=TRUE on the split to remove unused levels. Here is the revised script: set.seed(1) # start with a known number x - data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=sample(letters[1:4], 20, TRUE), b=runif(20)) x cat a b 1A d 0.82094629 2B a 0.64706019 3B c 0.78293276 4C a 0.55303631 5A b 0.52971958 6C b 0.78935623 7C a 0.02333120 8B b 0.47723007 9B d 0.73231374 10 A b 0.69273156 11 A b 0.47761962 12 A c 0.86120948 13 C b 0.43809711 14 B a 0.24479728 15 C d 0.07067905 16 B c 0.09946616 17 C d 0.31627171 18 C a 0.51863426 19 B c 0.66200508 20 C b 0.40683019 # drop unused groups from the split (z - split(x, list(x$cat, x$a), drop=TRUE)) $B.a cat a b 2B a 0.6470602 14 B a 0.2447973 $C.a cat a b 4C a 0.55303631 7C a 0.02333120 18 C a 0.51863426 $A.b cat a b 5A b 0.5297196 10 A b 0.6927316 11 A b 0.4776196 $B.b cat a b 8 B b 0.4772301 $C.b cat a b 6C b 0.7893562 13 C b 0.4380971 20 C b 0.4068302 $A.c cat a b 12 A c 0.8612095 $B.c cat a b 3B c 0.78293276 16 B c 0.09946616 19 B c 0.66200508 $A.d cat a b 1 A d 0.8209463 $B.d cat a b 9 B d 0.7323137 $C.d cat a b 15 C d 0.07067905 17 C d 0.31627171 # access the value ('b' in this instance); two ways- should be the same z[[1]]$b [1] 0.6470602 0.2447973 z$B.a$b [1] 0.6470602 0.2447973 On Sun, Jul 13, 2008 at 1:26 AM, [EMAIL PROTECTED] wrote: This is almost it. Maybe it is as good as can be expected. The only problem that I
Re: [R] Difficultes with grep
or your function looks like this were you dynamically create the string: myfun - function(name) { r - grep (paste(name, [^0-9], sep=), table ) return (r) } On Sun, Jul 13, 2008 at 7:24 AM, Fran100681 [EMAIL PROTECTED] wrote: Thank you, but this is not what i want exactly.. i would want to launch function myfun with this script: table - sample(LETTERS[1:5], 20,TRUE) name - A myfun - function(name) { + r - grep (name[^0-9], table ) + return (r) } but if I do it ,R doesn't accept this.. i want this because i have in table (a data frame) ,a list of element that are hsa-mir-N (when N is any number)..so, if i put in argument name this is: hsa-mir-70 ,function matches hsa-mir-70, but also (for instance) hsa-mir-700, 710 724 and so on... infact i insert a square brackets ^0-9 to exclude any other number after those i have given (in name).. It's a little complex situation.. :( p.s: i can't put in name simply: hsa-mir-20 to match all hsa-mir-20 in the data frame because i need to match hsa-mir-20 but also (for instance) hsa-mir-20-3... or hsa-mir-20b, hsa-mir-20a. and not those elements with another number after the 20 jholtman wrote: I think this is what you want table - sample(LETTERS[1:5], 20,TRUE) name - A myfun - function(name) { + r - grep (name, table ) + return (r) } myfun(name) [1] 4 7 14 18 table [1] E B D A B B A B E B C C C A E D D A D C On Fri, Jul 11, 2008 at 1:57 PM, Fran100681 [EMAIL PROTECTED] wrote: Hello everybody! I'm using R and I have a little problem about function grep. I 've got to make a new function in which grep is present. So the first argument of grep is the string we want to find,ok..but in this case I define a function x before , x receives an argument in a object name (for instance), then inside function x ,i define a grep.. so i want to set as pattern (1st argument of grep) what i put in name and not the string name... how do i do that? ex: name - Tom myfun - function(name) { r - grep (name, table ) return (r) } It returns nothing because it searches the word name in table rather Tom... I hope to receive some little help because this is stopping me in my projcet :/ ... i m'not able to reach a solution! Thanks a lot! -- View this message in context: http://www.nabble.com/Difficultes-with-grep-tp18409347p18409347.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Difficultes-with-grep-tp18409347p18428404.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
See at end. On 13-Jul-08 21:42:19, Johannes Huesing wrote: Ted Harding [EMAIL PROTECTED] [Sun, Jul 13, 2008 at 10:59:21PM CEST]: On 13-Jul-08 19:53:47, Johannes Huesing wrote: Frank E Harrell Jr [EMAIL PROTECTED] [Sun, Jul 13, 2008 at 08:07:37PM CEST]: (Ted Harding) wrote: On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. [...] But absence of evidence, in my interpretation (which I believe is right for the statistical context of non-significant P-values), means that we do not know about A: we do not have enough information. What would the p-value have to be like in your opinion to make the null hypothesis look more likely after the experiment than before? The proof is, basically, given in terms of a 2-valued logic where every term is either TRUE or FALSE. In the real world we have at least a third possible value: UNKNOWN (or, as R would put it, NA). How would the probabilities that A is NA be affected by the outcome of an experiment like this? If this probability is affected, how does this leave the probability that A is T or F unaffected? Or do you assign the NA status to the data collected? A high p-value does not always equate that you might as well have collected nothing but missing values. Of course I buy into the notion that a point estimate with a measure of accuracy is much better suited to describe your data; but a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis. -- Johannes Hüsing I shall perhaps try later to respond in more detail to specific points above. But, for the moment, let me say that I think your statement a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis is the main key. The power function of a test (which of course depends on the design of the investigation and on its size, i.e. number of data gathered) is basically much the same (in my mind) as the amount of evidence. A high P-value with a very powerful test serves to exclude all alternatives to the Null Hypothesis except those which lie very close to the Null Hypothesis. In that sense, we do in fact have a lot of evidence against all hypotheses except those which are very similar to the Null. So we are not in an absence of evidence situation, and we do have evidence of absence. The basic logic of a Hypothesis Test (in its standard sense) is the generalisation, to a logic where certainty is at best probabilistic, of the classical-logic argument: Given (as a matter of fact): If A, then B Observed: B is FALSE Conclusion: A is FALSE Probabilistically: Given: If A (H0), then B has high probability Observed: B is FALSE Conclusion: An event (not-B) has occurred which has very small probability if A is TRUE. Hence we (as George Barnard used to put it) apply The Principle of Disbelief in Tall Stories and disbelieve A to the extent that we disbelieve not-B as a possible outcome from A (H0). In applications, the event B will be specified in terms of a set of possible values of a Test Statistic T, devised so as to represent an interesting measure of discrepancy between the data and the hypothesis H0 (e.g. the t-statistic for testing whether two samples are drawn from populations with equal means -- if that is the case, then E(T) = 0, and the set of values {abs(T) T0} will be a discrepant set. By choosing T0 to be such that Prob(abs(T) T0) = p0, a small value which we choose to suit ourselves, we are defining the threshold at which we are prepared to deem that the claim that Abs(T) T0 is compatible with H0 is too unlikely to be plausible. The cleanest example in real life can be drawn from the basic principle in criminal law for concluding that an accused person is guilty, namely The accused is deemed innocent until proved guilty beyond reasonable doubt. What constitutes reasonable doubt can become a very interesting question, but there are some crimes for which it has a definite statistical interpretation, typically exceeding some authorised limit (of speed in a vehicle, of alcohol content in the blood while driving a vehicle, of a factory plant exceeding permitted levels of polluting emissions [which in the UK, under the Environmental Protection Act, is a criminal offence]. In the days when blood alcohol was determined by laboratory analysis of a blood sample, it was possible to determine that the margin of error corresponded to a P-value less than or equal to 0.001 (i.e. if the lab analysis yielded a result in exceess of the legal limit + 2*SE, then the inevitable result was a conviction unless it could be independently proved in defence that the statutory procedures were carried out in a flawed manner). So, in that case,
Re: [R] Reading Multi-value data fields for descriptive analysis
Thanks Jim, I wish I were comfortable enough with the language for the fix needed to the syntax to be obvious, but it is not yet. With your example, I get: Error in strsplit(x[.row, ][[i]], ;#) : non-character argument x appears to be filled properly, but z is not due to the error. Also, if you were willing to provide some brief annotation or describe the overall logic in the code you supplied it would help me immensely. Thanks, Dale -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Sunday, July 13, 2008 6:35 AM To: Hohm, Dale Cc: r-help@r-project.org Subject: Re: [R] Reading Multi-value data fields for descriptive analysis This may do what you want: x - read.table(/tempxx.txt, comment=, quote=, sep=|, header=TRUE, as.is=TRUE) # split out by name z - lapply(seq(nrow(x)), function(.row){ + .result - NULL + # construct the data output + for (i in c('picnic', 'food', 'other')){ + .split - strsplit(x[.row,][[i]], ;#) + .result - rbind(.result, cbind(name=x[.row,][['name']], field=i, value=unlist(.split))) + } + .result + }) z [[1]] namefieldvalue [1,] Yogi Bear picnic Yes [2,] Yogi Bear food Hamburgers [3,] Yogi Bear food Hot Dogs [4,] Yogi Bear food I rely on others to bring the good stuff [5,] Yogi Bear other \Softball [6,] Yogi Bear other Blanket [7,] Yogi Bear other I bring boo-boo, but he hides\ [[2]] name fieldvalue [1,] Boo-Boo picnic Yes [2,] Boo-Boo food Potato Salad [3,] Boo-Boo food Cole Slaw [4,] Boo-Boo food whatever Yogi doesn't eat [5,] Boo-Boo other Lawn Chairs [6,] Boo-Boo other Blanket [7,] Boo-Boo other my running shoes [[3]] name fieldvalue [1,] Ranger Rick picnic No [2,] Ranger Rick food I told you I don't picnic [3,] Ranger Rick other a big net and handcuffs [[4]] name fieldvalue [1,] Magilla Gorilla picnic Yes [2,] Magilla Gorilla food Hamburgers [3,] Magilla Gorilla food Hot Dogs [4,] Magilla Gorilla food Potato Salad [5,] Magilla Gorilla food Cole Slaw [6,] Magilla Gorilla food BBQ Chicken [7,] Magilla Gorilla other Softball [8,] Magilla Gorilla other Volleyball [9,] Magilla Gorilla other Lawn Chairs [10,] Magilla Gorilla other Blanket On Sun, Jul 13, 2008 at 12:56 AM, Hohm, Dale [EMAIL PROTECTED] wrote: Thanks for the reply Jim. Here is a representation of the data I want to analyze - 10 records as requested. Each line can easily include an ID number as below. So I want to determine a frequency or percentage of respondents that bring each of the 5 foods (Hamburgers, Hot Dogs, Potato Salad, Cole Slaw and BBQ Chicken) and how many Other write-ins there are. The same for what else is brought besides food (Softball, Volleyball, Lawn Chairs and Blanket) as well as a count of Other write-ins. I'll also need to be able to discern how many brought Hambergers AND a Blanket or how many brought a Softball AND a Vollyball etc. ID|Your Name|Do you picnic?|What is your favorite picnic food?|What do you bring besides food? 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good stuff|Softball;#Blanket;#I bring boo-boo, but he hides 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn Chairs;#Blanket;#my running shoes 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket 5|Foghorn Leghorn|Yes|Hot Dogs;#Cole Slaw;#I say, I say, BBQ Chicken?|Softball;#Blanket 6|Peter Potamus|Yes|Hamburgers;#Hot Dogs;#anything, just a lot of it|Softball;#Lawn Chairs;#hot air balloon 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit 8|Fleegle, Bingo, Drooper and Snorky|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#A banana split|a laugh track 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my flying car 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Heavens to Murgatroyd! Exit stage left! Thanks in advance, Dale -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Saturday, July 12, 2008 11:32 AM To: Hohm, Dale Cc: r-help@r-project.org Subject: Re: [R] Reading Multi-value data fields for descriptive analysis Can you provide a more complete example (say 10 lines) of what the input is like. Does each line have a unique index that can be related to it? Do you want to summarize all the multi1-n values of Col2? Do you want to know the percentage of input lines that have a Col3/multi-value4 on them? You could read in the data as you have indicated below and add a column that is the record number and therefore you would have have to worry about trying to say if it existed or not. For example, you might have: Rec#|col#|value 1|1|single 1|2|multi1
Re: [R] any way to set defaults for par?
One way that I do it is to save the default parameters with the following statement in my profile: assign('Default.par', par(no.readonly=T)) An then I have a function which will reset them: plotReset - function() {# reset plotting window par(Default.par) windows(width=7.5,height=4.7, record=T, pointsize=10) } On Sun, Jul 13, 2008 at 6:31 PM, Carl Witthoft [EMAIL PROTECTED] wrote: I know how to set graphic parameters by calling par(), but what I'd like is a way to set the default values so that subsequent calls to par() use my defaults. The reason to want this is that every time I create a new graphic window (I'm using quartz on OSX, and so far no answers in the Mac mailing list), my parameters get reset to the builtin defaults. I read about the unexported variable .Pars, but would like to know if there's any way to manipulate that variable. thanks Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Moran's I test- Ordinal Logistic Regression Model
Hi, I am trying to do a Moran's I test on an ordinal logistic regression model. I have a simple spatial weights matrix listed below I would like to use. Y= 10 0 0 0 0 0 0 0 01 1 0 0 0 0 1 1 01 1 0 0 0 0 1 1 00 0 1 1 1 1 0 0 00 0 1 1 1 1 0 0 00 0 1 1 1 1 0 0 00 0 1 1 1 1 0 0 01 1 0 0 0 0 1 1 00 1 0 0 0 0 1 1 I try to run the test as follows- moran.test(order$resid, y). It then gives me an error- Error in moran.test(resid(order), y) : y is not a listw object Can I transform my matrix into a listw object or use some other test where I can use my simple matrix to perform the test? Also, is using $resid for the ordinal logistic regression the proper way to run the moran's I test? Thanks for any help you can provide me. Lisa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Multi-value data fields for descriptive analysis
Is one of the rows NULL? Do an 'str(x)' show? The example you sent seems to work with the code. Are you reading in a different set of data? I think I know what happened. I shortened the names on your example so it was easier to access. Here is the data I used: ID|name|picnic|food|other 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good stuff|Softball;#Blanket;#I bring boo-boo, but he hides 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn Chairs;#Blanket;#my running shoes 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket 5|Foghorn Leghorn|Yes|Hot Dogs;#Cole Slaw;#I say, I say, BBQ Chicken?|Softball;#Blanket 6|Peter Potamus|Yes|Hamburgers;#Hot Dogs;#anything, just a lot of it|Softball;#Lawn Chairs;#hot air balloon 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit 8|Fleegle, Bingo, Drooper and Snorky|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#A banana split|a laugh track 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my flying car 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Heavens to Murgatroyd! Exit stage left! Here is the code with a few more comments. The basic structure was to loop through the three data columns since they had the same format. x - read.table(/tempxx.txt, comment=, quote=, sep=|, header=TRUE, as.is=TRUE) # split out by name. the 'lapply' will cycle through for each row in the data and # the index of the row is passed to the '.row' parameter of the function z - lapply(seq(nrow(x)), function(.row){ # this sets the result to NULL so that we can accumulate the data as it is processed .result - NULL # construct the data output # the three columns were shorted to the following names. the 'for' loop will # iterate through each of the three columns, taking the data in the columns and # spliting them by your separator ';#' for (i in c('picnic', 'food', 'other')){ # this will access the specific column (given in the variable 'i') .split - strsplit(x[.row,][[i]], ;#) # this appended on to '.result' the contents of this column after creating # the three columns of data which are the name, the column ID and the value # from that column .result - rbind(.result, cbind(name=x[.row,][['name']], field=i, value=unlist(.split))) } .result # return the result }) z On Sun, Jul 13, 2008 at 7:25 PM, Hohm, Dale [EMAIL PROTECTED] wrote: Thanks Jim, I wish I were comfortable enough with the language for the fix needed to the syntax to be obvious, but it is not yet. With your example, I get: Error in strsplit(x[.row, ][[i]], ;#) : non-character argument x appears to be filled properly, but z is not due to the error. Also, if you were willing to provide some brief annotation or describe the overall logic in the code you supplied it would help me immensely. Thanks, Dale -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Sunday, July 13, 2008 6:35 AM To: Hohm, Dale Cc: r-help@r-project.org Subject: Re: [R] Reading Multi-value data fields for descriptive analysis This may do what you want: x - read.table(/tempxx.txt, comment=, quote=, sep=|, header=TRUE, as.is=TRUE) # split out by name z - lapply(seq(nrow(x)), function(.row){ + .result - NULL + # construct the data output + for (i in c('picnic', 'food', 'other')){ + .split - strsplit(x[.row,][[i]], ;#) + .result - rbind(.result, cbind(name=x[.row,][['name']], field=i, value=unlist(.split))) + } + .result + }) z [[1]] namefieldvalue [1,] Yogi Bear picnic Yes [2,] Yogi Bear food Hamburgers [3,] Yogi Bear food Hot Dogs [4,] Yogi Bear food I rely on others to bring the good stuff [5,] Yogi Bear other \Softball [6,] Yogi Bear other Blanket [7,] Yogi Bear other I bring boo-boo, but he hides\ [[2]] name fieldvalue [1,] Boo-Boo picnic Yes [2,] Boo-Boo food Potato Salad [3,] Boo-Boo food Cole Slaw [4,] Boo-Boo food whatever Yogi doesn't eat [5,] Boo-Boo other Lawn Chairs [6,] Boo-Boo other Blanket [7,] Boo-Boo other my running shoes [[3]] name fieldvalue [1,] Ranger Rick picnic No [2,] Ranger Rick food I told you I don't picnic [3,] Ranger Rick other a big net and handcuffs [[4]] name fieldvalue [1,] Magilla Gorilla picnic Yes [2,] Magilla Gorilla food Hamburgers [3,] Magilla Gorilla food Hot Dogs [4,] Magilla Gorilla food Potato Salad [5,] Magilla Gorilla food Cole Slaw [6,] Magilla Gorilla food BBQ Chicken [7,] Magilla Gorilla other Softball [8,] Magilla Gorilla other Volleyball [9,] Magilla Gorilla other Lawn Chairs [10,] Magilla
[R] Computing row means for sets of 2 columns
Is there a better or more efficent approach than this without the use of t() ? (m - matrix(1:40, ncol=4)) [,1] [,2] [,3] [,4] [1,]1 11 21 31 [2,]2 12 22 32 [3,]3 13 23 33 [4,]4 14 24 34 [5,]5 15 25 35 [6,]6 16 26 36 [7,]7 17 27 37 [8,]8 18 28 38 [9,]9 19 29 39[10,] 10 20 30 40 (groups - rep(1:2, each=2))[1] 1 1 2 2 (m.mean - t(aggregate(t(m), by=list(groups), mean)))[,1] [,2]Group.1 12V1 6 26V2 7 27V3 8 28V4 9 29V510 30V611 31V712 32V813 33V9 14 34V10 15 35 _ Easily edit your photos like a pro with Photo Gallery. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing row means for sets of 2 columns
m - matrix(1:40, ncol=4); groups - rep(1:2, each=2); uGroups - unique(groups); mMeans - matrix(NA, nrow=nrow(m), ncol=length(uGroups)); for (gg in seq(along=uGroups)) { mMeans[,gg] - rowMeans(m[,groups == uGroups[gg], drop=FALSE]); } (Preallocation of result matrix is more memory efficient than using cbind() or similar!) /Henrik On Sun, Jul 13, 2008 at 6:03 PM, Daren Tan [EMAIL PROTECTED] wrote: Is there a better or more efficent approach than this without the use of t() ? (m - matrix(1:40, ncol=4)) [,1] [,2] [,3] [,4] [1,]1 11 21 31 [2,]2 12 22 32 [3,]3 13 23 33 [4,]4 14 24 34 [5,]5 15 25 35 [6,]6 16 26 36 [7,]7 17 27 37 [8,]8 18 28 38 [9,]9 19 29 39[10,] 10 20 30 40 (groups - rep(1:2, each=2))[1] 1 1 2 2 (m.mean - t(aggregate(t(m), by=list(groups), mean)))[,1] [,2]Group.112V1 6 26V2 7 27V3 8 28V4 9 29V510 30V611 31V712 32V813 33V914 34V10 15 35 _ Easily edit your photos like a pro with Photo Gallery. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stem and leaf plot: how to edit the stem-values
On 13/07/2008 5:40 PM, Jörg Groß wrote: Hi, I would like to make a stem and leaf plot and I want to edit the category-names. If you have a computer you can do much better histograms. But since you have chosen to do this, one way is to edit the underlying C code. It's available in https://svn.r-project.org/R/src/appl/stem.c Another way is to save the plot into a file, and manually edit the file. The best way is to produce the whole thing by hand, using pen and paper, while sitting on a tropical island without access to a computer. I recommend Half Moon Caye, http://maps.google.com/maps?f=qhl=engeocode=q=half+moon+cayesll=37.0625,-95.677068sspn=93.342821,105.292969ie=UTF8ll=17.206312,-87.532454spn=0.05854,0.051413t=hz=14 Duncan Murdoch So, by doing this: x - c(1,2,2,3,3,3,3,2,2,1) stem(x) I get: 1 | 00 1 | 2 | 2 | 3 | First Question: Why do I get gaps between the categories? (like in line 2 and line 4) And second: How can I edit the categories so that I can create something like that: category A | 00 category B | category C | __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.