[R] Best SVM Performance measure?
Hi, This is probably going to be one of those, It depends what you want kind of answers, but I'm very curious to see if the group has an opinion or some general suggestions. The actual experiment is too complicated for a quick e-mail, but I'll summarize well enough(hopefully) to get the concepts across. Binary classification problem Using and SVM (e1071) to train a model Experimenting with different features, costs, etc.) Training data and test data are complete separate data sets drawn from the same population. The general concept was to train on a large set of data and then test of a medium sized set of unseen data. We're looking for the best classification performance for future unlabeled data. Here is the puzzle: Comparing two versions of the model. A - Lower R2 (r squared) score but higher percentage labeled correct on test data B - Higher R2 score but lower percentage labeled correct on test data We're using the val.prob function from the Design library to evaluate our model. Additionally graphs from val.prob are interesting: A - Our non-parametric line mostly parallels the ideal line but is just a bit above. B - Our non-parametric line mostly parallels the ideal line but is just a bit below. If I understand things correctly, with model A, the actual probability is slightly higher than our predicted probability (not a bad thing for our application - better to under-predict than over predict.) One thought was that the R2 measures the distance from the ideal line. With model A, we are a touch further from the ideal line, but in a better position than model B. Does anybody have any insight? Thanks, -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Possible bug in plot.POSIXct regarding x axis
In article 80b45a8c0910192051nd558023p9b76566464fc3...@mail.gmail.com, remkoduur...@gmail.com says... I think it is a bug. Apparently plot.POSIXct calls axis.POSIXct for both y (correct) and x (incorrect) axes: Thanks. Reported as bug #14016. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to draw stacked ellipses to illustrate the shared and specific of multiple objects using R
Dear R-help listers, I am now asking for helps on how to draw stacked ellipses to illustrate the shared and specific of multiple objects using R. My problem comes from my population genetics study. Now, I genotyped three species, and I get known about the amount of shared and specific haplotypes in each of the species and their combinations. I want to illustrate this result in three stack ellipses, with the shared and unique area of the stack ellipses were annotated with the relative amount. my data is as followed: hap.typehap.dis.num hap.dis.per types hap.Pd 8 27.5862069 Pd specific hap.Pt 6 20.68965517 Pt specific hap.Py 9 31.03448276 Py specific hap.PdPt3 10.34482759 PdPt shared hap.PdPy3 10.34482759 PdPy shared hap.PtPy0 0 PtPy shared hap.3P 0 0 3 species shared - Several individuals of 3 species (Pd, Pt, Py) were genotyped, the summary of haplotype distribution in species level were contained in the forward dataset. Could you please give any direction on this graphical problem? Thank you in advance. Best regards, Mao J-F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave file generation
I use Lyx (www.lyx.org) with the Sweave noweb report or article class for the same purpose (if I understand you correctly). LyX is a LateX front-end where you can embed long and short R code. You can write the complete R program inside the document. For latex tables the most elegant solution is sometimes to write the latex to a file and put the file as an include in the Lyx document. I have a project report that I run every morning to give a refined and up-to-date document in a matter of minutes. 2009/10/19 Gabriel Koutilellis kgabr...@in.com Dear list,I have read really a lot the past few days, but I haven't found a matching solution for my problem.I have R 2.9.2 on Windows XP and MikTex 2.8 installed.What I want to do is to automate the sweave file generation.I thought I could use the R2Sweave, RweaveLatex, and Sweave in a combination so thatI won't need to do anything.Perhaps some minor modifications at the last step.My purpose is to print text (summaries) and plots on the same pdf file, fast and easily.If later I need to produce a much more elegant paper I know I will need to fix the latexpart of my code.Let's say I have a file of R code script.Rso in R R2Sweave(script.R) RweaveLatex() Sweave(script.Rnw) texi2dvi(script.tex, pdf=T)would produce of something like the pdf with the summaries and plots coded inside the script.RThank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems importing Unix SAS .ssd04 file to R (Win)
Hello, I'm trying to import a SAS file made using SAS on Unix. Currently I'm using SAS on Windows and I'm trying to import that .ssd04 file to R. The file name of the file is testfile.ssd04 and it is located in 'M:\sasuser'. I'm using Windows XP and R 2.91. Basically what I'm doing is r code ## library(foreign) sashome - C:/Program Files/SAS Institute/SAS/V8 folder_for_datafiles - M:/sasuser read.ssd(folder_for_datafiles, testfile, sascmd=file.path(sashome, sas.exe)) SAS failed. SAS program at C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file41bb5af1.sas The log file will be file41bb5af1.log in the current directory NULL Warning message: In read.ssd(folder_for_datafiles, testfile, sascmd = file.path(sashome, : SAS return code was 2 ## This temporary SAS file 'file41bb5af1.sas' looks like this sas code # option validvarname = v6;libname src2rd 'M:/sasuser'; libname rd xport 'C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file6df11649'; proc copy in=src2rd out=rd; select testfile ; ## Any ideas what I'm doing wrong? sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252 attached base packages: [1] graphics grDevices utils datasets grid stats methods base other attached packages: [1] foreign_0.8-38 gregmisc_2.1.1 gplots_2.7.1 caTools_1.9 bitops_1.0-4.1 gtools_2.6.1 gmodels_2.15.0 gdata_2.6.1 loaded via a namespace (and not attached): [1] MASS_7.2-49 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RPgSQL installation problem
Hi everybody I am trying to install RPsSQL and get the following error message: When I do ./configure form the untarred source directory I get loading cache ./config.cache checking for crypt in -lcrypt... no No crypt function found When I use the Package installer in R I get install.packages(/Users/christiaanpauw/tmp/RPgSQL/, , NULL, type = source) Warning in install.packages(/Users/christiaanpauw/tmp/RPgSQL/, , NULL, : argument 'lib' is missing: using '/Users/christiaanpauw/Library/R/2.8/library' * Installing *source* package 'RPgSQL' ... creating cache ./config.cache checking for crypt in -lcrypt... no No crypt function found I have a threefold question What is the crypt function, do I need it for the installation to work and if so where do I get it? Thanks in advance Christiaan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HOW to determine the number of components of the mixture model stratfied by age
Hi,all with my data,there are more than 1000 quantitative results of antibody concentrations, there may be 2 components(positive and negative), or 3 components (may be strong positive, positive, and negative), or 4-6 components. Could you tell me how to determine the number of components of the mixture model? the anova.mix in mixdist of the R software seems not work. my data is a little complicated(TABLE ), there are more than 1000 quantitative results of antibody concentrations stratified by age.The overall density of results at age j, Fj, is a mixture of the component densities, so if there are 5 age groups, then there will be 5 mixture models. Do i have to analyse each stratum respectively? TABLE age Bin length freq 1 1 19.75 4 1 2 21.7510 11241.7536 2 1 19.75 4 2 2 21.7510 21241.7536 --- appreciated lybao -- View this message in context: http://www.nabble.com/HOW-to-determine-the-number-of-components-of-the-mixture-model-stratfied-by-age-tp25971060p25971060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to draw stacked ellipses to illustrate the shared and specific of multiple objects using R
On 10/20/2009 06:10 PM, Mao Jianfeng wrote: Dear R-help listers, I am now asking for helps on how to draw stacked ellipses to illustrate the shared and specific of multiple objects using R. My problem comes from my population genetics study. Now, I genotyped three species, and I get known about the amount of shared and specific haplotypes in each of the species and their combinations. I want to illustrate this result in three stack ellipses, with the shared and unique area of the stack ellipses were annotated with the relative amount. my data is as followed: hap.typehap.dis.num hap.dis.per types hap.Pd 8 27.5862069 Pd specific hap.Pt 6 20.68965517 Pt specific hap.Py 9 31.03448276 Py specific hap.PdPt3 10.34482759 PdPt shared hap.PdPy3 10.34482759 PdPy shared hap.PtPy0 0 PtPy shared hap.3P 0 0 3 species shared - Several individuals of 3 species (Pd, Pt, Py) were genotyped, the summary of haplotype distribution in species level were contained in the forward dataset. Could you please give any direction on this graphical problem? Thank you in advance. Hi Mao, I can't help you with ellipses, but intersectDiagram in the plotrix package may do what you want in a different way. Try this: library(plotrix) hapIntList- getIntersectList(3,xnames=c(hap.Pd,hap.Pt,hap.Py)) # enter the data as follows #Number of elements in hap.Pd - 1: 27.586 #Number of elements in hap.Pt - 1: 20.689 #Number of elements in hap.Py - 1: 31.034 #Number of elements in hap.Pdhap.Pt - 1: 10.345 #Number of elements in hap.Pdhap.Py - 1: 10.345 #Number of elements in hap.Pthap.Py - 1: 0 #Number of elements in hap.Pdhap.Pthap.Py - 1: 0 #Total number of elements - 1: 99.999 # this is a small bug that I have fixed and will # not be necessary from v2.7-2 onward class(hapIntList)-intersectList intersectDiagram(hapIntList) I'm quite interested in whether this is of any use to you, as I had not even thought of this particular use for intersectDiagram. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatstat: xy binary data into mask type to use in owin(mask=)
If 'X' is a data frame containing columns 'x', 'y' and 'value, try m - with(X, tapply(value, list(y,x), all)) z - im(m, xcol=sort(unique(X$x)), yrow=sort(unique(X$y))) w - as.owin(z) This will only work if the x, y values form a rectangular grid in some order. Adrian Baddeley = Javier PB wrote: Dear users, I am trying to export polygons from Arcmap into Spatstat to run some simulations using functions available in Spatstat package. One particular area to be exported is formed by a number of polygons defining the external boundaries of the area (as a groups of islands) and a number of polygons inside the previous ones, as “holes” not to be considered as part of the area. I have managed to export one of these areas using owin(poly=list()), including in the list the number of external boundaries and holes, but the number of polygons that constitute each area is large and I was wondering if this could be easily done using owin(mask=my_area), using only one single matrix file my_area with TRUE, FALSE values defining the whole area. Here is where I got stuck. The file I have created in ArcMap contains three columns: x and y coordinates, and the values at points x[i], y[j] (1 if the area is TRUE and 0 is the area is FALSE). How can I covert this xy file into a mask type my_area that I can read using owin(mask=my_area)? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re posting various problems with two-way anova, lme, etc.
Hi Michael, How do you control what is the (intercept) in the model returned by the lme function and is there a way to still be able to refer to all groups and timepoints in there without referring to intercept? Here is some general help. The intercept is controlled by the contrasts that you used when fitting your model. By default these are treatment (i.e. Dunnett) contrasts, but you can change this. ## ?options options(contrasts) ?contrasts options(contrasts=c(contr.sum, contr.poly)) options(contrasts) You can design your own contrasts or you can use those provided by base R: ## ?contr.treatment (There is also contr.sdif in MASS, which is a very useful type for environmental studies) See http://users.monash.edu.au/~murray/stats/BIO4200/LinearModels.pdf, which covers the main ready-made types in R. For ANOVA, sum-to-zero contrasts are commonly used. If you do use treatment contrasts you can (usually) change your reference level on-the-fly or you can set it when you make your factor. ## ?factor ?levels ?reorder ?relevel If you are using the multcomp package then look at: ?contrMat An example of how to use it is given under the glht function. Regards, Mark. Michael Schacht Hansen wrote: Hi, I posted the message below last week, but no answers, so I'm giving it another attempt in case somebody who would be able to help might have missed it and it has now dropped off the end of the list of mails. I am fairly new to R and still trying to figure out how it all works, and I have run into a few issues. I apologize in advance if my questions are a bit basic, I'm also no statistics wizard, so part of my problem my be a more fundamental lack of knowledge in the field. I have a dataset that looks something like this: Week Subj Group Readout 01 A 352.2 11 A 366.6 21 A 377.3 31 A 389.8 41 A 392.5 02 B 344.7 12 B 360.9 . . . . So basically I have a number of subjects that are divided into different groups receiving different interventions/treatments. Observations on these subjects are made on 5 occasions (Week 0-4). I would like to see if there is difference between the treatment groups and if the observations that we are making change in the individual groups over time. According to my very limited statistics training that means that I would have to do a two-way anova with repeated measures because the same subjects are observed on the different weeks. Now in R I can do something like this: MyData$fWeek - factor(MyData$Week) #Convert week to factor m1 - aov(Readout ~ Group*fWeek + Error(Subj/fWeek), data=MyData) My first naive question is: Is the Error(...) term correct for the design that I describe? I am not quite sure if I understand the syntax correctly. In any case, it actually seems to work fine, but I can't figure out how to do post hoc testing on this. TukeyHSD does not work for the aovlist which is returned, so I am kind of stuck. Is there a simple way to do the post hoc test based on the aovlist? I have been reading various questions on the list and I think that I have understood that I should be using lme from the nlme package and this is where I run into some problems. As I understand it, the lme command that I would need would look something like this: m1 - lme(Readout ~ Group*fWeek,random=~1|Subj/fWeek, data=MyData) Now this actually fails with an error message like: Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 The reason (I believe) is that I have some weeks where there are no observations for a specific group. In this case, one of the groups had a lot of drop-out and at Week 4, there are no subjects left in one of the groups and that seems to be causing the problem. I can get it to run by excluding the last week with something like: m1 - lme(Readout ~ Group*fWeek,random=~1|Subj/fWeek, data=MyData[which(MyData$Week 4), ]) My next question is: Is there another way around that? I would still like to run the analysis for the remaining groups on the last time point and I believe that it should somehow be included into the entire analysis. I have also managed to find a few postings with similar problems, but no real solutions, so anything helpful comments would be much appreciated. My final issue is how do I do the post hoc testing on the model. I understand (I think) that I should use the glht function from the multcomp package. For Instance, I could do something like: summary(glht(m1,linfct=c(GroupB:fWeek1 - GroupC:fWeek1 = 0,GroupB:fWeek2 - GroupC:fWeek2 = 0))) And that actually works fine. My problem is that Group A and fWeek 0 seems to have turned into the (intercept), so I can't write something like: summary(glht(m1,linfct=c(GroupB:fWeek0 - GroupB:fWeek1 = 0))) To check for changes between baseline and week 1 in Group B
[R] system() or shell() with python script
Hi all, I am having some problems calling a python script from R that resides in a folder that is in the path (WindowsXP): system(quickPadTool.py) Warning message: In system(quickPadTool.py) : quickPadTool.py not found # I also tried 'shell' (and shell.exec as well). shell(quickPadTool.py) 'quickPadTool.py' is not recognized as an internal or external command, operable program or batch file. Warning message: In shell(quickPadTool.py) : 'quickPadTool.py' execution failed with error code 1 I can run the script fine from a command window just fine, from the same directory. Any pointers? thanks, Remko - Remko Duursma Post-Doctoral Fellow Centre for Plants and the Environment University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 www.remkoduursma.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system() or shell() with python script
Remko Duursma wrote: Hi all, I am having some problems calling a python script from R that resides in a folder that is in the path (WindowsXP): Hi Remko, Some suggestions: 1. Try to see if the path that R has from a call to system is correct (i.e. the same as from cmd): system(path) 2. Try calling it with python added in front: system(python script.py) 3. Add a shebang line to the top of your script like: #! c:/Program Files/Python/python.exe This tells the OS which program you want to use to run the script. cheers, Paul ps maybe superfluous, but try the python getopt package for reading commandline arguments. system(quickPadTool.py) Warning message: In system(quickPadTool.py) : quickPadTool.py not found # I also tried 'shell' (and shell.exec as well). shell(quickPadTool.py) 'quickPadTool.py' is not recognized as an internal or external command, operable program or batch file. Warning message: In shell(quickPadTool.py) : 'quickPadTool.py' execution failed with error code 1 I can run the script fine from a command window just fine, from the same directory. Any pointers? thanks, Remko - Remko Duursma Post-Doctoral Fellow Centre for Plants and the Environment University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 www.remkoduursma.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system() or shell() with python script
Paul Hiemstra wrote: Remko Duursma wrote: Hi all, I am having some problems calling a python script from R that resides in a folder that is in the path (WindowsXP): Hi Remko, Some suggestions: 1. Try to see if the path that R has from a call to system is correct (i.e. the same as from cmd): system(path) That won't work. You would use Sys.getenv(PATH) for this. 2. Try calling it with python added in front: system(python script.py) 3. Add a shebang line to the top of your script like: #! c:/Program Files/Python/python.exe This tells the OS which program you want to use to run the script. That's unlikely to work on Windows. If file associations are set to do it, then shell.exec(script.py) should work. (That should do the same as double clicking the file in Explorer.) Duncan cheers, Paul ps maybe superfluous, but try the python getopt package for reading commandline arguments. system(quickPadTool.py) Warning message: In system(quickPadTool.py) : quickPadTool.py not found # I also tried 'shell' (and shell.exec as well). shell(quickPadTool.py) 'quickPadTool.py' is not recognized as an internal or external command, operable program or batch file. Warning message: In shell(quickPadTool.py) : 'quickPadTool.py' execution failed with error code 1 I can run the script fine from a command window just fine, from the same directory. Any pointers? thanks, Remko - Remko Duursma Post-Doctoral Fellow Centre for Plants and the Environment University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 www.remkoduursma.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem using the source-function within R-functions
Dear R community, You may have the solution to how to construct a function using the function source() to build the function; i.e. myfunction - function(...){ source('file1.r') source('file2.r') } After compiling and installing the myfunction in R, then calling the myfunction gives an error because the content of 'file1.r' and 'file2.r' seems to be missing. Anyone has the trick to overcome this problem? Thanks in advance! best wishes, Johan PS: My function is: run_accumm_value - function(ind_noder_0, ind_loc_val,ind_retention,downstream){ ## Preprocessing of looping calculations: koersel_uden_ret - length(unique(ind_noder_0$oplid)) opsaml_b_0_2 - numeric(koersel_uden_ret) opsaml_b_0_2_1 - numeric(koersel_uden_ret) opsaml_b_0_2_2 - seq(1:koersel_uden_ret) ## Preprocessing of topology and local values to be summed: source('preproces_topology.r', local = T) source('preproces_loc_val.r', local = T) # Loop for each grouping factor (column in ind_noder_0: oplid): for(j in 1:koersel_uden_ret){ source('matrix_0.r', local = T) source('matrix.r', local = T) source('local_value.r', local = T) source('fordeling.r', local = T) source('fordeling_manuel.r', local = T) source('local_ret.r', local = T) source('Ax=b.r', local = T) source('opsamling_x_0_acc.r', local = T) } source('opsamling_b_1.r', local = T) opsaml_b_2 } -- Johan Lassen Environment Center Nykøbing F Denmark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interpretation of VarCorr results
Dear all, I'm working with a model to estimate components of variance by using the lme() function. The model is as the following: model=lme(fixed=X~1+as.factor(station),data=myData,na.action=na.exclude,rand om=~N+1|spliceOrHoming) Where X is the response measured variable, station is treated as fixed effects factor, N is a continuous variable, and spliceOrHoming is a (ordered) factor. The idea is that the response variable (X) varies almost linearly with N (in a decreasing fashion) , but the slope and intercept of this variation change with the levels of spliceOrHoming random factor. Now, the REML estimated standard deviation components are: StdDev Corr (Intercept) 5.274544e-01 (Intr) N 7.912717e-05 -0.58 Residual1.508647e-01 My questions are: . Is the model correctly specified? . If yes, how should I read the standard deviation estimate for N (7.912717e-05)? Is this the standard deviation for a unitary variation of N? Or is it the standard deviation due to the global contribution of the N independent variable? (note that I have a strong feeling that N contributes a lot to the final variance; so, the value that I obtain here seems too small for my beliefs) . Is there any methods/functions to obtain the variance components for the station factor too? (please, give me some references, at least.) Thank you a lot in advance for every suggestions you'll give me. Enrico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem using the source-function within R-functions
Hi Johan, Although it's not clear to my why you're putting this in a function in the first place... The way you've written your function it will only work if your working directory is the same as the directory where file1.r and file2.r are stored, and I suspect this is the problem. You can 1) set the working directory in your function or 2) specify the full path to file1.r and file2.r in your function. Option 1 will look like myfunction - function(...){ setwd(/path/where/files/are/saved) source('file1.r') source('file2.r') and option 2 will look like myfunction - function(...){ source('/path/where/files/are/saved/file1.r') source('/path/where/files/are/saved/file2.r') source() also has a chdir option that you could investigate. See ?source -Ista On Tue, Oct 20, 2009 at 7:00 AM, Johan Lassen jle...@gmail.com wrote: Dear R community, You may have the solution to how to construct a function using the function source() to build the function; i.e. myfunction - function(...){ source('file1.r') source('file2.r') } After compiling and installing the myfunction in R, then calling the myfunction gives an error because the content of 'file1.r' and 'file2.r' seems to be missing. Anyone has the trick to overcome this problem? Thanks in advance! best wishes, Johan PS: My function is: run_accumm_value - function(ind_noder_0, ind_loc_val,ind_retention,downstream){ ## Preprocessing of looping calculations: koersel_uden_ret - length(unique(ind_noder_0$oplid)) opsaml_b_0_2 - numeric(koersel_uden_ret) opsaml_b_0_2_1 - numeric(koersel_uden_ret) opsaml_b_0_2_2 - seq(1:koersel_uden_ret) ## Preprocessing of topology and local values to be summed: source('preproces_topology.r', local = T) source('preproces_loc_val.r', local = T) # Loop for each grouping factor (column in ind_noder_0: oplid): for(j in 1:koersel_uden_ret){ source('matrix_0.r', local = T) source('matrix.r', local = T) source('local_value.r', local = T) source('fordeling.r', local = T) source('fordeling_manuel.r', local = T) source('local_ret.r', local = T) source('Ax=b.r', local = T) source('opsamling_x_0_acc.r', local = T) } source('opsamling_b_1.r', local = T) opsaml_b_2 } -- Johan Lassen Environment Center Nykøbing F Denmark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get slope estimates from a four parameter logistic with SSfpl?
Is the following helpful? pdd-deriv(~a+(b-a)/(1+exp((c-t)/d)),d) pdd expression({ .expr1 - b - a .expr2 - c - t .expr4 - exp(.expr2/d) .expr5 - 1 + .expr4 .value - a + .expr1/.expr5 .grad - array(0, c(length(.value), 1L), list(NULL, c(d))) .grad[, d] - .expr1 * (.expr4 * (.expr2/d^2))/.expr5^2 attr(.value, gradient) - .grad .value }) Or perhaps you want it with respect to t? JN Message: 46 Date: Mon, 19 Oct 2009 14:50:15 +0100 From: Weber, Sam sam.we...@exeter.ac.uk Subject: [R] How to get slope estimates from a four parameter logistic with SSfpl? To: r-help@R-project.org r-help@r-project.org Message-ID: 5bb78285b60f9b4db2c6a4da8f3e788c12c64b3...@exchmbs04.isad.isadroot.ex.ac.uk Content-Type: text/plain Hi, I was hoping to get some advice on how to derive estimates of slopes from four parameter logistic models fit with SSfpl. I fit the model using: model-nls(temp~SSfpl(time,a,b,c,d)) summary(model) I am interested in the values of the lower and upper asymptotes (parameters a and b), but also in the gradient of the line at the inflection point (c) which I assume tells me my rate of increase when it is steepest (?). However, I cannot work out how to derive a slope estimate. I'm guessing it has something to do with the scaling parameter d but having searched the internet for hours I have not made any progress, and it is probably quite simple. Any help would be hugely appreciated! All the best Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject) -- rename to optim() crash due to NA returned
While developing updates to optimization tools for R (I was behind 3 of the 5 codes in optim(), but Brian Ripley did the implementation), I've been seeing this kind of error when the objective function cannot be computed so returns NA. Examples: attempts to divide by 0 or sqrt(-ve) or log(0). A crude work-around is to detect bad situations when evaluating the function and return a very large number e.g., something using .Machine$double.xmax (I used this divided by 4 or so to avoid troubles if algorithms try to do some arithmetic, as in gradient based routines). Most routines will bounce off such a value, but the scaling of the surface is messed up so it is not always efficient. A better way is if optimization codes can handle a failed evaluation flag and simply back off from a step or whatever is needed, but in R non tools do this as yet -- I'm hoping to put this in some of my codes using returned values larger than some threshhold as the flag. My 1980s BASIC codes had this (and bounds and masks). However, it takes time and For information, I'd like to know if this turns out to be the problem, as it would raise the priority for working on such issues. JN Message: 57 Date: Mon, 19 Oct 2009 17:48:12 +0200 From: Reynaerts, Jo jo.reynae...@econ.kuleuven.be Subject: [R] (no subject) To: r-help@r-project.org r-help@r-project.org Message-ID: 603d249cbe0b304490d0f774142b8e0f01cdcb9a4...@econsrvex6.econ.kuleuven.ac.be Content-Type: text/plain Dear R users I have the following problem when calling optim() to minimize a function f.obj (= outer loop) that calls another function f.con (a contraction mapping, = inner loop). It seems to me that it is a numerical problem that I currently fail to take into account when coding. Calling optim(), for the first few iterations of the outer loop, everything seems fine; the contraction mapping is calculated in each run. However, after a number of outer loop iterations, an error occurs and the following message is displayed: Error in while (max.dev = tol.in) { : missing value where TRUE/FALSE needed The previous conditional statement ensures that the iteration in the inner loop should run as long as max.dev - max(abs(x - x.in)) is greater than the inner loop tolerance level (tol.in - 1E-9), where x is computed by the contraction mapping using x.in. I used different stopping rules and tolerance levels, but this gives the same result. As I said, I think it's a numerical problem. Has anyone had similar experiences using optim() and could you give some coding advice? Thanks in advance, Jo Reynaerts __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get rid of 2 for-loops and optimize runtime
Hi Joris, The amount of a month ago is normally one value from another row. But I used 'sum-sum + dataset[i,22]' because I would like to reuse the code also for other tables. In some tables it is possible that the value of last month is the sum of values from different rows. Thank u for your time Greetings, Ian -Oorspronkelijk bericht- Van: joris meys [mailto:jorism...@gmail.com] Verzonden: maandag 19 oktober 2009 16:12 Aan: Ian Willems CC: r-help@r-project.org Onderwerp: Re: [R] how to get rid of 2 for-loops and optimize runtime Hi Ian, first of all, take a look at the functions sapply, mapply, lapply, tapply, ... : they are the more efficient way of implementing loops. Second, could you elaborate a bit further on the data set : the amount of the month ago, is that one value from another row, or the sum of all values in the previous month? I saw in your example dataset that the last month has 2 rows, but couldn't figure out whether that's a typo or really means something. That's necessary information to optimize your code. 129s is indeed far too long for a simple action. Cheers Joris On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems ian.will...@uz.kuleuven.ac.be wrote: Short: get rid of the loops I use and optimize runtime Dear all, I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 10) Table before Year month quarter yearmonth Service ... Amount 2009 9 Q3 092009 A ... 120 2009 9 Q3 092009 B ... 80 2009 8 Q3 082009 A ... 40 2009 7 Q3 072009 A ... 50 The result I want Year month quarter yearmonth Service ... Amount amound_lastmonth 2009 9 Q3 092009 A ... 120 40 2009 9 Q3 092009 B ... 80 ... 2009 8 Q3 082009 A ... 40 50 2009 7 Q3 072009 A ... 50 ... Table is not exactly the same but gives a good idea what I have and what I want The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows. I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below) But I don't really understand the output I guess I need code that requires linear time and need to get rid of the 2 for loops. can someone help me or tell me what else I can do to optimize my runtime I use R 2.9.2 windows Xp service pack3 Thank you in advance Best regards, Willems Ian * dataset[,5]= month dataset[,3]= year dataset[,22]= amount dataset[,14]= servicetype [CODE] #for each row of the matrix check if each row has.. for (j in 1:Number_rows) { + sum-0 + for(i in 1:Number_rows){ + if (dataset[j,14]== dataset[i,14]) #..the same service type + {if (dataset[j,18]== dataset[i,18]) # .. the same department + {if (dataset[j,5]== 1) # if month=1, month ago is 12 and year is -1 + {if (12== dataset[i,5]) + {if ((dataset[j,3]-1)== dataset[i,3]) + + { sum-sum + dataset[i,22]} + }} + else { + if ((dataset[j,5]-1)== dataset[i,5]) if month != 1, month ago is month -1 + { if (dataset[j,3]== dataset[i,3]) + {sum-sum + dataset[i,22]} + }} [\Code] summaryRprof() $by.self self.time self.pct total.time total.pct [.data.frame 33.92 26.2 80.90 62.5 NextMethod 12.68 9.8 12.68 9.8 [.factor 8.60 6.6 18.36 14.2 Ops.factor 8.10 6.3 40.08 31.0 sort.int 6.82 5.3 13.70 10.6 [ 6.70 5.2 85.44 66.0 names 6.54 5.1 6.54 5.1 length 5.66 4.4 5.66 4.4 == 5.04 3.9 44.92 34.7 levels 4.80 3.7 5.56 4.3 is.na 4.24 3.3 4.24 3.3 dim 3.66 2.8 3.66 2.8 switch 3.60 2.8 3.80 2.9 vector 2.68 2.1 8.02 6.2 inherits 1.90 1.5 1.90 1.5 any 1.68 1.3 1.68 1.3 noNA.levels 1.46 1.1 7.84 6.1 .Call 1.40 1.1 1.40 1.1 ! 1.26 1.0 1.26 1.0 attr- 1.06 0.8 1.06 0.8 .subset 1.00 0.8 1.00 0.8 class- 0.82 0.6
[R] underflow of fisher.test result
fisher.test() gives a very small p-value, which is underflow on my machine. However, the log of it should not be underflow. I'm wondering if there is a way get log() of a small p-value. An approximation is acceptable in this case. Thank you! fisher.test(rbind(c(1,10),c(10,1000)))$p.value [1] 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest - partial dependence plot
Are you talking about the y-axis or the x-axis? If you're talking about the y-axis, that range isn't really very meaningful. The partial dependence function basically gives you the average trend of that variable (integrating out all others in the model). It's the shape of that trend that is important. You may interpret the relative range of these plots from different predictor variables, but not the absolute range. Hope that helps. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Carlos M. Zambrana-Torrelio Sent: Monday, October 19, 2009 3:47 PM To: r-help@r-project.org Subject: [R] Random Forest - partial dependence plot Hi everybody, I used random forest regression to explain the patterns of species richness and a bunch of climate variables (e.g. Temperature, precipitation, etc.) All are continuos variables. My results are really interesting and my model explained 96,7% of the variance. Now I am trying to take advantage of the importance variable function and depicts the observed patterns using partial dependence plots. However, I found a really strange (at least for me...) behavior: the species number ranges between 1 to 150, but when I make the partial plot the graphic only represent values between 43 to 50!! I use the following code to get the partial plot: partialPlot(ampric.rf, amp.data, Temp) where ampric.rf is the random forest object; amp.data are the data and Temp is the variable I am interested. How I can have partial plot explaining all species number (from 1 to 150)?? Also, I read the RF documentation and I was wondering what its the meaning of marginal effect of a variable Thanks for your help Carlos I found really interesting -- Carlos M. Zambrana-Torrelio Department of Biology University of Puerto Rico - RP PO BOX 23360 San Juan, PR 00931-3360 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get rid of 2 for-loops and optimize runtime
Hi William, Your programs works perfect and very fast for the table I'm using right now (only one match per row) If I want to reuse this code other tables, it can match with more than one row. Is it possible to adapt your code easily, if I have to sum the values of last month from different rows? Thank u for your help regards, Ian -Oorspronkelijk bericht- Van: William Dunlap [mailto:wdun...@tibco.com] Verzonden: maandag 19 oktober 2009 18:08 Aan: Ian Willems; r-help@r-project.org Onderwerp: RE: [R] how to get rid of 2 for-loops and optimize runtime -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Willems Sent: Monday, October 19, 2009 6:50 AM To: 'r-help@r-project.org' Subject: [R] how to get rid of 2 for-loops and optimize runtime Short: get rid of the loops I use and optimize runtime Dear all, I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 10) Table before Year month quarter yearmonth Service ... Amount 2009 9Q3092009 A ...120 2009 9Q3092009 B ... 80 2009 8Q3 082009 A ... 40 2009 7Q3 072009 A ... 50 The result I want Year month quarter yearmonth Service ...Amount amound_lastmonth 2009 9 Q3 092009 A ...120 40 2009 9 Q3 092009 B ...80 ... 2009 8 Q3 082009 A ...40 50 2009 7 Q3 072009 A ...50 ... Table is not exactly the same but gives a good idea what I have and what I want The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows. I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below) But I don't really understand the output I guess I need code that requires linear time and need to get rid of the 2 for loops. can someone help me or tell me what else I can do to optimize my runtime I use R 2.9.2 windows Xp service pack3 Thank you in advance Best regards, Willems Ian * dataset[,5]= month dataset[,3]= year dataset[,22]= amount dataset[,14]= servicetype [CODE] #for each row of the matrix check if each row has.. for (j in 1:Number_rows) { + sum-0 + for(i in 1:Number_rows){ + if (dataset[j,14]== dataset[i,14]) #..the same service type + {if (dataset[j,18]== dataset[i,18]) # .. the same department +{if (dataset[j,5]== 1) # if month=1, month ago is 12 and year is -1 + {if (12== dataset[i,5]) +{if ((dataset[j,3]-1)== dataset[i,3]) + + { sum-sum + dataset[i,22]} + }} + else { + if ((dataset[j,5]-1)== dataset[i,5]) if month != 1, month ago is month -1 + { if (dataset[j,3]== dataset[i,3]) + {sum-sum + dataset[i,22]} + }} match() is often useful for quickly finding the locations of many items in a vector. It has no special methods for data.frames so you must combine the columns of interest into a character vector of keys and use match on the key vectors. E.g. # your test data in a format that mail readers # can copy and paste into R: d - read.table(header=TRUE, textConnection( Year month quarter yearmonth Service Amount 2009 9Q3 092009 A 120 2009 9Q3 092009 B 80 2009 8Q3 082009 A 40 2009 7Q3 072009 A 50 )) # The key functions dKey - function(d) { with(d, paste(d$Year, d$month, d$Service, sep=;)) } keyThisMonth - function(d)dKey(d) keyPrevMonth - function(d) { stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service)) isJan - d$month==1 d$Year[isJan] - d$Year[isJan] - 1 d$month[isJan] - 12 d$month[!isJan] - d$month[!isJan] - 1 dKey(d) } # Make the new column: d$AmountPrevMonth - d$Amount[ match(keyPrevMonth(d), keyThisMonth(d)) ] # The result print(d) Year month quarter yearmonth Service Amount AmountPrevMonth 1 2009 9 Q3 92009 A120 40 2 2009 9 Q3 92009 B 80 NA 3 2009 8 Q3 82009 A 40 50 4 2009 7 Q3 72009 A 50 NA This assumes there is only one match per row. Is this the result you are looking for? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com [\Code] summaryRprof() $by.self self.time self.pct total.time total.pct [.data.frame
[R] rbind with different columns
Hello there, with the following dummy code I'd like to give you an idea how my data looks like: df1 - data.frame(a = rnorm(10), b = rnorm(10)) df2 - data.frame(a = rnorm(10), c = rnorm(10)) df3 - data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10)) myList - list(df1, df2, df3) # (myList is representing the data structure I have to handle, it's read from single files with lapply) In every list entry is a data.frame but the columns are only partially the same. If I have exactly the same columns, I could do the following command to combine my data: do.call(rbind, myList) but of course it does not work with differnt column names. Is there any easy way to retrieve a combined table like this: a b c 1 -0.54586587 -0.3607873 NA 2 1.10876842 1.1439414 NA 3 0.57357988 -1.2117743 NA 4 -1.40975759 -1.2390305 NA 5 0.03371093 -1.8860748 NA 6 -1.27682961 0.9990840 NA 7 -1.78715858 0.8400642 NA 8 -0.22663310 1.5224016 NA 9 0.45703787 0.0599217 NA 10 -1.21984635 1.1991689 NA 11 -2.58848301 NA 0.2394272 12 0.71155177 NA -0.7107332 13 -2.16440676 NA -0.1744845 14 1.33043121 NA 0.5951272 15 1.51034297 NA 0.1956956 16 1.00844947 NA 0.6726101 17 0.78693840 NA 1.2189904 18 0.68622170 NA 1.2230500 19 -1.09376863 NA 0.4267472 20 2.23647873 NA 0.7328574 21 -0.38144792 0.1532647 1.4824618 22 0.27078024 -0.4264737 0.1317450 23 1.10812086 1.2550117 0.1677935 24 0.14881701 -0.2928157 -1.4081529 25 -1.00635045 -0.7885968 -0.3502532 26 0.32024094 0.4681016 -1.5477557 27 0.82974710 0.2345186 -0.6572728 28 0.49608133 1.7463265 0.6493405 29 -0.33022738 1.9510503 -1.7930093 30 -0.62615365 0.7330671 -0.4032405 The only thing I can think about is checking the names of each list entry and adding NA-columns before combining them. Is there any other way to do so? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AFT-model with time-dependent covariates
Sorry for being late in responding to this thread, but I was made aware of it only two weeks ago. In my package 'eha' there is a function 'aftreg', which performs what is asked for, given that the time-varying covariates are step functions of time and that the observation period for each individual is an interval. Left truncation works if it can be assumed that the covariate values during the first non-observable interval are the same as at the beginning of the first interval under observation. As Terry wrote, there is a lot of book-keeping in writing down the likelihood function, and even more for scores and the hessian, so I adopted a lazy way in the programming: I have only coded the log likelihood function, and I use 'optim' (in R code) and 'BFGS', without gradient, and I ask for a numerically differentiated Hessian, which I use for calculating standard errors and p-values. Tests show that this works surprisingly well, but for huge data sets it is very slow. If it was possible to ask for a hessian in the C version of BFGS, things would improve a lot. Also note that you need the latest version (1.2-12) of eha for this to work. aftreg in arlier versions only works (correctly) for time-constant covariates. And, this is not well tested, so care is needed. And I appreciate performance and bug reports, of course. Göran On Wed, May 13, 2009 at 9:34 PM, spencerg spencer.gra...@prodsyse.com wrote: To see what's available in other packages, try the following: library(RSiteSearch) AFT - RSiteSearch.function('AFT model') summary(AFT) # 24 help files found in 8 different packages HTML(AFT) # opens a table with 24 rows in a web browser. There may be nothing here that will help you, but this provides a quick overview of what's available. If this doesn't find what you want, it either has not been contributed or its help page does not use the phrase AFT model. Hope this helps. Spencer Graves Terry Therneau wrote: The coding for an AFT model with time-dependent covariates will be very hard, and I don't know of anyone who has done it. (But I don't keep watch of other survival packages, so something might be there). In a Cox model, a subject's risk depends only on the current value of his/her covariates; in an AFT model the risk depends on the entire covariate history. (My 'accelerated age' is the sum of all the extra years I have ever gained). Coding this is not theoretically complex, but would be a pain-in-the-rear amount of bookkeeping. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Göran Broström __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with different columns
In article 4addc1d0.2040...@yahoo.de, niederlein-rs...@yahoo.de says... In every list entry is a data.frame but the columns are only partially the same. If I have exactly the same columns, I could do the following command to combine my data: do.call(rbind, myList) but of course it does not work with differnt column names. Is there any easy way to retrieve a combined table like this: You're in luck. 'rbind.fill' in the 'plyr' package does exactly what you want. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting
Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 1 1 1 0 0 1 0 1 0 0 1 1 0 1 The output should be x1 changed 0 3 # has changed 3 times 1 1 # has changed 1 time x1 unchanged 0 1 # has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] underflow of fisher.test result
On 20-Oct-09 13:34:49, Peng Yu wrote: fisher.test() gives a very small p-value, which is underflow on my machine. However, the log of it should not be underflow. I'm wondering if there is a way get log() of a small p-value. An approximation is acceptable in this case. Thank you! fisher.test(rbind(c(1,10),c(10,1000)))$p.value [1] 0 I have not attempted an exact approach (though may do so later), but the P-value in question is about 1e-15000 so (using log to base e) log(P) approx = -33000 In such a case, P=0 is a pretty good approximation! Which prompts the question: Why the interest in having the value of such a very small number? Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 20-Oct-09 Time: 15:14:26 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with different columns
Nice! I was going to recommend merge(merge(myList[[1]], myList[[2]], all=TRUE, sort=FALSE), myList[[3]], all=TRUE, sort=FALSE) but rbind.fill is better. -Ista On Tue, Oct 20, 2009 at 10:05 AM, Karl Ove Hufthammer k...@huftis.org wrote: In article 4addc1d0.2040...@yahoo.de, niederlein-rs...@yahoo.de says... In every list entry is a data.frame but the columns are only partially the same. If I have exactly the same columns, I could do the following command to combine my data: do.call(rbind, myList) but of course it does not work with differnt column names. Is there any easy way to retrieve a combined table like this: You're in luck. 'rbind.fill' in the 'plyr' package does exactly what you want. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2x2 Contingency table with much sampling zeroes
Hi, I'm analyzing experimental results where two different events (T1 and T2) can occur or not during an experiment. I made my experiments with one factor (Substrate) with two levels (Sand and Clay). I would like to know wether or not Substrate affects the occurrence probability of the two events. Moreover, for each condition I would like to test the heterogeneity of my experimental contingency table with a theoretical one (from simulations). However, my problem is that several cells have sampling zeroes. My experiments can't be done again to fill these cells. Thus Chi-square requirements are not fulfilled and I have to find another statistical method. After spending hours searching for a solution, I thought I could use loglinear model to answer my questions, but : - I'm not sure I can use loglinear model = do I fulfill the required conditions ? - would this method answer to my hypothesis ? - I not sure to really understand how I have to use loglin()… Here is the data frame of my results. DF-data.frame(Subs=c(rep(Sand,4),rep(Clay,4)),T1=rep(c (YES,YES,NO,NO),2),T2=rep(c(YES,NO,YES,NO),2),Freq=c (12,5,0,7,24,1,0,0)) What do you think of such datas ? Can I use any statistical method to test my hypothesis ? Any advice ? Thanks, Etienne Toffin --- Etienne Toffin, PhD Student Unit of Social Ecology Université Libre de Bruxelles, CP 231 Boulevard du Triomphe B-1050 Brussels Belgium Tel: +32(0)2/650.55.30 Fax: +32(0)2/650.57.67 http://www.ulb.ac.be/sciences/use/toffin.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
Hi Ashta, Take a look at ?rle, e.g. rle(x1) Run Length Encoding lengths: int [1:4] 2 3 1 1 values : num [1:4] 1 0 1 0 rle(x1)$lengths [1] 2 3 1 1 rle(x1)$values [1] 1 0 1 0 HTH, Jorge On Tue, Oct 20, 2009 at 10:10 AM, Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 11 10 01 01 00 11 01 The output should be x1 changed 0 3# has changed 3 times 1 1# has changed 1 time x1 unchanged 0 1# has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Copulas
nobody? emkayenne wrote: Hello! I am currently using R to deal empirically with copulas. I am using financial return data to construct copula models that seem appropriate for my data sets. Is there anyone who has done something similar and who is interested in talking (or writing...) with me about this topic (difficulties, experiences...)? I have extensive experience in this area now but there are still some things espacially revolving around testing copula models that are still slightly messy. Waiting for your replies..., emkay -- View this message in context: http://www.nabble.com/Copulas-tp25967876p25974053.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get slope estimates from a four parameter logistic with SSfpl?
Yes that worked perfectly. Many thanks. From: Peter Ehlers [ehl...@ucalgary.ca] Sent: 19 October 2009 19:17 To: Weber, Sam Cc: r-help@R-project.org Subject: Re: [R] How to get slope estimates from a four parameter logistic with SSfpl? Weber, Sam wrote: Hi, I was hoping to get some advice on how to derive estimates of slopes from four parameter logistic models fit with SSfpl. I fit the model using: model-nls(temp~SSfpl(time,a,b,c,d)) summary(model) I am interested in the values of the lower and upper asymptotes (parameters a and b), but also in the gradient of the line at the inflection point (c) which I assume tells me my rate of increase when it is steepest (?). However, I cannot work out how to derive a slope estimate. I'm guessing it has something to do with the scaling parameter d but having searched the internet for hours I have not made any progress, and it is probably quite simple. Any help would be hugely appreciated! Sam, If I understand you correctly, all you want is the derivative of the fpl wrt the input. You can differentiate the fpl function or you can use the result provided by predict.nls() in the gradient attribute. The gradient wrt 'c' is the negative of the slope you want. The maximum slope will occur at time = c. So this should give you the slope: ypred - predict(model,newdata=data.frame(Time=coef(model)[3])) myslope - -attr(ypred, gradient)[,3] -Peter Ehlers All the best Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create a legend that automatically reads the values from two vectors?
Hi all!!! How can I create a legend to my plot that automatically reads the values from two vectors? betav-c(0.78,0.94,0.88,0.41,0.59,4.68) etav-c(235.6,59.5,31.2,8.7,3.2,1174) I want my legend to show 6 lines beta(in greeks)=0.78,eta(in greeks)=235.6 beta(in greeks)=0.94,eta(in greeks)=6.59 beta(in greeks)=4.68,eta(in greeks)=1174 I know how to do it by hand but I have to write them one by one and this is not efficient. How can I do that? I guess it has to be with the lapply or rapply functions, but I can't really get it Many thanks in advance Best regards Javier -- View this message in context: http://www.nabble.com/How-to-create-a-legend-that-automatically-reads-the-values-from-two-vectors--tp25976164p25976164.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LDA Precdict - Seems to be predicting on the Training Data
When I import a simple dataset, run LDA, and then try to use the model to forecast out of sample data, I get a forecast for the training set not the out of sample set. Others have posted this question, but I do not see the answers to their posts. Here is some sample data: DateNames v1 v2 v3 c1 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a Attached is the code: myDat -read.csv(file=f:\\Systematiq\\data\\TestData.csv, header=TRUE,sep=,) myData - data.frame(myDat) length(myDat[,1]) train - myDat[1:10,] outOfSample - myDat[11:16,] outOfSample - (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3)) outOfSample -data.frame(outOfSample) length(train[,1]) length(outOfSample[,1]) fit - lda(train$c1~train$v1+train$v2+train$v3) forecast - predict(fit,outOfSample)$class length(forecast)# I am expecting this to be same as lengthoutOfSample[,1]), which is 6 Output: length(forecast)# I am expecting this to be same as lengthoutOfSample[,1]), which is 6 [1] 10 -- View this message in context: http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Perl error - Can't locate R/Rdconv.pm in @INC
Dear list, I've a package, myPkg, which was developed on R v 2.6.1. I have installed R v2.9.2 from source, with a shared library and tcktk support. The package, myPkg, depends on RSPerl, and imports it in its NAMESPACE. I've downloaded and installed RSPerl using command: R CMD INSTALL -c --configure-args=--with-modules='IO \ Fcntl Socket Storable XML::Parser::Expat DB_File File::Glob \ GDBM_File SDBM_File POSIX' RSPerl When installing myPkg, I use the following command: R CMD INSTALL myPkg * Installing to library ‘/usr/local/lib/R/library’ * Installing *source* package ‘myPkg’ ... ** R ** inst ** preparing package for lazy loading Warning: 'Sys.putenv' is deprecated. Use 'Sys.setenv' instead. See help(Deprecated) ** help *** installing help indices Can't locate R/Rdconv.pm in @INC (@INC contains: /usr/local/lib/R/library/ RSPerl/perl/i486-linux-gnu-thread-multi /usr/local/lib/R/library/RSPerl/perl /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at /usr/local/lib/R/share/perl/build-help.pl line 21. BEGIN failed--compilation aborted at /usr/local/lib/R/share/perl/build-help.pl line 21. ERROR: building help failed for package ‘myPkg’ So - its can't find the perl module, Rdconv.pm, to build the help modules in the perl path. Doing a quick search for this module - its found at the location below (a location not in the perl path). find / -name Rdconv.pm /usr/local/lib/R/share/perl/R/Rdconv.pm I changed the NAMESPACE file in my myPkg package, so as not to import RSPerl. I altered the build-help.pl in R to print out the perl path. Path when RSPerl not imported: /usr/local/lib/R/share/perl /usr/local/lib/R/share/perl /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl (Also, the perl path when installing RSPerl (i.e. before installing myPkg) /usr/local/lib/R/share/perl /usr/local/lib/R/share/perl /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl) Obviously, if I run R CMD INSTALL --no-docs myPkg it installs fine. It also installs with older versions of R. I'm running Ubuntu 9.04. I have already posted (probably erroneously) this help plea of the r-devel mailing list - I apologise for cross-posting. Any ideas what should I look for in the code in order to fix this? Any pointers in the right direction or what to look for would be very much appreciated. Thanks in advance! Grainne. R sessionInfo() R version 2.9.2 (2009-08-24) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Putting names on a ggplot
On Sun, Oct 18, 2009 at 10:29 AM, John Kane jrkrid...@yahoo.ca wrote: Thanks Stefan, the annotate approach works beautifully. I had not got that far in Hadley's book apparently :( I'm not convinced though that the explaination you shouldn't use aes in this case since nampost, temprange, ... are not part of the dataframe year. makes sense since it seems to work in this case unless I am missing something obvious. Mind you I'm good at missing the obvious especially in ggplot. It currently works (because I can't figure out how to make it an error) but you really should not do it. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RGtk2:::gdkColorToString throws an error
Dear all, I try to use RGtk2:::gdkColorToString, but it throws an error: Error in .RGtkCall(S_gdk_color_to_string, object, PACKAGE = RGtk2) : gdk_color_to_string exists only in Gdk = 2.12.0 I know what it means, but don't know to solve this problem because I don't know where I can download the referred gdk library. Any information? Thank you. -- HUANG Ronggui, Wincent Doctoral Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time Zone names on Windows
Dear R-users, I tried to read recorded data into R, but confronted the problem about time zone. The data was recorded in GMT+1 (without summer time changing), and at first I tried to do in the way: data - read.zoo(data.txt, header=FALSE, sep=,, format=%H:%M:%S %d.%m.%Y, strip.white=TRUE, tz=GMT+1, skip=3) However, it failed. Then, I found the description of Time Zones in R help and noticed the Olson database is stored in the specific folder under Windows OS which is the OS I'm working on. I noticed there is no file related to GMT+1 time zone if I understood correctly. So far my workaround is to set the tz to Africa/Lagos where is the city in Africa sharing the same time zone. However, it's a little bit weird because the data was not collected in that city nor the place near by. My question is that if there is other way to set the tz to GMT+1 instead using my workaround to have a neutral meaning in my script? Regards, Keith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA Precdict - Seems to be predicting on the Training Data
Maybe you're getting strange results because you're not supplying a data object to lda() when you build your fit. When I do it the standard way, predict.lda() uses the new data and produces a result of length 6 as expected: myDat - read.csv(clipboard, sep=\t) fit - lda(c1 ~ v1 + v2 + v3, data=myDat[1:10,]) predict(fit, myDat[11:16,]) $class [1] c c c b c a Levels: a b c ... -- Tony Plate BostonR wrote: When I import a simple dataset, run LDA, and then try to use the model to forecast out of sample data, I get a forecast for the training set not the out of sample set. Others have posted this question, but I do not see the answers to their posts. Here is some sample data: DateNames v1 v2 v3 c1 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a Attached is the code: myDat -read.csv(file=f:\\Systematiq\\data\\TestData.csv, header=TRUE,sep=,) myData - data.frame(myDat) length(myDat[,1]) train - myDat[1:10,] outOfSample - myDat[11:16,] outOfSample - (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3)) outOfSample -data.frame(outOfSample) length(train[,1]) length(outOfSample[,1]) fit - lda(train$c1~train$v1+train$v2+train$v3) forecast - predict(fit,outOfSample)$class length(forecast)# I am expecting this to be same as lengthoutOfSample[,1]), which is 6 Output: length(forecast)# I am expecting this to be same as lengthoutOfSample[,1]), which is 6 [1] 10 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA Precdict - Seems to be predicting on the Training Data
This is not an explanation but it gives you a solution, Instead of using lda with a formula do it by giving the variables and the classification factor as arguments, base on your example and data: outOfSample - myDat[11:16,] train - myDat[1:10,] outOfSample - outOfSample[,3:5] train2 - train[,3:5] fit - lda(train2,train$c1) forecast - predict(fit,outOfSample)$class length(forecast) [1] 6 Seems that the problem arise when predict.lda works on lda fit applied to a formula class object. Hope this help, Gabriela. __ Lic. María Gabriela Cendoya Magíster en Biometría Profesor Adjunto Cátedra de Estadística y Diseño Facultad de Ciencias Agrarias Universidad Nacional de Mar del Plata __ - Original Message - From: BostonR dp...@capitaliq.com To: r-help@r-project.org Sent: Tuesday, October 20, 2009 11:31 AM Subject: [R] LDA Precdict - Seems to be predicting on the Training Data When I import a simple dataset, run LDA, and then try to use the model to forecast out of sample data, I get a forecast for the training set not the out of sample set. Others have posted this question, but I do not see the answers to their posts. Here is some sample data: Date Names v1 v2 v3 c1 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a Attached is the code: myDat -read.csv(file=f:\\Systematiq\\data\\TestData.csv, header=TRUE,sep=,) myData - data.frame(myDat) length(myDat[,1]) train - myDat[1:10,] outOfSample - myDat[11:16,] outOfSample - (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3)) outOfSample -data.frame(outOfSample) length(train[,1]) length(outOfSample[,1]) fit - lda(train$c1~train$v1+train$v2+train$v3) forecast - predict(fit,outOfSample)$class length(forecast)# I am expecting this to be same as lengthoutOfSample[,1]), which is 6 Output: length(forecast)# I am expecting this to be same as lengthoutOfSample[,1]), which is 6 [1] 10 -- View this message in context: http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ___ Aviso: = El contenido del presente e-mail y sus posibles adjuntos pertenecen al INTA y pueden contener información confidencial. Si usted no es el destinatario original de este mensaje y por este medio pudo acceder a dicha información, por favor solicitamos contactar al remitente y eliminar el mensaje de inmediato. Se encuentra prohibida la divulgación, copia, distribución o cualquier otro uso de la información contenida en el presente e-mail por parte de personas distintas al destinatario. This e-mail contents and its possible attachments belong to INTA and may contain confidential information. If this message was not originally addressed to you, but you have accessed to such information by this means, please contact the sender and eliminate this message immediately. Circulation, copy, distribution, or any other use of the information contained in this e-mail is not allowed on part of those different from the addressee. Antes de imprimir este mensaje, asegúrese de que sea necesario. Proteger el medio ambiente está también en su mano. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
How about unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum) chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum) -Peter Ehlers Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 11 10 01 01 00 11 01 The output should be x1 changed 0 3# has changed 3 times 1 1# has changed 1 time x1 unchanged 0 1# has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kendall.global
Hi every body: I need some help with kendall.global. The example in the manual seems not working well, and cannot used with my data, always the same error. data(mite) mite.hel - decostand(mite, hel) # Reproduce the results shown in Table 2 of Legendre (2005), a single group mite.small - mite.hel[c(4,9,14,22,31,34,45,53,61,69),c(13:15,23)] kendall.global(mite.small) Errore in FUN(newX[, i], ...) : .Random.seed is not an integer vector but of type 'list' kendall.post(mite.small, mult=holm) Errore in sample(R.gr[, j]) : .Random.seed is not an integer vector but of type 'list' # Reproduce the results shown in Tables 3 and 4 of Legendre (2005), 2 groups group -c(1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1,2,1,2,1,1,1,1,1,2,2,2,2,2) kendall.global(mite.hel, group=group) Errore in FUN(newX[, i], ...) : .Random.seed is not an integer vector but of type 'list' kendall.post(mite.hel, group=group, mult=holm, nperm=99) Errore in sample(R.gr[, j]) : .Random.seed is not an integer vector but of type 'list' Thank you very much if you know how to solve it. Rosa. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time Zone names on Windows
Keith, If you are working within a single time zone, including time zone information with each record does not seem necessary and you probably are not recording times to the sub-second. The best solution may thus be to use a simpler date/time class that does not include time zone information. You should consider the chron package and see R News 4/1 for helpful hints. chron makes construction and use of date/time records very easy: library(chron) dt - c(10/20/2009) tm - c(11:02:00) chron(dt,tm) [1] (10/20/09 11:02:00) Good luck! Glen Keith-119 wrote: My question is that if there is other way to set the tz to GMT+1 instead using my workaround to have a neutral meaning in my script? -- View this message in context: http://www.nabble.com/Time-Zone-names-on-Windows-tp25977196p25977926.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers Sent: Tuesday, October 20, 2009 8:48 AM To: Ashta Cc: R help Subject: Re: [R] Counting How about unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum) chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum) -Peter Ehlers When I hear 'count' I think first of the table() function. E.g., d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1)) with(d, table(x1, x1==x2)) x1 FALSE TRUE 0 31 1 12 or with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged x1 Changed Unchanged 0 3 1 1 1 2 or use dimnames- to change the labels on the table itself. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 11 10 01 01 00 11 01 The output should be x1 changed 0 3# has changed 3 times 1 1# has changed 1 time x1 unchanged 0 1# has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2x2 Contingency table with much sampling zeroes
On Tue, 20 Oct 2009, Etienne Toffin wrote: Hi, I'm analyzing experimental results where two different events (T1 and T2) can occur or not during an experiment. I made my experiments with one factor (Substrate) with two levels (Sand and Clay). I would like to know wether or not Substrate affects the occurrence probability of the two events. It is not clear to me what you mean by 'affects the occurence ...'. This sounds like 'Independence of Substrate from the two other variables', which is a 3 degree of freedom hypothesis (at least in the example you give). Is that what you are after or are only some of those contrasts interesting? Moreover, for each condition I would like to test the heterogeneity of my experimental contingency table with a theoretical one (from simulations). Do you mean you have some prior values for the counts or proportions? If so a standard goodness of fit test should do. If not, you need to describe the problem in more detail. However, my problem is that several cells have sampling zeroes. My experiments can't be done again to fill these cells. Thus Chi-square requirements are not fulfilled and I have to find another statistical method. Sampling zeroes in the cells are not a problem as long as the marginal tables do not have such zeroes. Depending on the hypotheses you want to test, the marginal tables may be OK. 'Substrate' is OK and so is 'T1 by T2', so you can do the 3 degree of freedom test implied by those margins. After spending hours searching for a solution, I thought I could use loglinear model to answer my questions, but : - I'm not sure I can use loglinear model = do I fulfill the required conditions ? Have you studied the Agresti reference listed in the help page?? I'll bet it addresses 'the required conditions' - which go to the sampling distribution of the counts. - would this method answer to my hypothesis ? - I not sure to really understand how I have to use loglin()… run example(loglin) and reread ?loglin The example is the same setup as you have here (albeit with more degrees of freedom), so you might emulate it. Here is the data frame of my results. DF-data.frame(Subs=c(rep(Sand,4),rep(Clay,4)),T1=rep(c(YES,YES,NO,NO),2),T2=rep(c(YES,NO,YES,NO),2),Freq=c(12,5,0,7,24,1,0,0)) What do you think of such datas ? Can I use any statistical method to test my hypothesis ? Any advice ? Recruit a statistician to your committee. Questions like these are better hashed out in front of a blackboard than over the internet. HTH, Chuck Thanks, Etienne Toffin --- Etienne Toffin, PhD Student Unit of Social Ecology Université Libre de Bruxelles, CP 231 Boulevard du Triomphe B-1050 Brussels Belgium Tel: +32(0)2/650.55.30 Fax: +32(0)2/650.57.67 http://www.ulb.ac.be/sciences/use/toffin.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kendall.global
Hi, I seem to be able to run it without any problems: kendall.global(mite.small) $Concordance_analysis Group.1 W 0.44160305 F 2.37252221 Prob.F 0.04403791 Chi2 15.89770992 Prob.perm 0.0450 attr(,class) [1] kendall.global It seems to be .Random.seed that is causing the problem, so maybe you can do this before running any of the commands, set.seed(1) HTH, Vik Rosa Manrique wrote: Hi every body: I need some help with kendall.global. The example in the manual seems not working well, and cannot used with my data, always the same error. data(mite) mite.hel - decostand(mite, hel) # Reproduce the results shown in Table 2 of Legendre (2005), a single group mite.small - mite.hel[c(4,9,14,22,31,34,45,53,61,69),c(13:15,23)] kendall.global(mite.small) Errore in FUN(newX[, i], ...) : .Random.seed is not an integer vector but of type 'list' kendall.post(mite.small, mult=holm) Errore in sample(R.gr[, j]) : .Random.seed is not an integer vector but of type 'list' # Reproduce the results shown in Tables 3 and 4 of Legendre (2005), 2 groups group -c(1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1,2,1,2,1,1,1,1,1,2,2,2,2,2) kendall.global(mite.hel, group=group) Errore in FUN(newX[, i], ...) : .Random.seed is not an integer vector but of type 'list' kendall.post(mite.hel, group=group, mult=holm, nperm=99) Errore in sample(R.gr[, j]) : .Random.seed is not an integer vector but of type 'list' Thank you very much if you know how to solve it. Rosa. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/kendall.global-tp25977847p25978086.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get rid of 2 for-loops and optimize runtime
-Original Message- From: Ian Willems [mailto:ian.will...@uz.kuleuven.ac.be] Sent: Tuesday, October 20, 2009 6:46 AM To: William Dunlap; r-help@r-project.org Subject: RE: [R] how to get rid of 2 for-loops and optimize runtime Hi William, Your programs works perfect and very fast for the table I'm using right now (only one match per row) If I want to reuse this code other tables, it can match with more than one row. Is it possible to adapt your code easily, if I have to sum the values of last month from different rows? You can use aggregate() with one of those keys to sum up the values with a common key value. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thank u for your help regards, Ian -Oorspronkelijk bericht- Van: William Dunlap [mailto:wdun...@tibco.com] Verzonden: maandag 19 oktober 2009 18:08 Aan: Ian Willems; r-help@r-project.org Onderwerp: RE: [R] how to get rid of 2 for-loops and optimize runtime -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Willems Sent: Monday, October 19, 2009 6:50 AM To: 'r-help@r-project.org' Subject: [R] how to get rid of 2 for-loops and optimize runtime Short: get rid of the loops I use and optimize runtime Dear all, I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 10) Table before Year month quarter yearmonth Service ... Amount 2009 9Q3092009 A ...120 2009 9Q3092009 B ... 80 2009 8Q3 082009 A ... 40 2009 7Q3 072009 A ... 50 The result I want Year month quarter yearmonth Service ...Amount amound_lastmonth 2009 9 Q3 092009 A ...120 40 2009 9 Q3 092009 B ...80 ... 2009 8 Q3 082009 A ...40 50 2009 7 Q3 072009 A ...50 ... Table is not exactly the same but gives a good idea what I have and what I want The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows. I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below) But I don't really understand the output I guess I need code that requires linear time and need to get rid of the 2 for loops. can someone help me or tell me what else I can do to optimize my runtime I use R 2.9.2 windows Xp service pack3 Thank you in advance Best regards, Willems Ian * dataset[,5]= month dataset[,3]= year dataset[,22]= amount dataset[,14]= servicetype [CODE] #for each row of the matrix check if each row has.. for (j in 1:Number_rows) { + sum-0 + for(i in 1:Number_rows){ + if (dataset[j,14]== dataset[i,14]) #..the same service type + {if (dataset[j,18]== dataset[i,18]) # .. the same department +{if (dataset[j,5]== 1) # if month=1, month ago is 12 and year is -1 + {if (12== dataset[i,5]) +{if ((dataset[j,3]-1)== dataset[i,3]) + + { sum-sum + dataset[i,22]} + }} + else { + if ((dataset[j,5]-1)== dataset[i,5]) if month != 1, month ago is month -1 + { if (dataset[j,3]== dataset[i,3]) + {sum-sum + dataset[i,22]} + }} match() is often useful for quickly finding the locations of many items in a vector. It has no special methods for data.frames so you must combine the columns of interest into a character vector of keys and use match on the key vectors. E.g. # your test data in a format that mail readers # can copy and paste into R: d - read.table(header=TRUE, textConnection( Year month quarter yearmonth Service Amount 2009 9Q3 092009 A 120 2009 9Q3 092009 B 80 2009 8Q3 082009 A 40 2009 7Q3 072009 A 50 )) # The key functions dKey - function(d) { with(d, paste(d$Year, d$month, d$Service, sep=;)) } keyThisMonth - function(d)dKey(d) keyPrevMonth - function(d) { stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service)) isJan - d$month==1 d$Year[isJan] - d$Year[isJan] - 1 d$month[isJan] - 12 d$month[!isJan] - d$month[!isJan] - 1 dKey(d) } # Make the new column: d$AmountPrevMonth - d$Amount[ match(keyPrevMonth(d), keyThisMonth(d)) ] # The result print(d) Year month quarter yearmonth
Re: [R] Counting
Nice solution, Bill. -Peter Ehlers William Dunlap wrote: From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers Sent: Tuesday, October 20, 2009 8:48 AM To: Ashta Cc: R help Subject: Re: [R] Counting How about unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum) chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum) -Peter Ehlers When I hear 'count' I think first of the table() function. E.g., d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1)) with(d, table(x1, x1==x2)) x1 FALSE TRUE 0 31 1 12 or with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged x1 Changed Unchanged 0 3 1 1 1 2 or use dimnames- to change the labels on the table itself. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 11 10 01 01 00 11 01 The output should be x1 changed 0 3# has changed 3 times 1 1# has changed 1 time x1 unchanged 0 1# has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotw5.png
I've shared a document with you: plotw5.png http://docs.google.com/Doc?docid=0AfJ5_yv8GrERZGY3NmhuMnNfMWZtcGhjc2d3hl=eninvite=CPzN4ZsF It's not an attachment -- it's stored online at Google Docs. To open this document, just click the link above. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems importing Unix SAS .ssd04 file to R (Win)
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of johannes rara Sent: Tuesday, October 20, 2009 12:26 AM To: r-help@r-project.org Subject: [R] Problems importing Unix SAS .ssd04 file to R (Win) Hello, I'm trying to import a SAS file made using SAS on Unix. Currently I'm using SAS on Windows and I'm trying to import that .ssd04 file to R. The file name of the file is testfile.ssd04 and it is located in 'M:\sasuser'. I'm using Windows XP and R 2.91. Basically what I'm doing is r code ## library(foreign) sashome - C:/Program Files/SAS Institute/SAS/V8 folder_for_datafiles - M:/sasuser read.ssd(folder_for_datafiles, testfile, sascmd=file.path(sashome, sas.exe)) SAS failed. SAS program at C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file41bb5af1.sas The log file will be file41bb5af1.log in the current directory NULL Warning message: In read.ssd(folder_for_datafiles, testfile, sascmd = file.path(sashome, : SAS return code was 2 ## This temporary SAS file 'file41bb5af1.sas' looks like this sas code # option validvarname = v6;libname src2rd 'M:/sasuser'; libname rd xport 'C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file6df11649'; proc copy in=src2rd out=rd; select testfile ; ## Any ideas what I'm doing wrong? sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MON ETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252 attached base packages: [1] graphics grDevices utils datasets grid stats methods base other attached packages: [1] foreign_0.8-38 gregmisc_2.1.1 gplots_2.7.1 caTools_1.9 bitops_1.0-4.1 gtools_2.6.1 gmodels_2.15.0 gdata_2.6.1 loaded via a namespace (and not attached): [1] MASS_7.2-49 C can you read that dataset just using your Windows SAS v8? Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] descriptive statistics qn
This is day one on R for me, I am trying to figure out how to do simple computations. For example I have a data set with 200 observations. I am trying to compute the mean and variance in r for 1:25 (first 25 observations); 1:50 (first 50 obs) ; 1:100th observation etc. Here is the dataset: Id value 1 2.2338 2 3.13597 3 1.98685 4 4.35593 5 2.43963 6 4.20262 7 3.12131 8 4.79583 9 3.13937 10 3.47266 11 3.79954 12 4.32285 13 3.00058 14 1.26102 15 4.07329 16 3.42037 17 1.99195 18 5.4461 19 3.03448 2 2.63395 21 0.99851 22 4.59577 23 2.77719 24 4.82748 25 4.11373 26 2.38615 27 1.81247 28 0.920195 29 2.78129 30 3.10531 31 3.60793 32 1.66597 33 4.80673 34 4.80266 35 5.35791 36 2.44573 37 2.63854 38 4.13737 39 1.68496 40 3.86962 41 1.60028 42 4.35621 43 3.60847 44 2.38884 45 1.03786 46 5.48904 47 1.95086 48 2.75401 49 2.25081 50 3.46435 51 0.309484 52 2.49405 53 4.52522 54 1.17947 55 2.50136 56 1.74537 57 5.48072 58 1.81081 59 2.67793 60 4.40603 61 3.21767 62 2.68124 63 1.9723 64 1.25973 65 2.64177 66 3.28414 67 3.2394 68 4.07508 69 2.32812 70 3.42763 71 2.40944 72 2.43776 73 0.072419 74 6.29773 75 2.29704 76 6.07705 77 1.59146 78 4.74327 79 5.29088 80 4.17874 81 1.80639 82 3.70503 83 3.13294 84 5.44017 85 3.50002 86 2.1762 87 2.27356 88 4.45687 89 1.99272 90 3.74362 91 3.89474 92 0.93481 93 4.99464 94 1.02872 95 2.63587 96 3.85226 97 4.82513 98 1.1171 99 1.09368 100 3.91599 101 3.91832 102 1.81166 103 1.48144 104 5.24373 105 3.15134 106 3.03343 107 3.373 108 6.10409 109 4.84878 110 1.95195 111 2.25949 112 1.45546 113 3.60767 114 3.10228 115 3.22694 116 0.861855 117 3.93392 118 2.81212 119 1.62434 120 2.41149 121 4.42936 122 2.4765 123 0.979153 124 3.3482 125 1.20345 126 1.56576 127 5.1418 128 3.5585 129 3.21124 130 3.52901 131 2.38191 132 3.6129 133 3.56378 134 2.70181 135 0.355836 136 3.10734 137 6.22159 138 4.7172 139 2.88295 140 2.87566 141 2.01538 142 2.78835 143 2.26519 144 1.38217 145 2.63885 146 1.0584 147 1.93247 148 3.10316 149 2.24714 150 5.08326 151 2.21789 152 4.63046 153 3.90057 154 2.19171 155 5.46094 156 1.84918 157 1.04059 158 4.57513 159 5.06956 160 3.65212 161 1.14078 162 5.89406 163 2.86261 164 1.87261 165 4.47285 166 3.13892 167 1.59146 168 1.07166 169 2.82925 170 2.83742 171 5.13528 172 3.20376 173 1.56253 174 3.08726 175 3.52339 176 3.66837 177 2.70302 178 2.00471 179 3.6823 180 2.90728 181 2.39302 182 2.54367 183 3.98498 184 4.17124 185 1.99922 186 2.61908 187 2.24057 188 3.29836 189 2.59912 190 4.49458 191 1.44657 192 2.99004 193 5.15769 194 2.64342 195 1.81755 196 5.1977 197 4.10043 198 4.63281 199 3.20347 200 2.08211 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
Hi Bill and all, On Tue, Oct 20, 2009 at 12:09 PM, William Dunlap wdun...@tibco.com wrote: From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers Sent: Tuesday, October 20, 2009 8:48 AM To: Ashta Cc: R help Subject: Re: [R] Counting How about unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum) chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum) -Peter Ehlers When I hear 'count' I think first of the table() function. E.g., d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1)) with(d, table(x1, x1==x2)) x1 FALSE TRUE 0 3 1 1 1 2 or with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged x1 Changed Unchanged 0 3 1 1 1 2 or use dimnames- to change the labels on the table itself. This works very well for numeric. How about if the factors are character such as F and M (male and female) ? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 1 1 1 0 0 1 0 1 0 0 1 1 0 1 The output should be x1 changed 0 3 # has changed 3 times 1 1 # has changed 1 time x1 unchanged 0 1 # has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlSave does not append the records to a DB2 table
Hello Caveman! I am able to append records using other applications. My credentials have the authority to append. I am able to query to select records, and even delete records from the remote DB2 table using RODBC. I'm just stuck when it comes to APPEND. The DB2 database is on an AIX server, and I am on a Windows XP laptop. Elaine McGovern Jones ISC Tape and DASD Storage Products Characterization and Failure Analysis Engineering Phone: 408 705-9588 Internal tieline: 587-9588 jon...@us.ibm.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descriptive statistics qn
foo is the data frame you provided mean(foo[1:25,2]) var(foo[1:25,2]) indexing key: foo[rows,cols] hth Stephen Sefick On Tue, Oct 20, 2009 at 11:04 AM, Sibusiso Moyo sm...@vt.edu wrote: This is day one on R for me, I am trying to figure out how to do simple computations. For example I have a data set with 200 observations. I am trying to compute the mean and variance in r for 1:25 (first 25 observations); 1:50 (first 50 obs) ; 1:100th observation etc. Here is the dataset: Id value 1 2.2338 2 3.13597 3 1.98685 4 4.35593 5 2.43963 6 4.20262 7 3.12131 8 4.79583 9 3.13937 10 3.47266 11 3.79954 12 4.32285 13 3.00058 14 1.26102 15 4.07329 16 3.42037 17 1.99195 18 5.4461 19 3.03448 2 2.63395 21 0.99851 22 4.59577 23 2.77719 24 4.82748 25 4.11373 26 2.38615 27 1.81247 28 0.920195 29 2.78129 30 3.10531 31 3.60793 32 1.66597 33 4.80673 34 4.80266 35 5.35791 36 2.44573 37 2.63854 38 4.13737 39 1.68496 40 3.86962 41 1.60028 42 4.35621 43 3.60847 44 2.38884 45 1.03786 46 5.48904 47 1.95086 48 2.75401 49 2.25081 50 3.46435 51 0.309484 52 2.49405 53 4.52522 54 1.17947 55 2.50136 56 1.74537 57 5.48072 58 1.81081 59 2.67793 60 4.40603 61 3.21767 62 2.68124 63 1.9723 64 1.25973 65 2.64177 66 3.28414 67 3.2394 68 4.07508 69 2.32812 70 3.42763 71 2.40944 72 2.43776 73 0.072419 74 6.29773 75 2.29704 76 6.07705 77 1.59146 78 4.74327 79 5.29088 80 4.17874 81 1.80639 82 3.70503 83 3.13294 84 5.44017 85 3.50002 86 2.1762 87 2.27356 88 4.45687 89 1.99272 90 3.74362 91 3.89474 92 0.93481 93 4.99464 94 1.02872 95 2.63587 96 3.85226 97 4.82513 98 1.1171 99 1.09368 100 3.91599 101 3.91832 102 1.81166 103 1.48144 104 5.24373 105 3.15134 106 3.03343 107 3.373 108 6.10409 109 4.84878 110 1.95195 111 2.25949 112 1.45546 113 3.60767 114 3.10228 115 3.22694 116 0.861855 117 3.93392 118 2.81212 119 1.62434 120 2.41149 121 4.42936 122 2.4765 123 0.979153 124 3.3482 125 1.20345 126 1.56576 127 5.1418 128 3.5585 129 3.21124 130 3.52901 131 2.38191 132 3.6129 133 3.56378 134 2.70181 135 0.355836 136 3.10734 137 6.22159 138 4.7172 139 2.88295 140 2.87566 141 2.01538 142 2.78835 143 2.26519 144 1.38217 145 2.63885 146 1.0584 147 1.93247 148 3.10316 149 2.24714 150 5.08326 151 2.21789 152 4.63046 153 3.90057 154 2.19171 155 5.46094 156 1.84918 157 1.04059 158 4.57513 159 5.06956 160 3.65212 161 1.14078 162 5.89406 163 2.86261 164 1.87261 165 4.47285 166 3.13892 167 1.59146 168 1.07166 169 2.82925 170 2.83742 171 5.13528 172 3.20376 173 1.56253 174 3.08726 175 3.52339 176 3.66837 177 2.70302 178 2.00471 179 3.6823 180 2.90728 181 2.39302 182 2.54367 183 3.98498 184 4.17124 185 1.99922 186 2.61908 187 2.24057 188 3.29836 189 2.59912 190 4.49458 191 1.44657 192 2.99004 193 5.15769 194 2.64342 195 1.81755 196 5.1977 197 4.10043 198 4.63281 199 3.20347 200 2.08211 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] underflow of fisher.test result
On Tue, Oct 20, 2009 at 9:14 AM, Ted Harding ted.hard...@manchester.ac.uk wrote: On 20-Oct-09 13:34:49, Peng Yu wrote: fisher.test() gives a very small p-value, which is underflow on my machine. However, the log of it should not be underflow. I'm wondering if there is a way get log() of a small p-value. An approximation is acceptable in this case. Thank you! fisher.test(rbind(c(1,10),c(10,1000)))$p.value [1] 0 I have not attempted an exact approach (though may do so later), but the P-value in question is about 1e-15000 so (using log to base e) log(P) approx = -33000 In such a case, P=0 is a pretty good approximation! Which prompts the question: Why the interest in having the value of such a very small number? This is just an example showing the result is underflow. You don't have to take it too literally. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descriptive statistics qn
Type ?mean ?sd in the R prompt. For your specific goal, type mean(r[1:25]) in the prompt. Proceed analogously for sd Just download one of the very many beginner's manuals or introductory course materials that you get for free on the internet, and get Ricci's R-refcard. Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Sibusiso Moyo Gesendet: Tuesday, October 20, 2009 12:05 PM An: r-help@r-project.org Betreff: [R] descriptive statistics qn This is day one on R for me, I am trying to figure out how to do simple computations. For example I have a data set with 200 observations. I am trying to compute the mean and variance in r for 1:25 (first 25 observations); 1:50 (first 50 obs) ; 1:100th observation etc. Here is the dataset: Id value 1 2.2338 2 3.13597 3 1.98685 4 4.35593 5 2.43963 6 4.20262 7 3.12131 8 4.79583 9 3.13937 10 3.47266 11 3.79954 12 4.32285 13 3.00058 14 1.26102 15 4.07329 16 3.42037 17 1.99195 18 5.4461 19 3.03448 2 2.63395 21 0.99851 22 4.59577 23 2.77719 24 4.82748 25 4.11373 26 2.38615 27 1.81247 28 0.920195 29 2.78129 30 3.10531 31 3.60793 32 1.66597 33 4.80673 34 4.80266 35 5.35791 36 2.44573 37 2.63854 38 4.13737 39 1.68496 40 3.86962 41 1.60028 42 4.35621 43 3.60847 44 2.38884 45 1.03786 46 5.48904 47 1.95086 48 2.75401 49 2.25081 50 3.46435 51 0.309484 52 2.49405 53 4.52522 54 1.17947 55 2.50136 56 1.74537 57 5.48072 58 1.81081 59 2.67793 60 4.40603 61 3.21767 62 2.68124 63 1.9723 64 1.25973 65 2.64177 66 3.28414 67 3.2394 68 4.07508 69 2.32812 70 3.42763 71 2.40944 72 2.43776 73 0.072419 74 6.29773 75 2.29704 76 6.07705 77 1.59146 78 4.74327 79 5.29088 80 4.17874 81 1.80639 82 3.70503 83 3.13294 84 5.44017 85 3.50002 86 2.1762 87 2.27356 88 4.45687 89 1.99272 90 3.74362 91 3.89474 92 0.93481 93 4.99464 94 1.02872 95 2.63587 96 3.85226 97 4.82513 98 1.1171 99 1.09368 100 3.91599 101 3.91832 102 1.81166 103 1.48144 104 5.24373 105 3.15134 106 3.03343 107 3.373 108 6.10409 109 4.84878 110 1.95195 111 2.25949 112 1.45546 113 3.60767 114 3.10228 115 3.22694 116 0.861855 117 3.93392 118 2.81212 119 1.62434 120 2.41149 121 4.42936 122 2.4765 123 0.979153 124 3.3482 125 1.20345 126 1.56576 127 5.1418 128 3.5585 129 3.21124 130 3.52901 131 2.38191 132 3.6129 133 3.56378 134 2.70181 135 0.355836 136 3.10734 137 6.22159 138 4.7172 139 2.88295 140 2.87566 141 2.01538 142 2.78835 143 2.26519 144 1.38217 145 2.63885 146 1.0584 147 1.93247 148 3.10316 149 2.24714 150 5.08326 151 2.21789 152 4.63046 153 3.90057 154 2.19171 155 5.46094 156 1.84918 157 1.04059 158 4.57513 159 5.06956 160 3.65212 161 1.14078 162 5.89406 163 2.86261 164 1.87261 165 4.47285 166 3.13892 167 1.59146 168 1.07166 169 2.82925 170 2.83742 171 5.13528 172 3.20376 173 1.56253 174 3.08726 175 3.52339 176 3.66837 177 2.70302 178 2.00471 179 3.6823 180 2.90728 181 2.39302 182 2.54367 183 3.98498 184 4.17124 185 1.99922 186 2.61908 187 2.24057 188 3.29836 189 2.59912 190 4.49458 191 1.44657 192 2.99004 193 5.15769 194 2.64342 195 1.81755 196 5.1977 197 4.10043 198 4.63281 199 3.20347 200 2.08211 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem using the source-function within R-functions
The problem probably lies in the source-ing part: look at getwd() setwd() HTH, Giovanni Date: Tue, 20 Oct 2009 13:00:02 +0200 From: Johan Lassen jle...@gmail.com Sender: r-help-boun...@r-project.org Precedence: list DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; --===0554064772== Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 1477 Dear R community, You may have the solution to how to construct a function using the function source() to build the function; i.e. myfunction - function(...){ source('file1.r') source('file2.r') } After compiling and installing the myfunction in R, then calling the myfunction gives an error because the content of 'file1.r' and 'file2.r' seems to be missing. Anyone has the trick to overcome this problem? Thanks in advance! best wishes, Johan PS: My function is: run_accumm_value - function(ind_noder_0, ind_loc_val,ind_retention,downstream){ ## Preprocessing of looping calculations: koersel_uden_ret - length(unique(ind_noder_0$oplid)) opsaml_b_0_2 - numeric(koersel_uden_ret) opsaml_b_0_2_1 - numeric(koersel_uden_ret) opsaml_b_0_2_2 - seq(1:koersel_uden_ret) ## Preprocessing of topology and local values to be summed: source('preproces_topology.r', local =3D T) source('preproces_loc_val.r', local =3D T) # Loop for each grouping factor (column in ind_noder_0: oplid): for(j in 1:koersel_uden_ret){ source('matrix_0.r', local =3D T) source('matrix.r', local =3D T) source('local_value.r', local =3D T) source('fordeling.r', local =3D T) source('fordeling_manuel.r', local =3D T) source('local_ret.r', local =3D T) source('Ax=3Db.r', local =3D T) source('opsamling_x_0_acc.r', local =3D T) } source('opsamling_b_1.r', local =3D T) opsaml_b_2 } --=20 Johan Lassen Environment Center Nyk=F8bing F Denmark [[alternative HTML version deleted]] --===0554064772== Content-Type: text/plain; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --===0554064772==-- -- Giovanni Petris gpet...@uark.edu Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlSave does not append the records to a DB2 table
Hi Felipe, Thanks for your message. Your approach works for an Access database. I am trying to insert records into a table in a DB2 database on a remote AIX server. My userid has the authority to insert records, and I am able to do that using another application (DB2 Client command line). Sincerely, Elaine McGovern Jones ISC Tape and DASD Storage Products Characterization and Failure Analysis Engineering Phone: 408 705-9588 Internal tieline: 587-9588 jon...@us.ibm.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Systemfit package
Dear Arne Henningsen, I send you this message because I have question with regard to systemfit package. I hope you answer to my request. I estimated a system of equation bu using SUR method. The function summary(xx) gives me summary of estimated equation system. However, this function does not give my the value of the durbin watson statistic for each one of my equations (to chek for serial correlation). Thus, my question is is there any function in the systemfit package which permit to return the value of durbin watson statistic or should I write my own program ? Thank you in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AFT-model with time-dependent covariates
Goran, You had stated that: I have only coded the log likelihood function, and I use 'optim' (in R code) and 'BFGS', without gradient, and I ask for a numerically differentiated Hessian, which I use for calculating standard errors and p-values. Tests show that this works surprisingly well, but for huge data sets it is very slow. If it was possible to ask for a hessian in the C version of BFGS, things would improve a lot. Here is a suggestion that could potentially work well for large data sets. There is a function called `bobyqa' in a package called minqa that is currently on R-forge (soon to be released; Kate Mullen is the author). http://r-forge.r-project.org/R/?group_id=395 This does not use derivatives, and might be faster on large-data sets. You could then use the numDeriv package to calculate the hessian of the log-likelihood at the converged parameter value. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.html -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Göran Broström Sent: Tuesday, October 20, 2009 10:07 AM To: spencerg Cc: r-help@r-project.org; Terry Therneau Subject: Re: [R] AFT-model with time-dependent covariates Sorry for being late in responding to this thread, but I was made aware of it only two weeks ago. In my package 'eha' there is a function 'aftreg', which performs what is asked for, given that the time-varying covariates are step functions of time and that the observation period for each individual is an interval. Left truncation works if it can be assumed that the covariate values during the first non-observable interval are the same as at the beginning of the first interval under observation. As Terry wrote, there is a lot of book-keeping in writing down the likelihood function, and even more for scores and the hessian, so I adopted a lazy way in the programming: I have only coded the log likelihood function, and I use 'optim' (in R code) and 'BFGS', without gradient, and I ask for a numerically differentiated Hessian, which I use for calculating standard errors and p-values. Tests show that this works surprisingly well, but for huge data sets it is very slow. If it was possible to ask for a hessian in the C version of BFGS, things would improve a lot. Also note that you need the latest version (1.2-12) of eha for this to work. aftreg in arlier versions only works (correctly) for time-constant covariates. And, this is not well tested, so care is needed. And I appreciate performance and bug reports, of course. Göran On Wed, May 13, 2009 at 9:34 PM, spencerg spencer.gra...@prodsyse.com wrote: To see what's available in other packages, try the following: library(RSiteSearch) AFT - RSiteSearch.function('AFT model') summary(AFT) # 24 help files found in 8 different packages HTML(AFT) # opens a table with 24 rows in a web browser. There may be nothing here that will help you, but this provides a quick overview of what's available. If this doesn't find what you want, it either has not been contributed or its help page does not use the phrase AFT model. Hope this helps. Spencer Graves Terry Therneau wrote: The coding for an AFT model with time-dependent covariates will be very hard, and I don't know of anyone who has done it. (But I don't keep watch of other survival packages, so something might be there). In a Cox model, a subject's risk depends only on the current value of his/her covariates; in an AFT model the risk depends on the entire covariate history. (My 'accelerated age' is the sum of all the extra years I have ever gained). Coding this is not theoretically complex, but would be a pain-in-the-rear amount of bookkeeping. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Göran Broström __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] Interpretation of VarCorr results
On Tue, Oct 20, 2009 at 5:09 AM, r-quantide r...@quantide.com wrote: Dear all, I'm working with a model to estimate components of variance by using the lme() function. The model is as the following: model=lme(fixed=X~1+as.factor(station),data=myData,na.action=na.exclude,rand om=~N+1|spliceOrHoming) Where X is the response measured variable, station is treated as fixed effects factor, N is a continuous variable, and spliceOrHoming is a (ordered) factor. The idea is that the response variable (X) varies almost linearly with N (in a decreasing fashion) , but the slope and intercept of this variation change with the levels of spliceOrHoming random factor. Now, the REML estimated standard deviation components are: StdDev Corr (Intercept) 5.274544e-01 (Intr) N 7.912717e-05 -0.58 Residual 1.508647e-01 My questions are: . Is the model correctly specified? This depends on the questions you are trying to approach with the model... . If yes, how should I read the standard deviation estimate for N (7.912717e-05)? Is this the standard deviation for a unitary variation of N? Or is it the standard deviation due to the global contribution of the N independent variable? (note that I have a strong feeling that N contributes a lot to the final variance; so, the value that I obtain here seems too small for my beliefs) Generally if you have a random slope you would also include that covariate in the fixed component, producing an estimate of the population (or marginal) slope in the fixed effects output, and an estimate of the variation in slopes around that population slope among the levels of the grouping variable (here spliceOrHoming) in the random effects output. . Is there any methods/functions to obtain the variance components for the station factor too? (please, give me some references, at least.) With the proper design (i.e. enough data in the appropriate levels) you might specify something like: random = ~N + station - 1|spliceOrHoming To get estimates of the between group variation for each of the levels of the station factor, as well as for the N slope. hth, Kingsford Jones Thank you a lot in advance for every suggestions you'll give me. Enrico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems importing Unix SAS .ssd04 file to R (Win)
Thanks Daniel for your response. Actually I haven't tried that yet, cause I'm not so fluent with SAS. I got this file from a colleague. I was thinking that R can read these files, but maybe the problem is with SAS. -Johannes C can you read that dataset just using your Windows SAS v8? Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 2009/10/20 johannes rara johannesr...@gmail.com: Hello, I'm trying to import a SAS file made using SAS on Unix. Currently I'm using SAS on Windows and I'm trying to import that .ssd04 file to R. The file name of the file is testfile.ssd04 and it is located in 'M:\sasuser'. I'm using Windows XP and R 2.91. Basically what I'm doing is r code ## library(foreign) sashome - C:/Program Files/SAS Institute/SAS/V8 folder_for_datafiles - M:/sasuser read.ssd(folder_for_datafiles, testfile, sascmd=file.path(sashome, sas.exe)) SAS failed. SAS program at C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file41bb5af1.sas The log file will be file41bb5af1.log in the current directory NULL Warning message: In read.ssd(folder_for_datafiles, testfile, sascmd = file.path(sashome, : SAS return code was 2 ## This temporary SAS file 'file41bb5af1.sas' looks like this sas code # option validvarname = v6;libname src2rd 'M:/sasuser'; libname rd xport 'C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file6df11649'; proc copy in=src2rd out=rd; select testfile ; ## Any ideas what I'm doing wrong? sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252 attached base packages: [1] graphics grDevices utils datasets grid stats methods base other attached packages: [1] foreign_0.8-38 gregmisc_2.1.1 gplots_2.7.1 caTools_1.9 bitops_1.0-4.1 gtools_2.6.1 gmodels_2.15.0 gdata_2.6.1 loaded via a namespace (and not attached): [1] MASS_7.2-49 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlSave does not append the records to a DB2 table
Now it becomes strange. One thing I note from this generated query: INSERT INTO STORAGE.TEST_APPEND2 ( MACRONAME, MACROUSER, MACRO_RT ) VALUES ( 's_ej_mach_config_vz', 'jones2', 5 ) The names of the variables are in double quotes; That is a problem. Can you try to run this query on another application and see what you get. Caveman On Tue, Oct 20, 2009 at 7:05 PM, Elaine Jones jon...@us.ibm.com wrote: Hi Felipe, Thanks for your message. Your approach works for an Access database. I am trying to insert records into a table in a DB2 database on a remote AIX server. My userid has the authority to insert records, and I am able to do that using another application (DB2 Client command line). Sincerely, Elaine McGovern Jones ISC Tape and DASD Storage Products Characterization and Failure Analysis Engineering Phone: 408 705-9588 Internal tieline: 587-9588 jon...@us.ibm.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpretation of VarCorr results
On Tue, Oct 20, 2009 at 5:09 AM, r-quantide r...@quantide.com wrote: [snip] . Is there any methods/functions to obtain the variance components for the station factor too? (please, give me some references, at least.) Pinheiro and Bates 2000 is (practically) a prerequisite for intelligent use of the nlme package... Thank you a lot in advance for every suggestions you'll give me. Enrico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
Ashta wrote: Hi Bill and all, On Tue, Oct 20, 2009 at 12:09 PM, William Dunlap wdun...@tibco.com wrote: From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers Sent: Tuesday, October 20, 2009 8:48 AM To: Ashta Cc: R help Subject: Re: [R] Counting How about unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum) chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum) -Peter Ehlers When I hear 'count' I think first of the table() function. E.g., d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1)) with(d, table(x1, x1==x2)) x1 FALSE TRUE 0 31 1 12 or with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged x1 Changed Unchanged 0 3 1 1 1 2 or use dimnames- to change the labels on the table itself. This works very well for numeric. How about if the factors are character such as F and M (male and female) ? Did you try it? Works fine for me. -Peter Ehlers Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 11 10 01 01 00 11 01 The output should be x1 changed 0 3# has changed 3 times 1 1# has changed 1 time x1 unchanged 0 1# has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlSave does not append the records to a DB2 table
The query generated by R looks ok to me. I went ahead and pasted it into my DB2 Client command editor. And I got the message confirmation message. INSERT INTO STORAGE.TEST_APPEND2 ( MACRONAME, MACROUSER, MACRO_RT ) VALUES ( 's_ej_mach_config_vz', 'jones2', 5 ) DB2I The SQL command completed successfully. And I confirmed the record was added to the table. Elaine McGovern Jones ISC Tape and DASD Storage Products Characterization and Failure Analysis Engineering Phone: 408 705-9588 Internal tieline: 587-9588 jon...@us.ibm.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New Award Announcement ASA Stat Comp/Graph Sections: The Statistical Computing and Graphics Award
The Statistical Computing and Graphics Award The ASA Sections of Statistical Computing and Statistical Graphics have established the Statistical Computing and Graphics Award to recognize an individual or team for innovation in computing, software, or graphics that has had a great impact on statistical practice or research. Typically, awards are granted bi-annually. The prize carries with it a cash award of $5,000 plus an allowance of up to $1,000 for travel to the annual Joint Statistical Meetings (JSM) where the award will be presented. Qualifications The prize-winning contribution will have had significant and lasting impact on statistical computing, software or graphics. The Awards Committee depends on the American Statistical Association membership to submit nominations. Committee members will review the nominations and make the final determination of who, if any, should receive the award. The award may not be given to a sitting member of the Awards Committee or a sitting member of the Executive Committee of the Section of Statistical Computing or the Section of Statistical Graphics. Nomination and Award Dates Nominations are due by December 15 of the award year. The award is presented at the Joint Statistical Meetings in August of the same year. The first award will be given in 2010, and subsequent awards are to be made at most bi-annually according to the discretion of the Awards Committee. Nominations should be submitted as a complete packet, consisting of: * nomination letter, no longer than four pages, addressing points in the selection criteria * nominee's curriculum vita(s) * maximum of four supporting letters, each no longer than two pages Selection Process The Awards Committee will consist of the Chairs and Past Chairs of the Sections on Statistical Computing and Statistical Graphics. The selection process will be handled by the Awards Chair of the Statistical Computing Section. Nominations and questions are to be sent to the e-mail address below. Fei Chen Avaya Labs 233 Mt Airy Rd Basking Ridge, NJ 07920 f...@avaya.com _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/171222986/direct/01/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ScatterPlot
I am trying to make a scatterplot with containing three columns. I know how to plot the two columns but now the third column consists of M or F (male or female) and I don't know how to separate the data so I can make two separate regression lines on the same graph. meta #name of file plot(meta$mass, meta$rate, xlab=mass, ylab=rate) I can make the regression line with all the data but I just don't know how to split everything up. abline(lm(rate ~ mass, data=meta), col=red) Thanks, Marsha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New Award Announcement ASA Stat Comp/Graph Sections: The Statistical Computing and Graphics Award
Apologies for a mistake in the announcement. To clarify, for the 2010 Award, nomination deadline is Dec 15, 2009, not Dec 15, 2010 as the announcement had implied. From: f...@live.com To: r-help@r-project.org Subject: New Award Announcement ASA Stat Comp/Graph Sections: The Statistical Computing and Graphics Award Date: Tue, 20 Oct 2009 15:03:43 -0400 The Statistical Computing and Graphics Award The ASA Sections of Statistical Computing and Statistical Graphics have established the Statistical Computing and Graphics Award to recognize an individual or team for innovation in computing, software, or graphics that has had a great impact on statistical practice or research. Typically, awards are granted bi-annually. The prize carries with it a cash award of $5,000 plus an allowance of up to $1,000 for travel to the annual Joint Statistical Meetings (JSM) where the award will be presented. Qualifications The prize-winning contribution will have had significant and lasting impact on statistical computing, software or graphics. The Awards Committee depends on the American Statistical Association membership to submit nominations. Committee members will review the nominations and make the final determination of who, if any, should receive the award. The award may not be given to a sitting member of the Awards Committee or a sitting member of the Executive Committee of the Section of Statistical Computing or the Section of Statistical Graphics. Nomination and Award Dates Nominations are due by December 15 of the award year. The award is presented at the Joint Statistical Meetings in August of the same year. The first award will be given in 2010, and subsequent awards are to be made at most bi-annually according to the discretion of the Awards Committee. Nominations should be submitted as a complete packet, consisting of: * nomination letter, no longer than four pages, addressing points in the selection criteria * nominee's curriculum vita(s) * maximum of four supporting letters, each no longer than two pages Selection Process The Awards Committee will consist of the Chairs and Past Chairs of the Sections on Statistical Computing and Statistical Graphics. The selection process will be handled by the Awards Chair of the Statistical Computing Section. Nominations and questions are to be sent to the e-mail address below. Fei Chen Avaya Labs 233 Mt Airy Rd Basking Ridge, NJ 07920 f...@avaya.com Hotmail: Powerful Free email with security by Microsoft. Get it now. _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/171222986/direct/01/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dummy variables or factors?
Dear R-people, I am analyzing epidemiological data using GLMM using the lmer package. I usually explore the assumption of linearity of continuous variables in the logit of the outcome by creating 4 categories of the variable, performing a bivariate logistic regression, and then plotting the coefficients of each category against their mid points. That gives me a pretty good idea about the linearity assumption and possible departures from it. I know of people who create 0,1 dummy variables in order to relax the linearity assumption. However, I've read that dummy variables are never needed (nor are desireble) in R! Instead, one should make use of factors variable. That is much easier to work with than dummy variables and the model itself will create the necessary dummy variables. Having said that, if my data violates the linearity assumption, does the use of a factors for the variable in question helps overcome the lack of linearity? Thanks in advance, Luciano Yahoo! Cocina Encontra las mejores recetas con Yahoo! Cocina. http://ar.mujer.yahoo.com/cocina/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fixed: Re: Re: [stats-rosuda-devel] how to install JGR manually?
To follow up: I believe I had a version mismatch between one of the packages and R. I made sure to install the latest of all, at which point jgr.exe launched just fine. Thanks to Simon and Liviu for their comments. Just one note: the Windows launcher does not provide any info if it can't reach the 'net (as opposed to listing missing packages as suggested previously). Carl Oct 16, 2009 10:31:48 AM, simon.urba...@r-project.org wrote: === On Oct 16, 2009, at 9:14 , Liviu Andronic wrote: (cc'ing JGR specific list) Yes, thanks, stats-rosuda-devel is where this post should go. Carl, On 10/15/09, Carl Witthoft wrote: Here's the problem: on Windows, the 'jgr.exe' tool starts up by checking for a connecting to the 'net in order to grab the support packages. Well, we have machines at work that are not and never will be connected to the Internet. I tried manually installing all the packages (JGR, Rjava, etc) but the jgr.exe still tries to find a connection. What this means that you have either installed them in the wrong library or you have not installed them all. The launcher actually tells you exactly which packages are missing so it should be easy to install them manually. If in doubt, remove your preferences file .JGRprefsrc (in your home). If you're still stuck, please tell us exactly your setup (where you installed R, the packages etc.). Also try to run JGR with --debug and it will create a file C:\JGRdebug.txt with additional information that may be helpful. Cheers, Simon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm.fit to use LAPACK instead of LINPACK
Hi, I understand that the glm.fit calls LINPACK fortran routines instead of LAPACK because it can handle the 'rank deficiency problem'. If my data matrix is not rank deficient, would a glm.fit function which runs on LAPACK be faster? Would this be worthwhile to convert glm.fit to use LAPACK? Has anyone done this already?? What is the best way to do this? I'm looking at very large datasets (thousands of glm calls), and would like to know if it's worth the effort for performance issues. Thanks, Ted - Ted Chiang Bioinformatics Analyst Centre for Computational Biology Hospital for Sick Children, Toronto 416.813.7028 tchi...@sickkids.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Systemfit package
Hi Axel, On Tue, Oct 20, 2009 at 7:09 PM, Axel Leroix axel.ler...@yahoo.fr wrote: I estimated a system of equation bu using SUR method. The function summary(xx) gives me summary of estimated equation system. However, this function does not give my the value of the durbin watson statistic for each one of my equations (to chek for serial correlation). Thus, my question is is there any function in the systemfit package which permit to return the value of durbin watson statistic or should I write my own program ? The systemfit package does not provide any tools for the durbin watson test statistic (yet). So, you are invited to write this tool, whereas you could either add this via the SVN repo on R-Forge or send it to me by email :-) /Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] TukeyHSD no longer working with aov output?
I can prove I've done this before, but I recently installed Rexcel (and it was easiest to reinstall R and some other bits to make it work) and now its no longer working. Before I would do an ANOVA and a tukey post-hoc like this: data1.aov=aov(result~factor1*factor2, data=data1) then... TukeyHSD(summary(data1.aov)) and it would give me a nice tukey table of all the pairwise comparisons, now though it gives me: in UseMethod(TukeyHSD) : no applicable method for TukeyHSD Has something changed? Does TukeyHSD no longer accept aov results? Is there a package I'm missing? I am very frustrated. Thanks, Clayton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest - partial dependence plot
Thanks Andy, I'm sorry, I didn't clear myself. I was talking about the y-axis, so your explanation was very helpful. On Tue, Oct 20, 2009 at 9:39 AM, Liaw, Andy andy_l...@merck.com wrote: Are you talking about the y-axis or the x-axis? If you're talking about the y-axis, that range isn't really very meaningful. The partial dependence function basically gives you the average trend of that variable (integrating out all others in the model). It's the shape of that trend that is important. You may interpret the relative range of these plots from different predictor variables, but not the absolute range. Hope that helps. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Carlos M. Zambrana-Torrelio Sent: Monday, October 19, 2009 3:47 PM To: r-help@r-project.org Subject: [R] Random Forest - partial dependence plot Hi everybody, I used random forest regression to explain the patterns of species richness and a bunch of climate variables (e.g. Temperature, precipitation, etc.) All are continuos variables. My results are really interesting and my model explained 96,7% of the variance. Now I am trying to take advantage of the importance variable function and depicts the observed patterns using partial dependence plots. However, I found a really strange (at least for me...) behavior: the species number ranges between 1 to 150, but when I make the partial plot the graphic only represent values between 43 to 50!! I use the following code to get the partial plot: partialPlot(ampric.rf, amp.data, Temp) where ampric.rf is the random forest object; amp.data are the data and Temp is the variable I am interested. How I can have partial plot explaining all species number (from 1 to 150)?? Also, I read the RF documentation and I was wondering what its the meaning of marginal effect of a variable Thanks for your help Carlos I found really interesting -- Carlos M. Zambrana-Torrelio Department of Biology University of Puerto Rico - RP PO BOX 23360 San Juan, PR 00931-3360 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- Carlos M. Zambrana-Torrelio Department of Biology University of Puerto Rico - RP PO BOX 23360 San Juan, PR 00931-3360 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
Hi Clayton, I don't think you need summary(). TukeyHSD(data1.aov) should work. -Ista On Tue, Oct 20, 2009 at 4:17 PM, Clayton Coffman clayton.coff...@gmail.com wrote: I can prove I've done this before, but I recently installed Rexcel (and it was easiest to reinstall R and some other bits to make it work) and now its no longer working. Before I would do an ANOVA and a tukey post-hoc like this: data1.aov=aov(result~factor1*factor2, data=data1) then... TukeyHSD(summary(data1.aov)) and it would give me a nice tukey table of all the pairwise comparisons, now though it gives me: in UseMethod(TukeyHSD) : no applicable method for TukeyHSD Has something changed? Does TukeyHSD no longer accept aov results? Is there a package I'm missing? I am very frustrated. Thanks, Clayton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted Logistic Regressions using svyglm
On Mon, 19 Oct 2009, Fulton wrote: I’m running some logistic regressions and I’ve been trying to include weights in the equation. However, when I run the model, I get this warning message: Here’s what it says: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! I think it is because the weights are non-integer values. Yes What is a good way to run logistic regressions in R when using non-integer weights? You should use family=quasibinomial(). For svyglm() this is identical to binomial() except that it doesn't warn about non-integer weights. I'll update the documentation for svyglm() to make this explicit. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transparent Bands in R
Hello All, My question is regarding the attached plot. I would like to have multiple transparent green bands running the length (yaxis) of the plot the width of which is determined by the green lines at y=0 in the plot. Can you suggest a way to do it? For those who can't or are unwilling to download the file the plot is at http://www.twitpic.com/ma8w0 Thanks! http://www.nabble.com/file/p25983169/foo.ps foo.ps -- View this message in context: http://www.nabble.com/Transparent-Bands-in-R-tp25983169p25983169.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] editors for R
Hello all, After reviewing the IDE/Script Editors article at sciviews.org, I wanted to pose a quick question here to see if anyone can offer an opinion or commentary about GUI editors that can be installed in a Windoze environment that allow editing/saving of remote .R files and running R programs from within a shell that is housed in the editor (although R itself is installed in a Linux environment). Windoze would essentially be providing a GUI-based tool to edit, save, and execute without making the user copy files back and forth and switch between various programs to execute their routines. Thus far, BlueFish seems to come closest to this; but other recommendations will be most appreciated. Best, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] editors for R
ESS does everything you requested. ess.r-project.org ess-remote is the specific feature that allows you to run a program on another computer from within a buffer on the local computer. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
I've tried that as well, and I get an error like: TukeyHSD(phenolic.aov) Error in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep' In addition: Warning messages: 1: In replications(paste(~, xx), data = mf) : non-factors ignored: time 2: In replications(paste(~, xx), data = mf) : non-factors ignored: sink.status, time 3: In replications(paste(~, xx), data = mf) : non-factors ignored: time, cat 4: In replications(paste(~, xx), data = mf) : non-factors ignored: sink.status, time, cat however, I think its a problem with my data, something about how Rexcel imported it, becuase when I do it on the example data in the HSAUR package, it works I'll try importing the data manually. On Tue, Oct 20, 2009 at 3:33 PM, Ista Zahn istaz...@gmail.com wrote: Hi Clayton, I don't think you need summary(). TukeyHSD(data1.aov) should work. -Ista On Tue, Oct 20, 2009 at 4:17 PM, Clayton Coffman clayton.coff...@gmail.comhttps://mail.google.com/mail?view=cmtf=0to=clayton.coff...@gmail.com wrote: I can prove I've done this before, but I recently installed Rexcel (and it was easiest to reinstall R and some other bits to make it work) and now its no longer working. Before I would do an ANOVA and a tukey post-hoc like this: data1.aov=aov(result~factor1*factor2, data=data1) then... TukeyHSD(summary(data1.aov)) and it would give me a nice tukey table of all the pairwise comparisons, now though it gives me: in UseMethod(TukeyHSD) : no applicable method for TukeyHSD Has something changed? Does TukeyHSD no longer accept aov results? Is there a package I'm missing? I am very frustrated. Thanks, Clayton [[alternative HTML version deleted]] __ R-help@r-project.orghttps://mail.google.com/mail?view=cmtf=0to=r-h...@r-project.orgmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transparent Bands in R
Hi, Maybe use polygon() ? Good luck, Megha. 2009/10/20 mnstn pavan.n...@gmail.com: Hello All, My question is regarding the attached plot. I would like to have multiple transparent green bands running the length (yaxis) of the plot the width of which is determined by the green lines at y=0 in the plot. Can you suggest a way to do it? For those who can't or are unwilling to download the file the plot is at http://www.twitpic.com/ma8w0 Thanks! http://www.nabble.com/file/p25983169/foo.ps foo.ps -- View this message in context: http://www.nabble.com/Transparent-Bands-in-R-tp25983169p25983169.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dummy variables or factors?
On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote: Dear R-people, I am analyzing epidemiological data using GLMM using the lmer package. I usually explore the assumption of linearity of continuous variables in the logit of the outcome by creating 4 categories of the variable, performing a bivariate logistic regression, and then plotting the coefficients of each category against their mid points. That gives me a pretty good idea about the linearity assumption and possible departures from it. I know of people who create 0,1 dummy variables in order to relax the linearity assumption. However, I've read that dummy variables are never needed (nor are desireble) in R! Instead, one should make use of factors variable. That is much easier to work with than dummy variables and the model itself will create the necessary dummy variables. Having said that, if my data violates the linearity assumption, does the use of a factors for the variable in question helps overcome the lack of linearity? No. If done by dividing into samall numbers of categories after looking at the data, it merely creates other (and probably more severe) problems. If you are in the unusal (although desirable) position of having a large number of events across the range of the covariates in your data, you may be able to cut your variable into quintiles or deciles and analyze the resulting factor, but the preferred approach would be to fit a regression spline of sufficient complexity. Thanks in advance. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
Problem solved: Lesson learned (I think): TukeyHSD doesn't like it when you use numbers as names for the factors. ie factor time cannot be 24 and 48 but twenty-four and forty-eight work fine. On Tue, Oct 20, 2009 at 5:36 PM, Clayton Coffman clayton.coff...@gmail.comwrote: I've tried that as well, and I get an error like: TukeyHSD(phenolic.aov) Error in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep' In addition: Warning messages: 1: In replications(paste(~, xx), data = mf) : non-factors ignored: time 2: In replications(paste(~, xx), data = mf) : non-factors ignored: sink.status, time 3: In replications(paste(~, xx), data = mf) : non-factors ignored: time, cat 4: In replications(paste(~, xx), data = mf) : non-factors ignored: sink.status, time, cat however, I think its a problem with my data, something about how Rexcel imported it, becuase when I do it on the example data in the HSAUR package, it works I'll try importing the data manually. On Tue, Oct 20, 2009 at 3:33 PM, Ista Zahn istaz...@gmail.comhttps://mail.google.com/mail?view=cmtf=0to=istaz...@gmail.com wrote: Hi Clayton, I don't think you need summary(). TukeyHSD(data1.aov) should work. -Ista On Tue, Oct 20, 2009 at 4:17 PM, Clayton Coffman clayton.coff...@gmail.comhttps://mail.google.com/mail?view=cmtf=0to=clayton.coff...@gmail.com wrote: I can prove I've done this before, but I recently installed Rexcel (and it was easiest to reinstall R and some other bits to make it work) and now its no longer working. Before I would do an ANOVA and a tukey post-hoc like this: data1.aov=aov(result~factor1*factor2, data=data1) then... TukeyHSD(summary(data1.aov)) and it would give me a nice tukey table of all the pairwise comparisons, now though it gives me: in UseMethod(TukeyHSD) : no applicable method for TukeyHSD Has something changed? Does TukeyHSD no longer accept aov results? Is there a package I'm missing? I am very frustrated. Thanks, Clayton [[alternative HTML version deleted]] __ R-help@r-project.orghttps://mail.google.com/mail?view=cmtf=0to=r-h...@r-project.orgmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
I think its a problem with my data, something about how Rexcel imported it We don't have enough information to be sure. My guess is that your data in Excel is integers which are intended to be levels of a factor. Excel doesn't distinguish between integers and integers that might be factor levels. This is an Excel issue, not an RExcel issue. When you sent your data to R outside of RExcel, what did you do to force the levels to be interpreted as factors. Something similar must be done when you send the data to R from within RExcel. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
interesting to know, that might explain it. When I imported I just saved as CSV and imported with read.csv, I've never done anything to specify the integers as factors (this is the first time I've used numbers as names). Here are the precise commands I use(d): phen=read.csv(file=phenolics.csv) phen.aov=aov(phenolics~sink*time*cat, data=phen) summary(phen.aov) Df Sum Sq Mean Sq F valuePr(F) sink2 1817.79 908.89 59.5934 2.2e-16 *** time1 19.07 19.07 1.2501 0.265039 cat 1 125.95 125.95 8.2581 0.004548 ** sink:time 2 12.026.01 0.3942 0.674834 sink:cat2 305.68 152.84 10.0213 7.493e-05 *** time:cat1 106.53 106.53 6.9849 0.008949 ** sink:time:cat 24.222.11 0.1384 0.870867 Residuals 179 2730.03 15.25 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 1 observation deleted due to missingness TukeyHSD(phen.aov) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = phenolics ~ sink * time * cat, data = phen) $sink diff lwr upr p adj Cut-Bolted -0.3491591 -1.980764 1.282446 0.8686316 Veg-Bolted 6.3800257 4.741959 8.018092 0.000 Veg-Cut 6.7291848 5.091118 8.367252 0.000 $time ...etc. I can see how the data format of the cell in Excel might confuse R if you imported it as an .xls file, or used Rexcel, thats avoided by using .csv. Whats interesting is that the aov function has no problem determining the levels of the factor be they integers or text, TukeyHSD doesn't like it though. When I did the summary(aov) with the data formatted such that it would not go into TukeyHSD it was imported with read.csv, but it had numbers/integers as the factor levels. TukeyHSD didn't work. When I reformatted those numbers to text (ie 24 to twenty-four), TukeyHSD worked fine. I assume it has something to do with how aov and TukeyHSD determine factor levels, aov may see anything in the specified factor-vectors as characters, whereas TukeyHSD will read integers as integers and get confused. On Tue, Oct 20, 2009 at 5:50 PM, Richard M. Heiberger r...@temple.eduwrote: I think its a problem with my data, something about how Rexcel imported it We don't have enough information to be sure. My guess is that your data in Excel is integers which are intended to be levels of a factor. Excel doesn't distinguish between integers and integers that might be factor levels. This is an Excel issue, not an RExcel issue. When you sent your data to R outside of RExcel, what did you do to force the levels to be interpreted as factors. Something similar must be done when you send the data to R from within RExcel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
Now I am worried that you have a wrong analysis. the aov function is perfectly happy using either factors or numeric variables. Are there really only two levels of time, which is what one degree of freedom for time suggests? Or are there more than two level, but since aov() sees that as a numeric variable it has only one df. csv files can interfere with the interpretation of data values, but not likely in this case. They too will have numbers interpreted as numeric. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] List of Windows time zones?
Hello, I would like to adapt the following code: data$datetime3-format(data$datetime2, tz=EST) for different time zones. For example, North America's CST and MDT (but those codes don't work). I have read ?Sys.timezone but I'm afraid it isn't very helpful. I'm just looking for a list of the timezones, not a history of how time zones have been dealt with throughout R's history. Previously asked messages (and replies) on R-help provide confusing and sometimes contradictory advice. Is there a simple list of timezones, for R on Windows? Or, can they be calculated on the fly (but e.g., GMT+6 does not work in the above code). Any help would be much appreciated as I am getting a bit frustrated, thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ScatterPlot
Here's an example: x - runif(100) sex - sample(0:1, 100, replace = TRUE) y - sex + 2*x + x*sex + rnorm(100) dfr - data.frame(y, x, sex) f1 - lm(y ~ x, data = dfr, subset = sex == 0) f2 - lm(y ~ x, data = dfr, subset = sex == 1) plot(x, y, pch = sex+1) abline(f1, col = 2, lty = 2) abline(f2, col = 3, lty = 3) legend('topleft', c('male', 'female'), lty = 2:3, col = 2:3) For some fancier plots see the ggplot2 examples here: http://had.co.nz/ggplot2/stat_smooth.html hth, Kingsford Jones On Tue, Oct 20, 2009 at 1:20 PM, Marsha Melnyk mmel...@stevens.edu wrote: I am trying to make a scatterplot with containing three columns. I know how to plot the two columns but now the third column consists of M or F (male or female) and I don't know how to separate the data so I can make two separate regression lines on the same graph. meta #name of file plot(meta$mass, meta$rate, xlab=mass, ylab=rate) I can make the regression line with all the data but I just don't know how to split everything up. abline(lm(rate ~ mass, data=meta), col=red) Thanks, Marsha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
There are two factors of time, but they are evenly replicated across all the other factors/levels. The experiment is perfectly balanced except for one lost sample, which is deleted automatically in the aov. I am very certain the analysis is correct. I think its merely a discrepancy between how aov and TukeyHSD group the levels of the factor variable. Probably aov handles any vector specified as a character vector, whereas TukeyHSD merely assumes its a character vector. It might be something to refine in the stats package so the functions, which are designed somewhat to be used together, can work on the same kinds of data. On Tue, Oct 20, 2009 at 6:13 PM, Richard M. Heiberger r...@temple.eduwrote: Now I am worried that you have a wrong analysis. the aov function is perfectly happy using either factors or numeric variables. Are there really only two levels of time, which is what one degree of freedom for time suggests? Or are there more than two level, but since aov() sees that as a numeric variable it has only one df. csv files can interfere with the interpretation of data values, but not likely in this case. They too will have numbers interpreted as numeric. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlSave does not append the records to a DB2 table
Well I do not know what could be happening. I tried to append records to a table on DB2 (community Edition 9.5 on a Linux machine). And it works greatly. The only odd thing (may be because of my low skills on DB2) is I could not create neither append to a table on a different schema than dbuser (actually my account was dbuser). My R version is 2.7.1 under another Linux PC and RODBC is 1.2-3. Caveman On Tue, Oct 20, 2009 at 8:45 PM, Elaine Jones jon...@us.ibm.com wrote: The query generated by R looks ok to me. I went ahead and pasted it into my DB2 Client command editor. And I got the message confirmation message. INSERT INTO STORAGE.TEST_APPEND2 ( MACRONAME, MACROUSER, MACRO_RT ) VALUES ( 's_ej_mach_config_vz', 'jones2', 5 ) DB2I The SQL command completed successfully. And I confirmed the record was added to the table. Elaine McGovern Jones ISC Tape and DASD Storage Products Characterization and Failure Analysis Engineering Phone: 408 705-9588 Internal tieline: 587-9588 jon...@us.ibm.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dummy variables or factors?
The following is *significantly* easier to do than try and add in dummy variables, although the dummy variable approach is going to give you exactly the same answer as the factor method, but possibly with a different baseline. Basically, you might want to search the lm help and possibly consult a stats book on information about how the design matrix is constructed in both cases. xF - factor(1:10) N - 1000 xFs - sample(x=xF,N,replace = T) yFs - rnorm(N, mean = as.numeric(xFs)) lm(yFs ~ xFs) Call: lm(formula = yFs ~ xFs) Coefficients: (Intercept) xFs2 xFs3 xFs4 xFs5 xFs6 xFs7 xFs8 0.7845 1.1620 2.1474 3.1391 4.2183 5.2621 6.0814 7.4170 xFs9xFs10 8.2193 9.2987 lm(yFs ~ diag(10)[,1:9][xFs,]) Call: lm(formula = yFs ~ diag(10)[, 1:9][xFs, ]) Coefficients: (Intercept) diag(10)[, 1:9][xFs, ]1 diag(10)[, 1:9] [xFs, ]2 diag(10)[, 1:9][xFs, ]3 10.083 -9.299 -8.137 -7.151 diag(10)[, 1:9][xFs, ]4 diag(10)[, 1:9][xFs, ]5 diag(10)[, 1:9] [xFs, ]6 diag(10)[, 1:9][xFs, ]7 -6.160 -5.080 -4.037 -3.217 diag(10)[, 1:9][xFs, ]8 diag(10)[, 1:9][xFs, ]9 -1.882 -1.079 On Oct 21, 9:44 am, David Winsemius dwinsem...@comcast.net wrote: On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote: Dear R-people, I am analyzing epidemiological data using GLMM using the lmer package. I usually explore the assumption of linearity of continuous variables in the logit of the outcome by creating 4 categories of the variable, performing a bivariate logistic regression, and then plotting the coefficients of each category against their mid points. That gives me a pretty good idea about the linearity assumption and possible departures from it. I know of people who create 0,1 dummy variables in order to relax the linearity assumption. However, I've read that dummy variables are never needed (nor are desireble) in R! Instead, one should make use of factors variable. That is much easier to work with than dummy variables and the model itself will create the necessary dummy variables. Having said that, if my data violates the linearity assumption, does the use of a factors for the variable in question helps overcome the lack of linearity? No. If done by dividing into samall numbers of categories after looking at the data, it merely creates other (and probably more severe) problems. If you are in the unusal (although desirable) position of having a large number of events across the range of the covariates in your data, you may be able to cut your variable into quintiles or deciles and analyze the resulting factor, but the preferred approach would be to fit a regression spline of sufficient complexity. Thanks in advance. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dummy variables or factors?
Oh dear, that doesn't look right at all. I shall have a think about what I did wrong and maybe follow my own advice and consult the doco myself! On Oct 21, 2:45 pm, andrew andrewjohnro...@gmail.com wrote: The following is *significantly* easier to do than try and add in dummy variables, although the dummy variable approach is going to give you exactly the same answer as the factor method, but possibly with a different baseline. Basically, you might want to search the lm help and possibly consult a stats book on information about how the design matrix is constructed in both cases. xF - factor(1:10) N - 1000 xFs - sample(x=xF,N,replace = T) yFs - rnorm(N, mean = as.numeric(xFs)) lm(yFs ~ xFs) Call: lm(formula = yFs ~ xFs) Coefficients: (Intercept) xFs2 xFs3 xFs4 xFs5 xFs6 xFs7 xFs8 0.7845 1.1620 2.1474 3.1391 4.2183 5.2621 6.0814 7.4170 xFs9 xFs10 8.2193 9.2987 lm(yFs ~ diag(10)[,1:9][xFs,]) Call: lm(formula = yFs ~ diag(10)[, 1:9][xFs, ]) Coefficients: (Intercept) diag(10)[, 1:9][xFs, ]1 diag(10)[, 1:9] [xFs, ]2 diag(10)[, 1:9][xFs, ]3 10.083 -9.299 -8.137 -7.151 diag(10)[, 1:9][xFs, ]4 diag(10)[, 1:9][xFs, ]5 diag(10)[, 1:9] [xFs, ]6 diag(10)[, 1:9][xFs, ]7 -6.160 -5.080 -4.037 -3.217 diag(10)[, 1:9][xFs, ]8 diag(10)[, 1:9][xFs, ]9 -1.882 -1.079 On Oct 21, 9:44 am, David Winsemius dwinsem...@comcast.net wrote: On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote: Dear R-people, I am analyzing epidemiological data using GLMM using the lmer package. I usually explore the assumption of linearity of continuous variables in the logit of the outcome by creating 4 categories of the variable, performing a bivariate logistic regression, and then plotting the coefficients of each category against their mid points. That gives me a pretty good idea about the linearity assumption and possible departures from it. I know of people who create 0,1 dummy variables in order to relax the linearity assumption. However, I've read that dummy variables are never needed (nor are desireble) in R! Instead, one should make use of factors variable. That is much easier to work with than dummy variables and the model itself will create the necessary dummy variables. Having said that, if my data violates the linearity assumption, does the use of a factors for the variable in question helps overcome the lack of linearity? No. If done by dividing into samall numbers of categories after looking at the data, it merely creates other (and probably more severe) problems. If you are in the unusal (although desirable) position of having a large number of events across the range of the covariates in your data, you may be able to cut your variable into quintiles or deciles and analyze the resulting factor, but the preferred approach would be to fit a regression spline of sufficient complexity. Thanks in advance. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dummy variables or factors?
Sorry for this third posting - the second method is the same as the first after all: the coefficients of the first linear model *is* a linear transformation of the second. Just got confused with the pasting, tis all. On Oct 21, 2:51 pm, andrew andrewjohnro...@gmail.com wrote: Oh dear, that doesn't look right at all. I shall have a think about what I did wrong and maybe follow my own advice and consult the doco myself! On Oct 21, 2:45 pm, andrew andrewjohnro...@gmail.com wrote: The following is *significantly* easier to do than try and add in dummy variables, although the dummy variable approach is going to give you exactly the same answer as the factor method, but possibly with a different baseline. Basically, you might want to search the lm help and possibly consult a stats book on information about how the design matrix is constructed in both cases. xF - factor(1:10) N - 1000 xFs - sample(x=xF,N,replace = T) yFs - rnorm(N, mean = as.numeric(xFs)) lm(yFs ~ xFs) Call: lm(formula = yFs ~ xFs) Coefficients: (Intercept) xFs2 xFs3 xFs4 xFs5 xFs6 xFs7 xFs8 0.7845 1.1620 2.1474 3.1391 4.2183 5.2621 6.0814 7.4170 xFs9 xFs10 8.2193 9.2987 lm(yFs ~ diag(10)[,1:9][xFs,]) Call: lm(formula = yFs ~ diag(10)[, 1:9][xFs, ]) Coefficients: (Intercept) diag(10)[, 1:9][xFs, ]1 diag(10)[, 1:9] [xFs, ]2 diag(10)[, 1:9][xFs, ]3 10.083 -9.299 -8.137 -7.151 diag(10)[, 1:9][xFs, ]4 diag(10)[, 1:9][xFs, ]5 diag(10)[, 1:9] [xFs, ]6 diag(10)[, 1:9][xFs, ]7 -6.160 -5.080 -4.037 -3.217 diag(10)[, 1:9][xFs, ]8 diag(10)[, 1:9][xFs, ]9 -1.882 -1.079 On Oct 21, 9:44 am, David Winsemius dwinsem...@comcast.net wrote: On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote: Dear R-people, I am analyzing epidemiological data using GLMM using the lmer package. I usually explore the assumption of linearity of continuous variables in the logit of the outcome by creating 4 categories of the variable, performing a bivariate logistic regression, and then plotting the coefficients of each category against their mid points. That gives me a pretty good idea about the linearity assumption and possible departures from it. I know of people who create 0,1 dummy variables in order to relax the linearity assumption. However, I've read that dummy variables are never needed (nor are desireble) in R! Instead, one should make use of factors variable. That is much easier to work with than dummy variables and the model itself will create the necessary dummy variables. Having said that, if my data violates the linearity assumption, does the use of a factors for the variable in question helps overcome the lack of linearity? No. If done by dividing into samall numbers of categories after looking at the data, it merely creates other (and probably more severe) problems. If you are in the unusal (although desirable) position of having a large number of events across the range of the covariates in your data, you may be able to cut your variable into quintiles or deciles and analyze the resulting factor, but the preferred approach would be to fit a regression spline of sufficient complexity. Thanks in advance. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD no longer working with aov output?
I think that Richard may be correct. In my experience it's always a bad idea to use numerical labels for factors, especially when importing data from Excel. But why not do a str() on your data to see whether R thinks that time is a factor or not? And if not, why not convert time to factor before passing it to aov()? I'm a bit uneasy over your use of the phrase there are two factors of time; that's not how most stats people would use the terminology and it may indicate a misunderstanding. You probably mean that there are two levels of the factor 'time'. Anyway, do str(your_data) and see what it says. -Peter Ehlers Clayton Coffman wrote: There are two factors of time, but they are evenly replicated across all the other factors/levels. The experiment is perfectly balanced except for one lost sample, which is deleted automatically in the aov. I am very certain the analysis is correct. I think its merely a discrepancy between how aov and TukeyHSD group the levels of the factor variable. Probably aov handles any vector specified as a character vector, whereas TukeyHSD merely assumes its a character vector. It might be something to refine in the stats package so the functions, which are designed somewhat to be used together, can work on the same kinds of data. On Tue, Oct 20, 2009 at 6:13 PM, Richard M. Heiberger r...@temple.eduwrote: Now I am worried that you have a wrong analysis. the aov function is perfectly happy using either factors or numeric variables. Are there really only two levels of time, which is what one degree of freedom for time suggests? Or are there more than two level, but since aov() sees that as a numeric variable it has only one df. csv files can interfere with the interpretation of data values, but not likely in this case. They too will have numbers interpreted as numeric. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.