[R] Bubble chart
I have a matrix of p-values for for each explanatory variable. Each row is an area of the response variable and each column is an explanatory variable. e.g. PSA pval_DOY pval_PDSIconcurrent pval_PDSIantecedent_annual_average pval_TMAXanomaly pval_FM100anomaly 1 NC06 0.96747495 0.6092668 0.53353019 0.93011150.99801334 2 NC04 0.04699659 0.2759152 0.07024752 0.60468280.03224094 3 NC01 0.71437394 0.9979173 0.85296024 0.99775580.99833623 4 NC08 0.67315904 0.9970511 0.51756714 0.78099940.99626038 5 NC07 0.55221280 0.5784208 0.43975219 0.36694910.34898877 6 NC05 0.52089881 0.7191645 0.91972153 0.44874600.94922430 I want to create a visual display such that instead of #s for the p-values, I have a circle sized to represent the size of the p-value. symbolys() is what I've found to do this, but it isn't clear to me how to format the input for the function. Any suggestions? or help? Even a link if someone else has asked the question and it's been answered would be helpful. -- View this message in context: http://r.789695.n4.nabble.com/Bubble-chart-tp4494539p4494539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automaticall adjust axis scales
On 03/22/2012 05:42 AM, Alaios wrote: Would it be possible to change the axis at the end? My data sets seem to be quite large so I was thinking for the plot and the consequent lines to keep always the current minimum and maximum like plot(x) minimum=min(x) lines(x1) minimum=c(minimum,x1) lines(x2) minimum=c(minimum,x2) then if there is a good way for altering axis lines I could put the limits I want to. Hi Alex, One way or another you are going to have to set the limits in your first call to "plot". You could do the plot first and collect the limits along the way, then do it again, but I can't see that this has any advantage over calculating the limits for the entire set of data that is to be plotted at the beginning and then doing the plots. If you want to get the limits for each data set separately, maybe: range1<-range(x1) range2<-range(x2) range3<-range(x3) ylim<-range(c(range1,range2,range3)) However, I suspect that this is of more use in helping you understand what is happening than actually getting it done. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Any package recommended for time series graphics in R?
Hello, I would like to produce a few time series graphs with R. My data is usually quarterly or monthly. I sometimes need to have two y-axes, one at the left, the other right. Is there any toolbox producing nice looking time series graphs? Thanks, miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calling java from R and using java time series double precision array
I haven't had time to try using R for over a year, but have a colleage who wants to. We work with time series and our current version of our calendar-time subroutines in java converts both directions between linear time and calendar. We have used calendar time since year 1965 starting out then with Fortran. Calendar time can be CnYrMoDa | CnYrMoDaHr | CnYrMoDaHrMn | CnYrMoDaHrMnSc | CnYrMoDaHrMnScD | CnYrMoDaHrMnScDC | CnYrMoDaHrMnScDCM | CnYrMoDaHrMnScDCMQ. We use calendar time in string format and linear time numbers in double precision. Thus we can get to calendar quadiseconds (tenths of milliseconds), and we must always have thoroughly tested subroutines. Unit of linear time can be converted to: da | hr | mn | sc | ds | cs | ms | qs . Limiting this discussion to reading time series data into R, I believe I want to have R call a java method that reads a time series into a two dimensional array with the first column being the linear time and the other columns being the various time series data (functions) at those times. I am not sure that the double precision java array can be used by R. And I would appreciate any other comments about this kind of interface with java. -- View this message in context: http://r.789695.n4.nabble.com/calling-java-from-R-and-using-java-time-series-double-precision-array-tp4494581p4494581.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read wiff extension files into R
Does anyone know how to read or import a wiff (AB SCIEX windows interchange file format) extension into R? Thanks a lot. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqrt(-x) vs. -x^0.5
Mike Williamson writes: > Thanks Sarah, All, > > I guess I never thought of a negative sign as an "operation", but > knowing that it is considered an operation explains everything nicely. > Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"? Not exactly. Here is the list of constituent elements of expressions for -1 and 0 - 1 > as.list(quote(-4)) [[1]] `-` [[2]] [1] 4 > as.list(quote(0-4)) [[1]] `-` [[2]] [1] 0 [[3]] [1] 4 The first case is unary, the second is binary. Exercise: See ?Syntax. Write the result of this line: lapply( as.list( quote(-1-4) ), as.list ) and (thereby) explain why the result of -1-4 isn't 3. HTH, Chuck [rest deleted] -- Charles C. BerryDept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summary values from Glm function (rms package)
Dear fellow R-users, I’m using the Glm function (gamma family of distributions) from the rms package to compare 2 groups on costs data. Although the summary function does provide the mean cost difference and standard errors, I believe these values were in the (natural) log ratio format. Is there a function to express these values into the original scale of the response variable (i.e., dollars) such that I could obtain the mean adjusted cost difference (and the 95%CI)? Many thanks to everyone in advance! YH -- View this message in context: http://r.789695.n4.nabble.com/Summary-values-from-Glm-function-rms-package-tp4494458p4494458.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict () for LDA and GLM
Hi! I'm using GLM, LDA and NaiveBayes for binomial classification. My training set is 70 rows long with 32 features, and my test set is 30 rows long with 32 features. Using Naive Bayes, I can train a model, and then predict the test set with it like so: ass4q1.dLDA = lda(ass4q1.trainSet[,1]~ass4q1.trainSet[,2:3]) table(predict(ass4q1.dNB, ass4q1.testSetDF[,2:3]), ass4q1.testSetDF[,1]) However, when the same is done for LDA or GLM, suddenly it tells me that the number of rows doesn't match and doesn't predict my test data. The error for GLM, as an example, is: Error in table(predict(ass4q1.dGLM, ass4q1.testSetDF[, 2:3]), ass4q1.testSetDF[, : all arguments must have the same length In addition: Warning message: 'newdata' had 30 rows but variable(s) found have 70 rows What am I missing? -- View this message in context: http://r.789695.n4.nabble.com/predict-for-LDA-and-GLM-tp4494381p4494381.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Macro or Loop info/help needed
I'm very new to R having recently made the transition from SPSS and SAS. In a dataset named t4, I have about 20 variables that are named in a somewhat chronological order - (e.g., q100ayr, q101ayr, q102ayr, q103ayr, etc.) Each variable contains a 2 digit year designation (e.g., 73, 74,75, 76, etc), which corresponds to the year something occurred (e.g., 73=1973). Each of the 20 variables is similarly designated, but corresponds to a different type of event/occurrence. I need to reorder the data in this fashion: # first event t4 $ eventX1973 <- 0 t4$ eventX1973[t4 $ q100ayr== 73 ] <- 1 t4 $ eventX1974 <- 0 t4$ eventX1974[t4 $ q100ayr== 74 ] <- 1 t4 $ eventX1975 <- 0 t4$ eventX1975[t4 $ q100ayr== 75 ] <- 1 etc. for years between 73 and 88 # second event t4 $ eventZ1973 <- 0 t4$ eventZ1973[t4 $ q101ayr== 73 ] <- 1 t4 $ eventZ1974 <- 0 t4$ eventZ1974[t4 $ q101ayr== 74 ] <- 1 t4 $ eventZ1975 <- 0 t4$ eventZ1975[t4 $ q101ayr== 75 ] <- 1 etc. The code above will work, but would be many lines long. I'm hoping someone can give me a quick introduction to what would work best in R. I assume some type of macro or loop. Thanks in advance. Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshape from long to wide
Thanks a lot, I tried one of the ways you guys showed me and it totally work. Just for fun, I tried all the others and with some modifications here and there they work fine too. It was time consuming but definitely worth as a good learning experience. Thanks again -- View this message in context: http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-tp4486875p4494055.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] To overlay my raster and its boundary
Hi I want to overlay my raster and its boundary which is a shapefile. When I used thise code separately, all is ok: # Open raster >Image<-read.table("C:\\Users\\Documents\\Nouveau\\Frequence.txt",sep="",dec=",",header=TRUE) > >testo<-rasterFromXYZ(Image) >plot(testo) >testo2 <- aggregate(testo,fact=10, fun=mean) >plot(testo2) # open shapefiles= boundary >library (rgdal) >test1<-readOGR(dsn="C:\\Users\\Documents\\DISC D\\Nouveau",layer="pays") >plot(test1) But How to overlay both layers (raster and shapefile) to have a same map?. Thank you in advance -- View this message in context: http://r.789695.n4.nabble.com/To-overlay-my-raster-and-its-boundary-tp4493705p4493705.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summary values from Glm function (rms package)
Dear fellow R-users, I’m using the Glm function (gamma family of distributions) from the rms package to compare 2 groups on costs data. Although the summary function does provide the mean cost difference and standard errors, I believe these values were in the (natural) log ratio format. Is there a function to express these values into the original scale of the response variable (i.e., dollars) such that I could obtain the mean adjusted cost difference (and the 95%CI)? Many thanks to everyone in advance! YH -- View this message in context: http://r.789695.n4.nabble.com/Summary-values-from-Glm-function-rms-package-tp4494456p4494456.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqrt(-x) vs. -x^0.5
On Wed, Mar 21, 2012 at 5:21 PM, Mike Williamson wrote: > Thanks Sarah, All, > > I guess I never thought of a negative sign as an "operation", but > knowing that it is considered an operation explains everything nicely. > Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"? Sheesh! And why do you suppose that?! Read up on floating point arithmetic to find out how it is done. -- Bert > Either way, I'm glad it is consistent & accurate, so that I didn't find > myself in another pickle like "weak" typing and attempts to use time /date > classes in 'R' have brought me. > > Thanks! > Mike > > > --- > [The theory of gravity] is to me so great an absurdity that I believe no > Man who has in philosophical matters a competent faculty of thinking can > ever fall into it. -- Isaac Newton > > > > On Wed, Mar 21, 2012 at 4:37 PM, Rolf Turner wrote: > >> On 22/03/12 12:17, Sarah Goslee wrote: >> >>> It's order of operations, and a good reason to always use >>> parentheses: which is evaluated first, the unary minus or >>> the raising-to-powers? >>> >>> (-4)^0.5 >>> -(4^0.5) >>> >>> sqrt(-4) >>> -sqrt(4) >>> >> >> If the OP *really* wants the square root of -4 he could do >> sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case). >> >> cheers, >> >> Rolf Turner >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqrt(-x) vs. -x^0.5
Thanks Sarah, All, I guess I never thought of a negative sign as an "operation", but knowing that it is considered an operation explains everything nicely. Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"? Either way, I'm glad it is consistent & accurate, so that I didn't find myself in another pickle like "weak" typing and attempts to use time /date classes in 'R' have brought me. Thanks! Mike --- [The theory of gravity] is to me so great an absurdity that I believe no Man who has in philosophical matters a competent faculty of thinking can ever fall into it. -- Isaac Newton On Wed, Mar 21, 2012 at 4:37 PM, Rolf Turner wrote: > On 22/03/12 12:17, Sarah Goslee wrote: > >> It's order of operations, and a good reason to always use >> parentheses: which is evaluated first, the unary minus or >> the raising-to-powers? >> >> (-4)^0.5 >> -(4^0.5) >> >> sqrt(-4) >> -sqrt(4) >> > > If the OP *really* wants the square root of -4 he could do > sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case). > >cheers, > >Rolf Turner > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqrt(-x) vs. -x^0.5
On 22/03/12 12:17, Sarah Goslee wrote: It's order of operations, and a good reason to always use parentheses: which is evaluated first, the unary minus or the raising-to-powers? (-4)^0.5 -(4^0.5) sqrt(-4) -sqrt(4) If the OP *really* wants the square root of -4 he could do sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case). cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqrt(-x) vs. -x^0.5
It's order of operations, and a good reason to always use parentheses: which is evaluated first, the unary minus or the raising-to-powers? (-4)^0.5 -(4^0.5) sqrt(-4) -sqrt(4) Sarah On Wed, Mar 21, 2012 at 7:09 PM, Mike Williamson wrote: > Hi Everyone, > > I did a search through the archives and did not find an answer, > although I must admit it is a hard search to do ( ^0.5 is tough to > explicitly search for ). > > I am sure there is some mathematically accurate reason to explain the > following, but I guess I either never learned it or have since forgotten it. > > In 'R', when I type (for instance): > > sqrt(-4) > > I get > > NaN > > but if I type in: > > -4 ^ 0.5 > > I get > > -2 > > I presume this is on purpose and correct, but it's the first time I've > come across any theoretical difference between ^0.5 and sqrt. What is the > theoretical difference / meaning between these two operations? > > Thanks! > Mike > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqrt(-x) vs. -x^0.5
Hi Everyone, I did a search through the archives and did not find an answer, although I must admit it is a hard search to do ( ^0.5 is tough to explicitly search for ). I am sure there is some mathematically accurate reason to explain the following, but I guess I either never learned it or have since forgotten it. In 'R', when I type (for instance): sqrt(-4) I get NaN but if I type in: -4 ^ 0.5 I get -2 I presume this is on purpose and correct, but it's the first time I've come across any theoretical difference between ^0.5 and sqrt. What is the theoretical difference / meaning between these two operations? Thanks! Mike --- [The theory of gravity] is to me so great an absurdity that I believe no Man who has in philosophical matters a competent faculty of thinking can ever fall into it. -- Isaac Newton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using extract function for dates in sqldf
On Wed, Mar 21, 2012 at 11:31 AM, wrote: > > I'm trying to use sqldf to query for the earliest date of a blood test when > patients have had multiple tests in a given year. My query looks like this: > > test11 <- sqldf("select CHILD_ID, min(SAMP_DATE) > from lab > group by CHILD_ID > having extract (year from SAMP_DATE) = 2011") > > SAMP_DATE has class "date." I get the error message > > Error in sqliteExecStatement(con, statement, bind.data) : > RS-DBI driver: (error in statement: near "from": syntax error) extract is not supported by sqlite. Check the SQLite date and time functions link on the left side of the sqldf home page. (The other three databases supported by sqldf all do support extract.) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble installing the XML package
The first thing to do is update your rather old version of R. If the 2.14 binaries aren't available for your your version of Ubuntu in the main repo, there are instructions here: http://cran.r-project.org/bin/linux/ubuntu/ If that still doesn't work, please provide the list with details of your system and R installation. Sarah On Wed, Mar 21, 2012 at 3:34 PM, Felix Jarman wrote: > Hello everyone, > > I am probably not the only one having trouble with this package but here goes. > I want to install XML on Ubuntu. I installed libxml2-dev and > everything works out fine until I get the following: > > Error in reconcilePropertiesAndPrototype(name, slots, prototype, > superClasses, : > No definition was found for superclass "namedList" in the > specification of class "XMLOutputStream" > Error : unable to load R code in package 'XML' > ERROR: lazy loading failed for package ‘XML’ > > Googling this was no help. > > > This is the entire R console output: > >> install.packages("XML",dependencies=TRUE,configure.args=c("library","clean")) > Warnung in install.packages("XML", dependencies = TRUE, configure.args > = c("library", : > Argument 'lib' fehlt: nutze '/home/yeti/R/i486-pc-linux-gnu-library/2.10' > versuche URL 'http://ftp.osuosl.org/pub/cran/src/contrib/XML_3.9-4.tar.gz' > Content type 'application/x-gzip' length 923501 bytes (901 Kb) > URL geöffnet > == > downloaded 901 Kb > > * installing *source* package ‘XML’ ... > configure: WARNING: you should use --build, --host, --target > configure: WARNING: you should use --build, --host, --target > checking for library-gcc... no > checking for gcc... gcc > checking for C compiler default output file name... a.out > checking whether the C compiler works... yes > checking whether we are cross compiling... no > checking for suffix of executables... > checking for suffix of object files... o > checking whether we are using the GNU C compiler... yes > checking whether gcc accepts -g... yes > checking for gcc option to accept ISO C89... none needed > checking how to run the C preprocessor... gcc -E > No ability to remove finalizers on externalptr objects in this verison of R > checking for sed... /bin/sed > checking for pkg-config... /usr/bin/pkg-config > checking for xml2-config... /usr/bin/xml2-config > USE_XML2 = yes > SED_EXTENDED_ARG: -E > Minor 7, Patch 6 for 2.7.6 > Located parser file -I/usr/include/libxml2/parser.h > Checking for 1.8: -I/usr/include/libxml2 > Using libxml2.* > checking for gzopen in -lz... yes > checking for xmlParseFile in -lxml2... yes > checking for xmlHashSize in -lxml2... yes > Using built-in xmlHashSize > Checking DTD parsing (presence of externalSubset)... > checking for xmlHashSize in -lxml2... yes > Found xmlHashSize > checking for xmlOutputBufferCreateBuffer in -lxml2... yes > have xmlOutputBufferCreateBuffer() > checking for xmlDocDumpFormatMemoryEnc in -lxml2... yes > checking libxml/xmlversion.h usability... yes > checking libxml/xmlversion.h presence... yes > checking for libxml/xmlversion.h... yes > Expat: FALSE > Checking for return type of xmlHashScan element routine. > No return value for xmlHashScan > xmlNs has a context field > Checking for cetype_t enumeration > No cetype_t enumeration defined in R headers. > checking for xmlsec1-config... no > nodegc default > Version has XML_WITH_ZLIB > Version has xmlHasFeature() > > > Configuration information: > > Libxml settings > > libxml include directory: -I/usr/include/libxml2 > libxml library directory: -lxml2 -lz -lxml2 > libxml 2: -DLIBXML2=1 > > Compilation flags: -DLIBXML -I/usr/include/libxml2 > -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1 > -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 > -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1 > -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1 > -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1 > Link flags: -lxml2 -lz -lxml2 > > > configure: creating ./config.status > config.status: creating src/Makevars > config.status: creating R/supports.R > config.status: creating inst/scripts/RSXML.csh > config.status: creating inst/scripts/RSXML.bsh > ** libs > gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/usr/include/libxml2 > -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1 > -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 > -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1 > -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1 > -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1 -I. -DLIBXML2=1 -fpic > -g -O2 -c DocParse.c -o DocParse.o > In file included from DocParse.c:13: > Utils.h:230:2: warning: #warning "Redefining COPY_TO_USER_STRING to > use encoding from XML parser" > gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/us
[R] fwdmsa package: Error in search.normal(X[samp, ], verbose = FALSE) : At least one item has no variance
I'm using the fwdmsa package to identify deviant cases in a Mokken scale analysis. I've run into a problem., separate from the one I posted previously. The problem comes with items that are "easy" by IRT standards. A good scale should include a range of difficulties; yet when I include "easy" items in a forward search I continuously run into the problem that these items demonstrate no variability in the smaller subsamples selected for forward search. I can resolve the problem by only running the analysis without the easier items, but this is not ideal for for the analysis. The data are 150 responses to a 37 item test. For the first 15 items, they look like this (the variable names are cumbersome and so have been removed) : > head(by364.data) 111101011010111 1 211101111110011 1 301110010111011 0 411100111111101 1 511110101111011 0 611111111111111 1 You can see that items 1, 2, 9 and 13 demonstrate no variance in this small sample and thus muck up the analysis. This happens even when I have items with plenty of variance, but seeing as they are easy items, they might have a .8 - .9 probability of endorsement, so even in samples of 25-50, still run into this variance problem. I have seeded the forward search with specific cases that create the necessary variance, but these artificial subsamples tend to loaded with deviant observations and thus are excluded after a couple of steps, creating the lack of variance problem. Any suggestions? Robert -- View this message in context: http://r.789695.n4.nabble.com/fwdmsa-package-Error-in-search-normal-X-samp-verbose-FALSE-At-least-one-item-has-no-variance-tp4493531p4493531.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fwdmsa package: Error in search.normal(X[samp, ], verbose = FALSE) : At least one item has no variance
I'm using the fwdmsa package to identify deviant cases in a Mokken scale analysis. I've run into a problem., separate from the one I posted previously. The problem comes with items that are "easy" by IRT standards. A good scale should include a range of difficulties; yet when I include "easy" items in a forward search I continuously run into the problem that these items demonstrate no variability in the smaller subsamples selected for forward search. I can resolve the problem by only running the analysis without the easier items, but this is not ideal for for the analysis. The data are 150 responses to a 37 item test. For the first 15 items, they look like this (the variable names are cumbersome and so have been removed) : > head(by364.data) 111101011010111 1 211101111110011 1 301110010111011 0 411100111111101 1 511110101111011 0 611111111111111 1 You can see that items 1, 2, 9 and 13 demonstrate no variance in this small sample and thus muck up the analysis. This happens even when I have items with plenty of variance, but seeing as they are easy items, they might have a .8 - .9 probability of endorsement, so even in samples of 25-50, still run into this variance problem. I have seeded the forward search with specific cases that create the necessary variance, but these artificial subsamples tend to loaded with deviant observations and thus are excluded after a couple of steps, creating the lack of variance problem. Any suggestions? Robert -- View this message in context: http://r.789695.n4.nabble.com/fwdmsa-package-Error-in-search-normal-X-samp-verbose-FALSE-At-least-one-item-has-no-variance-tp4493530p4493530.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] small scales in fwdmsa
sorry I mistyped the "fs.MSA(by364.data)" in the previous post. -- View this message in context: http://r.789695.n4.nabble.com/small-scales-in-fwdmsa-tp4493479p4493490.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] small scales in fwdmsa
I'm using the fwdmsa package to identify deviant cases in a Mokken scale analysis. I've run into a problem. When I use scales comprising a few items, iI tend to get an error: Error in y[order(res[-msamp])][1:(length(samp) + 1 - length(msamp))] : only 0's may be mixed with negative subscripts I understand that the error is triggered when the algorithm is fetching cases to enter into the next step of the forward search. I don't understand what I can do to remedy this error. The data are dichotomized (1,0) respsonses from a multiple-choice exam that 150 students have completed. If I run the entire test (37 items) , the fwd.msa algorithm works fine. However, the entire test is not unidimensional, so I want to perform separate analyses with the several unidimensional scales comprised by the entire test. Yet when I select those 4-5 item scales, I run into this error. Any ideas how to proceed? The data are 150 responses to a 37 item test. For the first 15 items, they look like this (the variable names are cumbersome and so have been removed) : > head(by364.data) 111101011010111 1 211101111110011 1 301110010111011 0 411100111111101 1 511110101111011 0 611111111111111 1 When I run the fwd.MSA(by364.data) on the full data set "works" fine. However, with shorter (for reasons of unidimensionality) scales, I continuously run into the error above. Any suggestions? Robert -- View this message in context: http://r.789695.n4.nabble.com/small-scales-in-fwdmsa-tp4493479p4493479.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble installing the XML package
Hello everyone, I am probably not the only one having trouble with this package but here goes. I want to install XML on Ubuntu. I installed libxml2-dev and everything works out fine until I get the following: Error in reconcilePropertiesAndPrototype(name, slots, prototype, superClasses, : No definition was found for superclass "namedList" in the specification of class "XMLOutputStream" Error : unable to load R code in package 'XML' ERROR: lazy loading failed for package ‘XML’ Googling this was no help. This is the entire R console output: > install.packages("XML",dependencies=TRUE,configure.args=c("library","clean")) Warnung in install.packages("XML", dependencies = TRUE, configure.args = c("library", : Argument 'lib' fehlt: nutze '/home/yeti/R/i486-pc-linux-gnu-library/2.10' versuche URL 'http://ftp.osuosl.org/pub/cran/src/contrib/XML_3.9-4.tar.gz' Content type 'application/x-gzip' length 923501 bytes (901 Kb) URL geöffnet == downloaded 901 Kb * installing *source* package ‘XML’ ... configure: WARNING: you should use --build, --host, --target configure: WARNING: you should use --build, --host, --target checking for library-gcc... no checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -E No ability to remove finalizers on externalptr objects in this verison of R checking for sed... /bin/sed checking for pkg-config... /usr/bin/pkg-config checking for xml2-config... /usr/bin/xml2-config USE_XML2 = yes SED_EXTENDED_ARG: -E Minor 7, Patch 6 for 2.7.6 Located parser file -I/usr/include/libxml2/parser.h Checking for 1.8: -I/usr/include/libxml2 Using libxml2.* checking for gzopen in -lz... yes checking for xmlParseFile in -lxml2... yes checking for xmlHashSize in -lxml2... yes Using built-in xmlHashSize Checking DTD parsing (presence of externalSubset)... checking for xmlHashSize in -lxml2... yes Found xmlHashSize checking for xmlOutputBufferCreateBuffer in -lxml2... yes have xmlOutputBufferCreateBuffer() checking for xmlDocDumpFormatMemoryEnc in -lxml2... yes checking libxml/xmlversion.h usability... yes checking libxml/xmlversion.h presence... yes checking for libxml/xmlversion.h... yes Expat: FALSE Checking for return type of xmlHashScan element routine. No return value for xmlHashScan xmlNs has a context field Checking for cetype_t enumeration No cetype_t enumeration defined in R headers. checking for xmlsec1-config... no nodegc default Version has XML_WITH_ZLIB Version has xmlHasFeature() Configuration information: Libxml settings libxml include directory: -I/usr/include/libxml2 libxml library directory: -lxml2 -lz -lxml2 libxml 2: -DLIBXML2=1 Compilation flags: -DLIBXML -I/usr/include/libxml2 -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1 -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1 -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1 -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1 Link flags: -lxml2 -lz -lxml2 configure: creating ./config.status config.status: creating src/Makevars config.status: creating R/supports.R config.status: creating inst/scripts/RSXML.csh config.status: creating inst/scripts/RSXML.bsh ** libs gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/usr/include/libxml2 -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1 -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1 -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1 -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1 -I. -DLIBXML2=1 -fpic -g -O2 -c DocParse.c -o DocParse.o In file included from DocParse.c:13: Utils.h:230:2: warning: #warning "Redefining COPY_TO_USER_STRING to use encoding from XML parser" gcc -std=gnu99 -I/usr/share/R/include -DLIBXML -I/usr/include/libxml2 -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DDUMP_WITH_ENCODING=1 -DUSE_XML_VERSION_H=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 -DNO_XML_HASH_SCANNER_RETURN=1 -DLIBXML_NAMESPACE_HAS_CONTEXT=1 -DHAVE_XML_WITH_ZLIB=1 -DHAVE_XML_HAS_FEATURE=1 -DUSE_R=1 -D_R_=1 -DHAVE_VALIDITY=1 -DXML_REF_COUNT_NODES=1 -I. -DLIBXML2=1 -fpic -g -O2 -c EventParse.c -o EventParse.o In file included from EventParse.c:15: Utils.h:230:2: warning: #warning "Redefining COPY_TO_USER_STRING to use encoding from XML parser" EventParse.c: In function ‘RS_XML_textHandler’: EventParse.c:41
Re: [R] glmnet: obtain predictions using predict and also by extracting coefficients
Hi Juliet, First of all, cv.glmnet is used to estimate lambda based on cross-validation. To get a glmnet prediction, you should use glmnet function which uses all data in the training set. Second, you constructed testX using a different data set (data.test.std) from one for glmnet predict (data.test). It's not surprise the predictions are different. Weidong Gu On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah wrote: > All, > > For my understanding, I wanted to see if I can get glmnet predictions > using both the predict function and also by multiplying coefficients > by the variable matrix. This is not worked out. Could anyone suggest > where I am going wrong? > I understand that I may not have the mean/intercept correct, but the > scaling is also off, which suggests a bigger mistake. > > Thanks for your help. > > Juliet Hannah > > > library(ElemStatLearn) > library(glmnet) > > data(prostate) > > # training data > data.train <- prostate[prostate$train,] > y <- data.train$lpsa > > # isolate predictors > data.train <- as.matrix(data.train[,-c(9,10)]) > > # test data > data.test <- prostate[!prostate$train,] > data.test <- as.matrix(data.test[,-c(9,10)]) > > # scale test data by using means and sd from training data > > trainMeans <- apply(data.train,2,mean) > trainSDs <- apply(data.train,2,sd) > > # create standardized test data > > data.test.std <- sweep(data.test, 2, trainMeans) > data.test.std <- sweep(data.test.std, 2, trainSDs, "/") > > # fit training model > > myglmnet =cv.glmnet(data.train,y) > > # predictions by using predict function > > yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") > > # attempting to get predictions by using coefficients > > beta <- as.vector( t(coef(myglmnet,s="lambda.min"))) > > testX <- cbind(1,data.test.std) > > yhat2 <- testX %*% beta > > # does not match > > plot(yhat2,yhat_enet) > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multivariate ordinal probit regression vglm()
Hello, all. I'm investigating the rate at which skeletal joint surfaces pass through a series of ordered stages (changes in morphology). Current statistical methods in this type of research use various logit or probit regression techniques (e.g., proportional odds logit/probit, forward/backward continuation ratio, or restricted/unrestricted cumulative probit). Data typically include the predictor (age) and one or more response variables (the stages of skeletal morphology). For example, the pubic symphysis and auriclar surface (two joint surfaces of the pelvis) may be observed in three and four stages, respectively (see sample dataframe "refdata" below). age pube3 auric4 1 32 3 2 2 42 3 2 3 27 2 1 4 39 2 1 5 85 3 4 I've had some success in fitting the ordinal probit model using both polr(method="probit") in the MASS library and vglm() in the VGAM library, but I've hit a wall when it comes to fitting a model that includes both response variables ("pube3" and "auric4" in the sample dataframe above). I've included the two univariate models below, but I'd like to know how to model the two response variables on age---returning the coefficients for each response AND the correlation parameter, since the two responses are not independent. If it would help, please feel free to access the dataframe (https://docs.google.com/open?id=0B5zZGW2utJN0TEctcW1oblFRcTJrNDVLOVBmRWRaQQ). Thanks in advance. --Trey *** Trey Batey---Anthropology Instructor Mt. Hood Community College 26000 SE Stark St. Gresham, OR 97030 trey.ba...@mhcc.edu or ekt.ba...@gmail.com ## unrestricted cumulative probit model for pubic scores > mod.pube3 Call: vglm(formula = refdata$pube3 ~ refdata$age, family = cumulative(link = "probit", parallel = FALSE, reverse = TRUE)) Coefficients: (Intercept):1 (Intercept):2 ref$age:1 ref$age:2 -1.65895567 -2.147559510.066882420.04055919 Degrees of Freedom: 1492 Total; 1488 Residual Residual Deviance: 1188.909 Log-likelihood: -594.4543 ## unrestricted cumulative probit model for auricular scores > mod.auric4 Call: vglm(formula = refdata$auric4 ~ refdata$age, family = cumulative(link = "probit", parallel = FALSE, reverse = TRUE)) Coefficients: (Intercept):1 (Intercept):2 (Intercept):3 ref$age:1 ref$age:2 -2.07719235 -2.43422370 -2.991230980.073196320.05133132 ref$age:3 0.03797696 Degrees of Freedom: 2238 Total; 2232 Residual Residual Deviance: 1583.47 Log-likelihood: -791.7348 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubts about mixed effect models
Post this on the r-sig-mixed=models list rather than here. However, fwiw, it is nonsense to estimate a random effect with a sample size of 3. That's trying to estimate variance with a sample size of 3. You can't do it with any meaningful precision. Whether or not the effect really **is** conceptually random is beside the point. I suggest you cross off your list of statistical advisers anyone who says otherwise. Entropy cannot be denied! -- Bert On Wed, Mar 21, 2012 at 11:01 AM, Lívia Dorneles Audino wrote: > Hi everyone! > > > > I have some doubts about mixed effect models and I hope someone could help > me. I´m trying to analyze a dataset coming from samples of dung beetles in > the same forest fragments along 3 consecutive years (1994, 1995 and 1996) > and 14 years after (2010). I sampled dung beetles in 18 different fragments > with different sizes and different degrees of isolation. My aim is to > determine whether total species richness change over time in forest > fragments and to verify the influence of fragment size and isolation on > species richness. However, I'm trying to find a way to consider in the > analyses the temporal pseudo-replication in the data. So, I decided to use > mixed effects models to analyze this data, but I still have doubts about > how I should construct the models. When I asked for some help I received > three different answers about how to construct the model. > > > The first suggestion was to treat year as a fixed rather than a random > effect because it won't be practical to estimate the variance of a > random effect > with only four levels. So, I constructed the model like cited below: > > m1<-lmer(riqueza~área*ano+isolamento*ano(1|fragmento),family=poisson > > > The second suggestion proposed to treat year as a random effect, as cited > bellow: > > m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson > > > And the third suggestion also proposed to treat year as a random effect, > but to consider it *as continuous variable rather than categorical*. In the > models above I consider year as a categorical variable. > > m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson > > > When I analyze my dataset using the second and the third model I always > face with a singular convergence warning: *In mer finalize(ans): singular > convergence (7)**.* What is that mean? Does anyone have some idea about > how can I solve this issue? > > > > I also need to know which one of these models is more appropriate to the > dataset available. Does anyone have some suggestions? > > Thanks in advance! > > Lívia. > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] runif error - maximum limit?
Dear all, I am receiving the error below, I think because my n is exceeding the allowable limit for a vector. Can anyone confirm, and help with the following questions? The function and error: > stPte8 <- rtrunc(rnorm, nx * ny * nsimu, linf=0, mean=as.vector(PTmn), > sd=as.vector(PTsd)) Error in runif(n, min = pinf, max = psup) : invalid arguments My total n is calculated by: nx=4053, ny=4848, nsimu=1000, n = nx*ny*nsimu = 19,648,994,000 I believe the maximum size of a vector is 2^31-1 (2,147,483,647) - clearly smaller than my n. I have done some initial reading about how to deal with large datasets in R, with reference to bigmemory and ff. For anyone who has experience in this, would you have advice regarding using these packages as opposed to creating a work-around such as reducing the size of the matrix (the ny by nx), batching the simulations, then piecing them back together outside of R (ArcGIS)? Thanks for any insights you might have. > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-mingw32/x64 (64-bit) -- 88GB RAM attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.3-17 rgdal_0.7-8 raster_1.9-70sp_0.9-97 mc2d_0.1-12 mvtnorm_0.9-9992 loaded via a namespace (and not attached): [1] grid_2.14.2lattice_0.20-0 -- View this message in context: http://r.789695.n4.nabble.com/runif-error-maximum-limit-tp4493486p4493486.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphic legend with mathematical symbol, numeric variable and character variable
Hi David, It's best to keep the mailing list in the loop so I'm cc-ing to R-help. I've rethought your problem and propose a different solution. The main problem is to create a vector of expressions to be used in the legend. (see https://stat.ethz.ch/pipermail/r-help/2011-February/270106.html) tau.vals <- c(51,88,200) types <- c("B&K UA 1404","01dB BAP 21","RION WS3") leg <- vector("expression", 3) for(i in 1:3) leg[i] <- substitute(expression( TYPE ~ group("(", tau == tauval, ")") ), list(TYPE=types[i], tauval=tau.vals[i]) )[2] plot(0);legend(x="topright",legend=leg) The idea is to create an expression vector and then fill its components appropriately. We need the second component of the substitute call. Peter Ehlers On 2012-03-21 06:49, "ECOTIÈRE David (Responsable d'activité) - CETE Est/LRPC de Strasbourg/6 Acoustique" wrote: Well, I have some problems with this solution: Here is my (real) code: tau<-c(51,88,200) types<-c("B&K UA 1404","01dB BAP 21","RION WS3") types<-gsub(pattern=" ", replacement="~",x=types) leg<-parse(text=paste(types,"(","tau==",tau,"min)", sep='~')) plot(0) legend(x="topright",legend=leg) I've got an error after the 'parse' command : Erreur dans parse(text = paste(types, "(", "tau==", tau, "min)", sep = "~")) : :2:3: symbole inattendu(e) 1: B&K~UA~1404~(~tau==~51~min) 2: 01dB ^ But no error if "01dB" is replaced by "dB01" (which is not what I want of course). It's like if it was impossible to have a letter that follows à number. Any idea ? Thank you David Le 21/03/2012 12:57,> Peter Ehlers (par Internet) a écrit : On 2012-03-20 06:09, "ECOTIÈRE David (Responsable d'activité) - CETE Est/LRPC de Strasbourg/6 Acoustique" wrote: Hi, I'd like to make a legend with a mix of mathematical symbol (tau), numeric variable and character variables.I have tried : types<-c("Type 1","Type 2","Type 2") tau<-c(1,3,2) legend(x="topright",legend=paste(types,"tau=",expression(tau))) but it doesn't work: the 'tau' symbol is not written in its 'symbol style' but as 'tau' Any (good) idea ? Thank you in advance ! David Does this do what you want: types<- paste('Type', 1:3, sep='~') mytau<- 4:6 leg<- parse(text=paste(types, "~~tau==", mytau, sep='~')) plot(0) legend('topright', legend=leg) The '~' symbols generate spaces; use more if you want more spacing. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "cannot change working directory"
Hi, I need to write program to run some results from an experiment. i have the results saved in 2 different files, named "tuesday" and "wednesday" (both located within the same file). when i wrote the program for "tuesday" i had no problem running it, but when i changed the work directory to wednesday i got the "cannot change working directory" message. I don't understand what's different between the two. my first code was: setwd("/Users/user/Desktop/Recorded\ results/tuesday") and my second code is: setwd("/Users/user/Desktop/Recorded\ results/wednesday") and i copied the exact location from the terminal, so i can't have typos (i am using a mac, if that makes any difference). any suggestions? i will be grateful for any insight. thanks mela -- View this message in context: http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492866p4492866.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create a function for a self created class.
Hi, I need to create a method for a class named 'simpleOLS' which I have created that will compute the coefficients of predictors. Here is my code: #--- setClass("simpleOLS", representation(dataset="matrix", beta="matrix", x="matrix", y="matrix", var="character")) computeBetas = function(dataset,str) class('simpleOLS') setMethod("computeBetas","simpleOLS",betas) betas = function(dataset, str) { x<- cbind(int=1,dataset[-which(colnames(dataset) %in% str)]) y<- dataset[,length(dataset)] names(x)<- NULL names(y)<- NULL x.numeric<- sapply(x, as.numeric) #converting into numeric type y.numeric<- sapply(y, as.numeric) #converting into numeric type transpose.x<- t(x.numeric) X<-transpose.x%*%x.numeric inv<- ginv(X, tol=sqrt(.Machine$double.eps)) Y<- inv%*%transpose.x beta<- Y%*%y.numeric return(beta) } T2<-computeBetas(data.setA1,str) # But when I call the function it returns me - [1] "character" which is not the intended output. I am new to object oriented programming in R and any help will be appreciated. I also want to know if there is any problem with the way I have created the function. Thanks & Regards, -- View this message in context: http://r.789695.n4.nabble.com/How-to-create-a-function-for-a-self-created-class-tp4493288p4493288.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Doubts about mixed effect models
Hi everyone! I have some doubts about mixed effect models and I hope someone could help me. I´m trying to analyze a dataset coming from samples of dung beetles in the same forest fragments along 3 consecutive years (1994, 1995 and 1996) and 14 years after (2010). I sampled dung beetles in 18 different fragments with different sizes and different degrees of isolation. My aim is to determine whether total species richness change over time in forest fragments and to verify the influence of fragment size and isolation on species richness. However, I'm trying to find a way to consider in the analyses the temporal pseudo-replication in the data. So, I decided to use mixed effects models to analyze this data, but I still have doubts about how I should construct the models. When I asked for some help I received three different answers about how to construct the model. The first suggestion was to treat year as a fixed rather than a random effect because it won't be practical to estimate the variance of a random effect with only four levels. So, I constructed the model like cited below: m1<-lmer(riqueza~área*ano+isolamento*ano(1|fragmento),family=poisson The second suggestion proposed to treat year as a random effect, as cited bellow: m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson And the third suggestion also proposed to treat year as a random effect, but to consider it *as continuous variable rather than categorical*. In the models above I consider year as a categorical variable. m1<-lmer(riqueza~área*ano+isolamento*ano(ano|fragmento),family=poisson When I analyze my dataset using the second and the third model I always face with a singular convergence warning: *In mer finalize(ans): singular convergence (7)**.* What is that mean? Does anyone have some idea about how can I solve this issue? I also need to know which one of these models is more appropriate to the dataset available. Does anyone have some suggestions? Thanks in advance! Lívia. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "cannot change working directory"
thanks Michael and William, I have resigned to moving the wednesday in to the tuesday file and keeping tuesday as the work directory. this seemed to solve the problem, but i think that also means that i do have permission to access the wednesday file. i don't quite understand what went wrong, but it's not the first time i've experienced problems with programs on my mac. could it be a bug or a compatibility issue? thanks again mela -- View this message in context: http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4493039.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assign names to the assets in portfolio frontier plot. Using frontierPlot fPortfolio.
Hi, I have troubles to assign names to the assets in the portfolio frontier plot using frontierPlot() in fPortfolio package. Can anyone please help me? If I use tailoredFrontierPlot in fPortfolio the assets assign names but I dont want to plot capital market line e.t.c. Someone maybe know how to delete stuff from tailoredFrontierPlot function? Best Regards Paul -- View this message in context: http://r.789695.n4.nabble.com/Assign-names-to-the-assets-in-portfolio-frontier-plot-Using-frontierPlot-fPortfolio-tp4492973p4492973.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova.lm F test confusion
Michelle: You need to work with someone locally who understands basic statistics, as you are clearly out of your depth. Posting to this list is highly unlikely to meet your needs, nor is this an appropriate place or means to learn statistics -- it's for help on R. (Yes, they do overlap, but it is generally assumed that the poster knows the statistics but needs help with R. You clearly do not fit that description). Cheers, Bert On Wed, Mar 21, 2012 at 7:11 AM, msteane wrote: > What if the model isn't nested, i.e. I want to test y=x+w vs y=x+z+v. Is > there a valid test/method to compare these (other than comparing R squared > values)? They are both multiple regression models. > > -- > View this message in context: > http://r.789695.n4.nabble.com/anova-lm-F-test-confusion-tp4490211p4492402.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in file(file, "rt") : cannot open the connection
Hi there, I migrated from Windows7 to Ubuntu (11.10) and am trying to get my R back online. I've switched from Tinn-R to RKWard which seems to work well. My problem at the moment is that I'm no longer able to read in the data file I was working on before. > popafr <- read.table(file="~/Documents/My PhD/Modelling/SAS/SAS datasets/SAS export sets/Final reference curve sets/SAS_for_R AFRO 11jan11.txt",header=T, sep="\t") Error in file(file, "rt") : cannot open the connection Calls: read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file '/home/me/Documents/My PhD/Modelling/SAS/SAS datasets/SAS export sets/Final reference curve sets/SAS_for_R AFRO 11jan11.txt': No such file or directory > I get the same results with: >popafr <- read.table(file.choose()) >popafr <- read.table(file="SAS_for_R AFRO 11jan11.txt",header=T, sep="\t") The file is physically present in both the getwd() and the directory listed above even with the .choose option I receive the same error. Does this have to do with file permissions? I've tried running sudo su and then starting R in a terminal with the same error. any help is much appreciated. Hopefully yours, Daniel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmnet: obtain predictions using predict and also by extracting coefficients
Oops. Coefficients are returned on the scale of the original data. testX <- cbind(1,data.test) yhat2 <- testX %*% beta # works plot(yhat2,yhat_enet) On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah wrote: > All, > > For my understanding, I wanted to see if I can get glmnet predictions > using both the predict function and also by multiplying coefficients > by the variable matrix. This is not worked out. Could anyone suggest > where I am going wrong? > I understand that I may not have the mean/intercept correct, but the > scaling is also off, which suggests a bigger mistake. > > Thanks for your help. > > Juliet Hannah > > > library(ElemStatLearn) > library(glmnet) > > data(prostate) > > # training data > data.train <- prostate[prostate$train,] > y <- data.train$lpsa > > # isolate predictors > data.train <- as.matrix(data.train[,-c(9,10)]) > > # test data > data.test <- prostate[!prostate$train,] > data.test <- as.matrix(data.test[,-c(9,10)]) > > # scale test data by using means and sd from training data > > trainMeans <- apply(data.train,2,mean) > trainSDs <- apply(data.train,2,sd) > > # create standardized test data > > data.test.std <- sweep(data.test, 2, trainMeans) > data.test.std <- sweep(data.test.std, 2, trainSDs, "/") > > # fit training model > > myglmnet =cv.glmnet(data.train,y) > > # predictions by using predict function > > yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") > > # attempting to get predictions by using coefficients > > beta <- as.vector( t(coef(myglmnet,s="lambda.min"))) > > testX <- cbind(1,data.test.std) > > yhat2 <- testX %*% beta > > # does not match > > plot(yhat2,yhat_enet) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automaticall adjust axis scales
Would it be possible to change the axis at the end? My data sets seem to be quite large so I was thinking for the plot and the consequent lines to keep always the current minimum and maximum like plot(x) minimum=min(x) lines(x1) minimum=c(minimum,x1) lines(x2)minimum=c(minimum,x2) then if there is a good way for altering axis lines I could put the limits I want to. Regards Alex From: Jim Lemon To: Sent: Tuesday, March 20, 2012 10:07 AM Subject: Re: [R] Automaticall adjust axis scales Alaios wrote: > Dear all, > > I have made a function that given a number of list elements plot them to the > same window. > > The first element is plotted by using plot and all the rest are plotted under > the > > same window by using lines. > > I have below a small and simple reproducible example. > > > x1<-c(1:10) > plot(x1) > > x2<-c(11:20) > lines(x2) > > x3<-c(31:40) > lines(x3) > > > > > as you might notice > the two consecutive lines fail to be plotted as the axis were formed by the > first plot. > Would it be possible after the last lines to change the axis to the minimum > and the maximum of all data sets to be visible? > > Any idea how I can do that? > > Hi Alaois, Try this: ylim=range(c(x1,x2,x3)) plot(x1,ylim=ylim,type="l") lines(x2) lines(x3) Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet: obtain predictions using predict and also by extracting coefficients
All, For my understanding, I wanted to see if I can get glmnet predictions using both the predict function and also by multiplying coefficients by the variable matrix. This is not worked out. Could anyone suggest where I am going wrong? I understand that I may not have the mean/intercept correct, but the scaling is also off, which suggests a bigger mistake. Thanks for your help. Juliet Hannah library(ElemStatLearn) library(glmnet) data(prostate) # training data data.train <- prostate[prostate$train,] y <- data.train$lpsa # isolate predictors data.train <- as.matrix(data.train[,-c(9,10)]) # test data data.test <- prostate[!prostate$train,] data.test <- as.matrix(data.test[,-c(9,10)]) # scale test data by using means and sd from training data trainMeans <- apply(data.train,2,mean) trainSDs <- apply(data.train,2,sd) # create standardized test data data.test.std <- sweep(data.test, 2, trainMeans) data.test.std <- sweep(data.test.std, 2, trainSDs, "/") # fit training model myglmnet =cv.glmnet(data.train,y) # predictions by using predict function yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") # attempting to get predictions by using coefficients beta <- as.vector( t(coef(myglmnet,s="lambda.min"))) testX <- cbind(1,data.test.std) yhat2 <- testX %*% beta # does not match plot(yhat2,yhat_enet) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Type II and III sum of squares (R and SPSS)
On Mar 21, 2012, at 11:27 , Marco Tommasi wrote: > To whom it may concern > > I made some analysis with R using the command Anova. However, I found > some problmes with the output obtained by selecting type II o type III > sum of squares. Well, it would primarily concern the maintainer of the "car" package, which is the one containing the (capital-A) Anova() function. The type III SS don't look right to me either. With aov() we get > M3l <- reshape(M3, direction="long", varying=c("b1","b2","b3"),sep="") > summary(aov(b~fattA*factor(time)+ Error(factor(id)), M3l)) Error: factor(id) Df Sum Sq Mean Sq F value Pr(>F) fattA 1 26.042 26.04216.3 0.00682 ** Residuals 6 9.583 1.597 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time)2 42.33 21.167 23.81 6.65e-05 *** fattA:factor(time) 2 20.33 10.167 11.44 0.00166 ** Residuals 12 10.67 0.889 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Briefly, I have to do a 2x3 mixed model anova, wherein the first factor > is a between factor and the second factor is a within factor. I use the > command Anova in the list below, because I want to obtain also the sum > of squares of the linear and quadratic contrast between the levels of > the within factor. > > > > > Below I report the list of commands used in R ("fattA" is the beteween > factor and "fB" is the within factor): > >> a1b1<-c(10,9,8,7) >> a1b2<-c(7,6,4,5) >> a1b3<-c(3,2,3,4) >> a2b1<-c(9,9,8,7) >> a2b2<-c(8,7,9,7) >> a2b3<-c(7,8,8,6) >> >> M3<-matrix(0,8,4) >> M3[,1]<-cbind(a1b1,a2b1) >> M3[,2]<-cbind(a1b2,a2b2) >> M3[,3]<-cbind(a1b3,a2b3) >> M3[,4]<-rep(c(1,2),each=4) >> >> colnames(M3)<-c("b1","b2","b3","fattA") >> >> M3<-as.data.frame(M3) >> >> require(car) > Loading required package: car > Loading required package: MASS > Loading required package: nnet >> f5<-lm(cbind(b1,b2,b3)~fattA,data=M3) >> a5<-Anova(f5) > >> f6<-lm(c(b1,b2,b3)~rep(fattA,3),data=M3) >> >> >> fB<-factor(c(1:3)) >> d2<-data.frame(fB) >> a6<-Anova(f5,idata=d2,idesign=~fB,type=2) > >> summary(a6,multivariate=F) > > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity > > SS num Df Error SS den Df FPr(>F) > (Intercept) 1080.04 1 9.5833 6 676.200 2.133e-07 *** > fattA 26.04 1 9.5833 6 16.304 0.006819 ** > fB42.33 2 10.6667 12 23.812 6.645e-05 *** > fattA:fB 20.33 2 10.6667 12 11.438 0.001660 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > Mauchly Tests for Sphericity > > Test statistic p-value > fB 0.87891 0.7242 > fattA:fB0.87891 0.7242 > > > Greenhouse-Geisser and Huynh-Feldt Corrections > for Departure from Sphericity > > GG eps Pr(>F[GG]) > fB 0.89199 0.0001474 *** > fattA:fB 0.89199 0.0026452 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > HF eps Pr(>F[HF]) > fB 1.2438 6.645e-05 *** > fattA:fB 1.24380.00166 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > Warning message: > In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1 > > > I repated the anlysis by setting type III sum of squares and I obtained: > >> a6<-Anova(f5,idata=d2,idesign=~fB,type=3) >> summary(a6,multivariate=F) > > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity > > SS num Df Error SS den Df F Pr(>F) > (Intercept) 30.817 1 9.5833 6 19.294 0.004606 ** > fattA 26.042 1 9.5833 6 16.304 0.006819 ** > fB 40.133 2 10.6667 12 22.575 8.57e-05 *** > fattA:fB20.333 2 10.6667 12 11.438 0.001660 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > Mauchly Tests for Sphericity > > Test statistic p-value > fB 0.87891 0.7242 > fattA:fB0.87891 0.7242 > > > Greenhouse-Geisser and Huynh-Feldt Corrections > for Departure from Sphericity > > GG eps Pr(>F[GG]) > fB 0.89199 0.0001851 *** > fattA:fB 0.89199 0.0026452 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > HF eps Pr(>F[HF]) > fB 1.2438 8.57e-05 *** > fattA:fB 1.24380.00166 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > Warning message: > In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1 > > > As you can see, the sum of squares of the within factor "fB" and of the > intercept obtained by setting type II sum of squares are dofferent form > those obtained by setting type III sum of squares. I repeated the > analysis by using SPPS (type II and III) and i obtained the same sum of > squares for both types., whi
Re: [R] Unable to specify order of a factor
On Mar 21, 2012, at 1:07 PM, Justin Montemarano wrote: Hi all: I've got it... it appears that total.density was also defined in two separate data frames (se.predict.data and dc.predict.data) with levels order 16, 32, 8. Using relevel(), I moved 8 to the first position and it's solved the plotting problem. Ista's 'minimal' reproducible code request prompted me to discover my error; thanks all. I've had the experience in the last few years that almost all of my questions to Rhelp have needed to be peacefully euthanized after being subjected to the rack of hammering into a "reproducible" condition. -- David. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Wed, Mar 21, 2012 at 12:42 PM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: You'll also want to use dput() to send us an exact encoding of your data when making that reproducible example: there might be something subtle at play here that print methods won't show. Michael On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn wrote: On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano > wrote: Ista: Your attached code did work for me; moreover, the facets were presented in the desired order with facet_wrap() and facet_grid(), which is what I'm using because I have a second factor used in facet_grid(). Still, my plots with total.density as a facet are coming out in 16, 32, 8, and I'm not seeing why. Below is my plot code - ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) + facet_grid(total.density ~ prop.ec) + #add point and error bar data theme_set(theme_bw()) + geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = per.remain + se), width = 3) + #add predicted model data geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',], aes(x = x.values, y = predicted.values), colour = c('red')) + geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',], aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = c('dashed')) + xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank()) Is there anything odd about it that might be producing the odd ordering problem? FYI, avoiding subsetting ag.tab doesn't do the trick. I don't know. Please create a minimal example that isolates the problem. You can start with levels(ag.tab$total.density) ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) + facet_grid(total.density ~ prop.ec) + geom_point() Best, Ista - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: Hi Justin, this gives the correct order (8, 16, 32) on my machine: total.density <- c (8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32 ) total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) str(total.density) order(levels(total.density)) dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density))) ggplot(dat, aes(x = v1)) + geom_density() + facet_wrap(~td) Does it work for you? If yes, then you need to tell us what you're doing that is different from this example. If no, please give use the output of sessionInfo(). best, Ista On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano < jmont...@kent.edu> wrote: I think I understand, but I believe my original interest is in the order of levels(total.density), since ggplot appears to be using that to order the facets. Thus, I'm still getting three graphs, ordered (and displayed as) 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry if I wasn't clear and/or I've missed your message. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
Re: [R] Unable to specify order of a factor
Hi all: I've got it... it appears that total.density was also defined in two separate data frames (se.predict.data and dc.predict.data) with levels order 16, 32, 8. Using relevel(), I moved 8 to the first position and it's solved the plotting problem. Ista's 'minimal' reproducible code request prompted me to discover my error; thanks all. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Wed, Mar 21, 2012 at 12:42 PM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: > You'll also want to use dput() to send us an exact encoding of your > data when making that reproducible example: there might be something > subtle at play here that print methods won't show. > > Michael > > On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn wrote: > > On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano > wrote: > >> Ista: > >> > >> Your attached code did work for me; moreover, the facets were presented > in > >> the desired order with facet_wrap() and facet_grid(), which is what I'm > >> using because I have a second factor used in facet_grid(). > >> > >> Still, my plots with total.density as a facet are coming out in 16, 32, > 8, > >> and I'm not seeing why. Below is my plot code - > >> > >>> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = > >>> per.remain)) + facet_grid(total.density ~ prop.ec) + > >>> #add point and error bar data > >>> theme_set(theme_bw()) + > >>> geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = > >>> per.remain + se), width = 3) + > >>> #add predicted model data > >>> geom_line(data = se.predict.data[se.predict.data$plant.sp == > 'EC',], > >>> aes(x = x.values, y = predicted.values), colour = c('red')) + > >>> geom_line(data = dc.predict.data[dc.predict.data$plant.sp == > 'EC',], > >>> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = > >>> c('dashed')) + > >>> > >>> xlab('Day') + ylab('Percent Mass Remaining') + > opts(panel.grid.major = > >>> theme_blank(), panel.grid.minor = theme_blank()) > >> > >> Is there anything odd about it that might be producing the odd ordering > >> problem? FYI, avoiding subsetting ag.tab doesn't do the trick. > > > > I don't know. Please create a minimal example that isolates the > > problem. You can start with > > > > levels(ag.tab$total.density) > > > > ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = > per.remain)) + > >facet_grid(total.density ~ prop.ec) + > >geom_point() > > > > Best, > > Ista > > > >> - > >> Justin Montemarano > >> Graduate Student > >> Kent State University - Biological Sciences > >> > >> http://www.montegraphia.com > >> > >> > >> On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: > >>> > >>> Hi Justin, > >>> > >>> this gives the correct order (8, 16, 32) on my machine: > >>> > >>> total.density <- > >>> > >>> > c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) > >>> total.density <- factor(total.density, levels=c(8, 16, 32), > ordered=TRUE) > >>> str(total.density) > >>> > >>> order(levels(total.density)) > >>> > >>> dat <- data.frame(td = total.density, v1 = > rnorm(1:length(total.density))) > >>> > >>> ggplot(dat, aes(x = v1)) + > >>> geom_density() + > >>> facet_wrap(~td) > >>> > >>> Does it work for you? If yes, then you need to tell us what you're > >>> doing that is different from this example. If no, please give use the > >>> output of sessionInfo(). > >>> > >>> best, > >>> Ista > >>> > >>> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano < > jmont...@kent.edu> > >>> wrote: > >>> > I think I understand, but I believe my original interest is in the > order > >>> > of > >>> > levels(total.density), since ggplot appears to be using that to order > >>> > the > >>> > facets. Thus, I'm still getting three graphs, ordered (and displayed > >>> > as) > >>> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm > sorry > >>> > if > >>> > I wasn't clear and/or I've missed your message. > >>> > - > >>> > Justin Montemarano > >>> > Graduate Student > >>> > Kent State University - Biological Sciences > >>> > > >>> > http://www.montegraphia.com > >>> > > >>> >[[alternative HTML version deleted]] > >>> > > >>> > __ > >>> > R-help@r-project.org mailing list > >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> > PLEASE do read the posting guide > >>> > http://www.R-project.org/posting-guide.html > >>> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible
Re: [R] "cannot change working directory"
list.dirs() will list the directories in that file exactly as R sees them. If you are getting can't change working directory messages I can only think of two reasons: i) File permissions (though that's unlikely to get messed up unless there's someone around who knows how to do that in the first place) ii) You don't actually have the right wd name. So like I said, please give the output of list.dirs("/Users/user/Desktop/Recorded results/", recursive = FALSE) and then try something like setwd(list.dirs("/Users/user/Desktop/Recorded results/", recursive = FALSE)[XXX]) where XXX is the number of the path that gives the Wednesday files. normalizePath() might also help. If you really want to check the file permissions route, this will be a start system("ls -hFGlxo /Users/user/Desktop/Recorded results") but you'll probably need some help to interpret the output. Michael On Wed, Mar 21, 2012 at 12:30 PM, Carmela Friedman wrote: > could you explain what that is please? i am new to R > On 21 Mar 2012, at 16:27, R. Michael Weylandt wrote: > >> Scratch that -- while true, it doesn't seem to cause that error message. >> >> Can you do this? >> >> list.dirs("/Users/user/Desktop/Recorded results/") >> >> Also, are you sure you have various access permissions to the Wednesday >> folder? >> >> Michael >> >> On Wed, Mar 21, 2012 at 12:25 PM, R. Michael Weylandt >> wrote: >>> If you are on a mac, you don't need to escape spaces within setwd(): >>> e.g., on my machine, setwd("~/Desktop/Current Semester") works just >>> fine. >>> >>> Michael >>> >>> On Wed, Mar 21, 2012 at 12:08 PM, mela wrote: Hi, I need to write program to run some results from an experiment. i have the results saved in 2 different files, named "tuesday" and "wednesday" (both located within the same file). when i wrote the program for "tuesday" i had no problem running it, but when i changed the work directory to wednesday i got the "cannot change working directory" message. I don't understand what's different between the two. my first code was: setwd("/Users/user/Desktop/Recorded\ results/tuesday") and my second code is: setwd("/Users/user/Desktop/Recorded\ results/wednesday") and i copied the exact location from the terminal, so i can't have typos (i am using a mac, if that makes any difference). any suggestions? i will be grateful for any insight. thanks mela -- View this message in context: http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check results between two data.frame
On Wed, Mar 21, 2012 at 12:48 PM, HJ YAN wrote: > Thanks a lot Sarah, for your nice example and code. > > I know '==' can do the work. Just as a R beginer, sometimes really want to > be more like a 'real programmer', if you know what I mean... Being a "real programmer" means using one line of code instead of a dozen, when that one line is more elegant and understandable. Being a real programmer also means learning how to debug your code: a good first step is to set x and y to actual matrices and try each step in the console. (And don't give them default values that are strings if your function expects them to be matrices.) If you'd actually tried to run each line, you would quickly discover that as.matrix() and matrix() do not do the same thing, and you need the latter. > NROW [1] 5 > NCOL [1] 4 > CHECK_XY <- as.matrix(NA, NROW, NCOL) > CHECK_XY [,1] [1,] NA > > CHECK_XY <- matrix(NA, NROW, NCOL) > CHECK_XY [,1] [,2] [,3] [,4] [1,] NA NA NA NA [2,] NA NA NA NA [3,] NA NA NA NA [4,] NA NA NA NA [5,] NA NA NA NA > > If any R expert could help to check where my code went wrong here that would > be very grateful! Getting help from "any R expert" is far more likely if you copy your messages to the R-help list as well as just to me. I've taken the liberty of doing so with my reply. Sarah > Many thanks! > HJ > On Wed, Mar 21, 2012 at 4:13 PM, Sarah Goslee > wrote: >> >> As long as == is an appropriate test for your data, why not just use >> R's innate ability to handle matrices/data frames? >> >> > x1 <- matrix(1:20, ncol=4) >> > x2 <- ifelse(x1 > 18, 22, x1) >> > x1 >> [,1] [,2] [,3] [,4] >> [1,] 1 6 11 16 >> [2,] 2 7 12 17 >> [3,] 3 8 13 18 >> [4,] 4 9 14 19 >> [5,] 5 10 15 20 >> > x2 >> [,1] [,2] [,3] [,4] >> [1,] 1 6 11 16 >> [2,] 2 7 12 17 >> [3,] 3 8 13 18 >> [4,] 4 9 14 22 >> [5,] 5 10 15 22 >> > x1 == x2 >> [,1] [,2] [,3] [,4] >> [1,] TRUE TRUE TRUE TRUE >> [2,] TRUE TRUE TRUE TRUE >> [3,] TRUE TRUE TRUE TRUE >> [4,] TRUE TRUE TRUE FALSE >> [5,] TRUE TRUE TRUE FALSE >> >> >> On Wed, Mar 21, 2012 at 8:48 AM, HJ YAN wrote: >> > Dear R-user, >> > >> > I'm trying to compare two sets of results and wanted to find out which >> > element in the two data frame/matrix are different. >> > >> > I wrote the following function and it works ok, and gives me a long list >> > of >> > "good" as outcomes. >> > >> > >> > CHECK<- >> > function (x = "file1", y = "file2") >> > { >> > for (i in 1:nrow(x)) { >> > for (j in 1:ncol(x)) { >> > if (x[i, j] == y[i, j]) { >> > print("good") >> > } >> > else { >> > print("check") >> > } >> > } >> > } >> > } >> > >> > >> > However, as the two datasets I was comparing are large (400*100 >> > roughly), >> > so I would like to create a matrix to identify which ones are not same >> > in >> > the two dataframes. >> > >> > So I added 'CHECK_XY' in my code but when I run it, I got 'Error in >> > CHECK_XY[i, j] = c("good") : subscript out of bounds'. >> > >> > Could anyone help please?? >> > >> > CHECK_1<- >> > function (x = "file1", y = "file2") >> > { >> > NROW <- nrow(x) >> > NCOL <- ncol(x) >> > CHECK_XY <- as.matrix(NA, NROW, NCOL) >> > for (i in 1:nrow(x)) { >> > for (j in 1:ncol(x)) { >> > if (x[i, j] == y[i, j]) { >> > CHECK_XY[i, j] = c("good") >> > } >> > else { >> > CHECK_XY[i, j] = c("check") >> > } >> > } >> > } >> > print(CHECK_XY) >> > } >> > >> > Thanks! >> > HJ >> -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
You'll also want to use dput() to send us an exact encoding of your data when making that reproducible example: there might be something subtle at play here that print methods won't show. Michael On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn wrote: > On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano > wrote: >> Ista: >> >> Your attached code did work for me; moreover, the facets were presented in >> the desired order with facet_wrap() and facet_grid(), which is what I'm >> using because I have a second factor used in facet_grid(). >> >> Still, my plots with total.density as a facet are coming out in 16, 32, 8, >> and I'm not seeing why. Below is my plot code - >> >>> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = >>> per.remain)) + facet_grid(total.density ~ prop.ec) + >>> #add point and error bar data >>> theme_set(theme_bw()) + >>> geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = >>> per.remain + se), width = 3) + >>> #add predicted model data >>> geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',], >>> aes(x = x.values, y = predicted.values), colour = c('red')) + >>> geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',], >>> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = >>> c('dashed')) + >>> >>> xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major = >>> theme_blank(), panel.grid.minor = theme_blank()) >> >> Is there anything odd about it that might be producing the odd ordering >> problem? FYI, avoiding subsetting ag.tab doesn't do the trick. > > I don't know. Please create a minimal example that isolates the > problem. You can start with > > levels(ag.tab$total.density) > > ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) + > facet_grid(total.density ~ prop.ec) + > geom_point() > > Best, > Ista > >> - >> Justin Montemarano >> Graduate Student >> Kent State University - Biological Sciences >> >> http://www.montegraphia.com >> >> >> On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: >>> >>> Hi Justin, >>> >>> this gives the correct order (8, 16, 32) on my machine: >>> >>> total.density <- >>> >>> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) >>> total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) >>> str(total.density) >>> >>> order(levels(total.density)) >>> >>> dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density))) >>> >>> ggplot(dat, aes(x = v1)) + >>> geom_density() + >>> facet_wrap(~td) >>> >>> Does it work for you? If yes, then you need to tell us what you're >>> doing that is different from this example. If no, please give use the >>> output of sessionInfo(). >>> >>> best, >>> Ista >>> >>> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano >>> wrote: >>> > I think I understand, but I believe my original interest is in the order >>> > of >>> > levels(total.density), since ggplot appears to be using that to order >>> > the >>> > facets. Thus, I'm still getting three graphs, ordered (and displayed >>> > as) >>> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry >>> > if >>> > I wasn't clear and/or I've missed your message. >>> > - >>> > Justin Montemarano >>> > Graduate Student >>> > Kent State University - Biological Sciences >>> > >>> > http://www.montegraphia.com >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > __ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing xlim to coord_map in ggplot2
Hi Zack, This works as expected on my machine: tank_trunc <- read.csv("http://r.789695.n4.nabble.com/file/n4492267/upton_tank_trunc_nabble.csv";) michigan_map.df <- map_data('county', 'michigan') ggplot() + geom_point(aes(lon, lat), data = tank_trunc, na.rm = T) + geom_path(aes(long, lat, group = group), data = michigan_map.df) + coord_map('gilbert', xlim = c(-88, -82)) > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mapproj_1.1-8.3 maps_2.2-5 ggplot2_0.9.0 loaded via a namespace (and not attached): [1] MASS_7.3-17RColorBrewer_1.0-5 colorspace_1.1-1 dichromat_1.2-4 [5] digest_0.5.2 grid_2.14.2memoise_0.1munsell_0.3 [9] plyr_1.7.1 proto_0.3-9.2 reshape2_1.2.1 scales_0.2.0 [13] stringr_0.6tools_2.14.2 Maybe you have some out-of-date packages? Best, Ista On Wed, Mar 21, 2012 at 9:28 AM, z2.0 wrote: > The first block of code should be reproducible. > > For the second block, you need only a data.frame. I've included a few rows > from the one I'm working with. > > Two required libraries: maps, ggplot2. > http://r.789695.n4.nabble.com/file/n4492267/upton_tank_trunc_nabble.csv > upton_tank_trunc_nabble.csv > > -- > View this message in context: > http://r.789695.n4.nabble.com/passing-xlim-to-coord-map-in-ggplot2-tp4490005p4492267.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forloop/ifelse program problem and list of dataframes
Hi Ian, I haven't read in details because you don't provide a reproducible example (see ?dput) but you might want to take a look at the aggregate() and doBy::summaryBy() functions. HTH, Ivan -- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282 Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE +33(0)3.80.39.63.06 ivan.calan...@u-bourgogne.fr http://biogeosciences.u-bourgogne.fr/calandra Le 21/03/12 16:44, Ian Craig a écrit : Hello R Community, I don't post to these things often so excuse me if I stumble on my forum etiquette. This is a complex problem for me, which may require two forum entries, but I will try my best to be concise. Also, I am a self taught coder, so if my code is not to convention, your constructive criticism is always welcome. I need to split up a data frame by participant (gpsarc - factor), status (testpo - factor), and by date (LocalDate), then sum the distance (Dist_Bef_m - numeric) records for each date and average them across each status state for each participant. Each participant has several records for each date and for at least 1 of 3 different possible status types (max 3). In the end, I want a table with participant number and the status state as my column headings with means for each status state under their appropriate heading (see example below). I am confident I made this way more complicated than it needs to be. I really appreciate any help you can offer. Here is my relevant coding so far: s1<- split(data[,c(4,10,20,42)], data$gpsarc) for(i in 1:length(s1)) s1[[i]]<- split(s1[[i]],s1[[i]]$testpo) s2<- vector("list", length(s1)) for(i in 1:length(s2)) s2[[i]]<- ldply(s1[[i]], function(x) { if(nrow(x) == 0) if one status state does not exist, but still accounted in the split sublist because its a factor, I would get an error, so I added this If/Else portion to remove those entries with no records { remove(x) } else { by(x, x[["LocalDate"]], function(x1) { sum(x1[["Dist_Bef_m"]]) }) } }) s3<- vector("list", length(s2)) for(i in 1:length(s3)) s3[[i]]<- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE), row.names = as.character(s2[[i]][,1])) here is a sample of the s3 result: [[1]] mean 2 12533.2 [[2]] mean 2 26300.96 3 25313.93 [[3]] mean 1 48489.15 3 27398.23 [[4]] mean 1 34783.97 [[5]] mean 1 21293.19 2 21962.41 3 18272.67 # I really want it to look like this: ppt 1 2 3 1 NA 12533.2 NA 2 NA 26300.96 25313.93 3 48489.15NA 27398.23 4 34783.97NA NA 5 21293.1921962.41 18272.67 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "cannot change working directory"
If you are on a mac, you don't need to escape spaces within setwd(): e.g., on my machine, setwd("~/Desktop/Current Semester") works just fine. Michael On Wed, Mar 21, 2012 at 12:08 PM, mela wrote: > Hi, > > I need to write program to run some results from an experiment. i have the > results saved in 2 different files, named "tuesday" and "wednesday" (both > located within the same file). when i wrote the program for "tuesday" i had > no problem running it, but when i changed the work directory to wednesday i > got the "cannot change working directory" message. I don't understand what's > different between the two. > > my first code was: > > setwd("/Users/user/Desktop/Recorded\ results/tuesday") > > and my second code is: > > setwd("/Users/user/Desktop/Recorded\ results/wednesday") > > and i copied the exact location from the terminal, so i can't have typos (i > am using a mac, if that makes any difference). > > any suggestions? i will be grateful for any insight. > > thanks > > mela > > > -- > View this message in context: > http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano wrote: > Ista: > > Your attached code did work for me; moreover, the facets were presented in > the desired order with facet_wrap() and facet_grid(), which is what I'm > using because I have a second factor used in facet_grid(). > > Still, my plots with total.density as a facet are coming out in 16, 32, 8, > and I'm not seeing why. Below is my plot code - > >> ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = >> per.remain)) + facet_grid(total.density ~ prop.ec) + >> #add point and error bar data >> theme_set(theme_bw()) + >> geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = >> per.remain + se), width = 3) + >> #add predicted model data >> geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',], >> aes(x = x.values, y = predicted.values), colour = c('red')) + >> geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',], >> aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = >> c('dashed')) + >> >> xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major = >> theme_blank(), panel.grid.minor = theme_blank()) > > Is there anything odd about it that might be producing the odd ordering > problem? FYI, avoiding subsetting ag.tab doesn't do the trick. I don't know. Please create a minimal example that isolates the problem. You can start with levels(ag.tab$total.density) ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) + facet_grid(total.density ~ prop.ec) + geom_point() Best, Ista > - > Justin Montemarano > Graduate Student > Kent State University - Biological Sciences > > http://www.montegraphia.com > > > On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: >> >> Hi Justin, >> >> this gives the correct order (8, 16, 32) on my machine: >> >> total.density <- >> >> c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) >> total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) >> str(total.density) >> >> order(levels(total.density)) >> >> dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density))) >> >> ggplot(dat, aes(x = v1)) + >> geom_density() + >> facet_wrap(~td) >> >> Does it work for you? If yes, then you need to tell us what you're >> doing that is different from this example. If no, please give use the >> output of sessionInfo(). >> >> best, >> Ista >> >> On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano >> wrote: >> > I think I understand, but I believe my original interest is in the order >> > of >> > levels(total.density), since ggplot appears to be using that to order >> > the >> > facets. Thus, I'm still getting three graphs, ordered (and displayed >> > as) >> > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry >> > if >> > I wasn't clear and/or I've missed your message. >> > - >> > Justin Montemarano >> > Graduate Student >> > Kent State University - Biological Sciences >> > >> > http://www.montegraphia.com >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list of matrices
You probably want mylist[[i]] = function(...) in your loop. Similarly, when you want them again later, you need to use double square brackets. I once heard a useful metaphor for understanding the difference between [ and [[ when it comes to lists. If x (a list) is a train, then x[2] is the second car of that train, while x[[2]] are the contents of that second car. That's why things like, x[1:3] are well defined (the sublist containing elements 1 through 3) while x[[1:3]] are not. Hope this helps, Michael On Wed, Mar 21, 2012 at 11:08 AM, David Zastrau wrote: > Hello dear R-users, > > I am currently trying to fill some datasatructure (array, list, hash...) > with matrices that are calculated by a function and do vary in size: > > mylist = list() > for(i in 1:n) > mylist[i] = function(...) # returns a matrix > > print(mylist[1]) # prints only the first element of the matrix > > > How can I store and afterwards access the matrices that are calculated by my > function? > > Any help would be greatly appreciated! > > Best Wishes > David > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "cannot change working directory"
Scratch that -- while true, it doesn't seem to cause that error message. Can you do this? list.dirs("/Users/user/Desktop/Recorded results/") Also, are you sure you have various access permissions to the Wednesday folder? Michael On Wed, Mar 21, 2012 at 12:25 PM, R. Michael Weylandt wrote: > If you are on a mac, you don't need to escape spaces within setwd(): > e.g., on my machine, setwd("~/Desktop/Current Semester") works just > fine. > > Michael > > On Wed, Mar 21, 2012 at 12:08 PM, mela wrote: >> Hi, >> >> I need to write program to run some results from an experiment. i have the >> results saved in 2 different files, named "tuesday" and "wednesday" (both >> located within the same file). when i wrote the program for "tuesday" i had >> no problem running it, but when i changed the work directory to wednesday i >> got the "cannot change working directory" message. I don't understand what's >> different between the two. >> >> my first code was: >> >> setwd("/Users/user/Desktop/Recorded\ results/tuesday") >> >> and my second code is: >> >> setwd("/Users/user/Desktop/Recorded\ results/wednesday") >> >> and i copied the exact location from the terminal, so i can't have typos (i >> am using a mac, if that makes any difference). >> >> any suggestions? i will be grateful for any insight. >> >> thanks >> >> mela >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list of matrices
Use mylist[[i]] = function(...) mylist[i] and mylist[[i]] are two different things. On Wed, Mar 21, 2012 at 11:08 AM, David Zastrau wrote: > Hello dear R-users, > > I am currently trying to fill some datasatructure (array, list, hash...) > with matrices that are calculated by a function and do vary in size: > > mylist = list() > for(i in 1:n) > mylist[i] = function(...) # returns a matrix > > print(mylist[1]) # prints only the first element of the matrix > > > How can I store and afterwards access the matrices that are calculated by my > function? > > Any help would be greatly appreciated! > > Best Wishes > David > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm: getting the confidence interval for an Odds Ratio, when using predict()
Hi again, As a follow-up question... If I need to calculate a risk difference with a CI, I imagine the solution would be similar? Here's how I would get the point estimate: x1 <- factor(rbinom(100,1,.5),levels=c(0,1)) x2 <- factor(round(runif(100,1,2)),levels=c(1,2),labels=c("cat1","cat2")) outcome <- rbinom(100,1,.2) model <- glm(outcome~x1+x2,family=binomial(logit)) newd <- data.frame(x1=factor(c(0,0),levels=c(0,1)), x2=factor(c(1,2),labels=c("cat1","cat2")), outcome=c(0,0)) library(boot) risks <- inv.logit(predict(model,newd)) risk.diff <- risks[2] - risks[1] Many thanks, Dominic C. 2012/3/20 Dominic Comtois > Case solved. Thanks a lot Peter! > > Dominic C. > > > -Message d'origine- > De : peter dalgaard [mailto:pda...@gmail.com] > Envoyé : 20 mars 2012 07:57 > À : Dominic Comtois > Cc : r-help@r-project.org help > Objet : Re: [R] glm: getting the confidence interval for an Odds Ratio, > when > using predict() > > [Oops, forgot cc. to list] > > On Mar 20, 2012, at 04:40 , Dominic Comtois wrote: > > > I apologize for the errors in the previous code. Here is a reworked > example. It works, but I suspect problems in the se calculation. I changed, > from the 1st prediction to the 2nd only one covariate, so that the OR's CI > should be equal to the exponentiated variable's coefficient and ci. And we > get something different: > > Yep. Classical rookie mistake: Forgot to take sqrt() in the se. I then get > > > se <- sqrt(contr %*% V %*% t(contr)) > > > > # display the CI > > exp(contr %*% coef(model) + qnorm(c(.025,.50,.975))*se) > [1] 0.655531 1.686115 4.336918 > > > > # the point estimate is ok, as verified with > > exp(model$coefficients[3]) > x2cat2 > 1.686115 > > > > # however I we'd expect to find upper and lower bound equal # to the > > exponentiated x2cat coefficient CI exp(confint(model))[3,] > Waiting for profiling to be done... > 2.5 %97.5 % > 0.6589485 4.4331058 > > which is as close as you can expect since the confint method is a bit more > advanced than +/-2SE. > > -pd > > > > x1 <- factor(rbinom(100,1,.5),levels=c(0,1)) > > x2 <- > > factor(round(runif(100,1,2)),levels=c(1,2),labels=c("cat1","cat2")) > > outcome <- rbinom(100,1,.2) > > > > model <- glm(outcome~x1+x2,family=binomial(logit)) > > newd <- data.frame(x1=factor(c(0,0),levels=c(0,1)), > > x2=factor(c("cat1","cat2"),levels=c("cat1","cat2")), > > outcome=c(1,1)) > > > > M <- model.matrix(formula(model), data=newd) V <- vcov(model) contr <- > > c(-1,1) %*% M se <- sqrt(contr %*% V %*% t(contr)) > > > > # display the CI > > exp(contr %*% coef(model) + qnorm(c(.025,.50,.975))*se) > > > > # the point estimate is ok, as verified with > > exp(model$coefficients[3]) > > > > # however I we'd expect to find upper and lower bound equal # to the > > exponentiated x2cat coefficient CI exp(confint(model))[3,] > > > > Many thanks, > > > > Dominic C. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 > Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "cannot change working directory"
Hi, I need to write program to run some results from an experiment. i have the results saved in 2 different files, named "tuesday" and "wednesday" (both located within the same file). when i wrote the program for "tuesday" i had no problem running it, but when i changed the work directory to wednesday i got the "cannot change working directory" message. I don't understand what's different between the two. my first code was: setwd("/Users/user/Desktop/Recorded\ results/tuesday") and my second code is: setwd("/Users/user/Desktop/Recorded\ results/wednesday") and i copied the exact location from the terminal, so i can't have typos (i am using a mac, if that makes any difference). any suggestions? i will be grateful for any insight. thanks mela -- View this message in context: http://r.789695.n4.nabble.com/cannot-change-working-directory-tp4492812p4492812.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing xlim to coord_map in ggplot2
The first block of code should be reproducible. For the second block, you need only a data.frame. I've included a few rows from the one I'm working with. Two required libraries: maps, ggplot2. http://r.789695.n4.nabble.com/file/n4492267/upton_tank_trunc_nabble.csv upton_tank_trunc_nabble.csv -- View this message in context: http://r.789695.n4.nabble.com/passing-xlim-to-coord-map-in-ggplot2-tp4490005p4492267.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check results between two data.frame
As long as == is an appropriate test for your data, why not just use R's innate ability to handle matrices/data frames? > x1 <- matrix(1:20, ncol=4) > x2 <- ifelse(x1 > 18, 22, x1) > x1 [,1] [,2] [,3] [,4] [1,]16 11 16 [2,]27 12 17 [3,]38 13 18 [4,]49 14 19 [5,]5 10 15 20 > x2 [,1] [,2] [,3] [,4] [1,]16 11 16 [2,]27 12 17 [3,]38 13 18 [4,]49 14 22 [5,]5 10 15 22 > x1 == x2 [,1] [,2] [,3] [,4] [1,] TRUE TRUE TRUE TRUE [2,] TRUE TRUE TRUE TRUE [3,] TRUE TRUE TRUE TRUE [4,] TRUE TRUE TRUE FALSE [5,] TRUE TRUE TRUE FALSE On Wed, Mar 21, 2012 at 8:48 AM, HJ YAN wrote: > Dear R-user, > > I'm trying to compare two sets of results and wanted to find out which > element in the two data frame/matrix are different. > > I wrote the following function and it works ok, and gives me a long list of > "good" as outcomes. > > > CHECK<- > function (x = "file1", y = "file2") > { > for (i in 1:nrow(x)) { > for (j in 1:ncol(x)) { > if (x[i, j] == y[i, j]) { > print("good") > } > else { > print("check") > } > } > } > } > > > However, as the two datasets I was comparing are large (400*100 roughly), > so I would like to create a matrix to identify which ones are not same in > the two dataframes. > > So I added 'CHECK_XY' in my code but when I run it, I got 'Error in > CHECK_XY[i, j] = c("good") : subscript out of bounds'. > > Could anyone help please?? > > CHECK_1<- > function (x = "file1", y = "file2") > { > NROW <- nrow(x) > NCOL <- ncol(x) > CHECK_XY <- as.matrix(NA, NROW, NCOL) > for (i in 1:nrow(x)) { > for (j in 1:ncol(x)) { > if (x[i, j] == y[i, j]) { > CHECK_XY[i, j] = c("good") > } > else { > CHECK_XY[i, j] = c("check") > } > } > } > print(CHECK_XY) > } > > Thanks! > HJ -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] list of matrices
Hello dear R-users, I am currently trying to fill some datasatructure (array, list, hash...) with matrices that are calculated by a function and do vary in size: mylist = list() for(i in 1:n) mylist[i] = function(...) # returns a matrix print(mylist[1]) # prints only the first element of the matrix How can I store and afterwards access the matrices that are calculated by my function? Any help would be greatly appreciated! Best Wishes David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred?
Dear Ellison, Many thanks for your reply. The information you typed is clear and now I know what to do. Your suggestion about finding some coffee while running simulation is so good =) Regards - Best regards Ufuk -- View this message in context: http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-tp4490722p4492436.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to decide best order for ma(x, order, centre=TRUE) in time series
when fitting an autoregressive time series model to the data i.e. in function ma(x, order, centre=TRUE) i attached file (time series data) http://r.789695.n4.nabble.com/file/n4491858/tinku.txt tinku.txt how to decide best order for a good time series model which would be valuable for arma() process -- View this message in context: http://r.789695.n4.nabble.com/how-to-decide-best-order-for-ma-x-order-centre-TRUE-in-time-series-tp4491858p4491858.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multinomial Logit data arrangement
Dear all, I am having a hard time attempting to do a multinomial logit modeling in R. I am trying to analyze a dataset whereby there are 3 scenarios with 4 difference choice parameters. In particular – I am having a hard time arranging this into a .csv format. What sorts of headings should I put in the column headers to reflect the variables and parameters for the total number of choice made by the respondents? An example can be seen below: Situation 1 Situation 2 Situation 3 Parameter 1 2818 8 Parameter 2 30200 Parameter 3 3217 3 Parameter 4 3112 2 Further more, I would like to attempt to analyses if there are any relationships with age, education etc.. How can i arrange it in a .csv format for R to perform the function? I know it’s seems a trivial problem but I just cant seem to get my had around it and its the first step to start the analysis. Any guidance would be really helpful! Thanks. oh yes - I am an MSc student in aquaculture investigating land use values in mangrove environments. Sakchai McDonough -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-data-arrangement-tp4492173p4492173.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rJava / RCMD javareconf fails
solved grrr I had to set JAVA_HOME to point to the jdk directory not the parent. This had to be done as root not user otherwise the rJava is fails. At the end of the day its installed. -- View this message in context: http://r.789695.n4.nabble.com/rJava-RCMD-javareconf-fails-tp4488961p4492094.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova.lm F test confusion
What if the model isn't nested, i.e. I want to test y=x+w vs y=x+z+v. Is there a valid test/method to compare these (other than comparing R squared values)? They are both multiple regression models. -- View this message in context: http://r.789695.n4.nabble.com/anova-lm-F-test-confusion-tp4490211p4492402.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Check results between two data.frame
Dear R-user, I'm trying to compare two sets of results and wanted to find out which element in the two data frame/matrix are different. I wrote the following function and it works ok, and gives me a long list of "good" as outcomes. CHECK<- function (x = "file1", y = "file2") { for (i in 1:nrow(x)) { for (j in 1:ncol(x)) { if (x[i, j] == y[i, j]) { print("good") } else { print("check") } } } } However, as the two datasets I was comparing are large (400*100 roughly), so I would like to create a matrix to identify which ones are not same in the two dataframes. So I added 'CHECK_XY' in my code but when I run it, I got 'Error in CHECK_XY[i, j] = c("good") : subscript out of bounds'. Could anyone help please?? CHECK_1<- function (x = "file1", y = "file2") { NROW <- nrow(x) NCOL <- ncol(x) CHECK_XY <- as.matrix(NA, NROW, NCOL) for (i in 1:nrow(x)) { for (j in 1:ncol(x)) { if (x[i, j] == y[i, j]) { CHECK_XY[i, j] = c("good") } else { CHECK_XY[i, j] = c("check") } } } print(CHECK_XY) } Thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not colour but symbols
Hi Thank you Bert and Thomas for your help, I did what I wanted with this code. >test<-c(4,8,9,6,7) >barplot(test,density =20,angle=45) But I want to cross the lines in each bar. Please, how to do it? Thank you in advance. -- View this message in context: http://r.789695.n4.nabble.com/Not-colour-but-symbols-tp4490030p4491785.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multinomial Logit data arrangement
Dear all, I am having a hard time attempting to do a multinomial logit modeling in R. I am trying to analyze a dataset whereby there are 3 scenarios with 4 difference choice parameters. In particular – I am having a hard time arranging this into a .csv format. What sorts of headings should I put in the column headers to reflect the variables and parameters for the total number of choice made by the respondents? An example can be seen below: Situation 1 Situation 2 Situation 3 Parameter 1 2818 8 Parameter 2 30200 Parameter 3 3217 3 Parameter 4 3112 2 Further more, I would like to attempt to analyses if there are any relationships with age, education etc.. How can i arrange it in a .csv format for R to perform the function? I know it’s seems a trivial problem but I just cant seem to get my had around it and its the first step to start the analysis. Any guidance would be really helpful! Thanks. oh yes - I am an MSc student in aquaculture investigating land use values in mangrove environments. Sakchai McDonough -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-data-arrangement-tp4492175p4492175.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlme error on dimensions in multiplication
Hello R users, When trying to fit a nonlinear mixed model to a respiration time series, I get the following error message: Error in recalc.varFunc(object[[i]], conLin) : dims [product 30] do not match the length of object [34] In addition: Warning message: In conLin$Xy * varWeights(object) : longer object length is not a multiple of shorter object length Below is an example that generates this message on Windows XP R 2.14.0 nlme 3.1-103 --- library(nlme) #load(file="tmp/exampleNmleError.RData")#ds1, fm1, ctrl con <- url("http://www.bgc-jena.mpg.de/~twutz/tmp/exampleNmleError.RData";) load(file=con) #ds1, fm1, ctrl close(con) ds1 <- subset(ds1, select=c("ColNoM","minT","resp")) head(ds1) plot( resp ~ minT, col=rainbow(length(unique(ColNoM)))[ColNoM], ds1) lines( fitted.values(fm1) ~ minT, ds1 ) # starting values from glns fit ds2 <- ds1 # ds2 <- unique(ds1)# does not resolve the error tmp.fit1 <- nlme( resp ~ exp(beta0l) + beta1 * exp(-exp(beta2l)*minT) ,fixed=list(beta0l+beta1+beta2l~1) ,random= beta0l ~ 1 | ColNoM , weights =varPower(fixed=0.4, form=~resp) , start=list(fixed = coef(fm1)) #, method='REML' ,data=ds2 #,control=ctrl ) --- The exp function in the example is used to constrain estimates of the coefficients to positive values, when actually estimating log of the coefficients with nlme. When I try to debug, there find a mismatch between the dims of weights and the dims of Matrix conLin$Xy generated by function MEdecomp, which reduced the original 34 rows to just 6 rows. I tried to remove double entries in my example, but this did not help. How can I make proper use of nlme? Yours, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] No its not working, how to calculate best p order value for arma process
> ts<-ts(bhavar$V1) > decompose(ts) Error in decompose(ts) : time series has no or less than 2 periods > HoltWinters(ts) Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods is it true that order of ar(ts) function is p-value used for arma(p,q) calculation > ar(ts) Call: ar(x = ts) Coefficients: 12345678 9 10 11 -0.0979 -0.0380 -0.1823 -0.1096 0.1806 0.0414 -0.1439 0.1093 0.0348 -0.1580 0.2900 Order selected 11 sigma^2 estimated as 0.4339 -- View this message in context: http://r.789695.n4.nabble.com/how-calculate-seasonal-component-cyclic-component-of-time-series-tp4491470p4491673.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Type II and III sum of squares (R and SPSS)
To whom it may concern I made some analysis with R using the command Anova. However, I found some problmes with the output obtained by selecting type II o type III sum of squares. Briefly, I have to do a 2x3 mixed model anova, wherein the first factor is a between factor and the second factor is a within factor. I use the command Anova in the list below, because I want to obtain also the sum of squares of the linear and quadratic contrast between the levels of the within factor. Below I report the list of commands used in R ("fattA" is the beteween factor and "fB" is the within factor): > a1b1<-c(10,9,8,7) > a1b2<-c(7,6,4,5) > a1b3<-c(3,2,3,4) > a2b1<-c(9,9,8,7) > a2b2<-c(8,7,9,7) > a2b3<-c(7,8,8,6) > > M3<-matrix(0,8,4) > M3[,1]<-cbind(a1b1,a2b1) > M3[,2]<-cbind(a1b2,a2b2) > M3[,3]<-cbind(a1b3,a2b3) > M3[,4]<-rep(c(1,2),each=4) > > colnames(M3)<-c("b1","b2","b3","fattA") > > M3<-as.data.frame(M3) > > require(car) Loading required package: car Loading required package: MASS Loading required package: nnet > f5<-lm(cbind(b1,b2,b3)~fattA,data=M3) > a5<-Anova(f5) > f6<-lm(c(b1,b2,b3)~rep(fattA,3),data=M3) > > > fB<-factor(c(1:3)) > d2<-data.frame(fB) > a6<-Anova(f5,idata=d2,idesign=~fB,type=2) > summary(a6,multivariate=F) Univariate Type II Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df FPr(>F) (Intercept) 1080.04 1 9.5833 6 676.200 2.133e-07 *** fattA 26.04 1 9.5833 6 16.304 0.006819 ** fB42.33 2 10.6667 12 23.812 6.645e-05 *** fattA:fB 20.33 2 10.6667 12 11.438 0.001660 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Mauchly Tests for Sphericity Test statistic p-value fB 0.87891 0.7242 fattA:fB0.87891 0.7242 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) fB 0.89199 0.0001474 *** fattA:fB 0.89199 0.0026452 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 HF eps Pr(>F[HF]) fB 1.2438 6.645e-05 *** fattA:fB 1.24380.00166 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning message: In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1 I repated the anlysis by setting type III sum of squares and I obtained: > a6<-Anova(f5,idata=d2,idesign=~fB,type=3) > summary(a6,multivariate=F) Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 30.817 1 9.5833 6 19.294 0.004606 ** fattA 26.042 1 9.5833 6 16.304 0.006819 ** fB 40.133 2 10.6667 12 22.575 8.57e-05 *** fattA:fB20.333 2 10.6667 12 11.438 0.001660 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Mauchly Tests for Sphericity Test statistic p-value fB 0.87891 0.7242 fattA:fB0.87891 0.7242 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) fB 0.89199 0.0001851 *** fattA:fB 0.89199 0.0026452 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 HF eps Pr(>F[HF]) fB 1.2438 8.57e-05 *** fattA:fB 1.24380.00166 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning message: In summary.Anova.mlm(a6, multivariate = F) : HF eps > 1 treated as 1 As you can see, the sum of squares of the within factor "fB" and of the intercept obtained by setting type II sum of squares are dofferent form those obtained by setting type III sum of squares. I repeated the analysis by using SPPS (type II and III) and i obtained the same sum of squares for both types., which I report below: within factor and interaction source Sum of squares (type II and III) fB42. fB * fattA 20. Error(fattB) 10.6667 between factor SourceSum of squares (type II and III) intercept1080.041667 fattA26.0417 Error 9.58333 The most strange thing, for me, is not only that R gives different outputs both for type II and III sum of squares, but that the output obtained with type II sum of squares in R coincides with the output obtained with type III of SPSS. As I remember, with balanced design, type II and III sum of squares should give the same output. Is there anybody that can help me about this point? thank you for your kind attention. Marco Tommasi, Ph/D. Department of Neuroscience and Imaging "G. D'Annunzio" University of Chieti-Pescara Via dei Vestini 31 66100 Chieti ITALY e-mail: marco.tomm...@unich.it tel.: +39 0871 3555890 / fax: + 39 0871 3554163 Web site: www.dni.unich.it [[alternative HTML version
[R] Solving ODE via deSolve and optimizing best parameter values.
Hello Everyone, I have just started with deSolve package, I have a set of ODE equation with few parameters, for example dy/dx = a1*Y*(1-Y/Ymax) - a2*(Y+0.001) and dz/dt = a3*Y- a4*Z; I have the initial values of Y, Z and a1,a2,a3 are my parameters, Now my main objective is to find the solve the equation and also to find the best optimized values for boht Y & Z and also other parameters like a1,a2,a3. I am using deSolve package and i have solved the equation and now I am stuck with the finding the best values for my parameters. Can anyone please help me out in this ? you can also write to me at bioinfo.himan...@gmail.com Thanks in Advance Himanshu -- View this message in context: http://r.789695.n4.nabble.com/Solving-ODE-via-deSolve-and-optimizing-best-parameter-values-tp4492505p4492505.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Ista: Your attached code did work for me; moreover, the facets were presented in the desired order with facet_wrap() and facet_grid(), which is what I'm using because I have a second factor used in facet_grid(). Still, my plots with total.density as a facet are coming out in 16, 32, 8, and I'm not seeing why. Below is my plot code - ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain)) > + facet_grid(total.density ~ prop.ec) + > #add point and error bar data > theme_set(theme_bw()) + > geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax = > per.remain + se), width = 3) + > #add predicted model data > geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',], > aes(x = x.values, y = predicted.values), colour = c('red')) + > geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',], > aes(x = x.values, y = predicted.values), colour = c('blue'), linetype = > c('dashed')) + > > xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major = > theme_blank(), panel.grid.minor = theme_blank()) Is there anything odd about it that might be producing the odd ordering problem? FYI, avoiding subsetting ag.tab doesn't do the trick. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn wrote: > Hi Justin, > > this gives the correct order (8, 16, 32) on my machine: > > total.density <- > > c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) > total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) > str(total.density) > > order(levels(total.density)) > > dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density))) > > ggplot(dat, aes(x = v1)) + > geom_density() + > facet_wrap(~td) > > Does it work for you? If yes, then you need to tell us what you're > doing that is different from this example. If no, please give use the > output of sessionInfo(). > > best, > Ista > > On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano > wrote: > > I think I understand, but I believe my original interest is in the order > of > > levels(total.density), since ggplot appears to be using that to order the > > facets. Thus, I'm still getting three graphs, ordered (and displayed as) > > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry > if > > I wasn't clear and/or I've missed your message. > > - > > Justin Montemarano > > Graduate Student > > Kent State University - Biological Sciences > > > > http://www.montegraphia.com > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forloop/ifelse program problem and list of dataframes
Hello R Community, I don't post to these things often so excuse me if I stumble on my forum etiquette. This is a complex problem for me, which may require two forum entries, but I will try my best to be concise. Also, I am a self taught coder, so if my code is not to convention, your constructive criticism is always welcome. I need to split up a data frame by participant (gpsarc - factor), status (testpo - factor), and by date (LocalDate), then sum the distance (Dist_Bef_m - numeric) records for each date and average them across each status state for each participant. Each participant has several records for each date and for at least 1 of 3 different possible status types (max 3). In the end, I want a table with participant number and the status state as my column headings with means for each status state under their appropriate heading (see example below). I am confident I made this way more complicated than it needs to be. I really appreciate any help you can offer. Here is my relevant coding so far: s1 <- split(data[,c(4,10,20,42)], data$gpsarc) for(i in 1:length(s1)) s1[[i]] <- split(s1[[i]],s1[[i]]$testpo) s2 <- vector("list", length(s1)) for(i in 1:length(s2)) s2[[i]] <- ldply(s1[[i]], function(x) { if(nrow(x) == 0) if one status state does not exist, but still accounted in the split sublist because its a factor, I would get an error, so I added this If/Else portion to remove those entries with no records { remove(x) } else { by(x, x[["LocalDate"]], function(x1) { sum(x1[["Dist_Bef_m"]]) }) } }) s3 <- vector("list", length(s2)) for(i in 1:length(s3)) s3[[i]] <- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE), row.names = as.character(s2[[i]][,1])) here is a sample of the s3 result: [[1]] mean 2 12533.2 [[2]] mean 2 26300.96 3 25313.93 [[3]] mean 1 48489.15 3 27398.23 [[4]] mean 1 34783.97 [[5]] mean 1 21293.19 2 21962.41 3 18272.67 # I really want it to look like this: ppt 1 2 3 1 NA 12533.2 NA 2 NA 26300.96 25313.93 3 48489.15NA 27398.23 4 34783.97NA NA 5 21293.1921962.41 18272.67 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Hi Justin, this gives the correct order (8, 16, 32) on my machine: total.density <- c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) str(total.density) order(levels(total.density)) dat <- data.frame(td = total.density, v1 = rnorm(1:length(total.density))) ggplot(dat, aes(x = v1)) + geom_density() + facet_wrap(~td) Does it work for you? If yes, then you need to tell us what you're doing that is different from this example. If no, please give use the output of sessionInfo(). best, Ista On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano wrote: > I think I understand, but I believe my original interest is in the order of > levels(total.density), since ggplot appears to be using that to order the > facets. Thus, I'm still getting three graphs, ordered (and displayed as) > 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry if > I wasn't clear and/or I've missed your message. > - > Justin Montemarano > Graduate Student > Kent State University - Biological Sciences > > http://www.montegraphia.com > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred?
On 21-Mar-2012 S Ellison wrote: >> I get the errors; >> glm.fit: fitted probabilities numerically 0 or 1 occurred and >> glm.fit: algorithm did not converge >> . >> Is there any way to to >> fix this problem? > > There are two separate issues. > One is the appearance of fitted values at 0 or 1. > The other is the lack of convergence. > > The first is usually not fatal; it means that the probabilities > are so close to 0 or 1 that a double precision value can't > distinguish them from 0 or 1. > Often that's a transient condition during iteration and the > final fitted values are inside (0,1), but final fitted values > can also be out there if you have continuous predictor values > a long way out; by itself, that usually won't stop a glm. > > The second is a bit more problematic. Sometimes it's just that > you need to increase the maximum number of iterations (see the > control= argument and ?glm.control). That is always worth a try > - use some absurdly high number like 1000 instead of the default > 25 and go find some coffee while it thinks about it. If that > solves your problem then you're OK, or at least as OK as you can > be with a data set that hard to fit. > But if you're bootstrapping with some anomalous values it is > also possible that some of your bootstrapped sets have too high > a proportion of anomalies, and under those conditions it's > possible that there could be no sensible optimum within reach. > > One way of dealing with that in a boostrap or other simulation > context is to check the 'converged' value and if it's FALSE, > return an NA for your statistic. But of course that is a form > of censoring; if you have a high proportion of such instances > you'd be on very thin ice drawing conclusions. > > S Ellison There is a further general issue. The occurence of glm.fit: fitted probabilities numerically 0 or 1 occurred and glm.fit: algorithm did not converge is usually a symptom of what is called "linear separation". The simplest instance of this is when there is a single predictor variable X whose values are, say, ordered as X[1] < X[2] < ... < X[n-1] < X[n]. If it should happen that the values Y[1] ... Y[n] of the 0/1 response variable Y are all equal to 0 for X[1],...,X[k] and all equal to 1 for X[k+1],...,X[n], and you are fitting a glm(Y~X,family=binomial) (which is what you use for logistic regression), then the formula being fitted is log(P(Y=1|X)/(Y=0|X)) = (X - m)/s for some values of m and s. In the circumstances described, for any value of m between X[k] < m < X[k+1] the fit can force P[Y=1|X] --> 0 for X = X[1],...,X[k] and can force P[Y=1|X] --> 1 for X = X[k+1],...,X[n] by pushing s --> 0 (so then the first set of (X - m)/s --> -Inf and the second set of (X - m)/s --> +Inf), thus forcing the probabilities predicted by the model towards agreeing exactly with the frequencies observed in the data. In other words, any value of m satisfying X[k] < m < X[k+1] *separates* the Y=0 responses from the Y=1 responses: all the Y=0 responses are for X-values less than m, and all the Y=1 responses are for values of X greater than m. Since such a value of m is not unique (any value between X[k] and X[k+1] will do), failure of convergence will be observed. This can certainly arise from excessively "outlying" data, i.e. the difference X[k+1] - X[k] is much greater than the true value of the logistic scale parameter s, so that it is almost certain that all responses for X <= X[k] will be 0, and all responses for X >= X[k+1] will be 1. But it can also readily arise when there are only a few data, i.e. n is small; and also, even with many data, if the scale paramater is n ot large compared with the spacing between X values. If you run the following code a few times, you will see that "separation" occurs with about 30% probability. n <- 6 X <- (1:n) ; m <- (n+1)/2 ; s <- 2 count <- numeric(100) for(i in (1:100)) { Y <- 1*(runif(n) <= exp((X-m)/s)/(1+exp((X-m)/s))) count[i] <- (max(X[Y==0]) < min(X[Y==1])) } sum(count) Similarly, with n <- 10, it occurs about 20% of the time; for n <- 15 about 15%, and it never gets much below 15% no matter how large n is. The more general situation of "linear separation" is where you have many predictor variables, when there can be a high probability that the random outcomes Y=0 label one set of cases, the Y=1 label another set of cases, and the X-vectors for Y=0 can be separated from the X-vectors for Y=1 by a linear form, i.e. there is a vector of coefficients beta such that X[for any Y==0]%*%beta < X[for any Y==1]%*%beta This also can arise quite readily in practice. There is no definitive procedure for coping with this situation when it arises. What the fitting procedure is doing is just what it is supposed to do, namely predicting P(Y=1)=0 for every X such that the observed Y=0, and predicting P(Y=1)=1 for every X such that the observed Y=1 whenever it can (i.e. whenever there is separation). So indee
[R] Using extract function for dates in sqldf
I'm trying to use sqldf to query for the earliest date of a blood test when patients have had multiple tests in a given year. My query looks like this: test11 <- sqldf("select CHILD_ID, min(SAMP_DATE) from lab group by CHILD_ID having extract (year from SAMP_DATE) = 2011") SAMP_DATE has class "date." I get the error message Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near "from": syntax error) The problem seems to be in the second "from" where the "extract" function is called. Does this need a fix or am I doing something wrong? Thanks in advance and apologies if it turns out a simple error. -M.L. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
I think I understand, but I believe my original interest is in the order of levels(total.density), since ggplot appears to be using that to order the facets. Thus, I'm still getting three graphs, ordered (and displayed as) 16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32. I'm sorry if I wasn't clear and/or I've missed your message. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Ah, you're missing something crucial: > levels(total.density) [1] "8" "16" "32" is giving you the *labels* of the factor, as *strings*, and what you get if you use order() on them has nothing to do with the order of the factor levels, and everything to do with the string sort order for your locale. > str(levels(total.density)) chr [1:3] "8" "16" "32" The factor levels themselves are in the order you specified. > str(total.density) Ord.factor w/ 3 levels "8"<"16"<"32": 1 1 1 1 1 1 1 1 1 1 ... On Wed, Mar 21, 2012 at 10:50 AM, Justin Montemarano wrote: > Actually I've try that too, Sarah > > The test is to run order(levels(total.density)), which I need to be 1 2 3, > not 2 3 1, and your solution still gives me 2 3 1. > > I also don't know how to reply to this thread with the previous message > below... > - > Justin Montemarano > Graduate Student > Kent State University - Biological Sciences > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC models are not all fitted to the same number of observation
Le 21/03/2012 10:56, Patrick Giraudoux a écrit : Hi, Using lme from the package nlme 3.1-103, I meet a strange warning. I am trying to compare to models with: library(nlme) lmez6=lme(lepus~vulpes,random=~1|troncon/an,data=ika_z6_test) lmez60=lme(lepus~1,random=~1|troncon/an,data=ika_z6_test) Both have the same number of observations and groups: lmez6 Linear mixed-effects model fit by REML Data: ika_z6_test Log-restricted-likelihood: -2267.756 Fixed: lepus ~ vulpes (Intercept) vulpes 1.35017117 0.04722338 Random effects: Formula: ~1 | troncon (Intercept) StdDev: 0.8080261 Formula: ~1 | an %in% troncon (Intercept) Residual StdDev:1.086611 0.4440076 Number of Observations: 1350 Number of Groups: troncon an %in% troncon 1691350 > lmez60 Linear mixed-effects model fit by REML Data: ika_z6_test Log-restricted-likelihood: -2266.869 Fixed: lepus ~ 1 (Intercept) 1.435569 Random effects: Formula: ~1 | troncon (Intercept) StdDev: 0.8139646 Formula: ~1 | an %in% troncon (Intercept) Residual StdDev:1.086843 0.4445815 Number of Observations: 1350 Number of Groups: troncon an %in% troncon 1691350 ...but when I want to compare their AIC, I get: AIC(lmez6,lmez60) df AIC lmez6 5 4545.511 lmez60 4 4541.737 Warning message: In AIC.default(lmez6, lmez60) : models are not all fitted to the same number of observations Has anybody an explanation about this strange warning ? To what extent this warning may limit the conclusions that could be drawn from AIC comparison ? Thanks in advance, Patrick Sorry to go on on the thread, I have created, but the trouble I meet is above my level in stats... Actually, not using AIC but an anova approach, I get a more informative message: anova(lmez6, lmez60) Model df AIC BIClogLik Test L.Ratio p-value lmez6 1 5 4545.511 4571.543 -2267.756 lmez60 2 4 4541.737 4562.566 -2266.869 1 vs 2 1.774036 0.1829 Warning message: In anova.lme(lmez6, lmez60) : Fitted objects with different fixed effects. REML comparisons are not meaningful. And fubbling a bit more, I disclosed that this was an effect of fitting the model using REML. If fitted using ML, things are going (apparently) smoothly: lmez6=lme(lepus~vulpes,random=~1|troncon/an,data=ika_z6_test,method="ML") > lmez60=lme(lepus~1,random=~1|troncon/an,data=ika_z6_test,method="ML") > anova(lmez6, lmez60) Model df AIC BIClogLik Test L.Ratio p-value lmez6 1 5 4536.406 4562.445 -2263.203 lmez60 2 4 4538.262 4559.093 -2265.131 1 vs 2 3.856102 0.0496 > AIC(lmez6,lmez60) df AIC lmez6 5 4536.406 lmez60 4 4538.262 Now I have the following problem. What I understood from Pinheiro and Bates's book and some forums, is that ML estimations are biased to some extent tending to underestimate variance parameters. So probably not to recommend however results looks consistent here. Thus, I am lost. The two models looks to me clearly embedded (one is just a null model with the only intercept to estimate and the other with intercept + one independent variable (numeric), both have the same random effects, the same response variable and the same number of observations). Warnings, from this point of view sounds inconsistent. They are probably not, but beyond my understanding... Any idea ? Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Actually I've try that too, Sarah The test is to run order(levels(total.density)), which I need to be 1 2 3, not 2 3 1, and your solution still gives me 2 3 1. I also don't know how to reply to this thread with the previous message below... - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to specify order of a factor
Is this what you need? > total.density <- + c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) > total.density <- factor(total.density, levels=c(8, 16, 32), ordered=TRUE) > str(total.density) Ord.factor w/ 3 levels "8"<"16"<"32": 1 1 1 1 1 1 1 1 1 1 ... On Wed, Mar 21, 2012 at 10:26 AM, Justin Montemarano wrote: > Hi all: > > I'm attempting to create a faceted plot with ggplot2 and I'm having issues > with a factor's order that is used to define the facet_grid(). > > The factor (named total.density) has three levels - 8, 16, and 32 - and I > would like them presented in that order. Running > order(levels(total.density)) yields the incorrect order of the facet grid - > 2 3 1, corresponding with 16, 32, and 8. > > I have attempted correcting the order with the following solutions (of > course, not run at once): > > #total.density <- relevel(total.density, '8') > #total.density <- as.numeric(levels(total.density)[total.density]) > #total.density <- factor(total.density, levels = c('8','16','32')) > #total.density <- factor(total.density, levels = > levels(total.density)[c(3,1,2)]) > #library(gregmisc) > #total.density <- reorder.factor(total.density, c('8', '16', '32'), > order = T) > > The data are as follows: > > total.density <- > c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) > > I'm running R 2.14.2 with all packages up-to-date as of 21.3.2012. > > Any help would be greatly appreciated. > > - -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to specify order of a factor
Hi all: I'm attempting to create a faceted plot with ggplot2 and I'm having issues with a factor's order that is used to define the facet_grid(). The factor (named total.density) has three levels - 8, 16, and 32 - and I would like them presented in that order. Running order(levels(total.density)) yields the incorrect order of the facet grid - 2 3 1, corresponding with 16, 32, and 8. I have attempted correcting the order with the following solutions (of course, not run at once): #total.density <- relevel(total.density, '8') #total.density <- as.numeric(levels(total.density)[total.density]) #total.density <- factor(total.density, levels = c('8','16','32')) #total.density <- factor(total.density, levels = levels(total.density)[c(3,1,2)]) #library(gregmisc) #total.density <- reorder.factor(total.density, c('8', '16', '32'), order = T) The data are as follows: total.density <- c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32) I'm running R 2.14.2 with all packages up-to-date as of 21.3.2012. Any help would be greatly appreciated. - Justin Montemarano Graduate Student Kent State University - Biological Sciences http://www.montegraphia.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmnet() vs. lars()
On 03/21/2012 06:30 AM, Vito Muggeo (UniPa) wrote: It appears that glmnet(), when "selecting" the covariates entering the model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3 variables are "added" at the same time and it is not possible to obtain a ranking of the covariates according to their importance in the model. On the other hand lars() "adds" the covariates one at a time. My question is: is it possible to obtain a similar output of lars (in terms of order of the variables entering the model) using glmnet()? glmnet() is based on an iterative coordinate descent algorithm applied to a grid of lambda values; LARS is a more elegant algorithm and computes exact solutions. You can get your glmnet solutions to have higher resolution (more "exact") by using a finer grid. In your example: set.seed(123) x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit1=glmnet(x,y) fit1$df [1] 0 2 4 4 ... The default is a grid of 100 lambda values. If we use 300 values, the resolution is finer and we can see the variables enter one at a time: > fit1=glmnet(x,y,nlambda=300) > fit1$df [1] 0 1 1 2 3 3 4 ... However, it is impossible to know in advance how fine the grid must be in order to ensure that only one variable enters the model between any two consecutive lambda values. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
> -Original Message- > Okay, try this: > > result <- with(data, > aggregate(data[,-(1:2)], by=list(ID), FUN=diff)) > > This assumes that the dataframe is sorted as in your example. > If that's not the case, then use order to arrange it first: A caveat: order and sort will return order of factor levels for factors and that is not necessarily alphanumeric order. The default order here is (T1, T2) but that need not be the case if the levels were allocated explicitly. So make sure that the factor _levels_ are in the order you expect! Steve E*** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] resetting console
Hello, I'm still hoping my issue is preventable and not worthy of a bug/crash report, hence my post is in 'help'. Anyway, I'd like to know how to reset the console so it is clear of all residual effects caused by previous scripts. Details: I run a script once and it runs successfully (but very slowly). The script uses fairly sizable data. No problem so far. Then I run the same exact script again and the console eventually crashes. Naturally, I thought it was the size of the data so I: 1) run script successfully (which includes a plot) 2) do this: dev.off() rm(list=ls(all=TRUE)) gc() 3) run script again ...and it still crashes. There isn't an R error or anything, I just get a Microsoft error report request window and the console goes away. However, if I: 1) run script successfully 2) shut down, and reopen console 3) run script again ...everything runs as expected and the console does not crash. If the script runs successfully the first time and I'm clearing all available memory (I think) what is 'remaining' that I need to reset in the console (that restarting the console solves/clears out)? PS - Because the script runs successfully, I'm thinking the script itself is not all that important. I'd prefer an answer that indicates generally how to reset the console. Basically I'm loading some data, manipulating the data including a log transformation, doing a gam fit (family="binomial"), and finally I plot the data from the fit. Interestingly, if I set family to "gaussian" or if I do not log transform the data, the console does not crash. Or should I post a crash/bug? Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do 2SLS in R
thanks Arne and Achim will surely use the forum on systemfit at RForge for further clarification regrads Priya . From: Arne Henningsen To: Achim Zeileis , priya.s...@zycus.com, r-help@r-project.org Date: 21-03-2012 18:35 Subject:Re: [R] How to do 2SLS in R On 21 March 2012 09:32, Achim Zeileis wrote: > On Wed, 21 Mar 2012, priya.s...@zycus.com wrote: >> Hi List >> I want to carry out structural mode. Following Example l have taken from >> Basic Econometrics- Damodar Gujarati : > > Look at the "systemfit" package. For an introduction see the accompanying > JSS paper: http://www.jstatsoft.org/v23/i04/ @Achim: Thanks for referring to the systemfit package and the corresponding JSS paper. @Priya: The command "tsls" (package "sem") can only estimate single-equation models, while systemfit can also estimate multiple-equation models. I hope that the documentation of systemfit as well as the JSS paper are sufficiently clear. If you still have questions regarding systemfit, please use a forum or "tracker" on systemfit's R-Forge site (see http://systemfit.org). Best wishes, Arne -- Arne Henningsen http://www.arne-henningsen.name [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshaping data from long to wide without a "timevar"
Hello All, Tried some more Internet searches and came to the conclusion that one probably does need to create a "timevar" before reshaping from long to wide. Below is some code that creates the "timevar" and transposes the data. connection <- textConnection(" 005 1 Gemcitabine 005 2 Erlotinib 006 1 Gemcitabine 006 3 Erlotinib 006 2 Paclitaxel 009 1 Gemcitabine 009 2 Erlotinib 010 1 Gemcitabine 010 2 Erlotinib 010 3 Herceptin ") TestData <- data.frame(scan(connection, list(Subject = 0, RowNo = 0, Drug = ""))) TestData$Drug <- as.character(TestData$Drug) TestData$Drug <- with(TestData, ifelse(Drug == "Gemcitabine", " Gemcitabine", Drug)) TestData$Drug <- with(TestData, ifelse(Drug == "Erlotinib", " Erlotinib", Drug)) TestData <- with(TestData, TestData[order(Subject,Drug), c("Subject", "Drug")]) require(reshape) Qualifying_Regimen <- TestData Qualifying_Regimen$Ordvar <- with(TestData, ave(1:nrow(Qualifying_Regimen), Subject, FUN = seq_along)) Qualifying_Regimen <- reshape(Qualifying_Regimen, direction="wide", idvar="Subject", timevar="Ordvar", v.names="Drug") TestData Qualifying_Regimen The code for creating the timevar came from a response to an earlier post by Joshua Wiley which can be found at: http://tolstoy.newcastle.edu.au/R/e15/help/11/07/0387.html All in all this works pretty well, and making the "timevar" is easy once you know how. If the gods are listening though, it would be nice if reshape could transpose from long to wide based solely on the order in which observations occur in the data. (Assuming of course that it can't do so already.) Time for a small confession. I've referred in my posts to alphabetical sorting of my Drug column and to wanting the transposed columns to be in alphabetical order. In my actual data, I added two spaces before the drug "Gemcitabine" and one space before the drug "Erlotinib" so that these drugs would always come first and second in the sort order. Unfortunately, I neglected to mention I had done this and as a result must have caused people some confusion. My apologies for this oversight. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do 2SLS in R
On 21 March 2012 09:32, Achim Zeileis wrote: > On Wed, 21 Mar 2012, priya.s...@zycus.com wrote: >> Hi List >> I want to carry out structural mode. Following Example l have taken from >> Basic Econometrics- Damodar Gujarati : > > Look at the "systemfit" package. For an introduction see the accompanying > JSS paper: http://www.jstatsoft.org/v23/i04/ @Achim: Thanks for referring to the systemfit package and the corresponding JSS paper. @Priya: The command "tsls" (package "sem") can only estimate single-equation models, while systemfit can also estimate multiple-equation models. I hope that the documentation of systemfit as well as the JSS paper are sufficiently clear. If you still have questions regarding systemfit, please use a forum or "tracker" on systemfit's R-Forge site (see http://systemfit.org). Best wishes, Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in fitdist- mle failed to estimate parameters
vinod1 gmail.com> writes: > I am trying fit certain data into Beta distribution. I get the error saying > "Error in fitdist(discrete_random_variable_c, "beta", start = NULL, fix.arg > = NULL) : the function mle failed to estimate the parameters, with the > error code 100" > > Below is the sorted data that I am trying to fit. Where am I going wrong. > > Thanks a lot for any help. > > Vinod Works for me. In the future, please (1) put the data in a more convenient format (dput() works nicely) and (2) let us know what non-base packages you're using (I had to guess at fitdistrplus) ## x <- scan("betatmp.dat") ## dput(x,"") x <- c(2.05495e-05, 3.68772e-05, 4.21994e-05, 4.38481e-05, 5.55001e-05, 5.74267e-05, 6.27489e-05, 6.43976e-05, 6.64938e-05, 7.40247e-05, 7.60495e-05, 7.90767e-05, 8.07253e-05, 8.60475e-05, 8.70433e-05, 9.23773e-05, 9.45742e-05, 9.76995e-05, 9.93482e-05, 9.96262e-05, 0.000101275, 0.000103371, 0.000106597, 0.000108693, 0.000110342, 0.000110902, 0.000112927, 0.000116224, 0.000118249, 0.000119346, 0.000119898, 0.000120773, 0.000121994, 0.000122925, 0.000123921, 0.000131451, 0.000134577, 0.000136225, 0.000136774, 0.000138422, 0.000139895, 0.000140519, 0.000141322, 0.000142543, 0.000150074, 0.00015475, 0.000155126, 0.000156223, 0.000156775, 0.00015765, 0.000161545, 0.000163194, 0.000164621, 0.000165842, 0.00016612, 0.000166402, 0.000167769, 0.000173373, 0.000173651, 0.000174846, 0.000176273, 0.000177396, 0.0001782, 0.000179421, 0.000183522, 0.000183744, 0.000186952, 0.000187267, 0.000192274, 0.000193371, 0.000197945, 0.000202719, 0.000203267, 0.000207816, 0.00021025, 0.00021315, 0.00021392, 0.000218694, 0.00022162, 0.000230248, 0.0002308, 0.000231675, 0.000240145, 0.000244693, 0.000252224, 0.000266343, 0.000308765, 0.000422837, 0.000429537, 0.000443386 ) library(fitdistrplus) fitdist(x,"beta",start=NULL) estimate Std. Error shape1 4.288030.5101352 shape2 27534.67757 3373.9026660 hist(x,col="gray",freq=FALSE,breaks=20) curve(dbeta(x,4.288,27534),add=TRUE,col=2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
Here's the plyr way I should have thought of earlier: require(plyr) ddply(data, "ID", numcolwise(diff)) Still requires your data to be ordered. Peter Ehlers On 2012-03-21 04:51, Eik Vettorazzi wrote: Hi Sylvain, assuming your data frame is ordered by ID and TIME, how about this aggregate(cbind(X,Y)~ID,data, function(x)(x[2]-x[1])) #or doing this for all but the first 2 columns of data: aggregate(data[,-(1:2)],by=list(data$ID), function(x)(x[2]-x[1])) cheers. Am 21.03.2012 09:48, schrieb wphantomfr: Dear R-help Members, I am wondering if anyone think of the optimal way of computing for several numeric variable the difference between 2 levels of a factor. To be clear let's generate a simple data frame with 2 numeric variables collected for different subjects (ID) and 2 levels of a TIME factor (time of evaluation) data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) ID TIME X Y 1 AA T1 9.959540 11.140529 2 AA T2 12.949522 9.896559 3 BB T1 9.039486 13.469104 4 BB T2 10.056392 14.632169 5 CC T1 8.706590 14.939197 6 CC T2 10.799296 10.747609 I want to compute for each subject and each variable (X, Y, ...) the difference between T2 and T1. Until today I do it by reshaping my dataframe to the wide format (the columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the difference between successive columns one by one : data$Xdiff=data$X.T2-data$X.T1 data$Ydiff=data$Y.T2-data$Y.T1 ... but this way is probably not optimal if the difference has to be computed for a large number of variables. How will you handle it ? Thanks in advance Sylvain Clément __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphic legend with mathematical symbol, numeric variable and character variable
On 2012-03-20 06:09, "ECOTIÈRE David (Responsable d'activité) - CETE Est/LRPC de Strasbourg/6 Acoustique" wrote: Hi, I'd like to make a legend with a mix of mathematical symbol (tau), numeric variable and character variables.I have tried : types<-c("Type 1","Type 2","Type 2") tau<-c(1,3,2) legend(x="topright",legend=paste(types,"tau=",expression(tau))) but it doesn't work: the 'tau' symbol is not written in its 'symbol style' but as 'tau' Any (good) idea ? Thank you in advance ! David Does this do what you want: types <- paste('Type', 1:3, sep='~') mytau <- 4:6 leg <- parse(text=paste(types, "~~tau==", mytau, sep='~')) plot(0) legend('topright', legend=leg) The '~' symbols generate spaces; use more if you want more spacing. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
Hi Sylvain, assuming your data frame is ordered by ID and TIME, how about this aggregate(cbind(X,Y)~ID,data, function(x)(x[2]-x[1])) #or doing this for all but the first 2 columns of data: aggregate(data[,-(1:2)],by=list(data$ID), function(x)(x[2]-x[1])) cheers. Am 21.03.2012 09:48, schrieb wphantomfr: > Dear R-help Members, > > > I am wondering if anyone think of the optimal way of computing for > several numeric variable the difference between 2 levels of a factor. > > > To be clear let's generate a simple data frame with 2 numeric variables > collected for different subjects (ID) and 2 levels of a TIME factor > (time of evaluation) > > data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) > > > ID TIME X Y > 1 AA T1 9.959540 11.140529 > 2 AA T2 12.949522 9.896559 > 3 BB T1 9.039486 13.469104 > 4 BB T2 10.056392 14.632169 > 5 CC T1 8.706590 14.939197 > 6 CC T2 10.799296 10.747609 > > I want to compute for each subject and each variable (X, Y, ...) the > difference between T2 and T1. > > Until today I do it by reshaping my dataframe to the wide format (the > columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the > difference between successive columns one by one : > data$Xdiff=data$X.T2-data$X.T1 > data$Ydiff=data$Y.T2-data$Y.T1 > ... > > but this way is probably not optimal if the difference has to be > computed for a large number of variables. > > How will you handle it ? > > > Thanks in advance > > Sylvain Clément > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred?
> I get the errors; > glm.fit: fitted probabilities numerically 0 or 1 occurred and > glm.fit: algorithm did not converge > . > Is there any way to to > fix this problem? There are two separate issues. One is the appearance of fitted values at 0 or 1. The other is the lack of convergence. The first is usually not fatal; it means that the probabilities are so close to 0 or 1 that a double precision value can't distinguish them from 0 or 1. Often that's a transient condition during iteration and the final fitted values are inside (0,1), but final fitted values can also be out there if you have continuous predictor values a long way out; by itself, that usually won't stop a glm. The second is a bit more problematic. Sometimes it's just that you need to increase the maximum number of iterations (see the control= argument and ?glm.control). That is always worth a try - use some absurdly high number like 1000 instead of the default 25 and go find some coffee while it thinks about it. If that solves your problem then you're OK, or at least as OK as you can be with a data set that hard to fit. But if you're bootstrapping with some anomalous values it is also possible that some of your bootstrapped sets have too high a proportion of anomalies, and under those conditions it's possible that there could be no sensible optimum within reach. One way of dealing with that in a boostrap or other simulation context is to check the 'converged' value and if it's FALSE, return an NA for your statistic. But of course that is a form of censoring; if you have a high proportion of such instances you'd be on very thin ice drawing conclusions. S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
Okay, try this: result <- with(data, aggregate(data[,-(1:2)], by=list(ID), FUN=diff)) That's it !! I didn't knew the "diff" function. Your solution works perfectly. Thanks Peter for this ! Sylvain Le 21/03/12 12:01, Peter Ehlers a écrit : On 2012-03-21 03:37, wphantomfr wrote: Thanks peter for your fast answer. your is really nice but if I have say 20 variables I have to write 20 statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]". Does someone has a trick to avoid this ? It may not be easily possible. Okay, try this: result <- with(data, aggregate(data[,-(1:2)], by=list(ID), FUN=diff)) This assumes that the dataframe is sorted as in your example. If that's not the case, then use order to arrange it first: data <- with(data, data[order(ID, TIME), ]) Peter Ehlers Le 21/03/12 11:03, Peter Ehlers a écrit : On 2012-03-21 01:48, wphantomfr wrote: Dear R-help Members, I am wondering if anyone think of the optimal way of computing for several numeric variable the difference between 2 levels of a factor. To be clear let's generate a simple data frame with 2 numeric variables collected for different subjects (ID) and 2 levels of a TIME factor (time of evaluation) data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) ID TIME X Y 1 AA T1 9.959540 11.140529 2 AA T2 12.949522 9.896559 3 BB T1 9.039486 13.469104 4 BB T2 10.056392 14.632169 5 CC T1 8.706590 14.939197 6 CC T2 10.799296 10.747609 I want to compute for each subject and each variable (X, Y, ...) the difference between T2 and T1. Until today I do it by reshaping my dataframe to the wide format (the columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the difference between successive columns one by one : data$Xdiff=data$X.T2-data$X.T1 data$Ydiff=data$Y.T2-data$Y.T1 ... but this way is probably not optimal if the difference has to be computed for a large number of variables. How will you handle it ? One way is to use the plyr package: library(plyr) result<- ddply(data, "ID", summarize, DIF.X = X[TIME=="T2"] - X[TIME=="T1"], DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"]) Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
On 2012-03-21 03:37, wphantomfr wrote: Thanks peter for your fast answer. your is really nice but if I have say 20 variables I have to write 20 statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]". Does someone has a trick to avoid this ? It may not be easily possible. Okay, try this: result <- with(data, aggregate(data[,-(1:2)], by=list(ID), FUN=diff)) This assumes that the dataframe is sorted as in your example. If that's not the case, then use order to arrange it first: data <- with(data, data[order(ID, TIME), ]) Peter Ehlers Le 21/03/12 11:03, Peter Ehlers a écrit : On 2012-03-21 01:48, wphantomfr wrote: Dear R-help Members, I am wondering if anyone think of the optimal way of computing for several numeric variable the difference between 2 levels of a factor. To be clear let's generate a simple data frame with 2 numeric variables collected for different subjects (ID) and 2 levels of a TIME factor (time of evaluation) data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) ID TIME X Y 1 AA T1 9.959540 11.140529 2 AA T2 12.949522 9.896559 3 BB T1 9.039486 13.469104 4 BB T2 10.056392 14.632169 5 CC T1 8.706590 14.939197 6 CC T2 10.799296 10.747609 I want to compute for each subject and each variable (X, Y, ...) the difference between T2 and T1. Until today I do it by reshaping my dataframe to the wide format (the columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the difference between successive columns one by one : data$Xdiff=data$X.T2-data$X.T1 data$Ydiff=data$Y.T2-data$Y.T1 ... but this way is probably not optimal if the difference has to be computed for a large number of variables. How will you handle it ? One way is to use the plyr package: library(plyr) result<- ddply(data, "ID", summarize, DIF.X = X[TIME=="T2"] - X[TIME=="T1"], DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"]) Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
Thanks peter for your fast answer. your is really nice but if I have say 20 variables I have to write 20 statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]". Does someone has a trick to avoid this ? It may not be easily possible. Regards Sylvain Clément Le 21/03/12 11:03, Peter Ehlers a écrit : On 2012-03-21 01:48, wphantomfr wrote: Dear R-help Members, I am wondering if anyone think of the optimal way of computing for several numeric variable the difference between 2 levels of a factor. To be clear let's generate a simple data frame with 2 numeric variables collected for different subjects (ID) and 2 levels of a TIME factor (time of evaluation) data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) ID TIME X Y 1 AA T1 9.959540 11.140529 2 AA T2 12.949522 9.896559 3 BB T1 9.039486 13.469104 4 BB T2 10.056392 14.632169 5 CC T1 8.706590 14.939197 6 CC T2 10.799296 10.747609 I want to compute for each subject and each variable (X, Y, ...) the difference between T2 and T1. Until today I do it by reshaping my dataframe to the wide format (the columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the difference between successive columns one by one : data$Xdiff=data$X.T2-data$X.T1 data$Ydiff=data$Y.T2-data$Y.T1 ... but this way is probably not optimal if the difference has to be computed for a large number of variables. How will you handle it ? One way is to use the plyr package: library(plyr) result <- ddply(data, "ID", summarize, DIF.X = X[TIME=="T2"] - X[TIME=="T1"], DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"]) Peter Ehlers Thanks in advance Sylvain Clément __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet() vs. lars()
dear all, It appears that glmnet(), when "selecting" the covariates entering the model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3 variables are "added" at the same time and it is not possible to obtain a ranking of the covariates according to their importance in the model. On the other hand lars() "adds" the covariates one at a time. My question is: is it possible to obtain a similar output of lars (in terms of order of the variables entering the model) using glmnet()? many thanks, vito #Example (from ?glmnet) set.seed(123) x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit1=glmnet(x,y) fit1$df #no. of covariates entering the model at different lambdas #Thus in the "first" model no covariate is included and in the second #one 2 covariates (V8 and V20) are included at the same time. Because #two variables are included at the same time I do not know which #variable (among the selected ones) is more important. #Everything is fine with lars o<-lars(x,y) o$df #the covariates enter one at a time.. V8 is "better" than V20 -- Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Università di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240 fax: 091 485726 http://dssm.unipa.it/vmuggeo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova.lm F test confusion
On Mar 21, 2012, at 09:04 , Rolf Turner wrote: > On 21/03/12 20:19, Gerrit Eichner wrote: >> Dear Ben, or anybody else, of course, >> >> I'd be grateful if you could point me to a reference (different from ch. 4 >> "Linear models" in "Statistical Models in S" (Chambers & Hastie (1992))) >> regarding the (asserted F-)distributional properties of the test statistic >> (used, e.g., by anova.lm()) to compare model 1 with model 2 using the MSE of >> model 3 in a sequence of three nested (linear) models? (A short >> RSiteSearch() and a google search didn't lead me far ...) >> >> Thx in advance! > > A good, if somewhat dry, reference on this is "Theory and Application of the > Linear Model" by Franklin A. Graybill, Duxbury, 1976. > > There are of course many, *many* other such books. The whole thing is of course a fairly straightforward consequence of Fisher-Cochran's theorem. This says that, in the absence of systematic effects, the Sums of Squares in the ANOVA tables are proportional to independent chi-square variables. Hence, the ratio of any pair of Mean Squares or pooled Mean Squares has an F distribution. This holds for the sort of ANOVA tables that decompose the total sum of squares (i.e, not the drop1 or TypeII-IV style of table) The convention of dividing by the "overall error" term rather than successively pooling terms according to model reductions is pervasive both in statistical literature and in software. This stems from the emphasis on balanced designs in the days before electronic computers: When you know that the sums of squares are independent of the testing order, you rather like to have the same property for the F tests, so that all conclusions can be conveniently read from a single ANOVA table. By and large, it doesn't make a whole lot of difference whether you gain a few denominator DF. It does, however, imply that the F tests with a common denominator are not independent, as opposed to the "proper" successive F tests. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how calculate seasonal component & cyclic component of time series?
On 21-03-2012, at 08:10, sagarnikam123 wrote: > i am new to time series,whatever i know up till now,from that > i have uploaded time series file & what to build arma model,but for that i > want p & q values(orders) > tell me how to calculate best p & q values to find best AIC values for model > > i am doing but giving error >> bhavar<-read.table(file.choose()) #taking time series file It's a file of numbers. >> decompose(bhavar$V1) > Error in decompose(bhavar$V1) : time series has no or less than 2 periods > > what this error is telling ? You asked a very similar question 8 days ago. You were given answers. If you want to use HoltWinters or decompose you will have to set the frequency of your time series to something > 1. You were told how to do that. You were also told to have a look at the forecast package. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to compute the difference between two levels of a factor ?
On 2012-03-21 01:48, wphantomfr wrote: Dear R-help Members, I am wondering if anyone think of the optimal way of computing for several numeric variable the difference between 2 levels of a factor. To be clear let's generate a simple data frame with 2 numeric variables collected for different subjects (ID) and 2 levels of a TIME factor (time of evaluation) data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) ID TIME X Y 1 AA T1 9.959540 11.140529 2 AA T2 12.949522 9.896559 3 BB T1 9.039486 13.469104 4 BB T2 10.056392 14.632169 5 CC T1 8.706590 14.939197 6 CC T2 10.799296 10.747609 I want to compute for each subject and each variable (X, Y, ...) the difference between T2 and T1. Until today I do it by reshaping my dataframe to the wide format (the columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the difference between successive columns one by one : data$Xdiff=data$X.T2-data$X.T1 data$Ydiff=data$Y.T2-data$Y.T1 ... but this way is probably not optimal if the difference has to be computed for a large number of variables. How will you handle it ? One way is to use the plyr package: library(plyr) result <- ddply(data, "ID", summarize, DIF.X = X[TIME=="T2"] - X[TIME=="T1"], DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"]) Peter Ehlers Thanks in advance Sylvain Clément __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AIC models are not all fitted to the same number of observation
Hi, Using lme from the package nlme 3.1-103, I meet a strange warning. I am trying to compare to models with: library(nlme) lmez6=lme(lepus~vulpes,random=~1|troncon/an,data=ika_z6_test) lmez60=lme(lepus~1,random=~1|troncon/an,data=ika_z6_test) Both have the same number of observations and groups: lmez6 Linear mixed-effects model fit by REML Data: ika_z6_test Log-restricted-likelihood: -2267.756 Fixed: lepus ~ vulpes (Intercept) vulpes 1.35017117 0.04722338 Random effects: Formula: ~1 | troncon (Intercept) StdDev: 0.8080261 Formula: ~1 | an %in% troncon (Intercept) Residual StdDev:1.086611 0.4440076 Number of Observations: 1350 Number of Groups: troncon an %in% troncon 1691350 > lmez60 Linear mixed-effects model fit by REML Data: ika_z6_test Log-restricted-likelihood: -2266.869 Fixed: lepus ~ 1 (Intercept) 1.435569 Random effects: Formula: ~1 | troncon (Intercept) StdDev: 0.8139646 Formula: ~1 | an %in% troncon (Intercept) Residual StdDev:1.086843 0.4445815 Number of Observations: 1350 Number of Groups: troncon an %in% troncon 1691350 ...but when I want to compare their AIC, I get: AIC(lmez6,lmez60) df AIC lmez6 5 4545.511 lmez60 4 4541.737 Warning message: In AIC.default(lmez6, lmez60) : models are not all fitted to the same number of observations Has anybody an explanation about this strange warning ? To what extent this warning may limit the conclusions that could be drawn from AIC comparison ? Thanks in advance, Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading Dataset into R continual issue
On 03/21/2012 06:35 AM, bobo wrote: > Thank you. I was able to get it loaded however when I tried to run > > mod1<-lm(Pat2006~FHouse) > I got > Error in eval(expr, envir, enclos) : object 'Pat2006' not found > > What exactly is occurring here? > > -- > View this message in context: > http://r.789695.n4.nabble.com/Loading-Dataset-into-R-continual-issue-tp4486619p4491424.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. What is probably going wrong is that Pat2006 and FHouse are part of a data.frame, as columns. If the columns are in the same data.frame, say one called df: mod1 <- lm(Path2006 ~ FHouse, data = df) an alternative is to use assign to dump the columns as variables in your workspace: > speed Error: object 'speed' not found > attach(cars) > speed [1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 [26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25 but I am very much in favor of the first solution using "data =". Using attach fills up your workspace with a great deal of objects. Keeping the columns in a data.frame is also better from a design point of view: having them in one data.frame already groups together variables (columns) that share a common background. In addition: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, *reproducible code*. good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.