[R] Uploading Google Spreadsheet data into R
Hello, I am trying to upload data I have on a Google Spreadsheet within R to perform some analysis. I regularly update such data and need to perform data analysis in the quickiest possible way - i.e. without need to publish the data, so I was wondering how to make work this piece of code (source http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets-from-a-google-spreadsheet-guardian-2014-university-guide-data/) with my dataset (see https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE#gid=0 ): library(RCurl) gsqAPI = function(key,query,gid=0){ tmp=getURL( paste( sep="",'https://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid), ssl.verifypeer = FALSE ) return( read.csv( textConnection( tmp ), stringsAsFactors=F ) ) } handler=function(key,i){ tmp=gsqAPI(key,"select * where B!=''", i) subject=sub(".Rank",'',colnames(tmp)[1]) colnames(tmp)[1]="Subject.Rank" tmp$subject=subject tmp } key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE' gdata=handler(key,0) The code is currently returning the following: Error in `$<-.data.frame`(`*tmp*`, "subject", value = "COL1") : replacement has 1 row, data has 0 Thank you in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deSolve, unresolved namespace error -- solved
We have been able to reproduce the reported issue on another Linux system: Fedora 19, and the solution was quite simple: The deSolve package must always to be loaded *before* loading the shared library of the compiled model. Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Underdispersion and count data
On Thu, 7 Nov 2013, sv.j...@yahoo.ca wrote: Hello, I have count data for 4 groups, 2 of which have a large number of zeroes and are overdispersed, and the other 2 underdispersed with no zeroes. Are you sure that it's really underdispersion in addition to the lack of zeros? It could also be that due to the missing zeros, there is less dispersion. I have two questions about model fitting, which I am quite new to, and have been using mostly the pscl package. 1 - How do I deal with underdispersion? Almost all the published and online advice is regarding overdispersion, and neither the Poisson nor negative binomial distribution seem appropriate. The COM Poisson comes up sometimes as a suggestion, but it's not clear to me how I can use this, explain my choice of it, or what information I would report for publication purposes. There are (at least) two packages on CRAN: compoisson and ComPoissonReg which support this. However, I would check first whether this is really needed or maybe a zero-truncated Poisson model is already sufficient. The package "countreg" on R-Forge (https://R-Forge.R-project.org/R/?group_id=522) has a function zerotrunc() which is essentially the same code that hurdle() in "pscl" uses. So it should be easy to use for you. 2 - For the overdispersed data with lots of zeroes, I've tried zero-inflated Poisson and NegBin and hurdle models, and used the Vuong test to compare. However, I get equal fit for two candidate models that produce quite different coefficient estimates for my predictor variables, and hence different p values. I am unsure how to proceed in choosing one of these models, and how I would justify one over the other given that the Vuong test seems not to discriminate. Is it just zero-inflated vs. hurdle or also differences in the regressors? If the former: zero-inflated and hurdle models are parametrized differently but often lead to similar fits. But the former has a count part plus a zero-inlation part whereas the latter as a zero-truncated count part and a zero hurdle. If the regressors are different, then it's probably a subject-matter decision. hth, Z Thank you and any advice would be much appreciated. Mo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help plotting two-dimensional ADMIXTURE data in R
Hello, I am trying to plot ADMIXTURE output in R. My output is two-dimensional (two columns, 190 rows). These rows represent individuals, and groups of them (e.g., 1-12) represent populations. What I'm hoping for is a plot with eleven different colours, basically like a STRUCTURE plot. #This code works, but plots with only two colours, not separating the populations #to show you how I'm reading in my data tbl=read.table("filepath to file",sep=" ") #plotting barplot(t(as.matrix(tbl)), col=rainbow(4), xlab="Individual #", ylab="Ancestry", border=NA) ###I have tried to amend this with the following code: barplot(t(as.matrix(tbl[c(0:12, 13:32, 33:51, 52:69, 70:87, 88:104, 105:118, 119:137, 138:157, 158:172, 173:190), ])), col=c("darkgreen", "darkblue", "orange", "cyan", "yellow", "chartreuse", "blue", "cyan4", "red", "violet", "blueviolet"), xlab="Individual #", ylab="Ancestry", border=NA) #but this code really still says the same thing as the first, i.e., c(1:190) ###Alternatively, this code separates the colours barplot(t(as.matrix(tbl))[1], col=c("darkgreen", "darkblue", "orange", "cyan", "yellow", "chartreuse", "blue", "cyan4", "red", "violet", "blueviolet") [c(rep(1,length(1:12)),rep(2,length(13:32)), rep(3,length(33:51)), rep(4,length(52:69)), rep(5,length(70:87)), rep(6,length(88:104)), rep(7,length(105:118)), rep(8,length(119:137)), rep(9,length(138:157)), rep(10,length(158:172)), rep(11,length(173:190)))], xlab="Individual #", ylab="Ancestry", border=NA) #But this only plots in one dimension. I've tried so many permutations of the concepts in each of these, and I just can't get it. I've tried available code on the net, but I think my R skills aren't advanced enough to sort out their tricks. Any suggestions would be much appreciated. Sincerely, Ella -- Ella Bowles PhD Candidate Biological Sciences University of Calgary e-mail: ebow...@ucalgary.ca, bowl...@gmail.com website: http://ellabowlesphd.wordpress.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Underdispersion and count data
Hello, I have count data for 4 groups, 2 of which have a large number of zeroes and are overdispersed, and the other 2 underdispersed with no zeroes. I have two questions about model fitting, which I am quite new to, and have been using mostly the pscl package. 1 - How do I deal with underdispersion? Almost all the published and online advice is regarding overdispersion, and neither the Poisson nor negative binomial distribution seem appropriate. The COM Poisson comes up sometimes as a suggestion, but it's not clear to me how I can use this, explain my choice of it, or what information I would report for publication purposes. 2 - For the overdispersed data with lots of zeroes, I've tried zero-inflated Poisson and NegBin and hurdle models, and used the Vuong test to compare. However, I get equal fit for two candidate models that produce quite different coefficient estimates for my predictor variables, and hence different p values. I am unsure how to proceed in choosing one of these models, and how I would justify one over the other given that the Vuong test seems not to discriminate. Thank you and any advice would be much appreciated. Mo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding absolute viewport location in grid / lattice
Hi Take a look at current.transform() (the grid.locator() function shows an example use) Paul On 11/06/13 13:35, James Price wrote: I'm trying to do some post-plot manipulation of some lattice graphics, in which I need to get the absolute viewport locations on the plotting device. So for example: library(lattice) print(xyplot(Petal.Length ~ Sepal.Length | Species, iris, layout = c(2, 2))) trellis.focus('panel', 1, 1) This shows the item I want to identify. I need to know, for example, that the lower left corner of the highlighted panel is at (0.1, 0.1) of the plotting device. Hopefully there's some grid shenanigans I can use to do so. To put the question into context (in case there's an alternative way): I'm constructing PDF reports that are an amalgam of PDF figures, constructed through LaTeX programmatically constructed in R, and I'd like to be able to add tooltips to some of the data points on the graphs. To do so, I'm overlaying a tikz (empty) image that uses the pdfcomment package to add the tooltips. Hence I need to be able to calculate where some of the points are on the screen so I can calculate where to put the tooltip 'hit-zones'. This would be fairly trivial using javascript / actionscript and mouseover hooks, but I'm constrained by PDF. Thanks, Jim Price. Strength in Numbers. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 p...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loglogistic 4-parameter model, fitting multiple curves specific estimates give NaNS, when fit alone all parameters estimated
Hello, I am fitting curves in drc to a Log-logistic 4-parameter model. My experiment is looking at fall and spring applications of a herbicide on wild oat at 5 different locations. A preliminary mixed model ANOVA showed that timing was significant. I am running a dose response curve for each application timing at each location. However, when I run them all together, my one fall location has estimated b,c,d and e, however the standard error for all of them is NaNS. When I run only the fall location on its own, there is no problem in getting standard errors for all my parameters. What is the difference between running it alone and running it with other curves? How can I fix the NaNS standard errors in the multiple curves as I need to be able to compare ED50s. The data for the curve alone is below. When I run it with the rest of the curves they are separated by biotype. Dependent variable is square root biomass (srbiomass) and treatment (Trt) is the independent variable. Location Timing biotypeno Trt Rep Trtno biomass srbiomass Kinsella Fall 5 0 1 1 21.2 4.604346 Kinsella Fall 5 0 2 1 168.72 12.98923 Kinsella Fall 5 0 3 1 86.76 9.314505 Kinsella Fall 5 0 4 1 14.3 3.781534 Kinsella Fall 5 50 1 2 12.7 3.563706 Kinsella Fall 5 50 2 2 46.44 6.81469 Kinsella Fall 5 50 3 2 9.84 3.136877 Kinsella Fall 5 50 4 2 32.38 5.690343 Kinsella Fall 5 100 1 3 10.3 3.209361 Kinsella Fall 5 100 2 3 41.84 6.468385 Kinsella Fall 5 100 3 3 0 0 Kinsella Fall 5 100 4 3 4.32 2.078461 Kinsella Fall 5 150 1 4 3.84 1.959592 Kinsella Fall 5 150 2 4 9.34 3.056141 Kinsella Fall 5 150 3 4 25.9 5.089204 Kinsella Fall 5 150 4 4 1.46 1.208305 Kinsella Fall 5 200 1 5 6.6 2.569047 Kinsella Fall 5 200 2 5 0 0 Kinsella Fall 5 200 3 5 72.94 8.540492 Kinsella Fall 5 200 4 5 24.56 4.955805 Kinsella Fall 5 300 1 6 6.36 2.521904 Kinsella Fall 5 300 2 6 3.6 1.897367 Kinsella Fall 5 300 3 6 31.28 5.592853 Kinsella Fall 5 300 4 6 1.46 1.208305 Kinsella Fall 5 400 1 7 2.02 1.421267 Kinsella Fall 5 400 2 7 11.2 3.34664 Kinsella Fall 5 400 3 7 3.02 1.737815 Kinsella Fall 5 400 4 7 0 0 Thanks in advance for any help. Breanne [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting Data Frames in R by multiple columns with a custom order
If you already have the order stored in a list or so: For example: dat1 <- as.data.frame(mat,stringsAsFactors=FALSE) lst1 <- list(c("OF","ON"), c("US","UK", "WW","BO","BR","CA"), c("P2","P3","P1"),c("S2","S1","S3")) dat1[] <- lapply(seq_along(lst1),function(i) factor(dat1[,i],levels=lst1[[i]])) dat1[with(dat1,order(c4,c1,c2,c3)),] A.K. On Thursday, November 7, 2013 5:47 PM, arun wrote: Hi, Not sure whether this helps: dat1 <- as.data.frame(mat,stringsAsFactors=FALSE) dat1$c4 <- factor(dat1$c4,levels=c("OF","ON")) dat1$c1 <- factor(dat1$c1,levels=c("US","UK","WW","BO","BR","CA")) dat1$c2 <- factor(dat1$c2, levels=c("P2","P3","P1")) dat1$c3 <- factor(dat1$c3, levels=c("S2","S1","S3")) dat1[with(dat1,order(c4,c1,c2,c3)),] A.K. Thank you guys for the help here and my apologies if this has been answered in the post's already somewhere which I just was not able to find. I am trying to sort a data frame by multiple columns. Each column has different values of interest for which I am trying to sort. I have been able to do this alphebetically in both ascending and decending order using the order function. However, for the document I am trying to create it is crutcial that the order is not alphebetically. In fact it is by a specific ordering, on each of the columns, which I need to specify. I could do this via a nasty convolution of creating sort order variables for each of the columns and then merging on to the data frame by the cols, then ordering on the new sort order cols, then remove them...however, I was hoping that there would be a nice and easy automated way to handle this in case the order changes at another time. so, here's an example c1 = c('CA', 'CA', 'CA', 'BR', 'BR', 'UK', 'UK', 'BO', 'BO', 'BO', 'WW', 'WW', 'WW', 'US', 'US') c2 = c('P3', 'P2', 'P1', 'P1', 'P2', 'P2', 'P1', 'P1', 'P3', 'P2', 'P1', 'P2', 'P3', 'P3', 'P2') c3 = c('S1', 'S1', 'S2', 'S2', 'S2', 'S1', 'S2', 'S1', 'S1', 'S1', 'S1', 'S2', 'S3', 'S1', 'S1') c4 = c('ON', 'ON', 'OF', 'ON', 'OF', 'OF', 'OF', 'ON', 'ON', 'ON', 'OF', 'ON', 'ON', 'ON', 'ON') mat = cbind(c4, c1, c2, c3) if we sort as usual we'd get "OF" "BR" "P2" "S2" "OF" "CA" "P1" "S2" "OF" "UK" "P1" "S2" "OF" "UK" "P2" "S1" "OF" "WW" "P1" "S1" "ON" "BO" "P1" "S1" "ON" "BO" "P2" "S1" "ON" "BO" "P3" "S1" "ON" "BR" "P1" "S2" "ON" "CA" "P2" "S1" "ON" "CA" "P3" "S1" "ON" "US" "P2" "S1" "ON" "US" "P3" "S1" "ON" "WW" "P2" "S2" "ON" "WW" "P3" "S3" however I want OF in col 1 to come first...then in the col2 i want US, UK, WW, BO, BR, and then CA. then col 3 we need P2, then P3, then P1, and finally in col 4 i need S2, then S1, and then finally S3. As such "OF" "UK" "P2" "S1" "OF" "UK" "P1" "S2" "OF" "WW" "P1" "S1" "OF" "BR" "P2" "S2" "OF" "CA" "P1" "S2" "ON" "US" "P2" "S1" "ON" "US" "P3" "S1" "ON" "WW" "P2" "S2" "ON" "WW" "P3" "S3" "ON" "BO" "P2" "S1" "ON" "BO" "P3" "S1" "ON" "BO" "P1" "S1" "ON" "BR" "P1" "S2" "ON" "CA" "P2" "S1" "ON" "CA" "P3" "S1" i've tried nesting orders in orders, the match function looks like it might work if it wasn't for the fact that in my actual data each col can have multiple records for each value and match only looks for the first matching casethat said, the sort order is unique. Furthermore, these are not the real data, just an example. anything might be able to get me further along the way than I am now. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Earth (MARS) package with categorical predictors
It appears to be legitimate to include multi-level categorical and continuous variables in defining the model for earth (e.g. y ~ cat + cont1 + cont2) but is it also then possible use categoricals in the predict method using the earth result? I tried but it returns an error which is not very informative. Thanks Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting Data Frames in R by multiple columns with a custom order
Hi, Not sure whether this helps: dat1 <- as.data.frame(mat,stringsAsFactors=FALSE) dat1$c4 <- factor(dat1$c4,levels=c("OF","ON")) dat1$c1 <- factor(dat1$c1,levels=c("US","UK","WW","BO","BR","CA")) dat1$c2 <- factor(dat1$c2, levels=c("P2","P3","P1")) dat1$c3 <- factor(dat1$c3, levels=c("S2","S1","S3")) dat1[with(dat1,order(c4,c1,c2,c3)),] A.K. Thank you guys for the help here and my apologies if this has been answered in the post's already somewhere which I just was not able to find. I am trying to sort a data frame by multiple columns. Each column has different values of interest for which I am trying to sort. I have been able to do this alphebetically in both ascending and decending order using the order function. However, for the document I am trying to create it is crutcial that the order is not alphebetically. In fact it is by a specific ordering, on each of the columns, which I need to specify. I could do this via a nasty convolution of creating sort order variables for each of the columns and then merging on to the data frame by the cols, then ordering on the new sort order cols, then remove them...however, I was hoping that there would be a nice and easy automated way to handle this in case the order changes at another time. so, here's an example c1 = c('CA', 'CA', 'CA', 'BR', 'BR', 'UK', 'UK', 'BO', 'BO', 'BO', 'WW', 'WW', 'WW', 'US', 'US') c2 = c('P3', 'P2', 'P1', 'P1', 'P2', 'P2', 'P1', 'P1', 'P3', 'P2', 'P1', 'P2', 'P3', 'P3', 'P2') c3 = c('S1', 'S1', 'S2', 'S2', 'S2', 'S1', 'S2', 'S1', 'S1', 'S1', 'S1', 'S2', 'S3', 'S1', 'S1') c4 = c('ON', 'ON', 'OF', 'ON', 'OF', 'OF', 'OF', 'ON', 'ON', 'ON', 'OF', 'ON', 'ON', 'ON', 'ON') mat = cbind(c4, c1, c2, c3) if we sort as usual we'd get "OF""BR""P2""S2" "OF""CA""P1""S2" "OF""UK""P1""S2" "OF""UK""P2""S1" "OF""WW""P1""S1" "ON""BO""P1""S1" "ON""BO""P2""S1" "ON""BO""P3""S1" "ON""BR""P1""S2" "ON""CA""P2""S1" "ON""CA""P3""S1" "ON""US""P2""S1" "ON""US""P3""S1" "ON""WW""P2""S2" "ON""WW""P3""S3" however I want OF in col 1 to come first...then in the col2 i want US, UK, WW, BO, BR, and then CA. then col 3 we need P2, then P3, then P1, and finally in col 4 i need S2, then S1, and then finally S3. As such "OF""UK""P2""S1" "OF""UK""P1""S2" "OF""WW""P1""S1" "OF""BR""P2""S2" "OF""CA""P1""S2" "ON""US""P2""S1" "ON""US""P3""S1" "ON""WW""P2""S2" "ON""WW""P3""S3" "ON""BO""P2""S1" "ON""BO""P3""S1" "ON""BO""P1""S1" "ON""BR""P1""S2" "ON""CA""P2""S1" "ON""CA""P3""S1" i've tried nesting orders in orders, the match function looks like it might work if it wasn't for the fact that in my actual data each col can have multiple records for each value and match only looks for the first matching casethat said, the sort order is unique. Furthermore, these are not the real data, just an example. anything might be able to get me further along the way than I am now. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging two dataframes with a condition involving variables of both dataframes
You might need to implement it as a nested pair of for loops using rbind. In essence iterate over the rows in df1 and each time find the matching row in df2. If none is found then add the df1 row by itself to the result. If one is then remove it from df2 and rbind both of them. Once done just merge in all rows that remain in df2. This would likely be slower than a sql-based method but is essentially the same algorithm. You can find advice on for-loops in R here: http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/ Best, Collin. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems loading xlsx
On Nov 7, 2013, at 12:55 PM, Barrett Gady wrote: > does anyone know how to resolve Error: .onLoad failed in loadNamespace() for > ‘xlsxjars’ > > I can’t seem to get this package to load on my windows 7 install of R3.02 It appears that you did not install all the depndencies for package xlsx. http://cran.r-project.org/web/packages/xlsx/index.html -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prod and F90 product
On 13-11-07 6:39 PM, Filippo wrote: Hi, I'm having strange differences between the R function prod ad the F90 function product. Processing the same vector C (see attachment). I get 2 different results: prod(C) = 1.069678e-307 testProduct(C) = 0 where testProd is the following wrapping function: testProd <- function(x) { return(.Fortran('testProd', as.double(x), as.double(0), as.double(0), as.integer(length(x } subroutine testProd(x, p, q, n) implicit none integer, intent (in) :: n double precision, intent (in) :: x(n) double precision, intent (out) :: p double precision, intent (out) :: q integer :: i p = product(x) q=1 do i = 1, n q = q*x(i) end do end subroutine testProd I check the lowest possible number and seems to be the same for both R and F90. Can anyone help me understanding this behaviour? Some intermediate results may be stored with 80 bit precision in R, 64 bit precision in Fortran. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prod and F90 product
Fortune ! Bert Sent from my iPhone -- please excuse typos. > On Nov 7, 2013, at 3:48 PM, Rolf Turner wrote: > >> On 11/08/13 12:39, Filippo wrote: >> Hi, >> I'm having strange differences between the R function prod ad the F90 >> function product. >> Processing the same vector C (see attachment). I get 2 different results: >> prod(C) = 1.069678e-307 >> testProduct(C) = 0 > > > > If you are worried about the difference between 0 and 1.069678e-307 then you > probably > shouldn't be using computers. > >cheers, > >Rolf Turner > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging two dataframes with a condition involving variables of both dataframes
Dear all, I would like to merge two dataframes using two conditions. For example, if I have the dataframes df1 and df2: > (df1 <- data.frame(var1=c("a","b","d","e","g"), var2=c(25,14,53,26,84), infodf1=c(1,1,1,1,1))) var1 var2 infodf1 1a 25 1 2b 14 1 3d 53 1 4e 26 1 5g 84 1 > (df2 <- data.frame(var1=c("a", "a", "c", "d", "e", "h","i"), var3=c(10, 32, 14,55,2,53,6), var4=c(40,37, 54,70,30,98,10), infodf2=c(2,2,2,2,2,2,2))) var1 var3 var4 infodf2 1a 10 40 2 2a 32 37 2 3c 14 54 2 4d 55 70 2 5e2 30 2 6h 53 98 2 7i6 10 2 I would like to obtain a new dataframe df3 merging df1 and df2 if var1(of df1)==var1(of df2) and if var3(of df2)<=var2(of df1)<=var4(of df2). Moreover, I would like to obtain a new data frame with all rows from df1 and df2 (i.e. full outer join: http://www.w3schools.com/sql/sql_join_full.asp) df3 should be: var1 var2 infodf1 var1 var3 var4 infodf2 1a 25 1a 10 40 2 2e 26 1e2 30 2 3b 14 1 NA NA NA NA 4d 53 1 NA NA NA NA 5g 84 1 NA NA NA NA 6 NA NA NAa 32 37 2 7 NA NA NAc 14 54 2 8 NA NA NAd 55 70 2 9 NA NA NAh 53 98 2 10 NA NA NAi6 10 2 I cannot use "merge" because this function doesn't allow conditions. On the other hand, I have tried to use SQL code using the package sqldf with the follwing R syntax: > (df3 <- sqldf("select a.*, b.* FROM df1 a, df2 b WHERE a.var1 = b.var1 AND b.var3 <= a.var2 AND a.var2 <= b.var4 ") ) var1 var2 infodf1 var1 var3 var4 infodf2 1a 25 1a 10 40 2 2e 26 1e2 30 2 But sqldf doesn't support the use of the option: "full outer join" option. > (df3 <- sqldf("select a.*, b.* FROM df1 a full outer join df2 b WHERE a.var1 = b.var1 AND b.var3 <= a.var2 AND a.var2 <= b.var4 ") ) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: RIGHT and FULL OUTER JOINs are not currently supported) Does anyone know how I can solve my problem? thank you very much! Marc [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prod and F90 product
On 11/08/13 12:39, Filippo wrote: Hi, I'm having strange differences between the R function prod ad the F90 function product. Processing the same vector C (see attachment). I get 2 different results: prod(C) = 1.069678e-307 testProduct(C) = 0 If you are worried about the difference between 0 and 1.069678e-307 then you probably shouldn't be using computers. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prod and F90 product
Hi, I'm having strange differences between the R function prod ad the F90 function product. Processing the same vector C (see attachment). I get 2 different results: prod(C) = 1.069678e-307 testProduct(C) = 0 where testProd is the following wrapping function: testProd <- function(x) { return(.Fortran('testProd', as.double(x), as.double(0), as.double(0), as.integer(length(x } subroutine testProd(x, p, q, n) implicit none integer, intent (in) :: n double precision, intent (in) :: x(n) double precision, intent (out) :: p double precision, intent (out) :: q integer :: i p = product(x) q=1 do i = 1, n q = q*x(i) end do end subroutine testProd I check the lowest possible number and seems to be the same for both R and F90. Can anyone help me understanding this behaviour? Thank you in advance Regards, Filippo c(0.126667774727727, 0.126778508335206, 0.126889731437094, 0.127001445486345, 0.127113651944097, 0.127226352279715, 0.127339547970833, 0.127453240503398, 0.127567431371714, 0.127682122078482, 0.127797314134848, 0.127913009060448, 0.128029208383447, 0.128145913640589, 0.128263126377241, 0.128380848147438, 0.128499080513928, 0.128617825048222, 0.128737083330636, 0.128856856950339, 0.128977147505404, 0.129097956602849, 0.129219285858691, 0.129341136897992, 0.129463511354905, 0.129586410872726, 0.129709837103945, 0.12983379171029, 0.129958276362782, 0.130083292741782, 0.130208842537045, 0.130334927447768, 0.130461549182643, 0.130588709459909, 0.130716410007402, 0.130844652562612, 0.130973438872731, 0.13110277069471, 0.131232649795309, 0.131363077951156, 0.131494056948797, 0.131625588584752, 0.131757674665574, 0.131890317007896, 0.132023517438499, 0.132157277794358, 0.132291599922703, 0.132426485681077, 0.132561936937394, 0.132697955569994, 0.132834543467704, 0.132971702529897, 0.133109434666551, 0.133247741798307, 0.133386625856532, 0.13352608878338, 0.133666132531848, 0.133806759065846, 0.133947970360249, 0.134089768400969, 0.134232155185011, 0.134375132720541, 0.134518703026945, 0.1346628681349, 0.13480763008643, 0.134952990934981, 0.135098952745478, 0.135245517594397, 0.13539268756983, 0.13554046477155, 0.135688851311081, 0.135837849311768, 0.13598746090884, 0.136137688249485, 0.136288533492915, 0.13643999881044, 0.136592086385536, 0.136744798413917, 0.136898137103608, 0.137052104675016, 0.137206703361001, 0.137361935406953, 0.137517803070863, 0.137674308623401, 0.137831454347986, 0.137989242540865, 0.138147675511187, 0.138306755581082, 0.138466485085736, 0.13862686637347, 0.13878790180582, 0.138949593757609, 0.139111944617036, 0.13927495678575, 0.139438632678933, 0.139602974725378, 0.139767985367577, 0.139933667061797, 0.140100022278169, 0.140267053500766, 0.140434763227692, 0.140603153971164, 0.140772228257601, 0.140941988627706, 0.141112437636557, 0.14128357785369, 0.141455411863193, 0.141627942263791, 0.141801171668934, 0.141975102706893, 0.142149738020846, 0.142325080268972, 0.14250113212454, 0.142677896276008, 0.14285537542711, 0.143033572296957, 0.143212489620126, 0.143392130146759, 0.14357249664266, 0.143753591889391, 0.14393541868437, 0.144117979840971, 0.144301278188622, 0.144485316572906, 0.144670097855664, 0.144855624915093, 0.14504190064585, 0.145228927959158, 0.145416709782907, 0.145605249061759, 0.145794548757257, 0.145984611847927, 0.146175441329391, 0.146367040214468, 0.14655941153329, 0.146752558333409, 0.146946483679905, 0.147141190655504, 0.147336682360684, 0.147532961913792, 0.147730032451156, 0.147927897127204, 0.148126559114576, 0.148326021604241, 0.148526287805616, 0.148727360946687, 0.148929244274124, 0.149131941053405, 0.149335454568934, 0.149539788124169, 0.149744945041738, 0.149950928663571, 0.150157742351019, 0.150365389484985, 0.150573873466047, 0.150783197714589, 0.150993365670932, 0.151204380795458, 0.151416246568747, 0.15162896649171, 0.151842544085717, 0.152056982892734, 0.15227228647546, 0.152488458417464, 0.152705502323319, 0.152923421818742, 0.153142220550736, 0.153361902187731, 0.15358247041972, 0.153803928958412, 0.154026281537366, 0.154249531912146, 0.15447368386046, 0.154698741182311, 0.154924707700148, 0.15515158725901, 0.155379383726686, 0.155608100993858, 0.155837742974264, 0.156068313604844, 0.156299816845904, 0.15653225668127, 0.156765637118446, 0.15662188775, 0.157235235947601, 0.15747146247443, 0.157708645873097, 0.157946790271927, 0.158185899823904, 0.158425978706841, 0.158667031123545, 0.158909061301989, 0.159152073495486, 0.159396071982862, 0.159641061068628, 0.15988704508316, 0.160134028382875, 0.160382015350409, 0.160631010394801, 0.16088101795167, 0.161132042483405, 0.161384088479345, 0.161637160455966, 0.161891262957074, 0.162146400553988, 0.162402577845737, 0.162659799459248, 0.162918070049544, 0.163177394299939, 0.163437776922235, 0.163699222656923, 0.16396173627338, 0.164225322570076, 0.164489986374778, 0.164755732544751, 0.1650
Re: [R] Error running MuMIn dredge function using glmer models
Martin Turcotte gmail.com> writes: > Dear list, > I am trying to use MuMIn to compare all possible mixed models > using the dredge function on binomial data but I > am getting an error message that I cannot decode. This error > only occurs when I use glmer. When I use an lmer > analysis on a different response variable every works great. > > Example using a simplified glmer model > global model: > mod<- glmer(cbind(st$X2.REP.LIVE, st$X2.REP.DEAD) ~ > DOMESTICATION*GLUC + (1|PAIR), data=st, > na.action=na.omit , family=binomial) > > The response variables are the number of survival and dead insects > (successes and failures) > DOMESTICATION is a 2 level factor. > GLUC is a continuous variable. > PAIR is coded as a factor or character (both ways fail). > > This model functions correctly but when I try it with dredge() > I get an error. > > g<- dredge(mod, beta=F, evaluate=F, rank='AIC') > Error in sprintf(gettext(fmt, domain = domain), ...) : > invalid type of argument[1]: 'symbol' > > When I try with another rank the same thing happens: > chat<- deviance(mod)/58 > g<- dredge(mod, beta=F, evaluate=F, rank='QAIC', chat=chat) > Error in sprintf(gettext(fmt, domain = domain), ...) : > invalid type of argument[1]: 'symbol' > > Any suggestions would be greatly appreciated > > thanks > > Martin Turcotte, Ph. D. > mart.turcotte gmail.com * Please send follow-ups to r-sig-mixed-mod...@r-project.org * I think we need a reproducible example: it works OK for the first two binomial GLMMs I tried: library(lme4) library(MuMIn) gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial) dredge(gm1) dredge(gm1,beta=FALSE,evaluate=FALSE) library(mlmRev) gm2 <- glmer(use ~ urban+age+livch+(1|district), Contraception, binomial) dredge(gm2) dredge(gm2,beta=FALSE,evaluate=FALSE) * It might be worth removing the 'st$' from your response variable, e.g. mod<- glmer(cbind(X2.REP.LIVE, X2.REP.DEAD) ~ DOMESTICATION*GLUC + (1|PAIR), data=st,family=binomial) * What are the results of traceback()? sessionInfo() ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems loading xlsx
does anyone know how to resolve Error: .onLoad failed in loadNamespace() for âxlsxjarsâ I canât seem to get this package to load on my windows 7 install of R3.02 Sent from Windows Mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R newbie - loop questions
Yep, that solves the problem and is much cleaner than what I was trying to do. KWS On Nov 7, 2013, at 10:01 AM, arun wrote: > Hi, > > May be this is what you wanted: > mat1 <- matrix(1:4,2,2) > results.norm <- mat1/rowSums(mat1) > > A.K. > > > > > On Thursday, November 7, 2013 12:27 PM, Karl Schelhammer > wrote: > Sorry for the confusion. results is a 2 x 2 matrix containing real positive > values. The goal is to divide each element of the matrix by the sum of the > elements in a row and store the result in results.norm. However, the loop > returns an error that I don't understand. Any thoughts? > > KWS > > On Nov 7, 2013, at 8:42 AM, smartpink...@yahoo.com wrote: > >> >> This is not clear. >> For example, >> v1 <- 24 #value >> r1 <- c(2,3) # vector with 2 elements >> my.function(v1,r1) >> #[1] 4.8 >> >> There is no reference to "results". If it is the output of my.function(), >> then there is only one element. Anyway, please show a reproducible example. >> >> >> >> >> Greetings all, I am attempting to run the following code: >> >> >> I recieve the following error: >> >> The function runs fine when I explicitly enter the subscripts in results and >> pass to my.function. Can someone else see what is wrong with this approach? >> >> Quoted from: >> http://r.789695.n4.nabble.com/R-newbie-loop-questions-tp4679975.html >> >> >> _ >> Sent from http://r.789695.n4.nabble.com >> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking datetime value
Hi, You can check the dataset for any patterns that are unique for the datetime values. For example: vec1 <- c(as.character(Sys.time()), format(Sys.time(), "%a %b %d %X %Y %Z"),4324343,format(Sys.time(),"%m-%d-%Y %H:%M:%S"),"aZZCRDDS") vec1[grepl(":",vec1)] #[1] "2013-11-07 15:04:27" "Thu Nov 07 03:04:27 PM 2013 EST" #[3] "11-07-2013 15:04:27" A.K. On Thursday, November 7, 2013 10:12 AM, vikrant wrote: Hi , I would like to check if a column contains datetime values. (column may contain any datetime format or just numbers or string). Request your help -- View this message in context: http://r.789695.n4.nabble.com/Checking-datetime-value-tp4679963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonnormal Residuals and GAMs
If you use GCV smoothness selection then, in the Gaussian case, the key assumptions are constant variance and independence. As with linear modelling, the normality assumption only comes in when you want to find confidence intervals or p-values. (The GM Thm does not require normality btw. but I don't know if it has a penalized analogue). With REML smoothness selection it's less clear (at least to me). Beyond Gaussian the situation is much as it is with GLMs. The key assumptions are independence and that the mean variance relationship is correct. The theory of quasi-likelihood tells you that you can make valid inference based only on specifying the mean-variance relationship for the response, rather than the whole distribution, with the price being a small loss of efficiency. It follows that getting the distribution exactly right is of secondary importance. It's also quite easy to be misled by normal qq plots of the deviance residuals when you have low count data. For example, section 4 of http://opus.bath.ac.uk/27091/1/qq_gam_resub.pdf shows a real example where the usual qq plots look awful, suggesting massive zero inflation, but if you compute the correct reference quantiles for the qq plot you find that there is nothing wrong and no evidence of zero inflation. best, Simon ps. in response to the follow up discussion: The default link depends on the family, rather than being a gam (or glm) default. Eg the default is log for the Poisson, but identity for the Gaussian. On 06/11/13 21:46, Collin Lynch wrote: Greetings, My question is more algorithmic than prectical. What I am trying to determine is, are the GAM algorithms used in the mgcv package affected by nonnormally-distributed residuals? As I understand the theory of linear models the Gauss-Markov theorem guarantees that least-squares regression is optimal over all unbiased estimators iff the data meet the conditions linearity, homoscedasticity, independence, and normally-distributed residuals. Absent the last requirement it is optimal but only over unbiased linear estimators. What I am trying to determine is whether or not it is necessary to check for normally-distributed errors in a GAM from mgcv. I know that the unsmoothed terms, if any, will be fitted by ordinary least-squares but I am unsure whether the default Penalized Iteratively Reweighted Least Squares method used in the package is also based upon this assumption or falls under any analogue to the Gauss-Markov Theorem. Thank you in advance for any help. Sincrely, Collin Lynch. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Same code - error in one PC but not in other
In the absence of any description of the computers themselves, it's hard to say, but there's a good chance that there is some object in your working environment that's being accessed by your function -- and that object exists on only one of the machines. I would suggest you use debug() or browser() to see what the value of HP and HP.LUT are in each case. Tursiops wrote > Hello >I am running exactly the same code on two different computers. In PC1 > of the >the code is working fine, whereas in PC2 I get this error >Error in HP.LUT[, 1] : incorrect number of dimensions >HP.LUT is an object within a function computed as HP.LUT <- which(HP > ==1, >arr.in=TRUE), where HP is a binary matrix. >HP.LUT is not returned for further use. I have made sure that no object > with >the name HP.LUT is present in either PC1 and 2, and I have used both > RStudio >and R 3.01. >So I am puzzled and wonder what the reason for this might be. >Any hint will be much appreciated. >Thank you very much for your attention >Juan A. Balbuena > >-- > >Dr. Juan A. Balbuena >Marine Zoology Unit >Cavanilles Institute of Biodiversity and Evolutionary Biology >University of >Valencia >[1]http://www.uv.es/~balbuena >P.O. Box 22085 >[2]http://www.uv.es/cavanilles/zoomarin/index.htm >46071 Valencia, Spain >[3]http://cetus.uv.es/mullpardb/index.html >e-mail: [4] > j.a.balbuena@ > tel. +34 963 543 658fax +34 963 543 733 > >NOTE! For shipments by EXPRESS COURIER use the following street > address: >C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. > > > References > >1. http://www.uv.es/%7Ebalbuena >2. http://www.uv.es/cavanilles/zoomarin/index.htm >3. http://cetus.uv.es/mullpardb/index.html >4. mailto: > j.a.balbuena@ > __ > R-help@ > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Same-code-error-in-one-PC-but-not-in-other-tp4679980p4679992.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behaviour when subsetting a data.frame
So is there a way to make the subsetting behave as expected? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Doran, Harold Sent: 07 November 2013 16:52 To: 'Eckstädt, Elisabeth'; r-help@R-project.org Subject: Re: [R] strange behaviour when subsetting a data.frame Yes, but notice that man2[3,] == .3 [1] FALSE This is because of issues of machine precision when dealing with floating points, not a problem in R. Comparisons for nearly equivalent numbers are done using all.equal() as shown below. > all.equal(man2[3,], .3) [1] TRUE -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eckstädt, Elisabeth Sent: Thursday, November 07, 2013 8:37 AM To: r-help@R-project.org Subject: [R] strange behaviour when subsetting a data.frame Hello everyone, I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is a minimal reproducable example. The data.frame cosists only of one column, which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the same with values from 1 to 10 is working out well (as expected). Beimischung=seq(0.1,1,0.1) man2=data.frame(Beimischung) subset(man2, Beimischung==0.3) #---> gives 0 rows subset(man2, Beimischung==0.1) ---> gives one row (as expected)# #also not working: man2$Beimischung3=man2$Beimischung*10 subset(man2, Beimischung3==3) --> gives 0 rows Does anybody have a clue for me? Thanks in advance Regards Elisabeth ___ Dipl.-Ing. Elisabeth Eckstädt Member of Academic Staff & PhD Student Technische Universität Dresden Faculty of Mechanical Engineering Institue of Power Engineering Chair of Building Energy Systems and Heat Supply Phone +49 351 463 34709 Fax +49 351 463 37076 Web http://tu-dresden.de/mw/iet/ew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error running MuMIn dredge function using glmer models
Dear list, I am trying to use MuMIn to compare all possible mixed models using the dredge function on binomial data but I am getting an error message that I cannot decode. This error only occurs when I use glmer. When I use an lmer analysis on a different response variable every works great. Example using a simplified glmer model global model: mod<- glmer(cbind(st$X2.REP.LIVE, st$X2.REP.DEAD) ~ DOMESTICATION*GLUC + (1|PAIR), data=st, na.action=na.omit , family=binomial) The response variables are the number of survival and dead insects (successes and failures) DOMESTICATION is a 2 level factor. GLUC is a continuous variable. PAIR is coded as a factor or character (both ways fail). This model functions correctly but when I try it with dredge() I get an error. g<- dredge(mod, beta=F, evaluate=F, rank='AIC') Error in sprintf(gettext(fmt, domain = domain), ...) : invalid type of argument[1]: 'symbol' When I try with another rank the same thing happens: chat<- deviance(mod)/58 g<- dredge(mod, beta=F, evaluate=F, rank='QAIC', chat=chat) Error in sprintf(gettext(fmt, domain = domain), ...) : invalid type of argument[1]: 'symbol' Any suggestions would be greatly appreciated thanks Martin Turcotte, Ph. D. mart.turco...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all combinations with replacement not ordered
... and actually, since u can be assumed to be of the form shown, v <-do.call(expand.grid, split(rep(u,len),rep(u,e=len))) should do. -- Bert On Thu, Nov 7, 2013 at 10:06 AM, Bert Gunter wrote: > Well, you can create the expand.grid data frame programmatically via: > > u <- 1:3 > len <- length(u) > v <-do.call(expand.grid, split(rep(u,len),rep(seq_len(len),e=len))) > > And then you can use unique.array to get the unique rows after the sort: > > unique(t(apply(v,1,sort))) > > However, I agree with your sentiments. Not only does this seem > inelegant, but it will not scale well. > > I would imagine a recursive approach would be more efficient -- as > then only the sets you need would be produced and there'd be no > sorting, etc. -- but I have neither the time nor interest to work it > out. > > ... and I bet someone already has done this in some R package anyway. > > Cheers, > Bert > > On Thu, Nov 7, 2013 at 9:04 AM, Ted Harding wrote: >> On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote: >>> Hello! >>> >>> I need to obtain all possible combinations with replacement when >>> order is not important. >>> E.g. I have a population x{1,2,3}. >>> So I can get (choose(3+3-1,3)=) 10 combinations from this population >>> with 'size=3'. >>> How can I get a list of all that combinations? >>> >>> I have tried 'expand.grid()' and managed to get only samples where >>> order is important. >>> 'combn()' gave me samples without replacement. >>> >>> Best regards, >>> Konstantin Tretyakov. >> >> >From your description I infer that, from {1,2,3}, you want the result: >> >> 1 1 1 >> 1 1 2 >> 1 1 3 >> 1 2 2 >> 1 2 3 >> 1 3 3 >> 2 2 2 >> 2 2 3 >> 2 3 3 >> 3 3 3 >> >> The following will do that: >> >> u <- c(1,2,3) >> unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1))) >> >> # [,1] [,2] [,3] >> # [1,]111 >> # [2,]112 >> # [3,]113 >> # [4,]122 >> # [5,]123 >> # [6,]133 >> # [7,]222 >> # [9,]233 >> #[10,]333 >> >> There may be a simpler way! >> Ted. >> >> - >> E-Mail: (Ted Harding) >> Date: 07-Nov-2013 Time: 17:04:50 >> This message was sent by XFMail >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > (650) 467-7374 -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AER ivreg diagnostics: question on DF of Sargan test
Hélène, thanks for spotting this! This is a bug in "AER". I had just tested the new diagnostics for regressions with 1 endogenous variable and hence never noticed the problem. But if there are > 1 endogenous variables, the df used in ivreg() (and hence the associated p-values) are too large. I've fixed the problem in AER's devel-version and will release it on CRAN in the next days. Thanks & best regards, Z On Thu, 7 Nov 2013, Hélène Huber-Yahi wrote: Hello, I'm new to R and I'm currently learning to use package AER, which is extremely comprehensive and useful. I have one question related to the diagnostics after ivreg: if I understood well, the Sargan test provided states that the statistic should follow a Chi squared of degrees of freedom equal to the number of excluded instruments minus one. But I read many times that the degrees of freedom of this statistic is supposed to equal the number of overidentifying restrictions, i.e. the number of excluded instruments minus the number of endogenous variables tested. When comparing with Stata results (estat overid after ivreg, same with ivreg2 output), the statistic is the same as the one provided by R, only the p-value changes because the distribution chosen is different. Is this command using a different flavor of the Sargan test ? I did not find the details in the AER pdf. I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The output I get from R is the following, where the Sargan DF is equal to 5, while I thought it would be equal to 6-3=3. The data comes from Verbeek's econometrics textbook and the example replicates the one in the book. Dependent variable is log of wage, endogenous variables are education, experience and its square (3 of them), excluded instruments are parents' education etc (6 of them). ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76,+ data = school)> > summary(ivmodel,diagnostics=TRUE) Call: ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76, data = school) Residuals: Min 1Q Median 3Q Max -1.63375 -0.22253 0.02403 0.24350 1.32911 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.6064811 0.1126195 40.903 < 2e-16 *** ed76 0.0848507 0.0066061 12.844 < 2e-16 *** exp760.0796432 0.0164406 4.844 1.34e-06 *** exp762 -0.0020376 0.0008257 -2.468 0.0136 * black -0.1726723 0.0195231 -8.845 < 2e-16 *** smsa76 0.1521693 0.0165207 9.211 < 2e-16 *** south76 -0.1204765 0.0154904 -7.778 1.01e-14 *** Diagnostic tests: df1 df2 statistic p-value Weak instruments6 2987 965.450 <2e-16 *** Wu-Hausman 2 2988 1.949 0.143 Sargan 5 NA 3.868 0.569 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.3753 on 2990 degrees of freedom Multiple R-Squared: 0.2868, Adjusted R-squared: 0.2854 Wald test: 178.6 on 6 and 2990 DF, p-value: < 2.2e-16 Would this be caused by the fact that I'm using 2SLS and not GMM (at least I suppose) to estimate the IV model ? I apologize if this comes from a misunderstanding from my part, and I thank you in advance for your help. Best, H. Huber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all combinations with replacement not ordered
Well, you can create the expand.grid data frame programmatically via: u <- 1:3 len <- length(u) v <-do.call(expand.grid, split(rep(u,len),rep(seq_len(len),e=len))) And then you can use unique.array to get the unique rows after the sort: unique(t(apply(v,1,sort))) However, I agree with your sentiments. Not only does this seem inelegant, but it will not scale well. I would imagine a recursive approach would be more efficient -- as then only the sets you need would be produced and there'd be no sorting, etc. -- but I have neither the time nor interest to work it out. ... and I bet someone already has done this in some R package anyway. Cheers, Bert On Thu, Nov 7, 2013 at 9:04 AM, Ted Harding wrote: > On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote: >> Hello! >> >> I need to obtain all possible combinations with replacement when >> order is not important. >> E.g. I have a population x{1,2,3}. >> So I can get (choose(3+3-1,3)=) 10 combinations from this population >> with 'size=3'. >> How can I get a list of all that combinations? >> >> I have tried 'expand.grid()' and managed to get only samples where >> order is important. >> 'combn()' gave me samples without replacement. >> >> Best regards, >> Konstantin Tretyakov. > > >From your description I infer that, from {1,2,3}, you want the result: > > 1 1 1 > 1 1 2 > 1 1 3 > 1 2 2 > 1 2 3 > 1 3 3 > 2 2 2 > 2 2 3 > 2 3 3 > 3 3 3 > > The following will do that: > > u <- c(1,2,3) > unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1))) > > # [,1] [,2] [,3] > # [1,]111 > # [2,]112 > # [3,]113 > # [4,]122 > # [5,]123 > # [6,]133 > # [7,]222 > # [9,]233 > #[10,]333 > > There may be a simpler way! > Ted. > > - > E-Mail: (Ted Harding) > Date: 07-Nov-2013 Time: 17:04:50 > This message was sent by XFMail > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable standardization in manova() call
2013/11/6 Michael Friendly : > On 11/4/2013 10:45 AM, Sergio Fonda wrote: >> >> Hi, >> I'm not able to get information about the following question: >> >> is the variables standardization a default option in manova() (stats >> package)? >> Or if you want to compare variables with different units or scales and >> rather different variances, you have to previously standardize the >> variables ? >> > > If you mean the response variables, manova() does not require equal > variances and does not standardize. > > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele StreetWeb: http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA Hi Michael, thank you for your reply. However, your remark was not so clear to me so I attach a short script I tried to launch. The comparison between results got from MANOVA() call with the non-standardized and standardized version of the same data frame, convinced me that it is not necessary to standardize data before calling MANOVA. The only difference you get is in the residuals values, of course. Could you kindly confirm my conclusion? All the best, Sergio Fonda __ set.seed(1234) # non- standardized series x1 = rnorm(n=10, mean=50, sd=11) x2 = rnorm(n=10, mean=93, sd=23) x1 = rnorm(n=10, mean=217, sd=52) fact= rep(1:2,20) glob1=data.frame(cbind(x1,x2,x3,fact)) fitta1=manova(cbind(x1,x2,x3)~fact, data=glob1) fitta1.wilks=summary(fitta1, test="Wilks") summary.aov(fitta1) #after standardization x.stand=scale(glob1[,-4]) glob2=data.frame(x.stand,fact) fitta2=manova(cbind(x1,x2,x3)~fact, data=glob2) fitta2.wilks=summary(fitta2, test="Wilks") summary.aov(fitta2) -- Sergio Fonda Associate Professor of Bioengineering Department of Life Sciences University of Modena and Reggio Emilia Modena, Italy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding the complementary log-link to binomial() and make.link()
Ken Knoblauch inserm.fr> writes: > > Roland Deutsch tuwien.ac.at> writes: > > in my research I frequently work with binomial > response models, which > > are of course part of the generalized linear > models. While I do use > > common link functions such as the logit, probit > and cloglog, I often > > have the need of invoking the lesser-known > Complementary Log link > > (Walter W. Piegorsch, 1992, "Complementary Log > Regression for > > Generalized Linear Models ", The American > Statistician , Vol. 46, No. 2, > > pp. 94-99 ) which is based on the exponential > distribution. > > > > Before the release of R 3.0, I simply could > do so by adding the > > following lines to the long switch command > in make.link(...): > > > > clog = { > > linkfun <- function(mu) qexp(mu) > > linkinv <- function(eta) pmax(.Machine$double.eps,pexp(eta)) > > mu.eta <- function(eta) pmax(dexp(eta), .Machine$double.eps) > > valideta <- function(eta) all(eta > 0) > > }, > > > > and then add "clog" to the okLinks vector in > binomial(). However, I > > I wouldn't mess with make.link. Why don't you just > define your own custom link, for example following the > example under help(family)? This is what I did in > the psyphy package and I just checked and > they all still work under the current version of R. > > > Thanks a lot, > > Roland Deutsch > This seems to work: linkinfo <- list(link="clog", linkfun=function(mu) qexp(mu), linkinv=function(eta) pmax(.Machine$double.eps,pexp(eta)), mu.eta=function(eta) pmax(dexp(eta), .Machine$double.eps), valideta=function(eta) all(eta > 0)) binomClog <- binomial() for (i in names(linkinfo)) binomClog[[i]] <- linkinfo[[i]] set.seed(101) d <- data.frame(x=runif(1000)) d$y <- rbinom(1000,prob=binomClog$linkinv(1+2*d$x),size=1) library(lattice) xyplot(y~x,data=d,type=c("p","smooth")) g1 <- glm(y~x,data=d,family=binomClog) library(MASS) confint(g1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all combinations with replacement not ordered
On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote: > Hello! > > I need to obtain all possible combinations with replacement when > order is not important. > E.g. I have a population x{1,2,3}. > So I can get (choose(3+3-1,3)=) 10 combinations from this population > with 'size=3'. > How can I get a list of all that combinations? > > I have tried 'expand.grid()' and managed to get only samples where > order is important. > 'combn()' gave me samples without replacement. > > Best regards, > Konstantin Tretyakov. >From your description I infer that, from {1,2,3}, you want the result: 1 1 1 1 1 2 1 1 3 1 2 2 1 2 3 1 3 3 2 2 2 2 2 3 2 3 3 3 3 3 The following will do that: u <- c(1,2,3) unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1))) # [,1] [,2] [,3] # [1,]111 # [2,]112 # [3,]113 # [4,]122 # [5,]123 # [6,]133 # [7,]222 # [9,]233 #[10,]333 There may be a simpler way! Ted. - E-Mail: (Ted Harding) Date: 07-Nov-2013 Time: 17:04:50 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behaviour when subsetting a data.frame
Yes, what I would do is write a function around the all.equal() comparisons on the data to see which values are "nearly" identical to the ones desired I the subset. If TRUE, I would retain those values and discard the others. -Original Message- From: Simon Hayward [mailto:simon.hayw...@infinitycloud.com] Sent: Thursday, November 07, 2013 11:59 AM To: Doran, Harold; 'Eckstädt, Elisabeth'; r-help@R-project.org Subject: RE: strange behaviour when subsetting a data.frame So is there a way to make the subsetting behave as expected? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Doran, Harold Sent: 07 November 2013 16:52 To: 'Eckstädt, Elisabeth'; r-help@R-project.org Subject: Re: [R] strange behaviour when subsetting a data.frame Yes, but notice that man2[3,] == .3 [1] FALSE This is because of issues of machine precision when dealing with floating points, not a problem in R. Comparisons for nearly equivalent numbers are done using all.equal() as shown below. > all.equal(man2[3,], .3) [1] TRUE -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eckstädt, Elisabeth Sent: Thursday, November 07, 2013 8:37 AM To: r-help@R-project.org Subject: [R] strange behaviour when subsetting a data.frame Hello everyone, I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is a minimal reproducable example. The data.frame cosists only of one column, which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the same with values from 1 to 10 is working out well (as expected). Beimischung=seq(0.1,1,0.1) man2=data.frame(Beimischung) subset(man2, Beimischung==0.3) #---> gives 0 rows subset(man2, Beimischung==0.1) ---> gives one row (as expected)# #also not working: man2$Beimischung3=man2$Beimischung*10 subset(man2, Beimischung3==3) --> gives 0 rows Does anybody have a clue for me? Thanks in advance Regards Elisabeth ___ Dipl.-Ing. Elisabeth Eckstädt Member of Academic Staff & PhD Student Technische Universität Dresden Faculty of Mechanical Engineering Institue of Power Engineering Chair of Building Energy Systems and Heat Supply Phone +49 351 463 34709 Fax +49 351 463 37076 Web http://tu-dresden.de/mw/iet/ew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behaviour when subsetting a data.frame
Yes, but notice that man2[3,] == .3 [1] FALSE This is because of issues of machine precision when dealing with floating points, not a problem in R. Comparisons for nearly equivalent numbers are done using all.equal() as shown below. > all.equal(man2[3,], .3) [1] TRUE -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eckstädt, Elisabeth Sent: Thursday, November 07, 2013 8:37 AM To: r-help@R-project.org Subject: [R] strange behaviour when subsetting a data.frame Hello everyone, I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is a minimal reproducable example. The data.frame cosists only of one column, which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the same with values from 1 to 10 is working out well (as expected). Beimischung=seq(0.1,1,0.1) man2=data.frame(Beimischung) subset(man2, Beimischung==0.3) #---> gives 0 rows subset(man2, Beimischung==0.1) ---> gives one row (as expected)# #also not working: man2$Beimischung3=man2$Beimischung*10 subset(man2, Beimischung3==3) --> gives 0 rows Does anybody have a clue for me? Thanks in advance Regards Elisabeth ___ Dipl.-Ing. Elisabeth Eckstädt Member of Academic Staff & PhD Student Technische Universität Dresden Faculty of Mechanical Engineering Institue of Power Engineering Chair of Building Energy Systems and Heat Supply Phone +49 351 463 34709 Fax +49 351 463 37076 Web http://tu-dresden.de/mw/iet/ew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all combinations with replacement not ordered
Is this what you want? f <- function (x, m) combn(x + m - 1, m) - seq_len(m) + 1 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Konstantin Tretiakov > Sent: Thursday, November 07, 2013 5:38 AM > To: r-help@r-project.org > Subject: [R] all combinations with replacement not ordered > > Hello! > > I need to obtain all possible combinations with replacement when order is > not important. > E.g. I have a population x{1,2,3}. > So I can get (choose(3+3-1,3)=) 10 combinations from this population with > 'size=3'. > How can I get a list of all that combinations? > > I have tried 'expand.grid()' and managed to get only samples where order is > important. > 'combn()' gave me samples without replacement. > > Best regards, > Konstantin Tretyakov. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Same code - error in one PC but not in other
Hello I am running exactly the same code on two different computers. In PC1 of the the code is working fine, whereas in PC2 I get this error Error in HP.LUT[, 1] : incorrect number of dimensions HP.LUT is an object within a function computed as HP.LUT <- which(HP ==1, arr.in=TRUE), where HP is a binary matrix. HP.LUT is not returned for further use. I have made sure that no object with the name HP.LUT is present in either PC1 and 2, and I have used both RStudio and R 3.01. So I am puzzled and wonder what the reason for this might be. Any hint will be much appreciated. Thank you very much for your attention Juan A. Balbuena -- Dr. Juan A. Balbuena Marine Zoology Unit Cavanilles Institute of Biodiversity and Evolutionary Biology University of Valencia [1]http://www.uv.es/~balbuena P.O. Box 22085 [2]http://www.uv.es/cavanilles/zoomarin/index.htm 46071 Valencia, Spain [3]http://cetus.uv.es/mullpardb/index.html e-mail: [4]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733 NOTE! For shipments by EXPRESS COURIER use the following street address: C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. References 1. http://www.uv.es/%7Ebalbuena 2. http://www.uv.es/cavanilles/zoomarin/index.htm 3. http://cetus.uv.es/mullpardb/index.html 4. mailto:j.a.balbu...@uv.es __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding the complementary log-link to binomial() and make.link()
Roland Deutsch tuwien.ac.at> writes: > in my research I frequently work with binomial response models, which > are of course part of the generalized linear models. While I do use > common link functions such as the logit, probit and cloglog, I often > have the need of invoking the lesser-known Complementary Log link > (Walter W. Piegorsch, 1992, "Complementary Log Regression for > Generalized Linear Models ", The American Statistician , Vol. 46, No. 2, > pp. 94-99 ) which is based on the exponential distribution. > > Before the release of R 3.0, I simply could do so by adding the > following lines to the long switch command in make.link(...): > > clog = { > linkfun <- function(mu) qexp(mu) > linkinv <- function(eta) pmax(.Machine$double.eps,pexp(eta)) > mu.eta <- function(eta) pmax(dexp(eta), .Machine$double.eps) > valideta <- function(eta) all(eta > 0) > }, > > and then add "clog" to the okLinks vector in binomial(). However, I I wouldn't mess with make.link. Why don't you just define your own custom link, for example following the example under help(family)? This is what I did in the psyphy package and I just checked and they all still work under the current version of R. > Thanks a lot, > Roland Deutsch -- Kenneth Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen Lépine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading in stata file with read.dta works in R x64 3.0.1 and crashes R x64 3.0.2
sorry i missed that. install.packages('foreign') solved the problem, thank you!! :) On Thu, Nov 7, 2013 at 10:41 AM, Prof Brian Ripley wrote: > See the posting guide: your version of 'foreign' is not current: > http://cran.r-project.org/web/packages/foreign/index.html . > > Please update your packages and try again. (This looks very like a bug in > recently contributed code that has already been fixed.) > > > On 07/11/2013 13:40, Anthony Damico wrote: > >> this file >> >> >> http://www.electionstudies.org/studypages/data/anes_ >> mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip >> >> can be downloaded after free registration on this page >> >> http://electionstudies.org/studypages/download/registration_form.php >> >> >> imports properly in windows R x64 3.0.1 but causes R x64 3.0.2 to crash >> >> >> the crash occurs at the line >> >> rval <- .External(do_readStata, file) >> >> >> at the bottom of this e-mail, i have included code that will reproduce the >> problem exactly in 3.0.2 but not 3.0.1 >> >> >> and if i don't load the memisc and Hmisc packages, i get an error instead >> of a crash.. >> >> >> x <- read.dta( fp , convert.factors = FALSE ) >>> >> Error in read.dta(fp, convert.factors = FALSE) : >>Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL' >> >> >> ..but if i load both (as shown in the script below), R just dies. >> >> >> = >> >> >> sessionInfo for 3.0.1 (the version that works) >> >> sessionInfo() >>> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252LC_MONETARY=English_United States.1252 >> LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid splines stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] memisc_0.96-6 MASS_7.3-26 lattice_0.20-15 Hmisc_3.12-2 >> Formula_1.1-1 survival_2.37-4 httr_0.2stringr_0.6.2 >> foreign_0.8-53 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.14.4 digest_0.6.3 RCurl_1.95-4.1 rpart_4.1-1 >> tools_3.0.1 >> >> >> = >> >> sessionInfo in 3.0.2 right before the crash -- >> >> sessionInfo() >>> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid splines stats graphics grDevices utils datasets >> [8] methods base >> >> other attached packages: >> [1] memisc_0.96-6 MASS_7.3-29 lattice_0.20-23 Hmisc_3.12-2 >> [5] Formula_1.1-1 survival_2.37-4 httr_0.2stringr_0.6.2 >> [9] foreign_0.8-55 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.14.4 digest_0.6.3 RCurl_1.95-4.1 rpart_4.1-3 >> tools_3.0.2 >> >> >> >> >> = >> >> >> # code that works in 3.0.1 and crashes in 3.0.2 -- >> >> >> setwd( "C:/my directory/anes") >> your.username <- "em...@address.com" >> your.password <- "your.password" >> >> >> >> >> require(foreign) # load foreign package (converts data files into R) >> require(stringr)# load stringr package (manipulates character strings >> easily) >> require(httr)# load httr package (downloads files from the web, >> with SSL and cookies) >> require(Hmisc) # load Hmisc package (loads spss.get function) >> require(memisc)# load memisc package (loads spss portable table >> import functions) >> >> # construct a list containing the pre-specified login information >> values <- >> list( >> "email" = your.username , >> "pass" = your.password >> ) >> >> # contact the anes website to log in >> POST( "http://www.electionstudies.org/studypages/download/login- >> process.php" >> , body = values ) >> >> # download the `all_datasets` page to figure out what needs to be >> downloaded >> z <- GET( " >> http://www.electionstudies.org/studypages/download/ >> datacenter_all_datasets.php" >> ) >> >> # create a temporary file and a temporary directory >> tf <- tempfile() ; td <- tempdir() >> >> # download the damn file >> z <- GET( " >> http://www.electionstudies.org/studypages/data/anes_ >> mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip" >> ) >> >> # save the result to a temporary file on the local disk >> writeBin( z$content , tf ) >> >> # unzip that temporary file to an equally-temporary directory >> z <- unzip( tf , exdir = td ) >> >> # find which one it is from among everything zipped up together.. >> fp <- z[ grep( 'dta' , z ) ] >> >> # ..import that puppy >> x <- read.dta( fp , convert.factors = FALSE ) >> # crash. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org maili
Re: [R] reading in stata file with read.dta works in R x64 3.0.1 and crashes R x64 3.0.2
See the posting guide: your version of 'foreign' is not current: http://cran.r-project.org/web/packages/foreign/index.html . Please update your packages and try again. (This looks very like a bug in recently contributed code that has already been fixed.) On 07/11/2013 13:40, Anthony Damico wrote: this file http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip can be downloaded after free registration on this page http://electionstudies.org/studypages/download/registration_form.php imports properly in windows R x64 3.0.1 but causes R x64 3.0.2 to crash the crash occurs at the line rval <- .External(do_readStata, file) at the bottom of this e-mail, i have included code that will reproduce the problem exactly in 3.0.2 but not 3.0.1 and if i don't load the memisc and Hmisc packages, i get an error instead of a crash.. x <- read.dta( fp , convert.factors = FALSE ) Error in read.dta(fp, convert.factors = FALSE) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL' ..but if i load both (as shown in the script below), R just dies. = sessionInfo for 3.0.1 (the version that works) sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid splines stats graphics grDevices utils datasets methods base other attached packages: [1] memisc_0.96-6 MASS_7.3-26 lattice_0.20-15 Hmisc_3.12-2 Formula_1.1-1 survival_2.37-4 httr_0.2stringr_0.6.2 foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.4 digest_0.6.3 RCurl_1.95-4.1 rpart_4.1-1 tools_3.0.1 = sessionInfo in 3.0.2 right before the crash -- sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid splines stats graphics grDevices utils datasets [8] methods base other attached packages: [1] memisc_0.96-6 MASS_7.3-29 lattice_0.20-23 Hmisc_3.12-2 [5] Formula_1.1-1 survival_2.37-4 httr_0.2stringr_0.6.2 [9] foreign_0.8-55 loaded via a namespace (and not attached): [1] cluster_1.14.4 digest_0.6.3 RCurl_1.95-4.1 rpart_4.1-3 tools_3.0.2 = # code that works in 3.0.1 and crashes in 3.0.2 -- setwd( "C:/my directory/anes") your.username <- "em...@address.com" your.password <- "your.password" require(foreign) # load foreign package (converts data files into R) require(stringr)# load stringr package (manipulates character strings easily) require(httr)# load httr package (downloads files from the web, with SSL and cookies) require(Hmisc) # load Hmisc package (loads spss.get function) require(memisc)# load memisc package (loads spss portable table import functions) # construct a list containing the pre-specified login information values <- list( "email" = your.username , "pass" = your.password ) # contact the anes website to log in POST( "http://www.electionstudies.org/studypages/download/login-process.php"; , body = values ) # download the `all_datasets` page to figure out what needs to be downloaded z <- GET( " http://www.electionstudies.org/studypages/download/datacenter_all_datasets.php"; ) # create a temporary file and a temporary directory tf <- tempfile() ; td <- tempdir() # download the damn file z <- GET( " http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip"; ) # save the result to a temporary file on the local disk writeBin( z$content , tf ) # unzip that temporary file to an equally-temporary directory z <- unzip( tf , exdir = td ) # find which one it is from among everything zipped up together.. fp <- z[ grep( 'dta' , z ) ] # ..import that puppy x <- read.dta( fp , convert.factors = FALSE ) # crash. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://s
Re: [R] strange behaviour when subsetting a data.frame
R FAQ 7.31. http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f On Thu, Nov 7, 2013 at 8:36 AM, Eckstädt, Elisabeth wrote: > Hello everyone, > I am experiencing a unfathomable benaviour of "subset" on a data.frame. This > is a minimal reproducable example. The data.frame cosists only of one column, > which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is > working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the > same with values from 1 to 10 is working out well (as expected). > > Beimischung=seq(0.1,1,0.1) > man2=data.frame(Beimischung) > subset(man2, Beimischung==0.3) > #---> gives 0 rows > subset(man2, Beimischung==0.1) > ---> gives one row (as expected)# > > #also not working: > > man2$Beimischung3=man2$Beimischung*10 > subset(man2, Beimischung3==3) > --> gives 0 rows > > Does anybody have a clue for me? > Thanks in advance > Regards > Elisabeth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Checking datetime value
Hi , I would like to check if a column contains datetime values. (column may contain any datetime format or just numbers or string). Request your help -- View this message in context: http://r.789695.n4.nabble.com/Checking-datetime-value-tp4679963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strange behaviour when subsetting a data.frame
Hello everyone, I am experiencing a unfathomable benaviour of "subset" on a data.frame. This is a minimal reproducable example. The data.frame cosists only of one column, which contains 10 ascending values (from 0.1 to 1). Subsetting for 0.1 is working (gives me one row), subsetting for 0.3 gives me zero rows? Doing the same with values from 1 to 10 is working out well (as expected). Beimischung=seq(0.1,1,0.1) man2=data.frame(Beimischung) subset(man2, Beimischung==0.3) #---> gives 0 rows subset(man2, Beimischung==0.1) ---> gives one row (as expected)# #also not working: man2$Beimischung3=man2$Beimischung*10 subset(man2, Beimischung3==3) --> gives 0 rows Does anybody have a clue for me? Thanks in advance Regards Elisabeth ___ Dipl.-Ing. Elisabeth Eckstädt Member of Academic Staff & PhD Student Technische Universität Dresden Faculty of Mechanical Engineering Institue of Power Engineering Chair of Building Energy Systems and Heat Supply Phone +49 351 463 34709 Fax +49 351 463 37076 Web http://tu-dresden.de/mw/iet/ew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] all combinations with replacement not ordered
Hello! I need to obtain all possible combinations with replacement when order is not important. E.g. I have a population x{1,2,3}. So I can get (choose(3+3-1,3)=) 10 combinations from this population with 'size=3'. How can I get a list of all that combinations? I have tried 'expand.grid()' and managed to get only samples where order is important. 'combn()' gave me samples without replacement. Best regards, Konstantin Tretyakov. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AER ivreg diagnostics: question on DF of Sargan test
Hello, I'm new to R and I'm currently learning to use package AER, which is extremely comprehensive and useful. I have one question related to the diagnostics after ivreg: if I understood well, the Sargan test provided states that the statistic should follow a Chi squared of degrees of freedom equal to the number of excluded instruments minus one. But I read many times that the degrees of freedom of this statistic is supposed to equal the number of overidentifying restrictions, i.e. the number of excluded instruments minus the number of endogenous variables tested. When comparing with Stata results (estat overid after ivreg, same with ivreg2 output), the statistic is the same as the one provided by R, only the p-value changes because the distribution chosen is different. Is this command using a different flavor of the Sargan test ? I did not find the details in the AER pdf. I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The output I get from R is the following, where the Sargan DF is equal to 5, while I thought it would be equal to 6-3=3. The data comes from Verbeek's econometrics textbook and the example replicates the one in the book. Dependent variable is log of wage, endogenous variables are education, experience and its square (3 of them), excluded instruments are parents' education etc (6 of them). > ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | > daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + > south76,+ data = school)> > summary(ivmodel,diagnostics=TRUE) Call: ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76, data = school) Residuals: Min 1Q Median 3Q Max -1.63375 -0.22253 0.02403 0.24350 1.32911 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.6064811 0.1126195 40.903 < 2e-16 *** ed76 0.0848507 0.0066061 12.844 < 2e-16 *** exp760.0796432 0.0164406 4.844 1.34e-06 *** exp762 -0.0020376 0.0008257 -2.468 0.0136 * black -0.1726723 0.0195231 -8.845 < 2e-16 *** smsa76 0.1521693 0.0165207 9.211 < 2e-16 *** south76 -0.1204765 0.0154904 -7.778 1.01e-14 *** Diagnostic tests: df1 df2 statistic p-value Weak instruments6 2987 965.450 <2e-16 *** Wu-Hausman 2 2988 1.949 0.143 Sargan 5 NA 3.868 0.569 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.3753 on 2990 degrees of freedom Multiple R-Squared: 0.2868, Adjusted R-squared: 0.2854 Wald test: 178.6 on 6 and 2990 DF, p-value: < 2.2e-16 Would this be caused by the fact that I'm using 2SLS and not GMM (at least I suppose) to estimate the IV model ? I apologize if this comes from a misunderstanding from my part, and I thank you in advance for your help. Best, H. Huber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding the complementary log-link to binomial() and make.link()
Dear R Help, in my research I frequently work with binomial response models, which are of course part of the generalized linear models. While I do use common link functions such as the logit, probit and cloglog, I often have the need of invoking the lesser-known Complementary Log link (Walter W. Piegorsch, 1992, "Complementary Log Regression for Generalized Linear Models ", The American Statistician , Vol. 46, No. 2, pp. 94-99 ) which is based on the exponential distribution. Before the release of R 3.0, I simply could do so by adding the following lines to the long switch command in make.link(...): clog = { linkfun <- function(mu) qexp(mu) linkinv <- function(eta) pmax(.Machine$double.eps,pexp(eta)) mu.eta <- function(eta) pmax(dexp(eta), .Machine$double.eps) valideta <- function(eta) all(eta > 0) }, and then add "clog" to the okLinks vector in binomial(). However, I recently installed R 3.0.2 and cannot get the clog link to work. I guess it has to do do with the fact that both make.link and binomial are residing in the Global environment, and does I do not know how to access the C routine C_binomial_dev_resids to compute the deviance residuals within binomial(). While I would appreciate any help on this issue, I also thought it may benefit other users in my research area to have the complementary log link included in make.link. Any chance of this happening? Thanks a lot, Roland Deutsch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loop for backtesting
First of all sorry for my bad english but its not my native language. I am working on a paper on Portfolio Optimization with Markowitz and Lower Partial Moments. I want to compare the returns of the minimum variance portfolios from booth methods. First of all i have an in-sample multivariate timeseries from 4 stocks reaching from 1.october.2011 to 1.october 2012 (log-returns, daily data, 257 observations for each stock). I start to optimize my portfolio using the package tseries as follow: portfolio.optim(x,pm=mean(x),riskless=F,rf=0,shorts=F,reslow=NULL,reshigh=NULL,covmat=cov(x)) with this i get the weights, the mean return of the whole period, the standard deviation and the returns on each day for my in-sample optimal portfolio Markowitz portfolio. The out of sample data reaches from 2.october.2012 to 1.october.2013 (log-returns,daily data, 253 observations for each stock, again a multivariate time series). Now i want to optimize the Portfolio 253 times. Each time the log-returns for one day should be added to the original in-sample timeseries (first optimization 257 in-sample data plus the first from the out of sample data and so on). Now i should get new weights for every of the 253 periods and therefor new returns for the portfolio every period. My advisor at the university told me i cant use backtest packages cause they cant handle the Lower Partial Moments part of my analysis. The problem is just for the markowitz portfolio optimization. I hope you can help me with my problem greetings wintwin111 -- View this message in context: http://r.789695.n4.nabble.com/loop-for-backtesting-tp4679962.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Nadaraya-Watson kernel
Use KernSmooth (one of the recommended packages that are included in R distribution). E.g., > library(KernSmooth) KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 > x <- seq(0, 1, length=201) > y <- 4 * cos(2*pi*x) + rnorm(x) > f <- locpoly(x, y, degree=0, kernel="epan", bandwidth=.1) > plot(x, y) > lines(f, lwd=2) Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ms khulood aljehani Sent: Tuesday, November 05, 2013 9:49 AM To: r-h...@stat.math.ethz.ch Subject: [R] FW: Nadaraya-Watson kernel From: aljehan...@hotmail.com To: r-help@r-project.org Subject: Nadaraya-Watson kernel Date: Tue, 5 Nov 2013 17:42:13 +0300 Hello i want to compute the Nadaraya-Watson kernel estimation when the kernel function is Epanchincov kernel i use the command ksmooth(x, y, kernel="normal", bandwidth ,) the argmunt ( kernel="normal" ) accept normal and box kernels i want to compute it if the kerenl = Epanchincov thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with interaction in lmer even after creating an "interaction variable"
a_lampei uni-tuebingen.de> writes: > > Dear all, > I have a problem with interactions in lmer. I have 2 factors (garden and > gebiet) which interact, plus one other variable (home), > dataframe arr. When > I put: > / > lmer (biomass ~ home + garden:gebiet + ( 1|Block), data = arr)/ > > it writes: > /Error in lme4::lFormula(formula = biomass ~ home + garden:gebiet > + (1 | : > rank of X = 28 < ncol(X) = 30/ > > In the lmer help I found out that if not all combination of > the interaction > are realized, lmer has problems and one should do new variable using > "interaction", which I did: > / > arr$agg <- interaction (arr$gebiet, arr$garden, drop = TRUE)/ > > when I fit the interaction term now: > / > lmer (biomass ~ home + agg+ ( 1|Block), data = arr)/ > > The error does not change: > / > Error in lme4::lFormula(formula = biomass ~ home + agg + (1 | Block), : > rank of X = 28 < ncol(X) = 29/ > > No NAs are in the given variables in the dataset. > > Interestingly it works when I put only given interaction like > > /lmer (biomass ~ agg + ( 1|Block), data = arr)/ > > Even following models work: > /lmer (biomass ~ gebiet*garden + ( 1|Block), data = arr) > lmer (biomass ~ garden + garden:gebiet +( 1|Block), data = arr)/ > > But if I add the interaction term in th enew formate of > the new fariable, it > reports again the same error. > > /lmer (biomass ~ garden + agg +( 1|Block), data = arr)/ > > If I put any other variable from the very same dataframe > (not only variable > "home"), the error is reported again. > > I do not understand it, the new variable is just another > factor now, or? And > it is in the same dataframe, it has the same length. > > Does anyone have any idea? > > Thanks a lot, Anna > This probably belongs on r-sig-mixed-models. Presumably 'home' is still correlated with one of the columns of 'garden:gebiet'. Here's an example of how you can use svd() to find out which of your columns are collinear: set.seed(101) d <- data.frame(x=runif(100),y=1:100,z=2:101) m <- model.matrix(~x+y+z,data=d) s <- svd(m) zapsmall(s$d) ## [1] 828.8452 6.6989 2.6735 0. ## this tells us there is one collinear component zapsmall(s$v) ##[,1] [,2] [,3] [,4] ## [1,] -0.0105005 -0.7187260 0.3872847 0.5773503 ## [2,] -0.0054954 -0.4742947 -0.8803490 0.000 ## [3,] -0.7017874 0.3692117 -0.1945349 0.5773503 ## [4,] -0.7122879 -0.3495142 0.1927498 -0.5773503 ## this tells us that the first (intercept), third (y), ## and fourth (z) column of the model matrix are ## involved in the collinear term, i.e. ## 1+y-z is zero __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading in stata file with read.dta works in R x64 3.0.1 and crashes R x64 3.0.2
this file http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip can be downloaded after free registration on this page http://electionstudies.org/studypages/download/registration_form.php imports properly in windows R x64 3.0.1 but causes R x64 3.0.2 to crash the crash occurs at the line rval <- .External(do_readStata, file) at the bottom of this e-mail, i have included code that will reproduce the problem exactly in 3.0.2 but not 3.0.1 and if i don't load the memisc and Hmisc packages, i get an error instead of a crash.. > x <- read.dta( fp , convert.factors = FALSE ) Error in read.dta(fp, convert.factors = FALSE) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL' ..but if i load both (as shown in the script below), R just dies. = sessionInfo for 3.0.1 (the version that works) > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid splines stats graphics grDevices utils datasets methods base other attached packages: [1] memisc_0.96-6 MASS_7.3-26 lattice_0.20-15 Hmisc_3.12-2 Formula_1.1-1 survival_2.37-4 httr_0.2stringr_0.6.2 foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.4 digest_0.6.3 RCurl_1.95-4.1 rpart_4.1-1 tools_3.0.1 = sessionInfo in 3.0.2 right before the crash -- > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid splines stats graphics grDevices utils datasets [8] methods base other attached packages: [1] memisc_0.96-6 MASS_7.3-29 lattice_0.20-23 Hmisc_3.12-2 [5] Formula_1.1-1 survival_2.37-4 httr_0.2stringr_0.6.2 [9] foreign_0.8-55 loaded via a namespace (and not attached): [1] cluster_1.14.4 digest_0.6.3 RCurl_1.95-4.1 rpart_4.1-3 tools_3.0.2 = # code that works in 3.0.1 and crashes in 3.0.2 -- setwd( "C:/my directory/anes") your.username <- "em...@address.com" your.password <- "your.password" require(foreign) # load foreign package (converts data files into R) require(stringr)# load stringr package (manipulates character strings easily) require(httr)# load httr package (downloads files from the web, with SSL and cookies) require(Hmisc) # load Hmisc package (loads spss.get function) require(memisc)# load memisc package (loads spss portable table import functions) # construct a list containing the pre-specified login information values <- list( "email" = your.username , "pass" = your.password ) # contact the anes website to log in POST( "http://www.electionstudies.org/studypages/download/login-process.php"; , body = values ) # download the `all_datasets` page to figure out what needs to be downloaded z <- GET( " http://www.electionstudies.org/studypages/download/datacenter_all_datasets.php"; ) # create a temporary file and a temporary directory tf <- tempfile() ; td <- tempdir() # download the damn file z <- GET( " http://www.electionstudies.org/studypages/data/anes_mergedfile_1992to1997/anes_mergedfile_1992to1997_dta.zip"; ) # save the result to a temporary file on the local disk writeBin( z$content , tf ) # unzip that temporary file to an equally-temporary directory z <- unzip( tf , exdir = td ) # find which one it is from among everything zipped up together.. fp <- z[ grep( 'dta' , z ) ] # ..import that puppy x <- read.dta( fp , convert.factors = FALSE ) # crash. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple String word replacements: Performance Issue
Here is a start. I was wondering how long it would take to at least substitute 800 different patterns into 4M vectors. Here is my test. It took longer (99 sec) to create the test data than to do the substitutes (52 secs). Now some variations on this can provide the other information that you are probably after in less than a day ( I would guess less than an hour) > n <- 1000 > x <- paste0("$" + , sample(LETTERS, n, TRUE) + , sample(LETTERS, n, TRUE) + , sample(LETTERS, n, TRUE) + , sample(LETTERS, n, TRUE) + ) > x <- x[!duplicated(x)][1:800] > > n <- 400 > system.time({ + output <- replicate(n, paste(sample(x,2), collapse = ' ')) + }) user system elapsed 99.850.22 100.37 > > system.time({ + pattern <- paste0("\\", x, collapse = "|") + z <- gsub(pattern, "[ticker]", output, perl = TRUE) + }) user system elapsed 52.050.00 52.21 > > > str(output) chr [1:400] "$JHVN $VKOL" "$GTEU $CEGL" "$LOEY $ETQK" "$AFDO $SDLH" "$MOIN $WEVR" ... > str(z) chr [1:400] "[ticker] [ticker]" "[ticker] [ticker]" "[ticker] [ticker]" ... > str(pattern) chr "\\$MATF|\\$GFGC|\\$SRYC|\\$HLWS|\\$GHFB|\\$BGVU|\\$GFDW|\\$PSFN|\\$ONDY|\\$SXUH|\\$EBDJ|\\$YNQY|\\$NDBT|\\$TOQK|\\$IUBN|\\$VSMT"| __truncated__ > Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Nov 6, 2013 at 8:11 AM, Simon Pickert wrote: > Dear experts, > I’ve been on this for weeks now, and couldn’t find a solution..Sorry for the > long description. I figured I post many details, so you get the problem > entirely, although it’s not hard to grasp. > > **Situation:** > Data frame consisting of 4 million entries (total size: 250 MB). Two columns: > `ID` and `TEXT`. Text strings are each up to 200 characters. > > > **Task:** > Preprocessing the text strings > > Example Data: > > > +——+—+ > | ID| Text > | > +——+—+ > | 123 | $AAPL is up +5%| > | 456 | $MSFT , $EBAY doing great. www.url.com | > .. > +——+—+ > > Should become > > +——+——-——+ > | ID| Text clean >| First Ticker | All Ticker | Ticker Count > +——++——+ +———-—+ > | 123 | [ticker] is up [positive_percentage] | > $aapl | $aapl| 1 > | 456 | [ticker] [ticker] doing great [url] [pos_emotion] | > $msft | $msft,$ebay | 2 > .. > +——++——-+——+——+ > > > > **Problem:** > It takes too long. On my 8GB RAM Dual-Core machine: Cancelled after 1 day. On > a 70GB 8-Core Amazon EC2 instance: Cancelled after 1 day. > > > **Details:** > I am basically > > - Counting how often certain words appear in one string > - Write this number into a new column (COUNT) > - Replace this (counted) word > - Replace other words (which I don't need to count before) > - Replace some regular expressions > > The vectors which are used as patterns look like this: > > "\\bWORD1\\b|\\bWORD2\\b|\\bWORD3\\b|\\bWORD4\\b..." > > Thus, those 'replacement vectors' are character vectors of length 1, each > containing up to 800 words > > > > **Main code:** > > library("parallel") > library("stringr") > > preprocessText<-function(x){ > > # Replace the 'html-and' > arguments<-list(pattern="\\&\\;",replacement="and",x=x, > ignore.case=TRUE) > y<-do.call(gsub, arguments) > > # Remove some special characters > > arguments<-list(pattern="[^-[:alnum:]\\'\\:\\/\\$\\%\\.\\,\\+\\-\\#\\@\\_\\!\\?+[:space:]]",replacement="",x=y, > ignore.case=TRUE) > y<-do.call(gsub, arguments) > > # Lowercase > arguments<-list(string=y,pattern=tolower(rep_ticker)) > first<-do.call(str_match,arguments) > > # Identify signal words and count them > # Need to be done in parts, because otherwise R can't handle this many > at once > arguments<-list(string=x, pattern=rep_words_part1) > t1<-do.call(str_extract_all,arguments) > > arguments<-list(string=x, pattern=rep_words_part2) > t2<-do.call(str_extract_all,arguments) > > arguments<-list(string=x, pattern=rep_words_part3) > t3<-do.call(str_extract_all,arguments) > > arguments<-list(string=x, pattern=rep_words_part4) > t4<-do.call(str_extract_all,arguments) > > count=length(t1[[1]])+length(t2[[1]])+length(t3[[1]])+length(t4[[1]]) > signal_words=c(t1[[1]],t2[[1]],t3[[1]],t4[[1]]) > > > # Replacements > > arguments<-list(pattern=rep_wordsA,re
Re: [R] Fitting multiple horizontal lines to data
You already asked this on StackOverflow. The answer remains the same, pretty much what David W. wrote: this is not a question about fitting lines to data. You need to step back and think about what message you want to deliver to those who will view your graph, and what the meaning of your dataset is in the first place. -- View this message in context: http://r.789695.n4.nabble.com/Fitting-multiple-horizontal-lines-to-data-tp4679891p4679952.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with interaction in lmer even after creating an "interaction variable"
Dear all, I have a problem with interactions in lmer. I have 2 factors (garden and gebiet) which interact, plus one other variable (home), dataframe arr. When I put: / lmer (biomass ~ home + garden:gebiet + ( 1|Block), data = arr)/ it writes: /Error in lme4::lFormula(formula = biomass ~ home + garden:gebiet + (1 | : rank of X = 28 < ncol(X) = 30/ In the lmer help I found out that if not all combination of the interaction are realized, lmer has problems and one should do new variable using "interaction", which I did: / arr$agg <- interaction (arr$gebiet, arr$garden, drop = TRUE)/ when I fit the interaction term now: / lmer (biomass ~ home + agg+ ( 1|Block), data = arr)/ The error does not change: / Error in lme4::lFormula(formula = biomass ~ home + agg + (1 | Block), : rank of X = 28 < ncol(X) = 29/ No NAs are in the given variables in the dataset. Interestingly it works when I put only given interaction like /lmer (biomass ~ agg + ( 1|Block), data = arr)/ Even following models work: /lmer (biomass ~ gebiet*garden + ( 1|Block), data = arr) lmer (biomass ~ garden + garden:gebiet +( 1|Block), data = arr)/ But if I add the interaction term in th enew formate of the new fariable, it reports again the same error. /lmer (biomass ~ garden + agg +( 1|Block), data = arr)/ If I put any other variable from the very same dataframe (not only variable "home"), the error is reported again. I do not understand it, the new variable is just another factor now, or? And it is in the same dataframe, it has the same length. Does anyone have any idea? Thanks a lot, Anna -- View this message in context: http://r.789695.n4.nabble.com/problem-with-interaction-in-lmer-even-after-creating-an-interaction-variable-tp4679951.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: resdiuals of random model estimated by plm function
Dear Alfonso, in a RE model you do not explicitly estimate every single individual effect, but only the variance of the distribution they have been "drawn from". Hence the only (pointwise) residual you can estimate ex-post is the composite one: i.e., the sum. Best, Giovanni Giovanni Millo, PhD Research Dept., Assicurazioni Generali SpA Via Machiavelli 3, 34132 Trieste (Italy) tel. +39 040 671184 fax +39 040 671160 -Messaggio originale- Da: alfonso.carf...@uniparthenope.it [mailto:alfonso.carf...@uniparthenope.it] Inviato: mercoledì 6 novembre 2013 17.30 A: r-help@r-project.org Cc: Millo Giovanni Oggetto: resdiuals of random model estimated by plm function Hi all, I have estimated a random panel model using plm function. I have a question about the vector of resduals obtained with the object $residuals. example: data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, model="random", data = Produc, index = c("state","year")) res<-zz$residuals # vector of the residuals. the vector res is the sum of the idyosiyncratic (eit) and individual (ui) component or is only the idyosiyncratic (eit) component? Thanks Alfonso Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.