[R] which() vs. just logical selection in df
Hi R-helpers, Does anyone know why adding which() makes the select call more efficient than just using logical selection in a dataframe? Doesn't which() technically add another conversion/function call on top of the logical selection? Here is a reproducible example with a slight difference in timing. # Surrogate data - the timing here isn't interesting urltext <- paste("https://drive.google.com/";, "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7", "-h8&export=download", sep="") download.file(url=urltext, destfile="tempfile.csv") # download file first dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE, nrows=2.5e6) # read the file; 'nrows' is a slight # overestimate dat <- dat[,1:3] # select just the first 3 columns head(dat, 10) # print the first 10 rows # Select using which() as the final step ~ 90ms total time on my macbook air system.time( head( dat[which(dat$gender2=="other"),],), gcFirst=TRUE) # Select skipping which() ~130ms total time system.time( head( dat[dat$gender2=="other", ]), gcFirst=TRUE) Now I would think that the second one without which() would be more efficient. However, every time I run these, the first version, with which() is more efficient by about 20ms of system time and 20ms of user time. Does anyone know why this is? Cheers! Keith __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [External] Re: unable to access index for repository...
On Fri, 9 Oct 2020 at 12:28, Martin Maechler wrote: > > > Steven Yen > > on Fri, 9 Oct 2020 05:39:48 +0800 writes: > > > Oh Hi Arne, You may recall we visited with this before. I > > do not believe the problem is algorithm specific. The > > algorithms I use the most often are BFGS and BHHH (or > > maxBFGS and maxBHHH). For simple econometric models such > > as probit, Tobit, and evening sample selection models, old > > and new versions of R work equally well (I write my own > > programs and do not use ones from AER or > > sampleSekection). For more complicated models the newer R > > would converge with not-so-nice gradients while R-3.0.3 > > would still do nicely (good gradient). I use numerical > > graduent of course. I wonder whether numerical gradient > > routine were revised at the time of transition from > > R-3.0.3 to newer. > > As R-core member, particularly interested in numerical accuracy > etc, I'm also interested in learning what's going on here. > > I think we (R core) have never heard of anything numerically deteriorating > going from R 3.0.x to R 4.0.x, and now you are claiming that in > public, you should really post *reproducible* code giving > evidence to your claim. > > As was mentioned earlier, the difference may not be in R, but > rather in the versions of the (non-base R, but "extension") R > packages you use; and you were saying earlier you will check > that (using the old version of the 'maxLik' package with a newer > version of R and vice verso) and tell us about it. > > Thank you in advance on being careful and rational about such > findings. I totally agree with Martin: It would be good if Steven could run his code with different combinations of versions of R and maxLik so that we know whether the problem is caused by newer versions of R or by newer versions of maxLik (and hopefully also which version introduced the problem). Steven wrote that the problem appeared both in maxBFGS() and in maxBHHH(). This is somewhat surprising because maxBFGS() uses for the optimisation optim() which is implemented in R's base package "stats", while maxBHHH() uses for the optimisation maxNR() which is implemented in the maxLik package. Hence, it would be a very unlikely coincidence if the optimisation routines in optim() and maxNR() would become worse at the same time. I remember that we slightly changed the user interface and the default values of some arguments of some function in the maxLik package (e.g., maxBFGS, maxBHHH, maxNR, ...) a long time ago. I suggest that Steven checks whether he needs to update his code to reflect the changes in the user interface and/or in the default values of some arguments. Best wishes, Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please need help to finalize my code
Good evening dear administrators, It is with pleasure that I am writing to you to ask for help to finalize my R programming algorithm. Indeed, I attach this note to my code which deals with a case of independence test statistic . My request is to introduce the kernels using the functional data for this same code that I am sending you. So I list the lines for which we need to introduce the functional data using the kernels below. First of all for this code we need to use the functional data. I have numbered the lines myself from the original code to give some indication about the lines of code to introduce the kernel. 1)Centering of redescription function phi(xi) (just use the kernel) 2)this function centers the functional data phi(xi) (just use the kernel) 3)phi(x1) (or kernel k( ., .) ) 5)Even here the kernel with the functional data is used. 7)return the kernel 28) kernel dimension 29) kernel dimension 30)kernel dimension 73) redescription function phi(xi) or the kernel k(. , .) 74)redescription function phi(yi) or the kernel k(. , .) 76)redescription function phi(xbar) or the kernel k(. , .) 77)redescription function phi(xjhbar) or the kernel k(. , .) 82)redescription function phi(xi) or the kernel k(. , .) introduced instead of x 84)redescription function phi(xjhbar) or the kernel k(. , .) 89)w= redescription function phi(xbar) or the kernel k(. , .) 120) we center the redescription function phi(xi) or the kernel k(. , .) 121)Vn is a kernel function, so we use ph(xi) or the kernel k(. , , .) 126)centering of the redescription function phi(xji) or the kernel k(. , .) 127)Vij is computed using redescription function phi(xi) or the kernel k(. , .) instead of X. 130)centering redescription function phi(Xkl) or the kernel k(. , .) 131)Vijkl always using the kernel function as precedent 132)Vkl always using the kernel function as precedent I look forward to your help and sincere request. Yours sincerely library(MASS) 1 CenteringV<-function(X,Ms,n){ 2 # this function centers the data of X 3 X1=X*0 4 for (i in 1:n){ 5 X1[i,]=X[i,]-Ms 6 } 7 return(X1) 8 } 9 # Function N°2 10 SqrtMatrix0<-function(M) { 11 # This function allows us to compute the square root of a matrix 12 # using the decomposition M=PDQ where Q is the inverse of P 13 # here negative eigenvalues are replaced by zero 14 a=eigen(M,TRUE) 15 b=a$values 16 b[b<0]=0 17 c=a$vectors 18 d=diag(sqrt(b)) 19 b=solve(c) 20 a=c%*%d%*%b 21 return(a) 22 } 23 # declaration of parameters 24 m1=0.01 # alpha value (1% risk) 25 m2=0.05 # alpha value (5% risk) 26 m3=0.1 # alpha value (10% risk) 27 nbrefoissim=100 # # times the program is running 28 p=3 #dimension of variable X 29 q=3 #dimension of variable Y 30 R=c(5,4,3);# Number of partition of each component of variable Y 31 if(length(R) != q) stop("The size of R must be equal to q") 32 n=25 # Sample size 33 N=c(25,50,100,200,300,400,500,1000) #different sample sizes 34 #N=c(25,50) #different sample sizes 35 pc1=rep(0.100) 36 K=0 37 MV=matrix(0,nr=8,nc=4) 38 dimnames(MV)[[2]]=c("n","r1%","r5%","r10%") 39 #Beginning of the program 40 for (n in N){ 41 l1=0 # initialization of the value to calculate the test level at 1%. 42 l2=0 # initialization of the value to calculate the test level at 5%. 43 l3=0 # initialization of the value to calculate the test level at 10%. 44 # Creation of an n11 list containing the sizes of the different groups 45 n11=list() 46 for (i in 1:q){ 47 n11[[i]]=rep(as.integer(n/R[i]),R[i]) 48 n11[[i]][R[i]]=n-((R[i]-1)*n11[[i]][1]) 49 } 50 # Creation of lists P11 and P12 which contain the probabilities and 51 # the inverses of the empirical probabilities of the different groups respectively 52 P11=list() 53 P12=list() 54 for (i in 1:q){ 55 P11[[i]]=n11[[i]]/n 56 P12[[i]]=n/n11[[i] 57 } 58 # creation of a list containing the W matrices 59 W=list() 60 for (i in 1:q){ 61 w=matrix(0,n,R[i]) 62 w[1:n11[[i]][1],1]=1 63 if (R[i]>1){ 64 for (j in 2:R[i]){ 65 s1=sum(n11[[i]][1:(j-1)]) 66 w[(1+s1):(s1+n11[[i]][j]),j]=1 67 }} 68 W[[i]]=w 69 } 70 for (i1 in 1:nbrefoissim){ 71 # data generation 72 VA1=mvrnorm(n,rep(0,(p+q)),diag((p+q))) 73 X=VA1[,1:p] 74 Y=VA1[,(p+1):(p+q)] 75 # Xbar calculation 76 Xbar=colMeans(X) 77 # Calculation of Xjh bar 78 Xjhbar=list() 79 for (i in 1:q){ 80 w=matrix(0,R[i],p) 81 for (j in 1:R[i]){ 82 w[j,]=colSums(W[[i]][,j]*X)/n11[[i]][j] 83 } 84 Xjhbar[[i]]=w 85 } 86 #calculation of TO jh 87 TO.jh=list() 88 for (i in 1:q){ 89 w=Xjhbar[[i]] 90 to=w*0 91 for (j in 1:R[i]){ 92 to[j,]=w[j,]-Xbar 93 } 94 TO.jh[[i]]=to 95 } 96 #calculation of Lamda J 97 Lamda=matrix(0,p,p) 98 for (i in 1:q){ 99 to=TO.jh[[i]] 100 w=matrix(0,p,p) 101 for (j in 1:R[i]){ 102 w=w+(P11[[i]][j]*(to[j,]%*%t(to[j,]))) 103 } 104 Lamda=Lamda+w 105 } 106 tr1=n*sum(diag(Lamda)) 107 # Gamma Calculation 108 GGamma=matrix(0,p*sum(R),p*sum(R)) 109 PGamma=kronecker(diag(P12[[1]]),diag(p)) 110 Ifin=p*R [1] 111 GGamma[1:Ifin,1:Ifin]=PGamma 112 for (i in 2:q){
Re: [R] Question about the package "MatchIt"
Maria: What you are looking for (propensity score matching on survey data) is discussed in lab 5 components of this series using matchit and matching package: https://www.youtube.com/playlist?list=PL2yD6frXhFob_Mvfg21Y01t_yu1aC9NnP Regards, Ehsan https://ehsank.com/ On Fri., Oct. 9, 2020, 4:02 p.m. Maria Cristina Maurizio, < mariacristina.mauri...@gmail.com> wrote: > Hi! I'm trying to perform propensity score matching on survey data and so > for each individual observation I have a statistical weight attached. My > question is: is there a way within the package to consider these weights in > the matching procedure? > Thank you very much. > > -- > Maria Cristina Maurizio > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about the package "MatchIt"
Maria Cristina, The MatchIt homepage at https://gking.harvard.edu/matchit has a link to a mailing list specific to the package and it has searchable archives. You will probably have better luck there than a general R programming list. Though a quick perusal of the user guide at that site makes me think the answer is probably "no." Pat On Fri, Oct 9, 2020 at 7:02 PM Maria Cristina Maurizio wrote: > > Hi! I'm trying to perform propensity score matching on survey data and so > for each individual observation I have a statistical weight attached. My > question is: is there a way within the package to consider these weights in > the matching procedure? > Thank you very much. > > -- > Maria Cristina Maurizio > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Patrick S. Malone, Ph.D., Malone Quantitative NEW Service Models: http://malonequantitative.com He/Him/His __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] highfrequency package-jump test
Hello, The error is not reproducible, the second code line gives: data<-xts(x=data$PRICE,order.by=tm) #Error in data$PRICE : object of type 'closure' is not subsettable Please post a data example in dput format. dput(head(data, 30))# post the output of this or similar Hope this helps, Rui Barradas Às 01:13 de 10/10/20, Zixuan Qi escreveu: Hello, My programming is as follows. library(highfrequency) library(data.table) library(xts) tm<-seq.POSIXt(from = as.POSIXct("2020-08-20 09:30:00"),to = as.POSIXct("2020-08-20 15:59:00"),by='min') data<-xts(x=data$PRICE,order.by=tm) data <- as.data.table(data) setnames(data,c('index'),c('DT')) setnames(data,c('V1'),c('PRICE')) LMtest <- intradayJumpTest(pData = data,volEstimator = "RM", driftEstimator = "none",RM = "bipower", lookBackPeriod = 20,on = 'minutes', k = 5, marketOpen = "09:30:00",marketClose = "15:59:00") plot(LMtest) However, the error is that ''Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ''. I don't know how to correct it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] highfrequency package-jump test
Hi, I tried running your code but it is not complete. When you create the xts you refer to data$PRICE but this has never been defined. I generated synthetic data and used that to create the xts object as follows: x <- runif( length(tm), min=90, max=110 ) data<-xts(x=x,order.by=tm) When I used this modified code the program executed with no problem. I hope that helps, Eric On Sat, Oct 10, 2020 at 3:13 AM Zixuan Qi wrote: > Hello, > > My programming is as follows. > > library(highfrequency) > library(data.table) > library(xts) > tm<-seq.POSIXt(from = as.POSIXct("2020-08-20 09:30:00"),to = > as.POSIXct("2020-08-20 15:59:00"),by='min') > data<-xts(x=data$PRICE,order.by=tm) > data <- as.data.table(data) > setnames(data,c('index'),c('DT')) > setnames(data,c('V1'),c('PRICE')) > LMtest <- intradayJumpTest(pData = data,volEstimator = "RM", > driftEstimator = "none",RM = "bipower", lookBackPeriod = 20,on = 'minutes', > k = 5, marketOpen = "09:30:00",marketClose = "15:59:00") > plot(LMtest) > > However, the error is that ''Error in xy.coords(x, y, setLab = FALSE) : > 'x' and 'y' lengths differ''. I don't know how to correct it. > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Rd] R 4.0.3 is released
The build system rolled up R-4.0.3.tar.gz (codename "Bunny-Wunnies Freak Out") this morning. The list below details the changes in this release. You can get the source code from https://cran.r-project.org/src/base/R-4/R-4.0.3.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team, Peter Dalgaard These are the checksums (md5 and SHA-256) for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = b9c44f9f78cab3184ad9898bebc854b4 MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = 5be656453b81e9393e2a0c42688b04b8 MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b MD5 (NEWS) = 71728ef32a09c5b9df36b496b52d5c8e MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8 MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801 MD5 (NEWS.2) = 496062c138e2def06cebccddfb814ac6 MD5 (NEWS.3) = 012e7f4a80cc8ec947bf3f0ff6117ec8 MD5 (R-latest.tar.gz) = 8ecf46afa36c9aff9904aad5ca422c6d MD5 (README) = f468f281c919665e276a1b691decbbe6 MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435 MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6 MD5 (VERSION-INFO.dcf) = bfbcfb2f4ef4416b635087160f965904 MD5 (R-4/R-4.0.3.tar.gz) = 8ecf46afa36c9aff9904aad5ca422c6d 2cde824a7b18958e5f06b391c801c8288be0f84fa8934b7ddefef23c67e60c09 AUTHORS e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4 COPYING 6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3 COPYING.LIB 4390543804392e072378b2d75cd3fb0d57e9885f9dc38ddd5a2ff780487b9d97 FAQ f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31 INSTALL 895ba3f810fd33d3be63deb6c4588be1f7ba3f5bf80e10ffeafc194221dd5ba9 NEWS 4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62 NEWS.0 12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01 NEWS.1 e80de410c77f05ff2012fa70051b89119845f734a7fa5c55857e61e4ed7d5f6e NEWS.2 7201d139947afa52b5e09d26dc01445edf444506264355b2185122bc1ed3dce0 NEWS.3 09983a8a78d5fb6bc45d27b1c55f9ba5265f78fa54a55c13ae691f87c5bb9e0d R-latest.tar.gz 2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc README 408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9 RESOURCES c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73 THANKS cd9666c09064b120655598c1ac792266ad20adb57c36aab4d094dedc7e480fa6 VERSION-INFO.dcf 09983a8a78d5fb6bc45d27b1c55f9ba5265f78fa54a55c13ae691f87c5bb9e0d R-4/R-4.0.3.tar.gz This is the relevant part of the NEWS file CHANGES IN R 4.0.3: NEW FEATURES: * On platforms using configure option --with-internal-tzcode, additional values "internal" and (on macOS only) "macOS" are accepted for the environment variable TZDIR. (See ?TZDIR.) On macOS, "macOS" is used by default if the system timezone database is a newer version than that in the R installation. * When install.packages(type = "source") fails to find a package in a repository it mentions package versions which are excluded by their R version requirement and links to hints on why a package might not be found. * The default value for options("timeout") can be set from enviromnent variable R_DEFAULT_INTERNET_TIMEOUT, still defaulting to 60 (seconds) if that is not set or invalid. This may be needed when child R processes are doing downloads, for example during the installation of source packages which download jars or other forms of data. LINK-TIME OPTIMIZATION on a UNIX-ALIKE: * There is now support for parallelized Link-Time Optimization (LTO) with GCC and for 'thin' LTO with clang _via_ setting the LTO macro. * There is support for setting a different LTO flag for the Fortran compiler, including to empty when mixing clang and gfortran (as on macOS). See file config.site. * There is a new LTO_LD macro to set linker options for LTO compilation, for example to select an alternative linker or to parallelize thin LTO. DEPRECATED AND DEFUNCT: * The LINPACK argument to chol.default(), chol2inv(), solve.default() and svd() has been defunct since R 3.1.0. Using it now gives a warning which will become an error in R 4.1.0. BUG FIXES: * The code mitigating stack overflow with PCRE regexps on very long strings is enabled for PCRE2 < 10.30 also when JIT is enabled, since stack overflows have been seen in that case. * Fix to correctly show the group labels in dotchart() (which where lost in the ylab improvement for R 4.0.0). * addmargins(*, ..) now also works when fn() is a local function, thanks to bug report and patch PR#17124 from Alex Bertram. * rank(x) and hence sort(x) now work when x is an object (as per is.object(x)) of type "raw" _and_ provides a valid `[` method, e.g., for gmp::as.bigz(.) numbers. * chisq.test