date:20201010

[R] which() vs. just logical selection in df

2020-10-10 Thread 1/k^c

Hi R-helpers,

Does anyone know why adding which() makes the select call more
efficient than just using logical selection in a dataframe? Doesn't
which() technically add another conversion/function call on top of the
logical selection? Here is a reproducible example with a slight
difference in timing.

# Surrogate data - the timing here isn't interesting
urltext <- paste("https://drive.google.com/";,
 "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7",
 "-h8&export=download", sep="")
download.file(url=urltext, destfile="tempfile.csv") # download file first
dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE,
  nrows=2.5e6) # read the file; 'nrows' is a slight
 # overestimate
dat <- dat[,1:3] # select just the first 3 columns
head(dat, 10) # print the first 10 rows

# Select using which() as the final step ~ 90ms total time on my macbook air
system.time(
  head(
dat[which(dat$gender2=="other"),],),
  gcFirst=TRUE)

# Select skipping which() ~130ms total time
system.time(
  head(
dat[dat$gender2=="other", ]),
  gcFirst=TRUE)

Now I would think that the second one without which() would be more
efficient. However, every time I run these, the first version, with
which() is more efficient by about 20ms of system time and 20ms of
user time. Does anyone know why this is?

Cheers!
Keith

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [External] Re: unable to access index for repository...

2020-10-10 Thread Arne Henningsen

On Fri, 9 Oct 2020 at 12:28, Martin Maechler  wrote:
>
> > Steven Yen
> > on Fri, 9 Oct 2020 05:39:48 +0800 writes:
>
> > Oh Hi Arne, You may recall we visited with this before. I
> > do not believe the problem is algorithm specific. The
> > algorithms I use the most often are BFGS and BHHH (or
> > maxBFGS and maxBHHH). For simple econometric models such
> > as probit, Tobit, and evening sample selection models, old
> > and new versions of R work equally well (I write my own
> > programs and do not use ones from AER or
> > sampleSekection). For more complicated models the newer R
> > would converge with not-so-nice gradients while R-3.0.3
> > would still do nicely (good gradient). I use numerical
> > graduent of course. I wonder whether numerical gradient
> > routine were revised at the time of transition from
> > R-3.0.3 to newer.
>
> As R-core member, particularly interested in numerical accuracy
> etc, I'm also interested in learning what's going on here.
>
> I think we (R core) have never heard of anything numerically deteriorating
> going from R 3.0.x to R 4.0.x,  and now you are claiming that in
> public, you should really post *reproducible* code giving
> evidence to your claim.
>
> As was mentioned earlier, the difference may not be in R, but
> rather in the versions of the (non-base R, but "extension") R
> packages you use;  and you were saying earlier you will check
> that (using the old version of the 'maxLik' package with a newer
> version of R and vice verso) and tell us about it.
>
> Thank you in advance on being careful and rational about such
> findings.

I totally agree with Martin: It would be good if Steven could run his
code with different combinations of versions of R and maxLik so that
we know whether the problem is caused by newer versions of R or by
newer versions of maxLik (and hopefully also which version introduced
the problem). Steven wrote that the problem appeared both in maxBFGS()
and in maxBHHH(). This is somewhat surprising because maxBFGS() uses
for the optimisation optim() which is implemented in R's base package
"stats", while maxBHHH() uses for the optimisation maxNR() which is
implemented in the maxLik package. Hence, it would be a very unlikely
coincidence if the optimisation routines in optim() and maxNR() would
become worse at the same time. I remember that we slightly changed the
user interface and the default values of some arguments of some
function in the maxLik package (e.g., maxBFGS, maxBHHH, maxNR, ...) a
long time ago. I suggest that Steven checks whether he needs to update
his code to reflect the changes in the user interface and/or in the
default values of some arguments.

Best wishes,
Arne

--
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Please need help to finalize my code

2020-10-10 Thread Ablaye Ngalaba

Good evening dear administrators,
It is with pleasure that I am writing to you to ask for help to finalize my
R programming algorithm.

Indeed, I attach this note to my code which deals with a case of
independence test statistic . My request is to introduce the kernels using
the functional data for this same code that I am sending you. So I list the
lines for which we need to introduce the functional data using the kernels
below.

First of all for this code we need to use the functional data. I have
numbered the lines myself from the original code to give some indication
about the lines of code to introduce the kernel.
1)Centering of redescription function phi(xi) (just use the kernel)
2)this function centers the functional data phi(xi) (just use the kernel)
3)phi(x1) (or kernel k( ., .) )
5)Even here the kernel with the functional data is used.
7)return the kernel
28) kernel dimension
29) kernel dimension
30)kernel dimension
73) redescription function phi(xi) or the kernel k(. , .)
74)redescription function phi(yi) or the kernel k(. , .)
76)redescription function phi(xbar) or the kernel k(. , .)
77)redescription function phi(xjhbar) or the kernel k(. , .)
82)redescription function phi(xi) or the kernel k(. , .) introduced instead
of x
84)redescription function phi(xjhbar) or the kernel k(. , .)
89)w= redescription function phi(xbar) or the kernel k(. , .)
120) we center the redescription function phi(xi) or the kernel k(. , .)
121)Vn is a kernel function, so we use ph(xi) or the kernel k(. , , .)
126)centering of the redescription function phi(xji) or the kernel k(. , .)
127)Vij is computed using redescription function phi(xi) or the kernel k(.
, .) instead of X.
130)centering redescription function phi(Xkl) or the kernel k(. , .)
131)Vijkl always using the kernel function as precedent
132)Vkl always using the kernel function as precedent

 I look forward to your help and sincere request.


 Yours sincerely

library(MASS)

1 CenteringV<-function(X,Ms,n){
2 # this function centers the data of X
3 X1=X*0
4 for (i in 1:n){
5 X1[i,]=X[i,]-Ms
6 }
7 return(X1)
8 }


9 # Function N°2
10 SqrtMatrix0<-function(M) {
11 # This function allows us to compute the square root of a matrix
12 # using the decomposition M=PDQ where Q is the inverse of P
13 # here negative eigenvalues are replaced by zero
 14 a=eigen(M,TRUE)
 15 b=a$values
 16 b[b<0]=0
 17 c=a$vectors
18 d=diag(sqrt(b))
 19 b=solve(c)
20 a=c%*%d%*%b
21 return(a)
22 }


23 # declaration of parameters
24 m1=0.01 # alpha value (1% risk)
25 m2=0.05 # alpha value (5% risk)
26 m3=0.1 # alpha value (10% risk)
27 nbrefoissim=100 # # times the program is running
28 p=3 #dimension of variable X
29 q=3 #dimension of variable Y
30 R=c(5,4,3);# Number of partition of each component of variable Y
31 if(length(R) != q) stop("The size of R must be equal to q")
32 n=25 # Sample size
33 N=c(25,50,100,200,300,400,500,1000) #different sample sizes
34 #N=c(25,50) #different sample sizes
35 pc1=rep(0.100)
36 K=0
37 MV=matrix(0,nr=8,nc=4)
38 dimnames(MV)[[2]]=c("n","r1%","r5%","r10%")


39 #Beginning of the program

 40 for (n in N){


41 l1=0 # initialization of the value to calculate the test level at 1%.
42 l2=0 # initialization of the value to calculate the test level at 5%.
43 l3=0 # initialization of the value to calculate the test level at 10%.

44 # Creation of an n11 list containing the sizes of the different groups
45 n11=list()
46 for (i in 1:q){
47 n11[[i]]=rep(as.integer(n/R[i]),R[i])
48 n11[[i]][R[i]]=n-((R[i]-1)*n11[[i]][1])
49 }


50 # Creation of lists P11 and P12 which contain the probabilities and
51 # the inverses of the empirical probabilities of the different groups
respectively

52 P11=list()
53 P12=list()
54 for (i in 1:q){
55 P11[[i]]=n11[[i]]/n
56 P12[[i]]=n/n11[[i]

57 }

58 # creation of a list containing the W matrices
59 W=list()
60 for (i in 1:q){
61 w=matrix(0,n,R[i])
62 w[1:n11[[i]][1],1]=1
63 if (R[i]>1){
64 for (j in 2:R[i]){
65 s1=sum(n11[[i]][1:(j-1)])
66 w[(1+s1):(s1+n11[[i]][j]),j]=1
67 }}
68 W[[i]]=w
69 }

70 for (i1 in 1:nbrefoissim){

71 # data generation
72 VA1=mvrnorm(n,rep(0,(p+q)),diag((p+q)))
73 X=VA1[,1:p]
74 Y=VA1[,(p+1):(p+q)]

75 # Xbar calculation
76 Xbar=colMeans(X)

77 # Calculation of Xjh bar
78 Xjhbar=list()
79 for (i in 1:q){
80 w=matrix(0,R[i],p)
81 for (j in 1:R[i]){
82 w[j,]=colSums(W[[i]][,j]*X)/n11[[i]][j]
83 }
84 Xjhbar[[i]]=w
85 }

86 #calculation of TO jh

87 TO.jh=list()
88 for (i in 1:q){
89 w=Xjhbar[[i]]
90 to=w*0
91 for (j in 1:R[i]){
92 to[j,]=w[j,]-Xbar
93 }
94 TO.jh[[i]]=to
95 }

96 #calculation of Lamda J

97 Lamda=matrix(0,p,p)
98 for (i in 1:q){
99 to=TO.jh[[i]]
100 w=matrix(0,p,p)
101 for (j in 1:R[i]){
102 w=w+(P11[[i]][j]*(to[j,]%*%t(to[j,])))
103 }
104 Lamda=Lamda+w
105 }

106 tr1=n*sum(diag(Lamda))

107 # Gamma Calculation

108 GGamma=matrix(0,p*sum(R),p*sum(R))
109 PGamma=kronecker(diag(P12[[1]]),diag(p))
110 Ifin=p*R [1]
111 GGamma[1:Ifin,1:Ifin]=PGamma
112 for (i in 2:q){

Re: [R] Question about the package "MatchIt"

2020-10-10 Thread Ehsan Karim

Maria:

What you are looking for (propensity score matching on survey data) is
discussed in lab 5 components of this series using matchit and matching
package:

https://www.youtube.com/playlist?list=PL2yD6frXhFob_Mvfg21Y01t_yu1aC9NnP

Regards,

Ehsan
https://ehsank.com/


On Fri., Oct. 9, 2020, 4:02 p.m. Maria Cristina Maurizio, <
mariacristina.mauri...@gmail.com> wrote:

> Hi! I'm trying to perform propensity score matching on survey data and so
> for each individual observation I have a statistical weight attached. My
> question is: is there a way within the package to consider these weights in
> the matching procedure?
> Thank you very much.
>
> --
> Maria Cristina Maurizio
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about the package "MatchIt"

2020-10-10 Thread Patrick (Malone Quantitative)

Maria Cristina,

The MatchIt homepage at https://gking.harvard.edu/matchit has a link
to a mailing list specific to the package and it has searchable
archives. You will probably have better luck there than a general R
programming list. Though a quick perusal of the user guide at that
site makes me think the answer is probably "no."

Pat

On Fri, Oct 9, 2020 at 7:02 PM Maria Cristina Maurizio
 wrote:
>
> Hi! I'm trying to perform propensity score matching on survey data and so
> for each individual observation I have a statistical weight attached. My
> question is: is there a way within the package to consider these weights in
> the matching procedure?
> Thank you very much.
>
> --
> Maria Cristina Maurizio
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Patrick S. Malone, Ph.D., Malone Quantitative
NEW Service Models: http://malonequantitative.com

He/Him/His

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] highfrequency package-jump test

2020-10-10 Thread Rui Barradas


Hello,

The error is not reproducible, the second code line gives:


data<-xts(x=data$PRICE,order.by=tm)
#Error in data$PRICE : object of type 'closure' is not subsettable


Please post a data example in dput format.

dput(head(data, 30))# post the output of this or similar

Hope this helps,

Rui Barradas

Às 01:13 de 10/10/20, Zixuan Qi escreveu:

Hello,

My programming is as follows.

library(highfrequency)
library(data.table)
library(xts)
tm<-seq.POSIXt(from = as.POSIXct("2020-08-20 09:30:00"),to = as.POSIXct("2020-08-20 
15:59:00"),by='min')
data<-xts(x=data$PRICE,order.by=tm)
data <- as.data.table(data)
setnames(data,c('index'),c('DT'))
setnames(data,c('V1'),c('PRICE'))
LMtest <- intradayJumpTest(pData = data,volEstimator = "RM", driftEstimator = "none",RM = 
"bipower", lookBackPeriod = 20,on = 'minutes', k = 5, marketOpen = "09:30:00",marketClose = "15:59:00")
plot(LMtest)

However, the error is that ''Error in xy.coords(x, y, setLab = FALSE) : 'x' and 
'y' lengths differ''. I don't know how to correct it.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] highfrequency package-jump test

2020-10-10 Thread Eric Berger

Hi,
I tried running your code but it is not complete. When you create the xts
you refer to data$PRICE but this has never been defined.

I generated synthetic data and used that to create the xts object as
follows:

x <- runif( length(tm), min=90, max=110 )
data<-xts(x=x,order.by=tm)

When I used this modified code the program executed with no problem.

I hope that helps,
Eric

On Sat, Oct 10, 2020 at 3:13 AM Zixuan Qi  wrote:

> Hello,
>
> My programming is as follows.
>
> library(highfrequency)
> library(data.table)
> library(xts)
> tm<-seq.POSIXt(from = as.POSIXct("2020-08-20 09:30:00"),to =
> as.POSIXct("2020-08-20 15:59:00"),by='min')
> data<-xts(x=data$PRICE,order.by=tm)
> data <- as.data.table(data)
> setnames(data,c('index'),c('DT'))
> setnames(data,c('V1'),c('PRICE'))
> LMtest <- intradayJumpTest(pData = data,volEstimator = "RM",
> driftEstimator = "none",RM = "bipower", lookBackPeriod = 20,on = 'minutes',
> k = 5, marketOpen = "09:30:00",marketClose = "15:59:00")
> plot(LMtest)
>
> However, the error is that ''Error in xy.coords(x, y, setLab = FALSE) :
> 'x' and 'y' lengths differ''. I don't know how to correct it.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [Rd] R 4.0.3 is released

2020-10-10 Thread Peter Dalgaard

The build system rolled up R-4.0.3.tar.gz (codename "Bunny-Wunnies Freak Out") 
this morning.

The list below details the changes in this release.

You can get the source code from

https://cran.r-project.org/src/base/R-4/R-4.0.3.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


For the R Core Team,

Peter Dalgaard

These are the checksums (md5 and SHA-256) for the freshly created files, in 
case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = b9c44f9f78cab3184ad9898bebc854b4
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = 5be656453b81e9393e2a0c42688b04b8
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = 71728ef32a09c5b9df36b496b52d5c8e
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801
MD5 (NEWS.2) = 496062c138e2def06cebccddfb814ac6
MD5 (NEWS.3) = 012e7f4a80cc8ec947bf3f0ff6117ec8
MD5 (R-latest.tar.gz) = 8ecf46afa36c9aff9904aad5ca422c6d
MD5 (README) = f468f281c919665e276a1b691decbbe6
MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435
MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6
MD5 (VERSION-INFO.dcf) = bfbcfb2f4ef4416b635087160f965904
MD5 (R-4/R-4.0.3.tar.gz) = 8ecf46afa36c9aff9904aad5ca422c6d

2cde824a7b18958e5f06b391c801c8288be0f84fa8934b7ddefef23c67e60c09  AUTHORS
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  COPYING.LIB
4390543804392e072378b2d75cd3fb0d57e9885f9dc38ddd5a2ff780487b9d97  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
895ba3f810fd33d3be63deb6c4588be1f7ba3f5bf80e10ffeafc194221dd5ba9  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01  NEWS.1
e80de410c77f05ff2012fa70051b89119845f734a7fa5c55857e61e4ed7d5f6e  NEWS.2
7201d139947afa52b5e09d26dc01445edf444506264355b2185122bc1ed3dce0  NEWS.3
09983a8a78d5fb6bc45d27b1c55f9ba5265f78fa54a55c13ae691f87c5bb9e0d  
R-latest.tar.gz
2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc  README
408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9  RESOURCES
c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73  THANKS
cd9666c09064b120655598c1ac792266ad20adb57c36aab4d094dedc7e480fa6  
VERSION-INFO.dcf
09983a8a78d5fb6bc45d27b1c55f9ba5265f78fa54a55c13ae691f87c5bb9e0d  
R-4/R-4.0.3.tar.gz

This is the relevant part of the NEWS file

CHANGES IN R 4.0.3:

  NEW FEATURES:

* On platforms using configure option --with-internal-tzcode,
  additional values "internal" and (on macOS only) "macOS" are
  accepted for the environment variable TZDIR.  (See ?TZDIR.)

  On macOS, "macOS" is used by default if the system timezone
  database is a newer version than that in the R installation.

* When install.packages(type = "source") fails to find a package in
  a repository it mentions package versions which are excluded by
  their R version requirement and links to hints on why a package
  might not be found.

* The default value for options("timeout") can be set from
  enviromnent variable R_DEFAULT_INTERNET_TIMEOUT, still defaulting
  to 60 (seconds) if that is not set or invalid.

  This may be needed when child R processes are doing downloads,
  for example during the installation of source packages which
  download jars or other forms of data.

  LINK-TIME OPTIMIZATION on a UNIX-ALIKE:

* There is now support for parallelized Link-Time Optimization
  (LTO) with GCC and for 'thin' LTO with clang _via_ setting the
  LTO macro.

* There is support for setting a different LTO flag for the Fortran
  compiler, including to empty when mixing clang and gfortran (as
  on macOS).  See file config.site.

* There is a new LTO_LD macro to set linker options for LTO
  compilation, for example to select an alternative linker or to
  parallelize thin LTO.

  DEPRECATED AND DEFUNCT:

* The LINPACK argument to chol.default(), chol2inv(),
  solve.default() and svd() has been defunct since R 3.1.0.  Using
  it now gives a warning which will become an error in R 4.1.0.

  BUG FIXES:

* The code mitigating stack overflow with PCRE regexps on very long
  strings is enabled for PCRE2 < 10.30 also when JIT is enabled,
  since stack overflows have been seen in that case.

* Fix to correctly show the group labels in dotchart() (which where
  lost in the ylab improvement for R 4.0.0).

* addmargins(*, ..) now also works when fn() is a local function,
  thanks to bug report and patch PR#17124 from Alex Bertram.

* rank(x) and hence sort(x) now work when x is an object (as per
  is.object(x)) of type "raw" _and_ provides a valid `[` method,
  e.g., for gmp::as.bigz(.) numbers.

* chisq.test

[R] which() vs. just logical selection in df

Re: [R] [External] Re: unable to access index for repository...

[R] Please need help to finalize my code

Re: [R] Question about the package "MatchIt"

Re: [R] Question about the package "MatchIt"

Re: [R] highfrequency package-jump test

Re: [R] highfrequency package-jump test

[R] [Rd] R 4.0.3 is released

8 matches

Site Navigation

Mail list logo

Footer information