[R] Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)

2007-02-01 Thread Jeffrey Racine
Greetings.

For R gurus this may be a no brainer, but I could not find pointers to
efficient computation of this beast in past help files.

Background - I wish to implement a Cramer-von Mises type test statistic
which involves double sums of max(X_i,Y_j) where X and Y are vectors of
differing length.

I am currently using ifelse pointwise in a vector, but have a nagging
suspicion that there is a more efficient way to do this. Basically, I
require three sums:

sum1: \sum_i\sum_j max(X_i,X_j)
sum2: \sum_i\sum_j max(Y_i,Y_j)
sum3: \sum_i\sum_j max(X_i,Y_j)

Here is my current implementation - any pointers to more efficient
computation greatly appreciated.

  nx <- length(x)
  ny <- length(y)

  sum1 <- 0
  sum3 <- 0

  for(i in 1:nx) {
sum1 <- sum1 + sum(ifelse(x[i]>x,x[i],x))
sum3 <- sum3 + sum(ifelse(x[i]>y,x[i],y))
  }

  sum2 <- 0
  sum4 <- sum3 # symmetric and identical

  for(i in 1:ny) {
sum2 <- sum2 + sum(ifelse(y[i]>y,y[i],y))
  }

Thanks in advance for your help.

-- Jeff

-- 
Professor J. S. Racine Phone:  (905) 525 9140 x 23825
Department of EconomicsFAX:(905) 521-8232
McMaster Universitye-mail: [EMAIL PROTECTED]
1280 Main St. W.,Hamilton, URL:
http://www.economics.mcmaster.ca/racine/
Ontario, Canada. L8S 4M4

`The generation of random numbers is too important to be left to chance'

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New package `np' - nonparametric kernel smoothing methods for mixed datatypes

2006-11-24 Thread Jeffrey Racine
Dear R users,

A new package titled `np' is now available from CRAN.

The package implements recently developed kernel methods that seamlessly
handle the mix of continuous, unordered, and ordered factor datatypes
often found in applied settings. 

The package also allows users to create their own
nonparametric/semiparametric routines using high-level function calls
(via the function npksum()) rather than writing their own C or Fortran
code. Much of the code underlying the package is written in C including
the function npksum().

Currently, a range of methods can be found in the package including

- multivariate nonparametric unconditional and conditional density
estimation

- multivariate nonparametric conditional mean and gradient estimation
(local constant and local linear)

- multivariate nonparametric conditional distribution, quantile and
gradient estimation 

- nonparametric model specification tests for testing correctness of
parametric regression models

- nonparametric significance tests for nonparametric regression

- semiparametric multivariate regression methods (partially linear,
index models, average derivative estimation, varying/smooth coefficient
models)

A function npplot() is implemented that allows users to visualize the
resulting objects.

A variety of methods for computing standard errors and error bounds are
implemented including asymptotic and resampling-based methods (the
latter employing the `boot' package which is required).

A range of examples can be found in the examples section of the
accompanying help files.

We caution potential users that multivariate data-driven bandwidth
selection methods can be numerically intensive. For this reason we are
now using some of the functionality contained in the Rmpi package to
develop an MPI-aware version of the np package that we have tentatively
titled the `npRmpi' package.

The np package implements a number of methods found in the newly
released publication by Li and Racine (2007) titled Nonparametric
Econometrics: Theory and Practice, Princeton University Press.

See press.princeton.edu/titles/8355.html for further details.

Best regards, Jeff.

-- 
Professor J. S. Racine Phone:  (905) 525 9140 x 23825
Department of EconomicsFAX:(905) 521-8232
McMaster Universitye-mail: [EMAIL PROTECTED]
1280 Main St. W.,Hamilton, URL:
http://www.economics.mcmaster.ca/racine/
Ontario, Canada. L8S 4M4

`The generation of random numbers is too important to be left to chance'

___
R-packages mailing list
R-packages@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cat(), Rgui, and support for carriage return \r...

2006-09-28 Thread Jeffrey Racine
Dear Brian, Duncan, et al.

A followup and quick question if I may regarding support for \r. First,
the R terminal and the windows console now support \r properly
(thanks!). However, I just got my hands on a macbook, installed the
latest R and R.app gui, and it appears that \r is not supported in the
OS X gui build on the R site. Would it be possible to add this support?

Thank you for your time and contributions to the R community.

-- Jeff

On Tue, 2006-03-28 at 17:00 +0100, Prof Brian Ripley wrote:
> Rgui now supports \r in the same way as rterm.

-- 
Professor J. S. Racine Phone:  (905) 525 9140 x 23825
Department of EconomicsFAX:(905) 521-8232
McMaster Universitye-mail: [EMAIL PROTECTED]
1280 Main St. W.,Hamilton, URL:
http://www.economics.mcmaster.ca/racine/
Ontario, Canada. L8S 4M4

`The generation of random numbers is too important to be left to chance'

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Guidance on step() with large dataset (750K) solicited...

2006-04-13 Thread Jeffrey Racine
Hi.

Background - I am working with a dataset involving around 750K
observations, where many of the variables (8/11) are unordered factors. 

The typical model used to model this relationship in the literature has
been a simple linear additive model, but this is rejected out of hand by
the data. I was asked to model this via kernel methods, but first wanted
to play with the parametric specification out of curiosity.

I thought it would be interesting to see what type of model stepwise BIC
would yield, and have been playing with the step() function (on R-beta
due to the factor.scope() problem that has been fixed in the patched and
beta version).

I am running this on a 64bit box with 32GB of RAM and tons of swap, but
am hitting the memory wall as occasionally memory needs grow to ungodly
proportions (in the early iterations the program starts out around 8GB
but quickly grows to 15GB, then grows from there). This is not due to my
using the beta version, as this also arises under R-2.2.1 for what that
is worth.

My question is whether or not there is some simple way to substantially
reduce the memory footprint for this procedure. I took a look at
previous posts for step() and memory issues, but still wonder whether
there might be a switch or possibly better way of constructing my model
that would overcome the memory issues.

I include the code below, and any comments or suggestions would be most
welcome (besides `what type of idiot lets information criteria determine
their model ;-)')

Thanks ever so much in advance.

-- Jeff

 Begin 

## Read in the full data set (n=745466 observations)

data <- read.table("../data_header.dat",header=TRUE)

## Create a data frame with all categorical variables declared as
## unordered factors

data <- data.frame(logrprice=data$logrprice,
   cgt=factor(data$cgt),
   cag=factor(data$cag),
   gstann=factor(data$gstann),
   fhogann=factor(data$fhogann),
   gstfhog=factor(data$gstfhog),
   luc=factor(data$luc),
   municipality=factor(data$municipality),
   time=factor(data$time),
   distance=data$distance,
   logr=data$logr,
   loginc=data$loginc)

## Estimate a simple linear model (used repeatedly in the literature,
## fails the most simple of model specification tests e.g.,
## resettest())

model.linear <- lm(logrprice~.,data=data)

## Now conduct stepwise (BIC) regression using the step() function in
## the stats library. The lower model is the unconditional mean of y,
## the upper having polynomials of up to order 6 in the three
## continuous covariates, with interaction among all variables of
## order 2.

n <- nrow(data)

model.bic <- step(model.linear,
  scope=list(
lower=~ 1,
upper=~ (.
 +I(logr^2)
 +I(logr^3)
 +I(logr^4)
 +I(logr^5)
 +I(logr^6)
 +I(distance^2)
 +I(distance^3)
 +I(distance^4)
 +I(distance^5)
 +I(distance^6)
 +I(loginc^2)
 +I(loginc^3)
 +I(loginc^4)
 +I(loginc^5)
 +I(loginc^6))
^2),
  trace=TRUE,
  k=log(n)
  )

summary(model.bic)

 End 
-- 
Professor J. S. Racine Phone:  (905) 525 9140 x 23825
Department of EconomicsFAX:(905) 521-8232
McMaster Universitye-mail: [EMAIL PROTECTED]
1280 Main St. W.,Hamilton, URL:
http://www.economics.mcmaster.ca/racine/
Ontario, Canada. L8S 4M4

`The generation of random numbers is too important to be left to
chance.'

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cat(), Rgui, and support for carriage return \r...

2006-03-17 Thread Jeffrey Racine
Hi, and thanks in advance for your time.

Background - I am working on a package and wish to have a routine's
progress reported. The routine can take some time, and I would like to
inform the user about the routine's progress. I have scoured the
archives but to no avail, so would like to solicit input from this list.

I am successfully using

cat("\rBootstrap replication ", i, " of ", boot.num,)
flush.console() # To flush stdout on windows systems

which works as expected on *NIX systems and using Rterm under windows.
However, under Rgui the carriage return \r is ignored, and I certainly
don't want to use the newline escape sequence \n. Under Rgui it appears
as

Bootstrap replication 1 of 399Bootstrap replication 2 of 399Bootstrap...

but I want it to function properly if at all possible.

My question is simply whether there is a portable way to implement this
so that it works regardless of the R platform the user may be working
on?

Many thanks for any/all suggestions.

-- Jeff

-- 
Professor J. S. Racine Phone:  (905) 525 9140 x 23825
Department of EconomicsFAX:(905) 521-8232
McMaster Universitye-mail: [EMAIL PROTECTED]
1280 Main St. W.,Hamilton, URL:
http://www.economics.mcmaster.ca/racine/
Ontario, Canada. L8S 4M4

`The generation of random numbers is too important to be left to

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html