On Mar 23, 2012, at 3:55 PM, heyi xiao wrote:
David,
Thanks a lot for the specific suggestions. That’s very helpful. My
question 1 is fully answered now. I guess I am not clear enough for
my question 2. I would like to generate a random sample using the
estimated probability density (as a result of my question 1) as the
reference distribution.
Say, I get a matrix of the estimated density (at some grid points)
using MASS::kde2d. How can I use that result as a reference
distribution to sample data from? I know it is a trivial issue for
parametric distributions like bivariate normal, but what about such
a nonparametric bivariate reference distribution? Any particular
procedures or functions I can use?
See if this works:
data(geyser, package="MASS")
x <- cbind(geyser$duration, geyser$waiting)
est <- bkde2D(x, bandwidth=c(0.7, 7))
# Heh, I realized after I did this that I started with
KernSmooth::bkde2D
# and checked the results with MASS::kde2d
# only difference appears to be name of density matrix
# Construct a dataframe with X.Y information and the data from the
bivariate density.
# The output of bkde2D with n=50 is:
#List of 3
#$ x1 : num [1:51] -0.2167 -0.0823 0.052 0.1863 0.3207 ...
# $ x2 : num [1:51] 32.5 34.2 35.9 37.7 39.4 ...
# $ fhat: num [1:51, 1:51] 3.05e-19 2.17e-19 3.25e-19 2.17e-19 0.00 ...
# The index X.Y could be X + 51*Y and there would be a 1:1 mapping
from (X,Y) to X.Y
# and the fhat values would be properly arranged
dfrm <- expand.grid(X=1:51, Y=1:51)
dfrm$fhat <- c(est$fhat)
#Sample randomly from X.Y with length=51*51 using the fhat values for
prob.
# The X.Y "index" never actually gets computed
# but is implicit in the order of the data.frame
sampfrm <- dfrm[sample(51*51, 300, prob=est$fhat) , ]
f2 <- with(sampfrm, MASS::kde2d(X, Y, n = 50, lims = c(0, 51, 0,
51)) )
persp(f2)
# Looks reasonable to my eye anyway.
--
David.
The reason I don’t want to use sampling (with replacement, I can
sample more data than I have without replacement), as this will
generate lots of duplicate data points, if I want to generated
bigger dataset yet my raw data do not have a big sample size. The
scatter plot of the sampled data doesn’t look good this way.
Heyi
--- On Fri, 3/23/12, David Winsemius <dwinsem...@comcast.net> wrote:
From: David Winsemius <dwinsem...@comcast.net>
Subject: Re: [R] Nonparametric bivariate distribution estimation
and sampling
To: "heyi xiao" <xiaohey...@yahoo.com>
Cc: "Sarah Goslee" <sarah.gos...@gmail.com>, r-help@r-project.org
Date: Friday, March 23, 2012, 2:20 PM
On Mar 23, 2012, at 1:53 PM, heyi xiao wrote:
Sarah,
Thanks for the response. I actually have several years
of working experience with R and statistics, although may
not be as good as you. that’s why I am here ;) I dug
deeper into R documentations and previous R-help posts, and
couldn’t found anything particular help. Again, I want to
do two things: (1) estimate the probability density of this
bivariate distribution using some nonparametric method
(kernel, spline etc);
?MASS::kde2d
?KernSmooth::bkde2D
?ade4::s.kde2d
help(package=locfit)
(2) sample a big dataset from this bivariate
distribution for a simulation study.
What is wrong with `sample`?
# to get sample of size n without replacement
set.seed(42)
dfrm[ sample(1:NROW(dfrm), n) , ]
--David.
If my questions are not clear enough show my how I can
improve, or which part is not clear enough. If you have any
particular suggestions/comments, you are more than welcome.
Thanks!
Heyi
--- On Fri, 3/23/12, Sarah Goslee <sarah.gos...@gmail.com>
wrote:
From: Sarah Goslee <sarah.gos...@gmail.com>
Subject: Re: [R] Nonparametric bivariate
distribution estimation and sampling
To: "heyi xiao" <xiaohey...@yahoo.com>
Cc: r-help@r-project.org
Date: Friday, March 23, 2012, 12:26 PM
R can do all of that and more.
But you'll need to put some work in reading about
how to use
R, about
the statistical methods involved, and about how to
use them
to best
effect. You might want, for instance, generalized
additive
models. Or
not. If your question isn't more fully-formed than
this,
your best bet
is almost certainly to talk to a local
statistician, spend
some time
working with R, and then come back to the list
with
specific
questions.
Sarah
On Fri, Mar 23, 2012 at 12:17 PM, heyi xiao <xiaohey...@yahoo.com>
wrote:
Dear all,
I have a bivariate dataset from a preliminary
study. I
want to do two things: (1) estimate the probability
density
of this bivariate distribution using some
nonparametric
method (kernel, spline etc); (2) sample a big
dataset from
this bivariate distribution for a simulation
study.
Is there any good method or package I can use
in R for
my work? I don’t want parametric models like
bivariate
normal distribution etc, as I would like to
accurate model
my data. I don’t want to use the bootstrapping
approach,
i.e. sampling with replacement, as this will
generate lots
of duplicate data points. Any thoughts or input
will be
highly appreciated!
Heyi
--Sarah Goslee
http://www.functionaldiversity.org
______________________________________________
R-help@r-project.org
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.
David Winsemius, MD
West Hartford, CT
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.