Re: [R] Sampling the Distance Matrix

2015-09-25 Thread David Winsemius

On Sep 25, 2015, at 12:54 PM, Lorenzo Isella wrote:

> Apologies for not letting this thread rest in peace.
> The small script
> 
> #
> set.seed(1234)
> 
> x <- rnorm(20)
> y <- rnorm(20)
> 
> 
> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx],
> y[idx]) ) > 0.9))
> 
> mycomb <- mtxcomb [ , goodcls]
> #
> 
> 
> is perfect to detects groups of 5 points whose distances to each other
> are always above 0.9.
> However, in my practical case I have about 500 points and I am looking
> for subset of several tens of points whose distance is above a given
> threshold.
> Unfortunately, the approach above does not scale, so I wonder if
> anybody is aware of an alternative approach.

Find the center of the distribution, eliminate all the points within some 
reasonable radius perhaps sqrt( sd(x)^2 +sd(y)^2 ) and then work on the reduced 
set. If you needed to reduce it even further I could imagine sampling in 
sectors defined by tan(x/y).

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-25 Thread Lorenzo Isella

Absolutely right!
Thanks to both David for their help.
Cheers

Lorenzo

On Fri, Sep 25, 2015 at 01:54:54PM +, David L Carlson wrote:

You defined x and y in your original email as:


x<-rnorm(20)
y<-rnorm(20)

mm<-as.matrix(cbind(x,y))

dst<-(dist(mm))


-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, September 24, 2015 6:30 PM
To: Lorenzo Isella
Cc: David L Carlson; r-help@r-project.org
Subject: Re: [R] Sampling the Distance Matrix


On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:


On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:


On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:


Hi,
And thanks for your reply.
Essentially, your script gets the job done.
For instance, if I run

mm <- cbind(5/(1:5), -2*sqrt(1:5))
dst <- dist(mm)
dst2 <- as.matrix(dst)
diag(dst2) <- NA
idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))

then it correctly detects the first two rows, where all the values are
larger than 0.9.
In other words, it detects the points that are at least 0.9 units away
from *all* the other points.
My other question (I did not realize this until I got your answer) is
the following: I have the distance matrix of a set of N points.
You gave me an algorithm two find all the points that are at least 0.9
units away from any other points.
However, in some cases, for me it is OK even a weaker condition: find
a subset of k points (with k tunable) whose distance *from each other*
is greater than 0.9 units (even if their distance from some other
points may be smaller than 0.9).


If I understand . Make a matrix of unique combinations, then apply by rows 
to get the qualifying columns that satisfy the distance criterion:

mtxcomb <- combn(1:20, 5)
goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) 
> 0.9))
mtxcomb [ , goodcls]

In my sample it was around 9% of the total 5 item combinations.

snipped a lot of output:
.
  [,1440] [,1441]
[1,]  12  13
[2,]  13  16
[3,]  16  17
[4,]  19  19
[5,]  20  20

dim( mtxcomb)

[1] 5 15504



Hi,
Thanks for your reply.
I think I am getting there, but when I run your commands, I get this
error message

Error in cbind(x[idx], y[idx]) : object 'x' not found

Any idea why? Should I combine those 3 lines with something else?


No idea. I was running the setup that you asked for in your original message 
which you have now omitted from the mail chain.




Cheers

Lorenzo


David Winsemius
Alameda, CA, USA



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-25 Thread Lorenzo Isella

Apologies for not letting this thread rest in peace.
The small script

#
set.seed(1234)

x <- rnorm(20)
y <- rnorm(20)


goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx],
y[idx]) ) > 0.9))

mycomb <- mtxcomb [ , goodcls]
#


is perfect to detects groups of 5 points whose distances to each other
are always above 0.9.
However, in my practical case I have about 500 points and I am looking
for subset of several tens of points whose distance is above a given
threshold.
Unfortunately, the approach above does not scale, so I wonder if
anybody is aware of an alternative approach.
Many thanks

Lorenzo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-25 Thread David L Carlson
You defined x and y in your original email as:

> x<-rnorm(20)
> y<-rnorm(20)
>
> mm<-as.matrix(cbind(x,y))
>
> dst<-(dist(mm))

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, September 24, 2015 6:30 PM
To: Lorenzo Isella
Cc: David L Carlson; r-help@r-project.org
Subject: Re: [R] Sampling the Distance Matrix


On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:

> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:
>> 
>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:
>> 
>>> Hi,
>>> And thanks for your reply.
>>> Essentially, your script gets the job done.
>>> For instance, if I run
>>> 
>>> mm <- cbind(5/(1:5), -2*sqrt(1:5))
>>> dst <- dist(mm)
>>> dst2 <- as.matrix(dst)
>>> diag(dst2) <- NA
>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>> 
>>> then it correctly detects the first two rows, where all the values are
>>> larger than 0.9.
>>> In other words, it detects the points that are at least 0.9 units away
>>> from *all* the other points.
>>> My other question (I did not realize this until I got your answer) is
>>> the following: I have the distance matrix of a set of N points.
>>> You gave me an algorithm two find all the points that are at least 0.9
>>> units away from any other points.
>>> However, in some cases, for me it is OK even a weaker condition: find
>>> a subset of k points (with k tunable) whose distance *from each other*
>>> is greater than 0.9 units (even if their distance from some other
>>> points may be smaller than 0.9).
>> 
>> If I understand . Make a matrix of unique combinations, then apply by 
>> rows to get the qualifying columns that satisfy the distance criterion:
>> 
>> mtxcomb <- combn(1:20, 5)
>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], 
>> y[idx]) ) > 0.9))
>> mtxcomb [ , goodcls]
>> 
>> In my sample it was around 9% of the total 5 item combinations.
>> 
>> snipped a lot of output:
>> .
>>   [,1440] [,1441]
>> [1,]  12  13
>> [2,]  13  16
>> [3,]  16  17
>> [4,]  19  19
>> [5,]  20  20
>>> dim( mtxcomb)
>> [1] 5 15504
>> 
> 
> Hi,
> Thanks for your reply.
> I think I am getting there, but when I run your commands, I get this
> error message
> 
> Error in cbind(x[idx], y[idx]) : object 'x' not found
> 
> Any idea why? Should I combine those 3 lines with something else?

No idea. I was running the setup that you asked for in your original message 
which you have now omitted from the mail chain.



> Cheers
> 
> Lorenzo

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-24 Thread Lorenzo Isella

Hi,
And thanks for your reply.
Essentially, your script gets the job done.
For instance, if I run

mm <- cbind(5/(1:5), -2*sqrt(1:5))
dst <- dist(mm)
dst2 <- as.matrix(dst)
diag(dst2) <- NA
idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))

then it correctly detects the first two rows, where all the values are
larger than 0.9.
In other words, it detects the points that are at least 0.9 units away
from *all* the other points.
My other question (I did not realize this until I got your answer) is
the following: I have the distance matrix of a set of N points.
You gave me an algorithm two find all the points that are at least 0.9
units away from any other points.
However, in some cases, for me it is OK even a weaker condition: find
a subset of k points (with k tunable) whose distance *from each other*
is greater than 0.9 units (even if their distance from some other
points may be smaller than 0.9).
Any idea about how to tackle that? Is it simply a matter of detecting
the row and column numbers of all the entries of the distance matrix
larger than 0.9?
Many thanks

Lorenzo



On Wed, Sep 23, 2015 at 09:23:04PM +, David L Carlson wrote:

I think the OP wanted rows where all values were greater than .9.
If so, this works:


set.seed(42)
dst <- dist(cbind(rnorm(20), rnorm(20)))
dst2 <- as.matrix(dst)
diag(dst2) <- NA
idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
idx

13 18 19
13 18 19

dst2[idx, idx]

13   18   19
13   NA 2.272407 3.606054
18 2.272407   NA 1.578150
19 3.606054 1.578150   NA

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap
Sent: Wednesday, September 23, 2015 3:23 PM
To: Lorenzo Isella
Cc: r-help@r-project.org
Subject: Re: [R] Sampling the Distance Matrix


mm <- cbind(1/(1:5), sqrt(1:5))
d <- dist(mm)
d

 1 2 3 4
2 0.6492864
3 0.9901226 0.3588848
4 1.250 0.6369033 0.2806086
5 1.4723668 0.8748970 0.5213550 0.2413050

which(as.matrix(d)>0.9, arr.ind=TRUE)

 row col
3   3   1
4   4   1
5   5   1
1   1   3
1   1   4
1   1   5
I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9

The as.matrix(d) is needed because dist returns the lower triangle of
the distance
matrix and an object of class "dist" and as.matrix.dist converts that
into a matrix.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella
<lorenzo.ise...@gmail.com> wrote:

Dear All,
Suppose you have a distance matrix stored like a dist object, for
instance

x<-rnorm(20)
y<-rnorm(20)

mm<-as.matrix(cbind(x,y))

dst<-(dist(mm))

Now, my problem is the following: I would like to get the rows of mm
corresponding to points whose distance is always larger of, let's say,
0.9.
In other words, if I were to compute the distance matrix on those
selected rows of mm, apart from the diagonal, I would get all entries
larger than 0.9.
Any idea about how I can efficiently code that?
Regards

Lorenzo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-24 Thread David Winsemius

On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:

> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:
>> 
>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:
>> 
>>> Hi,
>>> And thanks for your reply.
>>> Essentially, your script gets the job done.
>>> For instance, if I run
>>> 
>>> mm <- cbind(5/(1:5), -2*sqrt(1:5))
>>> dst <- dist(mm)
>>> dst2 <- as.matrix(dst)
>>> diag(dst2) <- NA
>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>> 
>>> then it correctly detects the first two rows, where all the values are
>>> larger than 0.9.
>>> In other words, it detects the points that are at least 0.9 units away
>>> from *all* the other points.
>>> My other question (I did not realize this until I got your answer) is
>>> the following: I have the distance matrix of a set of N points.
>>> You gave me an algorithm two find all the points that are at least 0.9
>>> units away from any other points.
>>> However, in some cases, for me it is OK even a weaker condition: find
>>> a subset of k points (with k tunable) whose distance *from each other*
>>> is greater than 0.9 units (even if their distance from some other
>>> points may be smaller than 0.9).
>> 
>> If I understand . Make a matrix of unique combinations, then apply by 
>> rows to get the qualifying columns that satisfy the distance criterion:
>> 
>> mtxcomb <- combn(1:20, 5)
>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], 
>> y[idx]) ) > 0.9))
>> mtxcomb [ , goodcls]
>> 
>> In my sample it was around 9% of the total 5 item combinations.
>> 
>> snipped a lot of output:
>> .
>>   [,1440] [,1441]
>> [1,]  12  13
>> [2,]  13  16
>> [3,]  16  17
>> [4,]  19  19
>> [5,]  20  20
>>> dim( mtxcomb)
>> [1] 5 15504
>> 
> 
> Hi,
> Thanks for your reply.
> I think I am getting there, but when I run your commands, I get this
> error message
> 
> Error in cbind(x[idx], y[idx]) : object 'x' not found
> 
> Any idea why? Should I combine those 3 lines with something else?

No idea. I was running the setup that you asked for in your original message 
which you have now omitted from the mail chain.



> Cheers
> 
> Lorenzo

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-24 Thread David Winsemius

On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:

> Hi,
> And thanks for your reply.
> Essentially, your script gets the job done.
> For instance, if I run
> 
> mm <- cbind(5/(1:5), -2*sqrt(1:5))
> dst <- dist(mm)
> dst2 <- as.matrix(dst)
> diag(dst2) <- NA
> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
> 
> then it correctly detects the first two rows, where all the values are
> larger than 0.9.
> In other words, it detects the points that are at least 0.9 units away
> from *all* the other points.
> My other question (I did not realize this until I got your answer) is
> the following: I have the distance matrix of a set of N points.
> You gave me an algorithm two find all the points that are at least 0.9
> units away from any other points.
> However, in some cases, for me it is OK even a weaker condition: find
> a subset of k points (with k tunable) whose distance *from each other*
> is greater than 0.9 units (even if their distance from some other
> points may be smaller than 0.9).

If I understand . Make a matrix of unique combinations, then apply by rows 
to get the qualifying columns that satisfy the distance criterion:

mtxcomb <- combn(1:20, 5)
goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) 
> 0.9))
mtxcomb [ , goodcls]

In my sample it was around 9% of the total 5 item combinations.

snipped a lot of output:
.
[,1440] [,1441]
[1,]  12  13
[2,]  13  16
[3,]  16  17
[4,]  19  19
[5,]  20  20
> dim( mtxcomb)
[1] 5 15504


-- 
David

> Any idea about how to tackle that? Is it simply a matter of detecting
> the row and column numbers of all the entries of the distance matrix
> larger than 0.9?
> Many thanks
> 
> Lorenzo
> 
> 
> 
> On Wed, Sep 23, 2015 at 09:23:04PM +, David L Carlson wrote:
>> I think the OP wanted rows where all values were greater than .9.
>> If so, this works:
>> 
>>> set.seed(42)
>>> dst <- dist(cbind(rnorm(20), rnorm(20)))
>>> dst2 <- as.matrix(dst)
>>> diag(dst2) <- NA
>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>> idx
>> 13 18 19
>> 13 18 19
>>> dst2[idx, idx]
>>13   18   19
>> 13   NA 2.272407 3.606054
>> 18 2.272407   NA 1.578150
>> 19 3.606054 1.578150   NA
>> 
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A University
>> College Station, TX 77840-4352
>> 
>> 
>> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of William 
>> Dunlap
>> Sent: Wednesday, September 23, 2015 3:23 PM
>> To: Lorenzo Isella
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Sampling the Distance Matrix
>> 
>>> mm <- cbind(1/(1:5), sqrt(1:5))
>>> d <- dist(mm)
>>> d
>> 1 2 3 4
>> 2 0.6492864
>> 3 0.9901226 0.3588848
>> 4 1.250 0.6369033 0.2806086
>> 5 1.4723668 0.8748970 0.5213550 0.2413050
>>> which(as.matrix(d)>0.9, arr.ind=TRUE)
>> row col
>> 3   3   1
>> 4   4   1
>> 5   5   1
>> 1   1   3
>> 1   1   4
>> 1   1   5
>> I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9
>> 
>> The as.matrix(d) is needed because dist returns the lower triangle of
>> the distance
>> matrix and an object of class "dist" and as.matrix.dist converts that
>> into a matrix.
>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>> 
>> 
>> On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella
>> <lorenzo.ise...@gmail.com> wrote:
>>> Dear All,
>>> Suppose you have a distance matrix stored like a dist object, for
>>> instance
>>> 
>>> x<-rnorm(20)
>>> y<-rnorm(20)
>>> 
>>> mm<-as.matrix(cbind(x,y))
>>> 
>>> dst<-(dist(mm))
>>> 
>>> Now, my problem is the following: I would like to get the rows of mm
>>> corresponding to points whose distance is always larger of, let's say,
>>> 0.9.
>>> In other words, if I were to compute the distance matrix on those
>>> selected rows of mm, apart from the diagonal, I would get all entries
>>> larger than 0.9.
>>> Any idea about how I can efficiently code that?
>>> Regards
>>> 
>>> Lorenzo
>>> 
>>> __
>>> R-help@r-project.

Re: [R] Sampling the Distance Matrix

2015-09-24 Thread Lorenzo Isella

On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:


On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:


Hi,
And thanks for your reply.
Essentially, your script gets the job done.
For instance, if I run

mm <- cbind(5/(1:5), -2*sqrt(1:5))
dst <- dist(mm)
dst2 <- as.matrix(dst)
diag(dst2) <- NA
idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))

then it correctly detects the first two rows, where all the values are
larger than 0.9.
In other words, it detects the points that are at least 0.9 units away
from *all* the other points.
My other question (I did not realize this until I got your answer) is
the following: I have the distance matrix of a set of N points.
You gave me an algorithm two find all the points that are at least 0.9
units away from any other points.
However, in some cases, for me it is OK even a weaker condition: find
a subset of k points (with k tunable) whose distance *from each other*
is greater than 0.9 units (even if their distance from some other
points may be smaller than 0.9).


If I understand . Make a matrix of unique combinations, then apply by rows 
to get the qualifying columns that satisfy the distance criterion:

mtxcomb <- combn(1:20, 5)
goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) 
> 0.9))
mtxcomb [ , goodcls]

In my sample it was around 9% of the total 5 item combinations.

snipped a lot of output:
.
   [,1440] [,1441]
[1,]  12  13
[2,]  13  16
[3,]  16  17
[4,]  19  19
[5,]  20  20

dim( mtxcomb)

[1] 5 15504



Hi,
Thanks for your reply.
I think I am getting there, but when I run your commands, I get this
error message

Error in cbind(x[idx], y[idx]) : object 'x' not found

Any idea why? Should I combine those 3 lines with something else?
Cheers

Lorenzo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-23 Thread David L Carlson
I think the OP wanted rows where all values were greater than .9.
If so, this works:

> set.seed(42)
> dst <- dist(cbind(rnorm(20), rnorm(20)))
> dst2 <- as.matrix(dst)
> diag(dst2) <- NA
> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
> idx
13 18 19 
13 18 19 
> dst2[idx, idx]
 13   18   19
13   NA 2.272407 3.606054
18 2.272407   NA 1.578150
19 3.606054 1.578150   NA

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap
Sent: Wednesday, September 23, 2015 3:23 PM
To: Lorenzo Isella
Cc: r-help@r-project.org
Subject: Re: [R] Sampling the Distance Matrix

> mm <- cbind(1/(1:5), sqrt(1:5))
> d <- dist(mm)
> d
  1 2 3 4
2 0.6492864
3 0.9901226 0.3588848
4 1.250 0.6369033 0.2806086
5 1.4723668 0.8748970 0.5213550 0.2413050
> which(as.matrix(d)>0.9, arr.ind=TRUE)
  row col
3   3   1
4   4   1
5   5   1
1   1   3
1   1   4
1   1   5
I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9

The as.matrix(d) is needed because dist returns the lower triangle of
the distance
matrix and an object of class "dist" and as.matrix.dist converts that
into a matrix.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella
<lorenzo.ise...@gmail.com> wrote:
> Dear All,
> Suppose you have a distance matrix stored like a dist object, for
> instance
>
> x<-rnorm(20)
> y<-rnorm(20)
>
> mm<-as.matrix(cbind(x,y))
>
> dst<-(dist(mm))
>
> Now, my problem is the following: I would like to get the rows of mm
> corresponding to points whose distance is always larger of, let's say,
> 0.9.
> In other words, if I were to compute the distance matrix on those
> selected rows of mm, apart from the diagonal, I would get all entries
> larger than 0.9.
> Any idea about how I can efficiently code that?
> Regards
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling the Distance Matrix

2015-09-23 Thread William Dunlap
> mm <- cbind(1/(1:5), sqrt(1:5))
> d <- dist(mm)
> d
  1 2 3 4
2 0.6492864
3 0.9901226 0.3588848
4 1.250 0.6369033 0.2806086
5 1.4723668 0.8748970 0.5213550 0.2413050
> which(as.matrix(d)>0.9, arr.ind=TRUE)
  row col
3   3   1
4   4   1
5   5   1
1   1   3
1   1   4
1   1   5
I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9

The as.matrix(d) is needed because dist returns the lower triangle of
the distance
matrix and an object of class "dist" and as.matrix.dist converts that
into a matrix.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella
 wrote:
> Dear All,
> Suppose you have a distance matrix stored like a dist object, for
> instance
>
> x<-rnorm(20)
> y<-rnorm(20)
>
> mm<-as.matrix(cbind(x,y))
>
> dst<-(dist(mm))
>
> Now, my problem is the following: I would like to get the rows of mm
> corresponding to points whose distance is always larger of, let's say,
> 0.9.
> In other words, if I were to compute the distance matrix on those
> selected rows of mm, apart from the diagonal, I would get all entries
> larger than 0.9.
> Any idea about how I can efficiently code that?
> Regards
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows with values never sampled before

2015-06-23 Thread Jon Skoien
If df is the data.frame with values and you want nn samples, then this 
is a slightly different approach:


# example data.frame:
df = data.frame(a1 = sample(1:20,50, replace = TRUE),
a2 =  sample(seq(0.1,10,length.out = 
30),50, replace = TRUE),
a3 = sample(seq(0.3, 20,length.out = 
20),50,replace = TRUE))

nrow = dim(df)[1] # 50
ncol = dim(df)[2]  # 3

# start by randomizing the order in your data.frame
randomOrder = sample(1:nrow, nrow, replace = FALSE)
dff = df[randomOrder,]

# find and remove all duplicates from all columns. With this you will 
only keep the first instance of any unique value:

rem = NULL
for (ic in 1:ncol) rem = c(rem, which(duplicated(dff[, ic])))
if (length(rem)  0) dff = dff[-unique(rem),]

# Reduce to the length you need
if (dim(dff)[1]  nn)  res = dff[1:nn,] else res = dff

I am not sure how this scales if you have a really big data, and whether 
you could get some FAQ 7.31 problems depending on how you fill your 
data.frame.


Cheers,
Jon

On 6/23/2015 12:13 AM, C W wrote:

Hi Jean,

Thanks!

Daniel,
Yes, you are absolutely right.  I want sampled vectors to be as different
as possible.

I added a little more to the earlier data set.
 x1  x2  x3
  [1,]  1 3.7  2.1
  [2,]  2 3.7  5.3
  [3,]  3 3.7  6.2
  [4,]  4 3.7  8.9
  [5,]  5 3.7  4.1
  [6,]  1 2.9  2.1
  [7,]  2 2.9  5.3
  [8,]  3 2.9  6.2
  [9,]  4 2.9  8.9
[10,]  5 2.9 4.1
[11,]  1 5.2 2.1
[12,]  2 5.2 5.3
[13,]  3 5.2 6.2
[14,]  4 5.2 8.9
[15,]  5 5.2 4.1

If I sampled row, 1, 6, 11, solving the system of equations will not be
possible.  So, I am avoiding similar vectors.

Thanks,

Mike


On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund djnordl...@frontier.com
wrote:


On 6/22/2015 9:42 AM, C W wrote:


Hello R list,

I am have question about sampling unique coordinate values.

Here's how my data looks like

  dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))

dat


x1  x2
   [1,]  1 3.7
   [2,]  2 3.7
   [3,]  3 3.7
   [4,]  4 3.7
   [5,]  5 3.7
   [6,]  1 2.9
   [7,]  2 2.9
   [8,]  3 2.9
   [9,]  4 2.9
[10,]  5 2.9
[11,]  1 5.2
[12,]  2 5.2
[13,]  3 5.2
[14,]  4 5.2
[15,]  5 5.2


If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).

I want to avoid either the first or second coordinate repeated.  It leads
to undefined matrix inversion.

I thought of using sampling(), but not sure about applying it to a data
frame.

Thanks in advance,

Mike



I am not sure you gave us enough information to solve your real world
problem.  But I have a few comments and a potential solution.

1. In your example the unique values in in x1 are completely crossed with
the unique values in x2.
2. since you don't want duplicates of either number, then the maximum
number of samples that you can take is the minimum number of unique values
in either vector, x1 or x2 (in this case x2 with 3 unique values).
3. Sample without replace from the smallest set of unique values first.
4. Sample without replacement from the larger set second.


x - 1:5
xx - c(3.7, 2.9, 5.2)
s2 - sample(xx,2, replace=FALSE)
s1 - sample(x,2, replace=FALSE)
samp - cbind(s1,s2)

samp

  s1  s2
[1,]  5 3.7
[2,]  1 5.2
Your actual data is probably larger, and the unique values in each vector
may not be completely crossed, in which case the task is a little harder.
In that case, you could remove values from your data as you sample.  This
may not be efficient, but it will work.

smpl - function(dat, size){
   mysamp - numeric(0)
   for(i in 1:size) {
 s - dat[sample(nrow(dat),1),]
 mysamp - rbind(mysamp,s, deparse.level=0)
 dat - dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
 }
   mysamp
}


This is just an example of how you might approach your real world
problem.  There is no error checking, and for large samples it may not
scale well.


Hope this is helpful,

Dan

--
Daniel Nordlund
Bothell, WA USA


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Climate Risk Management Unit

Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY

jon.sko...@jrc.ec.europa.eu
Tel:  +39 0332 789205

Disclaimer: Views expressed in this email are those of the individual and do 
not necessarily represent official views of the European Commission.


Re: [R] sampling rows with values never sampled before

2015-06-22 Thread Adams, Jean
Mike,

There may be a more efficient way to do this, but this works on your
example.

# mix up the order of the rows
mix - dat[order(runif(dim(dat)[1])), ]

# get rid of duplicate x1s and x2s
sub - mix[!duplicated(mix[, x1])  !duplicated(mix[, x2]), ]
sub

Jean

On Mon, Jun 22, 2015 at 11:42 AM, C W tmrs...@gmail.com wrote:

 Hello R list,

 I am have question about sampling unique coordinate values.

 Here's how my data looks like

  dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
  dat
   x1  x2
  [1,]  1 3.7
  [2,]  2 3.7
  [3,]  3 3.7
  [4,]  4 3.7
  [5,]  5 3.7
  [6,]  1 2.9
  [7,]  2 2.9
  [8,]  3 2.9
  [9,]  4 2.9
 [10,]  5 2.9
 [11,]  1 5.2
 [12,]  2 5.2
 [13,]  3 5.2
 [14,]  4 5.2
 [15,]  5 5.2


 If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).

 I want to avoid either the first or second coordinate repeated.  It leads
 to undefined matrix inversion.

 I thought of using sampling(), but not sure about applying it to a data
 frame.

 Thanks in advance,

 Mike

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows with values never sampled before

2015-06-22 Thread C W
Hi Jean,

Thanks!

Daniel,
Yes, you are absolutely right.  I want sampled vectors to be as different
as possible.

I added a little more to the earlier data set.
x1  x2  x3
 [1,]  1 3.7  2.1
 [2,]  2 3.7  5.3
 [3,]  3 3.7  6.2
 [4,]  4 3.7  8.9
 [5,]  5 3.7  4.1
 [6,]  1 2.9  2.1
 [7,]  2 2.9  5.3
 [8,]  3 2.9  6.2
 [9,]  4 2.9  8.9
[10,]  5 2.9 4.1
[11,]  1 5.2 2.1
[12,]  2 5.2 5.3
[13,]  3 5.2 6.2
[14,]  4 5.2 8.9
[15,]  5 5.2 4.1

If I sampled row, 1, 6, 11, solving the system of equations will not be
possible.  So, I am avoiding similar vectors.

Thanks,

Mike


On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund djnordl...@frontier.com
wrote:

 On 6/22/2015 9:42 AM, C W wrote:

 Hello R list,

 I am have question about sampling unique coordinate values.

 Here's how my data looks like

  dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
 dat

x1  x2
   [1,]  1 3.7
   [2,]  2 3.7
   [3,]  3 3.7
   [4,]  4 3.7
   [5,]  5 3.7
   [6,]  1 2.9
   [7,]  2 2.9
   [8,]  3 2.9
   [9,]  4 2.9
 [10,]  5 2.9
 [11,]  1 5.2
 [12,]  2 5.2
 [13,]  3 5.2
 [14,]  4 5.2
 [15,]  5 5.2


 If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).

 I want to avoid either the first or second coordinate repeated.  It leads
 to undefined matrix inversion.

 I thought of using sampling(), but not sure about applying it to a data
 frame.

 Thanks in advance,

 Mike


 I am not sure you gave us enough information to solve your real world
 problem.  But I have a few comments and a potential solution.

 1. In your example the unique values in in x1 are completely crossed with
 the unique values in x2.
 2. since you don't want duplicates of either number, then the maximum
 number of samples that you can take is the minimum number of unique values
 in either vector, x1 or x2 (in this case x2 with 3 unique values).
 3. Sample without replace from the smallest set of unique values first.
 4. Sample without replacement from the larger set second.

  x - 1:5
  xx - c(3.7, 2.9, 5.2)
  s2 - sample(xx,2, replace=FALSE)
  s1 - sample(x,2, replace=FALSE)
  samp - cbind(s1,s2)
 
  samp
  s1  s2
 [1,]  5 3.7
 [2,]  1 5.2
 

 Your actual data is probably larger, and the unique values in each vector
 may not be completely crossed, in which case the task is a little harder.
 In that case, you could remove values from your data as you sample.  This
 may not be efficient, but it will work.

 smpl - function(dat, size){
   mysamp - numeric(0)
   for(i in 1:size) {
 s - dat[sample(nrow(dat),1),]
 mysamp - rbind(mysamp,s, deparse.level=0)
 dat - dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
 }
   mysamp
 }


 This is just an example of how you might approach your real world
 problem.  There is no error checking, and for large samples it may not
 scale well.


 Hope this is helpful,

 Dan

 --
 Daniel Nordlund
 Bothell, WA USA


 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows with values never sampled before

2015-06-22 Thread Daniel Nordlund

On 6/22/2015 9:42 AM, C W wrote:

Hello R list,

I am have question about sampling unique coordinate values.

Here's how my data looks like


dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
dat

   x1  x2
  [1,]  1 3.7
  [2,]  2 3.7
  [3,]  3 3.7
  [4,]  4 3.7
  [5,]  5 3.7
  [6,]  1 2.9
  [7,]  2 2.9
  [8,]  3 2.9
  [9,]  4 2.9
[10,]  5 2.9
[11,]  1 5.2
[12,]  2 5.2
[13,]  3 5.2
[14,]  4 5.2
[15,]  5 5.2


If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).

I want to avoid either the first or second coordinate repeated.  It leads
to undefined matrix inversion.

I thought of using sampling(), but not sure about applying it to a data
frame.

Thanks in advance,

Mike



I am not sure you gave us enough information to solve your real world 
problem.  But I have a few comments and a potential solution.


1. In your example the unique values in in x1 are completely crossed 
with the unique values in x2.
2. since you don't want duplicates of either number, then the maximum 
number of samples that you can take is the minimum number of unique 
values in either vector, x1 or x2 (in this case x2 with 3 unique values).

3. Sample without replace from the smallest set of unique values first.
4. Sample without replacement from the larger set second.

 x - 1:5
 xx - c(3.7, 2.9, 5.2)
 s2 - sample(xx,2, replace=FALSE)
 s1 - sample(x,2, replace=FALSE)
 samp - cbind(s1,s2)

 samp
 s1  s2
[1,]  5 3.7
[2,]  1 5.2


Your actual data is probably larger, and the unique values in each 
vector may not be completely crossed, in which case the task is a little 
harder.  In that case, you could remove values from your data as you 
sample.  This may not be efficient, but it will work.


smpl - function(dat, size){
  mysamp - numeric(0)
  for(i in 1:size) {
s - dat[sample(nrow(dat),1),]
mysamp - rbind(mysamp,s, deparse.level=0)
dat - dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
}
  mysamp
}


This is just an example of how you might approach your real world 
problem.  There is no error checking, and for large samples it may not 
scale well.



Hope this is helpful,

Dan

--
Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling

2015-03-30 Thread Daniel Nordlund

On 3/29/2015 11:10 PM, Partha Sinha wrote:

I have 1000 data points.  i want to take 30 samples and find mean. I
also want to repeat this process 100 times. How to go about it?
Regards
Parth

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



see ?replicate and ?sample.  Simple example where yourdata is a simple 
vector of values, and assuming you want to sample without replacement. 
Generalizing it to other data structures is left as an exercise for the 
reader.


replicate(100,mean(sample(yourdata,30, replace=FALSE)))

hope this is helpful,

Dan

--
Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling dataframe based upon number of record occurrences

2015-03-04 Thread David L Carlson
I'm not sure I understand, but I think you have a large data frame with records 
and you want to construct a sample of that data frame that includes no more 
than 3 records for each IDbyYear combination? You say there are 5589 unique 
combinations and your code uses a data frame called fitting_set. Assuming this 
is the data frame you are describing, your code will select all of the lines 
since fitting_set$IDbyYear[i] is always a vector of length 1.

We need a reproducible example. The best way for you to give us that would be 
to copy the result of dput(head(fitting_set, 10)). It would look something like 
this plus the 6 other columns you mention except that I've added dta - in 
front of structure() to create a data frame:

dta - structure(list(IDbyYear = c(42.24, 42.24, 42.24, 42.24, 42.24, 
42.24, 45.32, 45.32, 45.36, 45.4, 45.4), SiteID = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c(A-Airport, 
A-Bark Corral East), class = factor), Year = c(2006L, 2006L, 
2006L, 2006L, 2006L, 2006L, 2008L, 2008L, 2009L, 2010L, 2010L
)), .Names = c(IDbyYear, SiteID, Year), class = data.frame, row.names = 
c(NA, 
-11L))

Now create a list of data frames, one for each IDbyYear:

dta.list - split(dta, dta$IDbyYear)

Now a function that will select 3 rows or all of them if there are fewer:

smp - function(dframe) {
ind - seq_len(nrow(dframe))
dframe[sample(ind, ifelse(length(ind)2, 3, length(ind))),]
}

Now take the samples and combine them into a single data frame:

sample - do.call(rbind, lapply(dta.list, smp))
sample

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Curtis 
Burkhalter
Sent: Tuesday, March 3, 2015 3:23 PM
To: r-help@r-project.org
Subject: [R] sampling dataframe based upon number of record occurrences

Hello everyone,

I'm having trouble performing a task that is probably very simple, but
can't seem to figure out how to get my code to work. What I want to do is
use the sample function to pick records within in a dataframe, but only if
a column attribute value is repeated more than 3 times. So if you look at
the data below I have created a unique attribute value that corresponds to
every site by year combination (i.e. IDxYear). So you can see that for the
site called A-Airport it was sampled 6 times in 2006, A-Bank Corral
East was sampled twice in 2008. So what I want to do is randomly select 3
records for A-Airport in 2006 for the existing 6 records, but for A-Bark
Corral East in 2008 I just want to leave these records as they currently
are.

I've used the following code to try and  accomplish this, but like I said I
can't get it to work so I'm clearly doing something wrong. If you could
check out the code and provide any suggestions that would be great. It
should be noted that there are 5589 unique IDxYear combinations so that's
why that number is in the code. If any further clarification is needed also
let me know.

boom=data.frame()
for (i in 1:5589){

boom[i,]=ifelse(length(fitting_set$IDbyYear[i]3),fitting_set[sample(nrow(fitting_set),3),],fitting_set)

}
boom


  *IDbyYear*   *SiteID *  *Year*
 *6 other column attributes*
  42.24   A-Airport 2006
 42.24   A-Airport 2006
  42.24   A-Airport 2006
 42.24   A-Airport 2006
  42.24   A-Airport 2006
 42.24   A-Airport 2006
 45.32  A-Bark Corral East2008
 45.32  A-Bark Corral East2008
 45.36  A-Bark Corral East2009
 45.40  A-Bark Corral East2010
 45.40   A-Bark Corral East   2010

 Thanks


-- 
Curtis Burkhalter

https://sites.google.com/site/curtisburkhalter/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling dataframe based upon number of record occurrences

2015-03-04 Thread JS Huang
Here is an implementation with function named getSample. Some modification to
the data was made so that it can be read as a table.

 fitting.set
   IDbyYear SiteID Year
1 42.24  A-Airport 2006
2 42.24  A-Airport 2006
3 42.24  A-Airport 2006
4 42.24  A-Airport 2006
5 42.24  A-Airport 2006
6 42.24  A-Airport 2006
7 45.32 A-Bark.Corral.East 2008
8 45.32 A-Bark.Corral.East 2008
9 45.36 A-Bark.Corral.East 2009
1045.40 A-Bark.Corral.East 2010
1145.40 A-Bark.Corral.East 2010
 getSample
function(x)
{
  sites - unique(x$SiteID)
  years - unique(x$Year)
  result - data.frame()
  x$ID - seq(1,nrow(x))
  for (i in 1:length(sites))
  {
for (j in 1:length(years))
{
  if (nrow(x[as.character(x$SiteID)==as.character(sites[i]) 
x$Year==years[j],])  3)
  {
sampledID - sample(x[as.character(x$SiteID)==as.character(sites[i])
 x$Year==years[j],]$ID,3,replace=FALSE)
for (k in 1:length(sampledID))
{
  result - rbind(result,x[x$ID==sampledID[k],-4])
}  
  }
}
  }
  names(result) - c(IDbyYear,SiteID,Year)
  rownames(result) - NULL
  return(result)
}
 getSample(fitting.set)
  IDbyYearSiteID Year
142.24 A-Airport 2006
242.24 A-Airport 2006
342.24 A-Airport 2006



--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-dataframe-based-upon-number-of-record-occurrences-tp4704144p4704154.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling dataframe based upon number of record occurrences

2015-03-04 Thread Curtis Burkhalter
That worked great, thanks so much David!

On Wed, Mar 4, 2015 at 8:23 AM, David L Carlson dcarl...@tamu.edu wrote:

 I'm not sure I understand, but I think you have a large data frame with
 records and you want to construct a sample of that data frame that includes
 no more than 3 records for each IDbyYear combination? You say there are
 5589 unique combinations and your code uses a data frame called
 fitting_set. Assuming this is the data frame you are describing, your code
 will select all of the lines since fitting_set$IDbyYear[i] is always a
 vector of length 1.

 We need a reproducible example. The best way for you to give us that would
 be to copy the result of dput(head(fitting_set, 10)). It would look
 something like this plus the 6 other columns you mention except that I've
 added dta - in front of structure() to create a data frame:

 dta - structure(list(IDbyYear = c(42.24, 42.24, 42.24, 42.24, 42.24,
 42.24, 45.32, 45.32, 45.36, 45.4, 45.4), SiteID = structure(c(1L,
 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c(A-Airport,
 A-Bark Corral East), class = factor), Year = c(2006L, 2006L,
 2006L, 2006L, 2006L, 2006L, 2008L, 2008L, 2009L, 2010L, 2010L
 )), .Names = c(IDbyYear, SiteID, Year), class = data.frame,
 row.names = c(NA,
 -11L))

 Now create a list of data frames, one for each IDbyYear:

 dta.list - split(dta, dta$IDbyYear)

 Now a function that will select 3 rows or all of them if there are fewer:

 smp - function(dframe) {
 ind - seq_len(nrow(dframe))
 dframe[sample(ind, ifelse(length(ind)2, 3, length(ind))),]
 }

 Now take the samples and combine them into a single data frame:

 sample - do.call(rbind, lapply(dta.list, smp))
 sample

 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Curtis
 Burkhalter
 Sent: Tuesday, March 3, 2015 3:23 PM
 To: r-help@r-project.org
 Subject: [R] sampling dataframe based upon number of record occurrences

 Hello everyone,

 I'm having trouble performing a task that is probably very simple, but
 can't seem to figure out how to get my code to work. What I want to do is
 use the sample function to pick records within in a dataframe, but only if
 a column attribute value is repeated more than 3 times. So if you look at
 the data below I have created a unique attribute value that corresponds to
 every site by year combination (i.e. IDxYear). So you can see that for the
 site called A-Airport it was sampled 6 times in 2006, A-Bank Corral
 East was sampled twice in 2008. So what I want to do is randomly select 3
 records for A-Airport in 2006 for the existing 6 records, but for A-Bark
 Corral East in 2008 I just want to leave these records as they currently
 are.

 I've used the following code to try and  accomplish this, but like I said I
 can't get it to work so I'm clearly doing something wrong. If you could
 check out the code and provide any suggestions that would be great. It
 should be noted that there are 5589 unique IDxYear combinations so that's
 why that number is in the code. If any further clarification is needed also
 let me know.

 boom=data.frame()
 for (i in 1:5589){


 boom[i,]=ifelse(length(fitting_set$IDbyYear[i]3),fitting_set[sample(nrow(fitting_set),3),],fitting_set)

 }
 boom


   *IDbyYear*   *SiteID *  *Year*
  *6 other column attributes*
   42.24   A-Airport 2006
  42.24   A-Airport 2006
   42.24   A-Airport 2006
  42.24   A-Airport 2006
   42.24   A-Airport 2006
  42.24   A-Airport 2006
  45.32  A-Bark Corral East2008
  45.32  A-Bark Corral East2008
  45.36  A-Bark Corral East2009
  45.40  A-Bark Corral East2010
  45.40   A-Bark Corral East   2010

  Thanks


 --
 Curtis Burkhalter

 https://sites.google.com/site/curtisburkhalter/

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Curtis Burkhalter

https://sites.google.com/site/curtisburkhalter/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] sampling dataframe based upon number of record occurrences

2015-03-04 Thread JS Huang
Since you indicated there are six more columns in the data.frame, getSample
modified below to take care of it.

 getSample
function(x)
{
  sites - unique(x$SiteID)
  years - unique(x$Year)
  result - data.frame()
  x$ID - seq(1,nrow(x))
  for (i in 1:length(sites))
  {
for (j in 1:length(years))
{
  if (nrow(x[as.character(x$SiteID)==as.character(sites[i]) 
x$Year==years[j],])  3)
  {
sampledID - sample(x[as.character(x$SiteID)==as.character(sites[i])
 x$Year==years[j],]$ID,3,replace=FALSE)
for (k in 1:length(sampledID))
{
  result - rbind(result,x[x$ID==sampledID[k],-ncol(x)])
}  
  }
}
  }
  names(result) - names(x)[-ncol(x)]
  rownames(result) - NULL
  return(result)
}
 getSample(fitting.set)
  IDbyYearSiteID Year
142.24 A-Airport 2006
242.24 A-Airport 2006
342.24 A-Airport 2006




--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-dataframe-based-upon-number-of-record-occurrences-tp4704144p4704155.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling according to type

2014-03-05 Thread Suzen, Mehmet
If I understood correctly, you need weighted sampling. Try 'prob'
argument from 'sample'.  For your example:

n - 10
ntype - rbinom(n, 1, 0.5)
myProbs - rep(1/10, 10) # equally likely
myProbs[ which(ntype == 0)] - 0.75/7 # Divide so the sum will be 1.0
myProbs[ which(ntype == 1)] - 0.25/3
sample(ntype,3, prob=myProbs)




On 5 March 2014 15:20, Thomas thomas.ches...@nottingham.ac.uk wrote:
 I have a matrix where each entry represents a data subject's type, 1 or 0:

 n - 10
 ntype - rbinom(n, 1, 0.5)

 and I'd like to sample say 3 subjects from ntype where those subjects who
 are Type 1 are selected with probability say 0.75, and Type 0 with (1-0.75).
 (So the sample would produce a list with three indices each referring to a
 position within ntype.)

 Can anyone suggest a way to do this please?

 Thank you,

 Thomas Chesney
 This message and any attachment are intended solely for the addressee and
 may contain confidential information. If you have received this message in
 error, please send it back to me, and immediately delete it.   Please do not
 use, copy or disclose the information contained in this message or in any
 attachment.  Any views or opinions expressed by the author of this email do
 not necessarily reflect the views of the University of Nottingham.

 This message has been checked for viruses but the contents of an attachment
 may still contain software viruses which could damage your computer system,
 you are advised to perform your own checks. Email communications with the
 University of Nottingham may be monitored as permitted by UK legislation.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling according to type

2014-03-05 Thread Suzen, Mehmet
 myProbs[ which(ntype == 0)] - 0.75/7 # Divide so the sum will be 1.0
 myProbs[ which(ntype == 1)] - 0.25/3

Here of course you need to divide by number of 0s and 1s,  7 and 3
were was just an example.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling question

2013-11-05 Thread arun
Hi,

You may try:
dat1 - structure(list(SubID = 1:8, CSE1 = c(6L, 6L, 5L, 5L, 5L, 5L, 
3L, 3L), CSE2 = c(5L, 4L, 5L, 4L, 6L, 4L, 6L, 6L), CSE3 = c(6L, 
7L, 5L, 3L, 7L, 3L, 6L, 6L), CSE4 = c(2L, 2L, 5L, 4L, 5L, 6L, 
3L, 3L), WSE1 = c(6L, 6L, 5L, 4L, 6L, 4L, 6L, 6L), WSE2 = c(2L, 
6L, 5L, 4L, 4L, 3L, 5L, 5L), WSE3 = c(2L, 2L, 4L, 5L, 4L, 7L, 
2L, 4L), WSE4 = c(4L, 3L, 5L, 2L, 1L, 3L, 1L, 7L)), .Names = c(SubID, 
CSE1, CSE2, CSE3, CSE4, WSE1, WSE2, WSE3, WSE4
), class = data.frame, row.names = c(NA, -8L))


fun1 - function(dat, rep){
res - replicate(rep,{
 lst1 -lapply(sample(nrow(dat),nrow(dat)),function(x) sample(dat[x,2:5],4))
names(lst1) - sapply(lst1,row.names)

lst1[-c(1:2)] - lapply(names(lst1)[-c(1:2)],function(i) {
            x1 - 
dat[i,6:9][is.na(match(gsub(^.,,names(dat[i,6:9])),gsub(^.,,names(lst1[[i]][1]]
             cbind(lst1[[i]][1], sample(x1,3))
            
                }
                )


 do.call(rbind,lapply(lst1,function(x) {datNew - cbind(SubID= 
as.numeric(row.names(x)), x); names(datNew)[-1] - var; datNew}))
})
res
}

 res1 - fun1(dat1,5)
lst2 - lapply(split(res1,col(res1)), function(x) {dat - do.call(cbind,x); 
colnames(dat) - c(SubID, rep(var,4));dat})

do.call(cbind,res1[,1])
do.call(cbind,res1[,2])
A.K.




I have a question about drawing samples from a data frame. This might 
sound really tricky. Let me use a data frame I have posted earlier as an 
example: 

    SubID    CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 
      1          6      5       6       2      6      2        2       4 
      2          6      4       7       2      6      6        2       3 
      3          5      5       5       5      5      5        4       5 
      4          5      4       3       4      4      4        5       2 
      5          5      6       7       5      6      4        4       1 
      6          5      4       3       6      4      3        7       3 
      7          3      6       6       3      6      5        2       1 
      8          3      6       6       3      6      5        4       7 

this data frame have two sets of variables. each set simply 
represent one scale. as shown above, the first scale, say CSE, consists 
of four items: CSE1, CSE2, CSE3, and CSE4, whereas the second scale, say
 WSE, also has four items: WSE1, WSE2, WSE3, WSE4. 
the leftmost column lists the subjects' ID. 

I wanna create a new data frame through sampling random numbers 
from the data frame above. Below is the structure of the new data frame. 

    SubID    var    var   var     var 
      s          c      c      c       c       
      s          c      c      c       c       
      s          c      w     w       w       
      s          c      w     w       w           
      s          c      w     w       w         
      s          c      w     w       w         
      s          c      w     w       w         
      s          c      w     w       w 

in the new data frame: 
  
s= SubID range from 1 to 8 
var= variables 
c=CSE numbers 
w=WSE numbers 

some rules to construct the new data frame: 

1. the top two rows have to be filled with CSE numbers; the 
numbers in the cells of each row should be randomized. for example, if 
the first row is an array of numbers from subject 4, they can follow the
 order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4). Also, the numbers in the
 second row does not have to follow the order of the first row. for 
example, similarly, if the first row is an array of numbers from subject
 4 in the order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4), numbers in the 
second row (assuming it is from subject 8) does not have to be 6(CSE2), 
3(CSE1), 6(CSE3), and 3(CSE4). numbers in these two rows should be drawn
 without replacement. 

2. each of the rest of the rows should include a CSE number in 
the leftmost cell and three WSE numbers on the right. At the same time, 
in each row, the three WSE numbers on the right have to be only those 
numbers that are not corresponding to the CSE number in the leftmost 
cell. For example, if the CSE number in the leftmost cell is 4, a CSE2 
number from subject 6, the three WSE numbers on the right side can only 
be 4(WSE1), 7(WSE3), and 3(WSE4) from subject 6. 

3. the numbers in each row can only be drawn from the same 
subject. Also, Subjects should be randomized. Specifically, they does 
have to be in the following order: 

 SubID     
      1         
      2           
      3         
      4           
      5           
      6           
      7           
      8     
      
they can be: 

 SubID     
      2         
      8           
      5         
      4           
      1           
      6           
      7           
      3 
4. repeat the whole process 1000 times to draw 1000 random samples 

Any ideas?  Thanks in advance!! :)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Sampling Weights and lmer() update?

2013-05-14 Thread Thomas Lumley
Arguably you are looking in the wrong place (there's a special mixed-models
mailing list for R), but I can answer the question.

No.

At least, there's nothing in lme4, and I haven't done anything (since I
want a more general solution than Stata and MLWiN implement) and I'd be
surprised if someone else had done it.

   -thomas


On Tue, May 14, 2013 at 3:35 PM, Richard Blissett rsl.bl...@gmail.comwrote:

 Perhaps I am not looking in the right place, but I am looking for a way to
 use lmer() to run a multilevel model that incorporates sampling weights. I
 have used the Lumley survey package to use sampling weights in the past,
 but according to post I found online from Thomas Lumley in mid-2012, R is
 currently not equipped to be able to do this.

 His post is here:

 http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947p4632955.html

 Does anyone know if there has been an update since then to be able to do
 this, or if there's another way to go about doing this in R? Otherwise, I
 am thinking that I will have to move my data over to Stata and try to run
 the multilevel models there.

 Richard

 --
 Richard Blissett

 Eco-Tip: Before printing, please consider whether you really need to have
 this email on paper.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling data without having infinite numbers after diong a transformation

2012-12-25 Thread Jeff Newmiller
Perhaps you should read the help file for rnorm more carefully.

?rnorm

Keep in mind that the normal probability distribution is a density function, so 
the smaller the standard deviation is, the greater the magnitude of the density 
function is. 
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Agnes Ayang agnes.ay...@yahoo.com wrote:

Hello R-helpers..

I want to ask about how I can sample data sets without having the
infinite numbers coming out. For example,

set.seed(1234)

a-rnorm(15,0,1)
b-rnorm(15,0,1)
c-rnorm(15,0,1)
d-rnorm(15,0,36)

After come out with the sample, I need to do a transformation  (by
Hoaglin, 1985) for each data set. Actually I need to measure the
skewness and kurtosis, that's why I need to do the transformation.
After transformation, there will be 'Inf' value in my data sets and I
cannot proceed with the next step where I need to compute the trimmed
mean and sum square of deviation.

If anyone can help on how to obtain a better data sets so that my
programme will work. Thank you.

Best regards,
Hyo Min
UPM Malaysia

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from a Population

2012-12-08 Thread R. Michael Weylandt
Hi Lorenzo,

This has the feel of a homework problem, but I will suggest to you
that this is sampling without replacement and there exist easy
mathematical formulas (no need to resort to R) to calculate your
desired probability.

Michael

On Sat, Dec 8, 2012 at 11:54 AM, Lorenzo Isella
lorenzo.ise...@gmail.com wrote:
 Dear All,
 I hope this is not too off topic, but I am sure it has to be a one-liner in
 R.
 Suppose you have a population of size N and that you take a random sample of
 n_s individuals out of this population.
 This population includes a subgroup of n_i individuals.
 For any individual in n_i, what is the probability of being included in the
 sample n_s?
 Many thanks.

 Lorenzo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling weights for multilevel models

2012-06-10 Thread Rui Barradas

Hello,

The link you've posted is to a page that does NOT have a dataset, it has 
links to other pages. The proper way of posting a data example would be


# paste the output of this in a post
dput(head(yourdata, 20))  # or 30


Now, if I understand your question, function sample() does have a 
weights argument, 'prob'. (Package base.) See


help(sample)

Hope this helps,

Rui Barradas

Em 10-06-2012 20:00, Tamara escreveu:

Dear all,

I am struggling with a problem which I have been reading on the forums about
and it did not seem to me that there is a precise answer to my question.
However, I still hope there is one.

I am working with  http://timss.bc.edu/ PIRLS   data and trying to conduct
multilevel analysis. There are different weights for each level of analysis
in the PIRLS dataset (e.g.  there is a school weight, class weight, student
weight).
Is there a function in R which would let me use different weights for
different levels of my model?
If yes, which package contains it?

I would be very grateful for any help!






--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling weights for multilevel models

2012-06-10 Thread Thomas Lumley
On Mon, Jun 11, 2012 at 7:00 AM, Tamara petrova.t...@gmail.com wrote:
 Dear all,

 I am struggling with a problem which I have been reading on the forums about
 and it did not seem to me that there is a precise answer to my question.
 However, I still hope there is one.

 I am working with  http://timss.bc.edu/ PIRLS   data and trying to conduct
 multilevel analysis. There are different weights for each level of analysis
 in the PIRLS dataset (e.g.  there is a school weight, class weight, student
 weight).
 Is there a function in R which would let me use different weights for
 different levels of my model?
 If yes, which package contains it?


As far as I know there is no function that does what you want.   In
particular, lme() and lmer() don't work correctly with sampling
weights.

It does depend on why you want a multilevel model.  If you are
primarily interested in the mean model and the variance components are
just needed to get appropriate standard errors, then you can use the
svyglm() function in the survey package to fit a linear regression
with appropriate standard errors.   On the other hand, if you are
interested in estimating the variance components for their own sake,
you need some other software.

I do have longer-term plans to add multilevel modelling capabilities
to the survey package, but it's harder than it may appear.


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling weights for multilevel models

2012-06-10 Thread Tamara
Thank you very much, Rui!
But I am afraid that I won't be able to use this function for multilevel
analysis, as unfortunately I don't see how exactly I will combine it with
functions in the R packages for multilevel analysis .

--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947p4632957.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling weights for multilevel models

2012-06-10 Thread Tamara
Thank you very much, Thomas!

As I need to estimate the variance components, I will most probably have to
switch from R to HLM or Mplus to apply different weights to different
levels.
Although I prefer R in general.


--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947p4632962.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows from a list

2012-04-02 Thread Bert Gunter
??
Something like:

lapply(mydata, function(x){
 nr - nrow(x)
 x[sample(seq_len(nr),nr,rep=TRUE),]
})

maybe. The idea is to use the sampled rows as your row index.

-- Bert


On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote:
 Hi:

 I'm sure this seems like a rudimentary question, but I am not well versed
 with R syntax for lists.  I have a ragged array from which I've removed
 records (entire rows) with missing data.  The functions I used to remove the
 missing cases resulted in the generation of an R list class object, that
 looks something like this;

 mydata
 [[1]]
     [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    4    5    6
 [3,]    7    8    9

 [[2]]
     [,1] [,2] [,3]
 [1,]   10   11   12
 [2,]   13   14   15

 [[3]]
     [,1] [,2] [,3]
 [1,]   16   17   18
 [2,]   19   20   21
 [3,]   22   23   24
 [4,]   25   26   27
 [5,]   28   29   30

 Part1
 What I would like to do is draw an equal number of random row samples
 from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3].

 Part2
 Then I would like to cocerce the list object into something like an array.

 Help scripting out part 1 or 2 would be much appreciated.

 Brian Campbell




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows from a list

2012-04-02 Thread Justin Haynes
## recreating your data
mydata-list(matrix(1:9, nrow=3, byrow=T),
  matrix(10:15, nrow=2, byrow=T),
  matrix(16:30, nrow=5, byrow=T))

## get the shortest matrix in your list
n - min(unlist(lapply(mydata, nrow)))

## subset the list into random samples of length n
out - lapply(mydata, function(x, n) x[sample(1:nrow(x), n),], n=n)
## this  structure is still a list though...

## converting directly to an array:
out.array - array(unlist(out), dim=c(dim(out[[1]]), length(out)))

not totally sure about what structure you're wanting in the last step,
so if i missed i apologize...

Hope that helps,

Justin


On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote:
 Hi:

 I'm sure this seems like a rudimentary question, but I am not well versed
 with R syntax for lists.  I have a ragged array from which I've removed
 records (entire rows) with missing data.  The functions I used to remove the
 missing cases resulted in the generation of an R list class object, that
 looks something like this;

 mydata
 [[1]]
     [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    4    5    6
 [3,]    7    8    9

 [[2]]
     [,1] [,2] [,3]
 [1,]   10   11   12
 [2,]   13   14   15

 [[3]]
     [,1] [,2] [,3]
 [1,]   16   17   18
 [2,]   19   20   21
 [3,]   22   23   24
 [4,]   25   26   27
 [5,]   28   29   30

 Part1
 What I would like to do is draw an equal number of random row samples
 from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3].

 Part2
 Then I would like to cocerce the list object into something like an array.

 Help scripting out part 1 or 2 would be much appreciated.

 Brian Campbell




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-08 Thread Oritteropus
Hi, thank you but it does work for vectors and matrix but not dataframes, it
gives me this message error:

MeanA - read.csv(MeanAmf.csv,header=T)
mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),]
remainder-MeanA[-mysample]
Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list'
In Ops.factor(left) : - not meaningful for factors

Any other way?

--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4455912.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-08 Thread Oritteropus
Hi sarah, it is not clear to me how to do that, can you show me please?

Imagine I have a situation like this:

MeanA - read.csv(MeanAmf.csv,header=T)
mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),]

Then?


--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4455921.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-08 Thread Petr PIKAL
Hi

I have only faint idea what was you problem as there is no context in you 
message but maybe

remainder-MeanA[-mysample, ]

could work.

Regards
Petr

 
 Hi, thank you but it does work for vectors and matrix but not 
dataframes, it
 gives me this message error:
 
 MeanA - read.csv(MeanAmf.csv,header=T)
 mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),]
 remainder-MeanA[-mysample]
 Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list'
 In Ops.factor(left) : - not meaningful for factors
 
 Any other way?
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Sampling-
 problems-tp4453752p4455912.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-08 Thread Petr PIKAL
 
 Hi, thank you but it does work for vectors and matrix but not 
dataframes, it
 gives me this message error:
 
 MeanA - read.csv(MeanAmf.csv,header=T)
 mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),]

Well, maybe slight correction

mysample - sample(1:nrow(MeanA), 20, replace=FALSE)
chosen.one-MeanA[mysample,]
remainder-MeanA[-mysample,]

Regards
Petr

 remainder-MeanA[-mysample]
 Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list'
 In Ops.factor(left) : - not meaningful for factors
 
 Any other way?
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Sampling-
 problems-tp4453752p4455912.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-08 Thread Oritteropus
Thanks, but it doesn't work either, it gives me the same message error. 
It works just if my first sample is taken in this way:

mysample - sample(1:nrow(MeanA), 20, replace=FALSE)

However, in this way it sample just the number of rows:
 [1] 71 24 12 36  2 39 69 62 43 38  9 44 13 54 50 63 67 66 37 28

but not the data inside.  I need to sample in this way:

mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] 

to get a sample like this

HRkmMean.mf Mean.mfm Loc Diet Terr
Soc Type Soc.Ter W.cat.0.25 W.cat.0.5
-2.49-0.432.57   A  
 
OT   S   D  
   
TS  b
23 -2.050.67   T
   
CN   SD 

NS   A

This is an example of my dataframe

--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456048.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-08 Thread R. Michael Weylandt
Please use dput() to give a reproducible example: I can make this work
on a data frame quite easily --

x - data.frame(1:10, letters[1:10], rnorm(10))
str(x)
print(x)
x[sample(nrow(x), 5), ]

So it's not a problem with something being a data frame or having factors.

Michael

On Thu, Mar 8, 2012 at 5:16 AM, Oritteropus lucasantin...@hotmail.com wrote:
 Thanks, but it doesn't work either, it gives me the same message error.
 It works just if my first sample is taken in this way:

 mysample - sample(1:nrow(MeanA), 20, replace=FALSE)

 However, in this way it sample just the number of rows:
  [1] 71 24 12 36  2 39 69 62 43 38  9 44 13 54 50 63 67 66 37 28

 but not the data inside.  I need to sample in this way:

 mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),]

 to get a sample like this

 HRkm        Mean.mf         Mean.mfm         Loc         Diet         Terr
 Soc         Type         Soc.Ter         W.cat.0.25         W.cat.0.5
 -2.49                -0.43                2.57                       A
 O                T                       S                   D
 TS                          b
 23                     -2.05                0.67                       T
 C                N                       S                    D
 NS                       A

 This is an example of my dataframe

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456048.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-07 Thread Sarah Goslee
You could make a vector containing the number of TRUE values that
makes up 80% of your data, and the number of FALSE values that makes
up 20% of your data. Use sample() to reorder it, then use it to divide
your dataset.

If you had provided a reproducible example, I could write you code.

Sarah

On Wed, Mar 7, 2012 at 11:41 AM, Oritteropus lucasantin...@hotmail.com wrote:
 Hi,
 I need to sample randomly my dataset for 1000 times. The sample need to be
 the 80%. I know how to do that, my problem is that not only I need the 80%,
 but I also need the corresponding 20% each time. Is there any way to do
 that?
 Alternatively, I was thinking to something like setdiff () function to
 compare my 80% sample to the original dataset and obtain the corresponding
 20%, unfortunately setdiff works just for vectors, do you know a similar
 function for dataframes?
 Thanks


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-07 Thread Petr Savicky
On Wed, Mar 07, 2012 at 08:41:35AM -0800, Oritteropus wrote:
 Hi,
 I need to sample randomly my dataset for 1000 times. The sample need to be
 the 80%. I know how to do that, my problem is that not only I need the 80%,
 but I also need the corresponding 20% each time. Is there any way to do
 that?

Hi.

If you use sample() to get the 80% and store the indices, you
can also get the remaining cases

  a - matrix(1:30, ncol=3)
  i - sample(10, 8)
  a[sort(i), ]

   [,1] [,2] [,3]
  [1,]1   11   21
  [2,]2   12   22
  [3,]3   13   23
  [4,]4   14   24
  [5,]6   16   26
  [6,]7   17   27
  [7,]8   18   28
  [8,]   10   20   30

  a[-i, ]

   [,1] [,2] [,3]
  [1,]5   15   25
  [2,]9   19   29

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problems

2012-03-07 Thread David Winsemius


On Mar 7, 2012, at 11:41 AM, Oritteropus wrote:


Hi,
I need to sample randomly my dataset for 1000 times. The sample need  
to be
the 80%. I know how to do that, my problem is that not only I need  
the 80%,
but I also need the corresponding 20% each time. Is there any way to  
do

that?
Alternatively, I was thinking to something like setdiff () function to
compare my 80% sample to the original dataset and obtain the  
corresponding
20%, unfortunately setdiff works just for vectors, do you know a  
similar

function for dataframes?


Create an index vector with runif or sample and then use that to get  
you sample and use negative indexing to get the remainder.


idx - sample(1:1000, 800)
x[ idx, ]  # 80%
x[ -idx, ] # the other 20%

(I think this does presume you have not mucked with the default  
rownames.)



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with Constraints for testing and training data

2012-01-25 Thread Eliano
Hi People, 

Does anyone have a good solution for this problem: 

a database called DB. 


index - sample(1:nrow(DB), size=0.2*nrow(BD)) 
test - DB[index,] 
train - DB[-index,] 

One of the variables in this database contais a target variable with two
values 0 and 1. 

Imagine now that i want to constraint the test data frame so the 20% of the
size of test has 50% of DB$target. 

Imagine: n=100 
DB$target = { 0=80 
   1=20} 

test=20 and contain 10 random values of DB$target=1 and 10 random values of
DB$target=0. 



Many Thanks, 
Eliano 



--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-with-Constraints-for-testing-and-training-data-tp4325530p4327028.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with Constraints for testing and training data

2012-01-25 Thread Petr Savicky
On Wed, Jan 25, 2012 at 04:00:27AM -0800, Eliano wrote:
 Hi People, 
 
 Does anyone have a good solution for this problem: 
 
 a database called DB. 
 
 
 index - sample(1:nrow(DB), size=0.2*nrow(BD)) 
 test - DB[index,] 
 train - DB[-index,] 
 
 One of the variables in this database contais a target variable with two
 values 0 and 1. 
 
 Imagine now that i want to constraint the test data frame so the 20% of the
 size of test has 50% of DB$target. 
 
 Imagine: n=100 
 DB$target = { 0=80 
1=20} 
 
 test=20 and contain 10 random values of DB$target=1 and 10 random values of
 DB$target=0. 

Hi.

One way is as follows.

  t0 - which(DB$target==0)
  t1 - which(DB$target==1)
  m - round(0.1*nrow(DB))
  stopifnot(length(t0) = m  length(t1) = m)
  index - c(sample(t0, size=m), sample(t1, size=m))

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling weights in package lme4

2012-01-24 Thread Thomas Lumley
On Tue, Jan 24, 2012 at 6:19 PM, Mohd masood masood0...@rediffmail.com wrote:

 Dear All
 I am trying to include sampling weights in multilavel regression analysis 
 using packege lme4 using following codes

 print(fm1 lt;- lmer(DC~sex+age+smoker+alcohol+fruits(1|setting), 
 dataset,REML = FALSE), corr = FALSE)
 print(fm2 lt;- lmer(DC~sex+age+smoker+alcohol+fruits(1|setting), 
 dataset,REML = FALSE), corr = FALSE,weights=sweight)
 The problem is both the codesnbsp;givingnbsp;me exactly the same results.is 
 this weights not meant for sampling weights?if not, how can i include 
 sampling weights in lme4?

It's not meant for sampling weights.  It's meant for precision
weights.  How best to include sampling weights in mixed models is a
research problem at the moment, but you can rely on getting the wrong
answer if you just use the weights= argument.

  -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling weights in package lme4

2012-01-24 Thread peter dalgaard

On Jan 24, 2012, at 20:41 , Thomas Lumley wrote:

 It's not meant for sampling weights.  It's meant for precision
 weights.  How best to include sampling weights in mixed models is a
 research problem at the moment, but you can rely on getting the wrong
 answer if you just use the weights= argument.
 
  -thomas

Fortune nomination!

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling data every third hour

2011-12-14 Thread R. Michael Weylandt
1) Use dput() to submit data.

2) Would this work? (It requires your data are evenly spaced, but I
think that's it) d[seq(1, nrow(d), by = 3), ]

Michael

On Wed, Dec 14, 2011 at 7:17 AM, abcdef ghijk lineh...@yahoo.com wrote:
  Good Morning ,

 I want to sample the following time series for every third hour. For example 
 at 00:00,03:00,06:00,09:00 etc.


 2011-01-01 00:00:00 0.00e+00
 2011-01-01 01:00:00 1.471667e+01
 2011-01-01 02:00:00 1.576667e+01
 2011-01-01 03:00:00 0.00e+00
 2011-01-01 04:00:00 0.00e+00
 2011-01-01 05:00:00 0.00e+00
 2011-01-01 06:00:00 0.00e+00
 2011-01-01 07:00:00 0.00e+00
 2011-01-01 08:00:00 0.00e+00
 2011-01-01 09:00:00 1.826667e+01
 2011-01-01 10:00:00 0.00e+00
 2011-01-01 11:00:00 0.00e+00
 2011-01-01 12:00:00 0.00e+00
 2011-01-01 13:00:00 0.00e+00
 2011-01-01 14:00:00 0.00e+00
 2011-01-01 15:00:00 0.00e+00
 2011-01-01 16:00:00 0.00e+00
 2011-01-01 17:00:00 0.00e+00
 2011-01-01 18:00:00 0.00e+00
 2011-01-01 19:00:00 0.00e+00
 2011-01-01 20:00:00 0.00e+00
 2011-01-01 21:00:00 7.01e+01
 2011-01-01 22:00:00 7.154167e+02
 2011-01-01 23:00:00 2.039167e+02
 2011-01-02 00:00:00 3.703000e+02
 2011-01-02 01:00:00 9.130167e+02
 2011-01-02 02:00:00 0.00e+00
 2011-01-02 03:00:00 0.00e+00
 Thanks in advance.
 Regards,
 Shan

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread Ted Harding
[Yet another correction -- this one is important.
 I start from scratch this time]

On 07-Nov-11 22:22:54, SarahJoyes wrote:
 Hey everyone, 
 I am at best, an amateur user of R, but I am stuck on how
 to set-up the following situation. 
 I am trying to select a random sample of numbers from 0 to 10
 and insert them into the first column of a matrix (which will
 used later in a loop).
 However, I need to have those numbers add up to 10. How can
 I set those conditions?
 So far I have:
 n-matrix(0,nr=5,ncol=10)
 for(i in 1:10){n[i,1]-sample(0:10,1)}
 How do I set-up the BUT sum(n[i,1])=10?
 Thanks
 SarahJ

Sarah, your example is confusing because you have set up a
matrix 'n' with 5 rows and 10 columns. But your loop cycles
through 10 rows!

However, assuming that your basic requirement is to sample
10 integers which add up to 10, consider rmultinom():

### Instead of: rmultinom(n=1,size=10,prob=(1:10)/10) ###
  rmultinom(n=1,size=10,prob=rep(1,10)/10)
  #  [,1]
  # [1,]1
  # [2,]0
  # [3,]2
  # [4,]3
  # [5,]1
  # [6,]1
  # [7,]0
  # [8,]0
  # [9,]1
  #[10,]1
rmultinom(n=1,size=10,prob=rep(1,10)/10)
  #  [,1]
  # [1,]2
  # [2,]0
  # [3,]1
  # [4,]1
  # [5,]2
  # [6,]2
  # [7,]1
  # [8,]0
  # [9,]1
  #[10,]0

This gives a uniform distribution over the positions in
the sample vector for the sampled integers, so that all
permutations are equally likely. For a non-uniform
distribution, vary 'prob'.

Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 08-Nov-11   Time: 08:13:36
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread SarahJoyes
Sorry about being confusing, I have so many loops in loops and ifelses that I
get mixed up sometimes, it was just a typo, it was supposed to be for(i in
1:5) Sorry, 
Thanks for  you help!
SJ

--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4016058.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread SarahJoyes
That is exactly what I want, and it's so simple! 
Thanks so much!

--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4016050.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of SarahJoyes
 Sent: Tuesday, November 08, 2011 5:57 AM
 To: r-help@r-project.org
 Subject: Re: [R] Sampling with conditions
 
 That is exactly what I want, and it's so simple!
 Thanks so much!
 

Sarah,

I want to point out that my post was qualified by something like.  I am not 
sure it is exactly what you want.  Since you didn't quote my post, let me show 
my suggestion and then express my concern.

n - matrix(0,nrow=5, ncol=10)
repeat{
  c1 - sample(0:10, 4, replace=TRUE)
  if(sum(c1) = 10) break
}
n[,1] - c(c1,10-sum(c1))
n

This nominally meets your criteria, but it will tend to result in larger digits 
being under-represented.  For example, you unlikely to get a result like 
c(0,8,0,0,2) or (9,0,0,1,0).

That may be OK for your purposes, but I wanted to point it out.

You could use something like 

n - matrix(0,nrow=5, ncol=10)
c1 - rep(0,4)
for(i in 1:4){
  upper - 10-sum(c1)
  c1[i] - sample(0:upper, 1, replace=TRUE)
  if(sum(c1) == 10) break
}
n[,1] - c(c1,10-sum(c1))
n

if that would suit your purposes better.


Good luck,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread Dennis Murphy
In addition to Dan's quite valid concern,  the final sample is not
truly 'random' - the first k - 1 elements are randomly chosen, but the
last is determined so that the constraint is met.

Dennis

On Tue, Nov 8, 2011 at 9:59 AM, Nordlund, Dan (DSHS/RDA)
nord...@dshs.wa.gov wrote:
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of SarahJoyes
 Sent: Tuesday, November 08, 2011 5:57 AM
 To: r-help@r-project.org
 Subject: Re: [R] Sampling with conditions

 That is exactly what I want, and it's so simple!
 Thanks so much!


 Sarah,

 I want to point out that my post was qualified by something like.  I am not 
 sure it is exactly what you want.  Since you didn't quote my post, let me 
 show my suggestion and then express my concern.

 n - matrix(0,nrow=5, ncol=10)
 repeat{
  c1 - sample(0:10, 4, replace=TRUE)
  if(sum(c1) = 10) break
 }
 n[,1] - c(c1,10-sum(c1))
 n

 This nominally meets your criteria, but it will tend to result in larger 
 digits being under-represented.  For example, you unlikely to get a result 
 like c(0,8,0,0,2) or (9,0,0,1,0).

 That may be OK for your purposes, but I wanted to point it out.

 You could use something like

 n - matrix(0,nrow=5, ncol=10)
 c1 - rep(0,4)
 for(i in 1:4){
  upper - 10-sum(c1)
  c1[i] - sample(0:upper, 1, replace=TRUE)
  if(sum(c1) == 10) break
 }
 n[,1] - c(c1,10-sum(c1))
 n

 if that would suit your purposes better.


 Good luck,

 Dan

 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA 98504-5204


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread SarahJoyes
Dan 
Nordlund, Dan (DSHS/RDA) wrote:
 
 -Original Message-
 From: r-help-bounces@ [mailto:r-help-bounces@r-
 project.org] On Behalf Of SarahJoyes
 Sent: Tuesday, November 08, 2011 5:57 AM
 To: r-help@
 Subject: Re: [R] Sampling with conditions
 
 That is exactly what I want, and it's so simple!
 Thanks so much!
 
 
 Sarah,
 
 I want to point out that my post was qualified by something like.  I am
 not sure it is exactly what you want.  Since you didn't quote my post, let
 me show my suggestion and then express my concern.
 
 n - matrix(0,nrow=5, ncol=10)
 repeat{
   c1 - sample(0:10, 4, replace=TRUE)
   if(sum(c1) = 10) break
 }
 n[,1] - c(c1,10-sum(c1))
 n
 
 This nominally meets your criteria, but it will tend to result in larger
 digits being under-represented.  For example, you unlikely to get a result
 like c(0,8,0,0,2) or (9,0,0,1,0).
 
 That may be OK for your purposes, but I wanted to point it out.
 
 You could use something like 
 
 n - matrix(0,nrow=5, ncol=10)
 c1 - rep(0,4)
 for(i in 1:4){
   upper - 10-sum(c1)
   c1[i] - sample(0:upper, 1, replace=TRUE)
   if(sum(c1) == 10) break
 }
 n[,1] - c(c1,10-sum(c1))
 n
 
 if that would suit your purposes better.
 
 
 Good luck,
 
 Dan
 
 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA 98504-5204
 
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Perhaps a little bit of context may be helpful, 
I am trying to figure out the ideal age structure for a population of ten
individuals that would yield the best overall survival rate given that each
age group has different survivorbility and different reproductive rates. 
So yes, having a bias for smaller numbers would be a problem. The only other
problem that I see with your revised code is that there will be a bias
towards having higher numbers in the first age group or first row of the
column...
The other idea I was playing with was to create a series of ifelse
statements for each row of the column...
Something like:
n-matrix(0,nr=5,ncol=10)
n[1,1]-sample(0:10,1)
n[2,1]-ifelse(n[1,1]=10,0,sample(0:10,1))
n[3,1]-ifelse(sum(n[i,1])10,0,sample(0:10,1))
etc...
I still think that might be biased towards high numbers in the first rows
though...
hmmm
SJ



--
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4017351.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-07 Thread Weidong Gu
Not sure this is valid that you can have 9 random samples out of 10,
but the last one has to be fixed to meet the restraint, sum=10.

Weidong

On Mon, Nov 7, 2011 at 5:22 PM, SarahJoyes sjo...@uoguelph.ca wrote:
 Hey everyone,
 I am at best, an amateur user of R, but I am stuck on how to set-up the
 following situation.
 I am trying to select a random sample of numbers from 0 to 10 and insert
 them into the first column of a matrix (which will used later in a loop).
 However, I need to have those numbers add up to 10. How can I set those
 conditions?
 So far I have:
 n-matrix(0,nr=5,ncol=10)
 for(i in 1:10){n[i,1]-sample(0:10,1)}
 How do I set-up the BUT sum(n[i,1])=10?
 Thanks
 SarahJ

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4014036.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-07 Thread Daniel Nordlund
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of SarahJoyes
 Sent: Monday, November 07, 2011 2:23 PM
 To: r-help@r-project.org
 Subject: [R] Sampling with conditions
 
 Hey everyone,
 I am at best, an amateur user of R, but I am stuck on how to set-up the
 following situation.
 I am trying to select a random sample of numbers from 0 to 10 and insert
 them into the first column of a matrix (which will used later in a loop).
 However, I need to have those numbers add up to 10. How can I set those
 conditions?
 So far I have:
 n-matrix(0,nr=5,ncol=10)
 for(i in 1:10){n[i,1]-sample(0:10,1)}
 How do I set-up the BUT sum(n[i,1])=10?
 Thanks
 SarahJ
 

Sarah,

Does something like this do what you want?

n - matrix(0,nrow=5, ncol=10)
repeat{
  c1 - sample(0:10, 4, replace=TRUE)
  if(sum(c1) = 10) break
}
n[,1] - c(c1,10-sum(c1))
n


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-07 Thread Ted Harding
On 07-Nov-11 22:22:54, SarahJoyes wrote:
 Hey everyone, 
 I am at best, an amateur user of R, but I am stuck on how
 to set-up the following situation. 
 I am trying to select a random sample of numbers from 0 to 10
 and insert them into the first column of a matrix (which will
 used later in a loop).
 However, I need to have those numbers add up to 10. How can
 I set those conditions?
 So far I have:
 n-matrix(0,nr=5,ncol=10)
 for(i in 1:10){n[i,1]-sample(0:10,1)}
 How do I set-up the BUT sum(n[i,1])=10?
 Thanks
 SarahJ

Sarah, your example is confusing because you have set up a
matrix 'n' with 5 rows and 10 columns. But your loop cycles
through 10 rows!

However, assuming that your basic requirement is to sample
10 integers which add up to 10, consider rmultinom():

  rmultinom(n=1,size=10,prob=(1:10)/10)
  #  [,1]
  # [1,]1
  # [2,]0
  # [3,]2
  # [4,]0
  # [5,]1
  # [6,]1
  # [7,]2
  # [8,]0
  # [9,]1
  #[10,]2
  rmultinom(n=1,size=10,prob=(1:10)/10)
  #  [,1]
  # [1,]0
  # [2,]0
  # [3,]0
  # [4,]0
  # [5,]1
  # [6,]1
  # [7,]2
  # [8,]1
  # [9,]2
  #[10,]3

This gives each integer in (0:10) equal chances of being
in the sample. For unequal chances, vary 'prob'.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 08-Nov-11   Time: 00:25:54
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-07 Thread Ted Harding
[Correction below (I was writing too late at night ... ]

On 08-Nov-11 00:25:57, Ted Harding wrote:
 On 07-Nov-11 22:22:54, SarahJoyes wrote:
 Hey everyone, 
 I am at best, an amateur user of R, but I am stuck on how
 to set-up the following situation. 
 I am trying to select a random sample of numbers from 0 to 10
 and insert them into the first column of a matrix (which will
 used later in a loop).
 However, I need to have those numbers add up to 10. How can
 I set those conditions?
 So far I have:
 n-matrix(0,nr=5,ncol=10)
 for(i in 1:10){n[i,1]-sample(0:10,1)}
 How do I set-up the BUT sum(n[i,1])=10?
 Thanks
 SarahJ
 
 Sarah, your example is confusing because you have set up a
 matrix 'n' with 5 rows and 10 columns. But your loop cycles
 through 10 rows!
 
 However, assuming that your basic requirement is to sample
 10 integers which add up to 10, consider rmultinom():
 
   rmultinom(n=1,size=10,prob=(1:10)/10)
   #  [,1]
   # [1,]1
   # [2,]0
   # [3,]2
   # [4,]0
   # [5,]1
   # [6,]1
   # [7,]2
   # [8,]0
   # [9,]1
   #[10,]2
   rmultinom(n=1,size=10,prob=(1:10)/10)
   #  [,1]
   # [1,]0
   # [2,]0
   # [3,]0
   # [4,]0
   # [5,]1
   # [6,]1
   # [7,]2
   # [8,]1
   # [9,]2
   #[10,]3
 
 This gives each integer in (0:10) equal chances of being
 in the sample. For unequal chances, vary 'prob'.
 
 Hoping this helps,
 Ted.

That should have read:

  This gives a uniform distribution over the positions in
  the sample vector for the sampled integers, so that all
  permutations are equally likely. For a non-uniform
  distribution, vary 'prob'.

Sorry,
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 08-Nov-11   Time: 07:40:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling from the multivariate truncated normal

2011-07-05 Thread statfan

Well, for 0.828324  x[2]  Inf the probablility is roughly 0 hence not 
easy to draw random numbers out there  

Uwe Ligges

How is this probability roughly 0?

--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-from-the-multivariate-truncated-normal-tp3626438p3647039.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling from the multivariate truncated normal

2011-06-27 Thread Uwe Ligges



On 26.06.2011 21:26, statfan wrote:

I am trying generate a sample for a truncated multivariate normal
distribution via the rtmvnorm function in the  {tmvtnorm} package.

Why does the following produce NaNs?


rtmvnorm(1, mean = rep(0, 2), matrix(c(0.06906084, -0.07463565, -0.07463565,
0.08078086),2),c(-0.4316738,  0.8283240),  c(Inf,Inf), algorithm=gibbsR,
burn.in.samples=100)


Well, for 0.828324  x[2]  Inf the probablility is roughly 0 hence not 
easy to draw random numbers out there  


Uwe Ligges





Thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-from-the-multivariate-truncated-normal-tp3626438p3626438.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling design runs with no errors but returns empty data set

2011-03-30 Thread Thomas Lumley
On Thu, Mar 31, 2011 at 4:01 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I'm working with the 2008 Canada Election Studies 
 (http://www.queensu.ca/cora/_files/_CES/CES2008.sav.zip), trying to construct 
 a weighted national sample using the survey package.
 Three weights are included in the national survey (a household weight,  a 
 provincial weight and a national weight which is a product of the first two).
 In the following code I removed variables with missing national weights and 
 tried to construct the sample from advice I've gleaned from the documentation 
 for the survey package and other help requests.
 There are no errors, but the data frame (weight_test) contains no
 What am I missing?
 Yours, Simon Kiss
 P.S. The code is only reproducible if the data set is downloadable.  I'm nt 
 sure

 ces-read.spss(file.choose(), to.data.frame=TRUE, use.value.labels=FALSE)
 missing_data-subset(ces1, !is.na(ces08_NATWGT))
 weight_test-svydesign(id=~0, weights=~ces08_NATWGT, data=missing_data)


The code isn't reproducible even with the data.  The code refers to a
data frame ces1, which isn't defined, and to a variable ces08_NATWGT
that isn't in the data set.

However, a bit of Googling suggests that the variable CES08_NA is
probably the one you mean, giving the following code

library(survey)
library(foreign)
ces-read.spss(CES2008.sav, to.data.frame=TRUE, use.value.labels=FALSE)

missing_data-subset(ces, !is.na(CES08_NA))
weight_test-svydesign(id=~0, weights=~CES08_NA, data=missing_data)

which seems to produce a perfectly reasonable survey design object.

 weight_test
Independent Sampling design (with replacement)
svydesign(id = ~0, weights = ~CES08_NA, data = missing_data)
 dim(weight_test)
[1] 3257  531
 svymean(~factor(GENDER),weight_test)
   mean   SE
factor(GENDER)1 0.47362 0.01
factor(GENDER)5 0.52638 0.01


Since you don't say how you concluded the object contained no, I don't
know what you were seeing.  Note that weight_test is not supposed to
be a data frame. It's a survey design object.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling

2011-02-17 Thread Mohamed Lajnef
Hi ,

what about split function ?
?split  divided x into 2 data.frame

a-split(x,1:2)

a[[1]] first data frame
a[[2]] second data frame

regrads
M




Le 17/02/11 05:35, yf a écrit :
 I want to sample from the ID. For each ID, i want to have 2 set of data. I
 try the sample() function but it didn't work.

 x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23))
 x
 id v1 V2
 1   1  1 12
 2   1  2 13
 3   1  3 14
 4   2  4 15
 5   2  5 16
 6   2  6 17
 7   2  7 18
 8   3  8 19
 9   3  9 20
 10  3 10 21
 11  4 11 22
 12  4 12 23


-- 

Mohamed Lajnef,IE INSERM U955 eq 15#
Pôle de Psychiatrie#
Hôpital CHENEVIER  #
40, rue Mesly  #
94010 CRETEIL Cedex FRANCE #
mohamed.laj...@inserm.fr   #
tel : 01 49 81 31 31 (poste 18467) #
Sec : 01 49 81 32 90   #
fax : 01 49 81 30 99   #




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling

2011-02-17 Thread David Winsemius


On Feb 16, 2011, at 11:35 PM, yf wrote:



I want to sample from the ID. For each ID, i want to have 2 set of  
data. I

try the sample() function but it didn't work.


You don't say _how_ you used the sample function. You should show what  
code you used when stating the _something_ doesn't work.


Sample returns a vector of items from objects where length()  
represents some sensible notion. It does not sample a complex object  
such as a dataframe. For dataframes, length is the number of columns,  
which doesn't agree very well with most people's notion of cases from  
which to sample.  For selection of rows of a dataframes you need to  
first create a vector of numeric indices and then use that with [


idx - sample(nrow(x), nrow(x)/2)
# A random split
x[  idx, ]
x[ -idx, ]




x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23))
x

  id v1 V2
1   1  1 12
2   1  2 13
3   1  3 14
4   2  4 15
5   2  5 16
6   2  6 17
7   2  7 18
8   3  8 19
9   3  9 20
10  3 10 21
11  4 11 22
12  4 12 23
--
View this message in context: 
http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling

2011-02-17 Thread yf

But i need for each id have two data. 
Like...
 x
   id v1 V2
1   1  1 12
2   1  2 13

4   2  4 15
5   2  5 16


8   3  8 19
9   3  9 20

11  4 11 22
12  4 12 23

So should write sample( if sample id 2  ,2). I don't know how to write  (if
sample id 2). Thanks.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sampling-tp3310184p3311253.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling

2011-02-17 Thread andrija djurovic
This is, maybe, not the best solution but I hope it will help you:

x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23))

do.call(rbind,by(x,x$id,function(x) x[c(sample(nrow(x),2)),]))

Andrija

On Thu, Feb 17, 2011 at 6:39 PM, yf chang...@umn.edu wrote:


 But i need for each id have two data.
 Like...
  x
   id v1 V2
 1   1  1 12
 2   1  2 13

 4   2  4 15
 5   2  5 16


 8   3  8 19
 9   3  9 20

 11  4 11 22
 12  4 12 23

 So should write sample( if sample id 2  ,2). I don't know how to write
  (if
 sample id 2). Thanks.
 --
 View this message in context:
 http://r.789695.n4.nabble.com/sampling-tp3310184p3311253.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling

2011-02-17 Thread David Winsemius


On Feb 17, 2011, at 1:33 PM, andrija djurovic wrote:


This is, maybe, not the best solution but I hope it will help you:

x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23))

do.call(rbind,by(x,x$id,function(x) x[c(sample(nrow(x),2)),]))

Andrija



Another way (and note that by is just a wrppare for tapply):

 tapply(1:nrow(x), x$id, sample, 2)
$`1`
[1] 2 3

$`2`
[1] 5 4

$`3`
[1] 10  8

$`4`
[1] 11 12

 x[unlist( tapply(1:nrow(x), x$id, sample, 2) ), ]
   id v1 V2
2   1  2 13
3   1  3 14
5   2  5 16
6   2  6 17
9   3  9 20
8   3  8 19
12  4 12 23
11  4 11 22



On Thu, Feb 17, 2011 at 6:39 PM, yf chang...@umn.edu wrote:



But i need for each id have two data.
Like...

x

 id v1 V2
1   1  1 12
2   1  2 13

4   2  4 15
5   2  5 16


8   3  8 19
9   3  9 20

11  4 11 22
12  4 12 23

So should write sample( if sample id 2  ,2). I don't know how to  
write

(if
sample id 2). Thanks.
--
View this message in context:
http://r.789695.n4.nabble.com/sampling-tp3310184p3311253.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling

2011-02-17 Thread Dennis Murphy
Hi:

A couple more approaches to consider:

# Utility function to extract two rows from a data frame
# Meant to be applied to each data subset
sampler - function(d) if(nrow(d)  2) d[sample(1:nrow(d), 2, replace =
FALSE), ] else d

library(plyr)
 ddply(x, 'id', sampler)
  id v1 V2
1  1  2 13
2  1  1 12
3  2  4 15
4  2  6 17
5  3  8 19
6  3 10 21
7  4 11 22
8  4 12 23

library(data.table)
dtx - data.table(x, key = 'id')
 dtx[, sampler(.SD), by = 'id']
 id v1 V2
[1,]  1  1 12
[2,]  1  3 14
[3,]  2  5 16
[4,]  2  7 18
[5,]  3  9 20
[6,]  3 10 21
[7,]  4 11 22
[8,]  4 12 23


HTH,
Dennis


On Wed, Feb 16, 2011 at 8:35 PM, yf chang...@umn.edu wrote:


 I want to sample from the ID. For each ID, i want to have 2 set of data. I
 try the sample() function but it didn't work.

  x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23))
  x
   id v1 V2
 1   1  1 12
 2   1  2 13
 3   1  3 14
 4   2  4 15
 5   2  5 16
 6   2  6 17
 7   2  7 18
 8   3  8 19
 9   3  9 20
 10  3 10 21
 11  4 11 22
 12  4 12 23
 --
 View this message in context:
 http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from multi-dimensional kernel density estimation

2010-11-23 Thread Greg Snow
Generating new data from a kernel density estimate is equivalent to choosing a 
point from your data at random, then generating a point from your kernel 
centered at the chosen point.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Christoph Goebel
 Sent: Friday, November 19, 2010 1:56 PM
 To: r-help@r-project.org
 Subject: [R] Sampling from multi-dimensional kernel density estimation
 
 Hi,
 
 
 
 I'd like to use a three-dimensional dataset to build a kernel density
 and
 then sample from the distribution.
 
 
 
 I already used the npudens function in the np package to estimate the
 density and plot it:
 
 
 
 fit-npudens(~x+y+z)
 
 plot(fit)
 
 
 
 It takes some time but appears to work well.
 
 
 
 How can I use this to evaluate the fitted function at a certain point,
 e.g.
 (x=1, y=1, z=1)? Does R provide methods for sampling from the fitted
 function?
 
 
 
 Thanks,
 
 
 
 Christoph
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problem

2010-11-16 Thread wangwallace

Michael, I really appreciate your help.

but I got the following error message when I wan trying to run the function
written by you:

Error in out[i, ] - apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : 
  number of items to replace is not a multiple of replacement length

I am not quite sure why would this happen.

As a novice of R, these functions are kinda complex for me. I am wondering
if it is doable without using loops like that.

Again, thank you so much!!!  
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3044249.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problem

2010-11-16 Thread Michael Bedward
On 16 November 2010 16:10, wangwallace talentt...@gmail.com wrote:

 Michael, I really appreciate your help.

 but I got the following error message when I wan trying to run the function
 written by you:

 Error in out[i, ] - apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) :
  number of items to replace is not a multiple of replacement length

Did the data.frame or matrix you were sampling have the same general
form as the example you posted previously ?  Can you give me a small
example that causes the error ?

 I am not quite sure why would this happen.

 As a novice of R, these functions are kinda complex for me. I am wondering
 if it is doable without using loops like that.

I wasn't sure exactly what you wanted so the function was meant to be
general and easy to modify. It is often possible to use constructs
other than loops in R, though that doesn't mean the code will always
be either faster or clearer. But you'll need to describe your
requirements in more precise terms (short, clear examples are good)
for folks here to suggest methods.


 Again, thank you so much!!!

No worries. If you can provide an example that generates the error we
should be able to get further.

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problem

2010-11-16 Thread wangwallace

yes, the data.frame is exactly the same as the one I posted earlier. 

I was trying to see if the loop function works. And I got that message. here
below is the syntax I was trying to run, followed by the error message at
the end:
 sampleX-function(X,nGrp1,nsamples){if(nGrp1=4)stop(can't sample all
 group 1 variables)
+ out-matrix(0,nsamples,nGrp1+1)
+ for(i in 1:nsamples){
+ grp1-sample(4,nGrp1)
+ grp2-sample((1:4)[-grp1],)
+ out[i,]-apply(X[,c(grp1+1,grp2+5)],2,sample,1)
+ }
+ out}
 sampleX(help,1,10)
Error in out[i, ] - apply(X[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : 
  number of items to replace is not a multiple of replacement length

By the way, it is only a small piece of my data set, which has 12 variables
(or columns) for each group (grp1: CSE1, CSE2, CSE3, CSE4, CSE5, CSE6, CSE7,
CSE8, CSE9, CSE10, CSE11, CSE12; grp2: WSE1, WSE2, WSE3, WSE4, WSE5, WSE6,
WSE7, WSE8, WSE9, WSE10, WSE11, WSE12). I will draw 1000 random samples for
each of the 11 different combinations below: 

combination 1: 1 variable from grp1 + 11 variables from grp2 = 12 variables
combination 2: 2 variable from grp1 + 10 variables from grp2 = 12 variables
combination 3: 3 variable from grp1 + 9 variables from grp2 = 12 variables
combination 4: 4 variable from grp1 + 8 variables from grp2 = 12 variables
combination 5: 5 variable from grp1 + 7 variables from grp2 = 12 variables
combination 6: 6 variable from grp1 + 6 variables from grp2 = 12 variables
combination 7: 7 variable from grp1 + 5 variables from grp2 = 12 variables
combination 8: 8 variable from grp1 + 4 variables from grp2 = 12 variables
combination 9: 9 variable from grp1 + 3 variables from grp2 = 12 variables
combination 10: 10 variable from grp1 + 2 variables from grp2 = 12 variables
combination 11: 11 variable from grp1 + 1 variables from grp2 = 12 variables

As shown above, the sum of the variables in each combination will have to be
12. Also, I want to restrict a vector I am going to sample from to only
those columns that are not correspond to grp1 variables I have sampled. For
example, if I sampled 1 variable, say CSE1, from grp1, the other 11
variables from grp2 should not include WSE1; if I sampled 2 variables, say
CSE1 and CSE2, from grp1, the other 10 variables from grp2 should not
include WSE1 and WSE2.

Anyway, this is a lot more complicated example than the one I described in
my first post. But I think I can modify your function if I wanna apply it to
the large data set with 12 variables for each group, since they basically
share the same method. Now I am wondering where the error message is from.

Again, thanks!! :)


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3045095.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling problem

2010-11-15 Thread Michael Bedward
Hello,

Is this what you want ?

sampleX - function(X, nGrp1, nsamples)
# X is matrix or data.frame with cols for two groups of variables
# with grp1 in cols 2:5 and grp2 in cols 6:9
#
# nGrp1 - number of variables to sample from group 1
#
# nsamples - number of rows in output matrix

  if (nGrp1 = 4) stop(can't sample all group 1 variables)

  out - matrix(0, nsamples, nGrp1+1)
  for (i in 1:nsamples) {
# choose grp1 vars to sample
grp1 - sample(4, nGrp1)

# choose complentary grp2 var to sample
grp2 - sample((1:4)[-grp1], 1)

# sample 1 value from each var
out[i, ] - apply(X[,c(grp1+1, grp2+5)], 2, sample, 1)
  }

  out
}

Michael


On 16 November 2010 07:59, wangwallace talentt...@gmail.com wrote:

 Hey,

 I am hoping someone can help me with a sampling question.

 I have a data frame of 8 variables (the first column is the subjects' id):

    SubID    CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
      1          6      5       6       2      6      2        2       4
      2          6      4       7       2      6      6        2       3
      3          5      5       5       5      5      5        4       5
      4          5      4       3       4      4      4        5       2
      5          5      6       7       5      6      4        4       1
      6          5      4       3       6      4      3        7       3
      7          3      6       6       3      6      5        2       1
      8          3      6       6       3      6      5        4       7

 the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and
 CSE4 in one group and the rest in another group.

sample(data[,2:4],2,replace=FALSE)

   CSE1 CSE2
 1      6    5
 2      6    4
 3      5    5
 4      5    4
 5      5    6
 6      5    4
 7      3    6
 8      3    6

 Now I want to sample 1 column from another group of variables (i.e., WSE1,
 WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample from
 to only those columns that are not correspond to GROUP 1 variables I have
 sampled. That is, I want to sample a column from WSE3, WSE4  Columns
 corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped.

 How can I do this? what if I want to repeat this whole process (drawing 2
 random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
 random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas?

 Many thanks in advance!!

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling from normal

2010-10-19 Thread Wu Gong

Hi Solafah,

You are right that two commands are equivalent when p= pnorm(a). You can
check the results by following codes.
n - 5
a - -1
set.seed(123456)
qnorm(runif(n,0,pnorm(a)))
p - pnorm(a)
set.seed(123456)
qnorm(p*runif(n)) 

Anyway, the elements of the lower tail are not chosen equally by this
method. I may try another method. Such like:
s1 - rnorm(1)
n - 5
a - -1
sample(s1[s1a],n)


-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sampling-from-normal-tp3003016p3003164.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from data set

2010-10-05 Thread Jeffrey Spies
We'll probably need much more info, but this should get you started:

nameOfDataSet[sample(1:1, 100),]

You can replace the 1 with dim(nameOfDataSet)[1] to make it more dynamic.

Jeff.

On Tue, Oct 5, 2010 at 3:07 AM, Jumlong Vongprasert
jumlong.u...@gmail.com wrote:
 Dear all.
 I have data with 2 variable x,y size 1.
 I want to sampling from this data with size 100.
 How I can do it.
 THANK.

 --
 Jumlong Vongprasert
 Institute of Research and Development
 Ubon Ratchathani Rajabhat University
 Ubon Ratchathani
 THAILAND
 34000

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from data set

2010-10-05 Thread Jeffrey Spies
If poproh.3 was your dataset as a data.frame (an object with row and
column dimensions), you need a comma following the row selection
(sample(...)) to indicate that you want to select those rows and all
columns:

newsample -poprho.3[sample(1:1,100),] # note the last comma in the brackets

General use is:

my.data.frame[rows,columns]

Where either rows or columns (or both) can be left blank to indicate
that you want all of them.  Similarly, a selection of the first column
would have been (comma followed by column number):

newsample -poprho.3[sample(1:1,100),1]

That's why your:

newsample -as.matrix(nameofdataset[sample(1:1,100),])

worked; the as.matrix wasn't necessary to simply sample the data.

Cheers,

Jeff.

On Tue, Oct 5, 2010 at 3:54 AM, Jumlong Vongprasert
jumlong.u...@gmail.com wrote:
 Dear Jeffrey.
 I used newsample -as.matrix(nameofdataset[sample(1:1,100),]).
 Now it include all 2 variable.
 Thank you for your answer to inspire.
 Jumlong

 2010/10/5 Jeffrey Spies jsp...@virginia.edu

 We'll probably need much more info, but this should get you started:

 nameOfDataSet[sample(1:1, 100),]

 You can replace the 1 with dim(nameOfDataSet)[1] to make it more
 dynamic.

 Jeff.

 On Tue, Oct 5, 2010 at 3:07 AM, Jumlong Vongprasert
 jumlong.u...@gmail.com wrote:
  Dear all.
  I have data with 2 variable x,y size 1.
  I want to sampling from this data with size 100.
  How I can do it.
  THANK.
 
  --
  Jumlong Vongprasert
  Institute of Research and Development
  Ubon Ratchathani Rajabhat University
  Ubon Ratchathani
  THAILAND
  34000
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jumlong Vongprasert
 Institute of Research and Development
 Ubon Ratchathani Rajabhat University
 Ubon Ratchathani
 THAILAND
 34000


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling from normal distribution

2010-10-03 Thread Daniel Nordlund


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of solafah bh
 Sent: Sunday, October 03, 2010 3:39 PM
 To: R help mailing list
 Subject: [R] sampling from normal distribution
 
 Hello
 If i want to resampl from the tails of normal distribution , are these
 commans equivelant??
   upper tail:qnorm(runif(n,pnorm(b),1))  if b is an upper tail boundary
   or
   upper tail:qnorm((1-p)+p(runif(n))  if p is the probability of each
 interval (the observatins are divided to intervals)
 
 Regards
 
 
 

Yes, they are equivalent, although the second formula is missing a closing 
parenthesis and a multiplication operator.  You could also simplify the second 
formula to 

 qnorm(1-p*runif(n))

Hope this is helpful,

Dan

Daniel Nordlund
Bothell

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling from normal distribution

2010-10-03 Thread Duncan Murdoch

On 03/10/2010 6:38 PM, solafah bh wrote:

Hello
If i want to resampl from the tails of normal distribution , are these commans 
equivelant??
  upper tail:qnorm(runif(n,pnorm(b),1))  if b is an upper tail boundary
  or
  upper tail:qnorm((1-p)+p(runif(n))  if p is the probability of each interval 
(the observatins are divided to intervals)



You don't say how far up in the tail you are going, but if b is very 
large, you have to watch out for rounding error.  For example, with 
b=10, pnorm(b) will be exactly equal to 1, and both versions will fail. 
 In general for b  0 you'll get a bit more accuracy by sampling from 
the lower tail using -b.  For really extreme cases you will probably 
need to switch to a log scale.  For example, to get a random sample from 
a normal, conditional on being larger than 20, you'd want something like


n - 10
logp1 - pnorm(-20, log=TRUE)
logprobs - log(runif(n)) + logp1
-qnorm(logprobs, log=TRUE)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling one random frame from each unique trial?

2010-06-28 Thread Dennis Murphy
Hi:

Try this:

do.call(rbind, lapply(split(h, h$file), function(x) x[sample(1:nrow(x), 1),
]))

My test returns
   file time_pred distance_1 distance_2
12.03.08_ins_odo_01 12.03.08_ins_odo_01   210 19.003 18.023
12.03.08_ins_odo_02 12.03.08_ins_odo_0290 13.668 12.950
12.03.08_ins_odo_03 12.03.08_ins_odo_03   120 21.220 26.370
12.03.08_ins_odo_07 12.03.08_ins_odo_07   180 16.301 19.976
distance_3
12.03.08_ins_odo_01 14.666
12.03.08_ins_odo_02 13.506
12.03.08_ins_odo_03 23.962
12.03.08_ins_odo_07 25.309

The function does the following:
(1) Splits the data frame into a list, where each component of the list is a
sub-data frame.
(2) Applies the (anonymous) sampling function to each list component
(lapply)
(3) Combines the individual outputs together using the rbind function
(do.call)

Since this is the raison d'etre of the plyr package, one can also use

library(plyr)
 ddply(d, 'file', function(x) x[sample(1:nrow(x), 1), ])
 file time_pred distance_1 distance_2 distance_3
1 12.03.08_ins_odo_01   270 15.694  9.285  4.135
2 12.03.08_ins_odo_02   270 17.252 18.235 18.661
3 12.03.08_ins_odo_03   240 18.117 19.111 19.870
4 12.03.08_ins_odo_0790 19.790 23.276 18.678

(Your results may vary, but you do get one row per file as output.)

HTH,
Dennis

On Sun, Jun 27, 2010 at 6:16 PM, Kristiina Hurme
kristiina.hu...@uconn.eduwrote:


 hello everyone. please bear with me if this is very easy...

 I have a data set with many trials, and frames within each trial. I would
 like to pull out one random frame from each trial.
 here is an example. i have 4 unique trials (file), and various frames
 within
 each (time_pred). I would like to randomly sample 4 rows, but 1 from each
 trial (file).

 this sample data is called h
  file   time_pred distance_1 distance_2
 distance_3
 1  12.03.08_ins_odo_01   210 19.003 18.023 14.666
 2  12.03.08_ins_odo_01   240 23.905 20.087 17.266
 3  12.03.08_ins_odo_01   270 15.694  9.285  4.135
 4  12.03.08_ins_odo_02 0 22.142 16.061 14.776
 5  12.03.08_ins_odo_0230  2.968 12.533 19.696
 6  12.03.08_ins_odo_0260  6.175 17.701 20.198
 7  12.03.08_ins_odo_0290 13.668 12.950 13.506
 8  12.03.08_ins_odo_02   120  7.098 17.817 22.878
 9  12.03.08_ins_odo_02   270 17.252 18.235 18.661
 10 12.03.08_ins_odo_02   300  7.967 15.944  8.130
 11 12.03.08_ins_odo_0390 18.724 17.931 21.148
 12 12.03.08_ins_odo_03   120 21.220 26.370 23.962
 13 12.03.08_ins_odo_03   150 21.225 24.376 20.194
 14 12.03.08_ins_odo_03   180 22.298 24.119 24.606
 15 12.03.08_ins_odo_03   210  8.413 14.464 15.219
 16 12.03.08_ins_odo_03   240 18.117 19.111 19.870
 17 12.03.08_ins_odo_0760 24.063 25.779 24.800
 18 12.03.08_ins_odo_0790 19.790 23.276 18.678
 19 12.03.08_ins_odo_07   120 15.617 23.707 19.545
 20 12.03.08_ins_odo_07   150 24.818 22.373 24.515
 21 12.03.08_ins_odo_07   180 16.301 19.976 25.309
 22 12.03.08_ins_odo_07   210 23.843 24.772 26.025
 23 12.03.08_ins_odo_07   240  9.029 15.125 20.139
 24 12.03.08_ins_odo_07   270  6.533 22.833 23.618

 here is my code so far...

  random -for(i in unique(file)){h[sample(1:24,1),]}
  random

 but this only gives me one sample... and if I try to exclude naming it as
 random, then nothing comes up. i'm confused and very new to R. please help!
 many many thanks!
 kristiina


 --
 View this message in context:
 http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270396.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling one random frame from each unique trial?

2010-06-27 Thread Daniel Malter

Hi, take the following example and proceed accordingly.

Name=c(Miller,Miller,Miller,Miller,Smith,Smith,Smith,Smith)
X=rnorm(8)
Year=rep(2000:2003,2)

d=data.frame(Name,X,Year)

#Row indices
rows=1:dim(d)[1]

#Which Name occupies which rows?
#Name would be your file
w=function(x){which(Name%in%unique(x))}
samplefrom=tapply(Name,Name,w)

#Sample one row index for each Name and
#give the data frame d for these row indices
f=function(x){sample(x,1)}
d[unlist(lapply(samplefrom,f)),]

HTH,
Daniel

-- 
View this message in context: 
http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270465.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-17 Thread Jim Lemon

On 06/17/2010 03:27 AM, Somnath Somnath wrote:

Thanks for all those reply. Is there any general rule to determine how many
samples I would get from a population of size n, I draw a sample of size
m (m may be greater than n) if sample is drawn with replacement?


Hi Somnath,
If you mean how many unique values, I think this is the occupancy 
problem that is discussed in:


Feller, W. (1950) An introduction to probability theory and its 
applications (Vol 1). New York: Wiley.


and probably other places. You can calculate the probability of 
obtaining each possible number of outcomes using the Maxwell-Botlzmann 
distribution.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Jun Shen
sample(1:20,4,replace=TRUE) should do it.

Jun

On Wed, Jun 16, 2010 at 9:20 AM, Somnath Somnath somnath700...@gmail.comwrote:

 Dear all, good morning,

 I have a population, let say members are tagged with some simple number
 like
 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can
 be
 more than 20 also). Is there any R function which will show me all such
 possible samples?

 Thanks

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Jorge Ivan Velez
Try

sample(20, 4, replace = TRUE)

HTH,
Jorge


On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath  wrote:

 Dear all, good morning,

 I have a population, let say members are tagged with some simple number
 like
 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can
 be
 more than 20 also). Is there any R function which will show me all such
 possible samples?

 Thanks

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Rafael Björk
If you for some reason want to be shown all the possible combinations, try
expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling.

hth Rafael

2010/6/16 Jorge Ivan Velez jorgeivanve...@gmail.com

 Try

 sample(20, 4, replace = TRUE)

 HTH,
 Jorge


 On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath  wrote:

  Dear all, good morning,
 
  I have a population, let say members are tagged with some simple number
  like
  1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can
  be
  more than 20 also). Is there any R function which will show me all such
  possible samples?
 
  Thanks
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread David Winsemius


On Jun 16, 2010, at 10:20 AM, Somnath Somnath wrote:


Dear all, good morning,

I have a population, let say members are tagged with some simple  
number like
1,2,3,...20. I want to draw a sample with replacement of size 4  
(say, can be

more than 20 also).


Already answered on the list.


Is there any R function which will show me all such
possible samples?


?expand.grid

 nrow(expand.grid(1:20, 1:20, 1:20, 1:20))
[1] 16

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Jorge Ivan Velez
Hi Rafael,

You might try:

 r - expand.grid(rep(list(1:20), 4))
 dim(r)
[1] 16  4

HTH,
Jorge


2010/6/16 Rafael Björk 

 If you for some reason want to be shown all the possible combinations, try
 expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling.

 hth Rafael

 2010/6/16 Jorge Ivan Velez 

 Try

 sample(20, 4, replace = TRUE)

 HTH,
 Jorge


 On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath  wrote:

  Dear all, good morning,
 
  I have a population, let say members are tagged with some simple number
  like
  1,2,3,...20. I want to draw a sample with replacement of size 4 (say,
 can
  be
  more than 20 also). Is there any R function which will show me all such
  possible samples?
 
  Thanks
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Tom La Bone

How about


library(TeachingSampling)
SupportWR(20,4)


Tom
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Sampling-with-replacement-tp2257450p2257644.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Somnath Somnath
Thanks for all those reply. Is there any general rule to determine how many
samples I would get from a population of size n, I draw a sample of size
m (m may be greater than n) if sample is drawn with replacement?

Thanks,

2010/6/16 Jorge Ivan Velez jorgeivanve...@gmail.com

 Hi Rafael,

 You might try:

  r - expand.grid(rep(list(1:20), 4))
  dim(r)
 [1] 16  4

 HTH,
 Jorge


 2010/6/16 Rafael Björk 

 If you for some reason want to be shown all the possible combinations, try
 expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling.

 hth Rafael

 2010/6/16 Jorge Ivan Velez 

 Try

 sample(20, 4, replace = TRUE)

 HTH,
 Jorge


 On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath  wrote:

  Dear all, good morning,
 
  I have a population, let say members are tagged with some simple number
  like
  1,2,3,...20. I want to draw a sample with replacement of size 4 (say,
 can
  be
  more than 20 also). Is there any R function which will show me all such
  possible samples?
 
  Thanks
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread William Dunlap
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Somnath Somnath
 Sent: Wednesday, June 16, 2010 10:28 AM
 To: r-help@r-project.org
 Subject: Re: [R] Sampling with replacement
 
 Thanks for all those reply. Is there any general rule to 
 determine how many
 samples I would get from a population of size n, I draw a 
 sample of size
 m (m may be greater than n) if sample is drawn with replacement?

If you consider two samples equivalent if they
differ only in their ordering (e.g., c(1,2,2)
is equivalent to c(2,1,2) and c(2,2,1)) then
the answer is
choose(n+m-1, m)
If order matters then it is
n^m

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 Thanks,
 
 2010/6/16 Jorge Ivan Velez jorgeivanve...@gmail.com
 
  Hi Rafael,
 
  You might try:
 
   r - expand.grid(rep(list(1:20), 4))
   dim(r)
  [1] 16  4
 
  HTH,
  Jorge
 
 
  2010/6/16 Rafael Björk 
 
  If you for some reason want to be shown all the possible 
 combinations, try
  expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use 
 this for sampling.
 
  hth Rafael
 
  2010/6/16 Jorge Ivan Velez 
 
  Try
 
  sample(20, 4, replace = TRUE)
 
  HTH,
  Jorge
 
 
  On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath  wrote:
 
   Dear all, good morning,
  
   I have a population, let say members are tagged with 
 some simple number
   like
   1,2,3,...20. I want to draw a sample with replacement 
 of size 4 (say,
  can
   be
   more than 20 also). Is there any R function which will 
 show me all such
   possible samples?
  
   Thanks
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, 
 reproducible code.
  
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with replacement

2010-06-16 Thread Johannes Huesing
Somnath Somnath somnath700...@gmail.com [Wed, Jun 16, 2010 at 07:27:32PM 
CEST]:
 Thanks for all those reply. Is there any general rule to determine how many
 samples I would get from a population of size n, I draw a sample of size
 m (m may be greater than n) if sample is drawn with replacement?

m^n

-- 
Johannes Hüsing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:johan...@huesing.name  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from Bivariate Uniform Distribution

2010-02-16 Thread Greg Snow
The correlation will not be exactly 0, but will represent a draw from an 
independent population.

There may be something in the copulas package to allow for more independence 
(but that about exhausts my knowledge of that package).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Haneef_An
 Sent: Monday, February 15, 2010 11:53 AM
 To: r-help@r-project.org
 Subject: Re: [R] Sampling from Bivariate Uniform Distribution
 
 
 When I wrap those values in to a matrix will it be still independent ?
 ( non
 zero correlation).
 
 Can I do this for any multivariate distribution which has the
 univariate
 form?
 
 Thank you for the response.
 
 Haneef
 --
 View this message in context: http://n4.nabble.com/Sampling-from-
 Bivariate-Uniform-Distribution-tp1476485p1556481.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from Bivariate Uniform Distribution

2010-02-15 Thread Haneef_An

When I wrap those values in to a matrix will it be still independent ? ( non
zero correlation). 

Can I do this for any multivariate distribution which has the univariate
form?

Thank you for the response.

Haneef
-- 
View this message in context: 
http://n4.nabble.com/Sampling-from-Bivariate-Uniform-Distribution-tp1476485p1556481.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from Bivariate Uniform Distribution

2010-02-10 Thread Greg Snow
The runif function generates random numbers from a uniform distribution, wrap 
those values into a matrix and you have a multi dimensional uniform 
distribution.

If you want more than this, give us more detail.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Haneef Anver
 Sent: Wednesday, February 10, 2010 1:29 PM
 To: r-help@r-project.org
 Subject: [R] Sampling from Bivariate Uniform Distribution
 
 Hello all!!!
 
 
 1) I am wondering is there a way to generate random numbers in R for
 Bivariate Uniform distribution?
 
 
 2) Does R have  built-in function for generating random numbers for any
 given bivariate distribution.
 
 Any help would be greatly appreciated !!
 
 
 Good day!
 
 
 Haneef Anver
 
 
 
 
 
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling theory

2010-01-19 Thread Thomas Lumley

On Tue, 19 Jan 2010, Christian Hennig wrote:

are there any R-packages for computations required in sampling theury (such 
as confidence intervals under random, stratified, cluster sampling; I'd be 
partoculary interested in confidence intervals for the population variance, 
which is difficult enough to find even in books)?




Yes, these are in the survey package, for fairly general designs, using 
linearization or replicate weights.

 I don't know how good the confidence intervals for the variance are. One of 
the disadvantages of implementing survey estimators in a general way is that 
you lose the opportunity to use bias corrections that are only available for 
simple cases.

The forthcoming version 3.19 (later this week) has nicer output for the 
population variance, but the computations are still the same.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from a Postgres database

2010-01-15 Thread Bart Joosen

One way could be to first select only the unique ID's, sample this and then
select only the relevant records:

strQuery = SELECT ID from tblFoo;
IDs - sqlQuery(channel, strQuery)
sample.IDs - sample(IDs,10)
strQuery = paste(SELECT ID from tblFoo WHRE ID IN(, sample.IDs, );)
IDs - sqlQuery(channel, strQuery)

Bart



christiaan pauw-2 wrote:
 
 Hi Everybody
 
 Is there a way in which one can use the RPostgreSQL package to take a
 sample
 from a table in Postgres database without having to read the whole table
 into R
 
 regards
 Christiaan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://n4.nabble.com/Sampling-from-a-Postgres-database-tp1014506p1014638.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling from a Postgres database

2010-01-15 Thread Joe Conway
On 01/15/2010 01:49 AM, Bart Joosen wrote:
 
 One way could be to first select only the unique ID's, sample this and then
 select only the relevant records:
 
 strQuery = SELECT ID from tblFoo;
 IDs - sqlQuery(channel, strQuery)
 sample.IDs - sample(IDs,10)
 strQuery = paste(SELECT ID from tblFoo WHRE ID IN(, sample.IDs, );)
 IDs - sqlQuery(channel, strQuery)

Better is to use the built-in random() function in Postgres:

#select count(*) from visits;
  count
-
 4846604
(1 row)

# select count(*) from visits where random()  0.005;
 count
---
 24391
(1 row)

HTH,

Joe



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling dataframe

2009-11-28 Thread Juliet Hannah
Here are some options that may help you out. First,
let's put the data in a format that can be cut-and-pasted
into R.

myData - read.table(textConnection(var1 var2 var3
1 111
2 312
3 813
4 614
51015
6 221
7 422
8 623
9 824
10   1025),header=TRUE,row.names=1)
closeAllConnections()

or

use dput

myData - structure(list(var1 = c(1L, 3L, 8L, 6L, 10L, 2L, 4L, 6L, 8L,
10L), var2 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), var3 = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c(var1, var2,
var3), class = data.frame, row.names = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10))


#Select data where v2=1

select_v2 - myData[myData$var2==1,]

# sample two rows of select_v2

sampled_v2 - select_v2[sample(1:nrow(select_v2),2),]

# select rows of var3 not equal to 1

select_v3 - myData[myData$v3 !=1,]

# ?rbind may also come in useful.

2009/11/25 Ronaldo Reis Júnior chrys...@gmail.com:
 Hi,

 I have a table like that:

 datatest
   var1 var2 var3
 1     1    1    1
 2     3    1    2
 3     8    1    3
 4     6    1    4
 5    10    1    5
 6     2    2    1
 7     4    2    2
 8     6    2    3
 9     8    2    4
 10   10    2    5

 I need to create another table based on that with the rules:

 take a random sample by var2==1 (2 sample rows for example):

   var1 var2 var3
 1     1    1    1
 4     6    1    4

 in this random sample a get the 1 and 4 value on the var3, now I need to
 complete the table with var1==2 with the lines that var3 are not select on
 var2==1

 The resulting table is:
   var1 var2 var3
 1     1    1    1
 4     6    1    4
 7     4    2    2
 8     6    2    3
 10   10    2    5

 the value 1 and 4 on var3 is not present in the var2==2.

 I try several options but without success. take a random value is easy, but I
 cant select the others value excluding the random selected values.

 Any help?

 Thanks
 Ronaldo


 --
 17ª lei - Seu orientador quer que você se torne famoso,
          de modo que ele possa, finalmente, se tornar famoso.

      --Herman, I. P. 2007. Following the law. NATURE, Vol 445, p. 228.
 --
 Prof. Ronaldo Reis Júnior
 |  .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional
 | : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
 | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
 |   `- Fone: (38) 3229-8192 | ronaldo.r...@unimontes.br | chrys...@gmail.com
 | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366
 --
 Favor NÃO ENVIAR arquivos do Word ou Powerpoint
 Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling procedure

2009-10-15 Thread David Winsemius


On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote:



I would like to divide a vector in 9 groups in a way that each  
number is

present in only one group.
In a vector of 783 I would like to divide in 9 different groups of 87

Example - matrix(c(1:783),ncol = 1)



 Example - matrix(c(1:783),ncol = 1)
 Grp1 - sample(Example, 87, replace=FALSE)
 Grp2 - sample(Example[-Grp1], 87, replace=FALSE)
 Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE)
# lather, rinse , repeat



s1 - as.matrix(sample(Example,87, re = FALSE))
Example - Example[-s1]
s2 - as.matrix(sample(Example,87, re = FALSE))
#however I don´t know how to remove the second group from the  
Example to

continue sampling.



#Don't mess up the original


There is probably an easy and faster way to do this.
Could anybody help me?
Thanks

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling procedure

2009-10-15 Thread Bert Gunter
If I understand what is wanted correctly, this can be a one-liner! -- think
whole objects:

splitup - function(x,n.groups)
#split x into n.groups mutually exclusive sets
{
  lx - length(x)
  if(n.groups = lx) stop(Number of groups greater than vector length)
  x - x[sample(lx,lx)]
  split(x,seq_len(n.groups))
}

## testit

 splitup(1:71,9)

$`1`
[1] 22 26 38 50 65 60  9 27

$`2`
[1] 24  2 69 28 71 31 41 13

$`3`
[1] 16 47 63 45 23  1  8 32

$`4`
[1] 34 39 64 35  7 19  4 55

$`5`
[1] 54 10 37 68  6 17 70 18

$`6`
[1] 61 11  5 46 33 43 14 56

$`7`
[1] 42 44 12 62 66 48 57 58

$`8`
[1] 21 40 30 29 20 49 52 67

$`9`
[1] 59 15 25 51  3 36 53


Cheers,

Bert Gunter
Genentech Nonclinical Statistics
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of David Winsemius
Sent: Thursday, October 15, 2009 7:55 AM
To: Marcio Resende
Cc: r-help@r-project.org
Subject: Re: [R] Sampling procedure


On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote:


 I would like to divide a vector in 9 groups in a way that each  
 number is
 present in only one group.
 In a vector of 783 I would like to divide in 9 different groups of 87

 Example - matrix(c(1:783),ncol = 1)


  Example - matrix(c(1:783),ncol = 1)
  Grp1 - sample(Example, 87, replace=FALSE)
  Grp2 - sample(Example[-Grp1], 87, replace=FALSE)
  Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE)
# lather, rinse , repeat


 s1 - as.matrix(sample(Example,87, re = FALSE))
 Example - Example[-s1]
 s2 - as.matrix(sample(Example,87, re = FALSE))
 #however I don´t know how to remove the second group from the  
 Example to
 continue sampling.


#Don't mess up the original

 There is probably an easy and faster way to do this.
 Could anybody help me?
 Thanks
-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling procedure

2009-10-15 Thread David Winsemius
If parsimony is needed, then define a 9-row matrix and send a  
randomized indexed version of Example to it:


s-matrix(NA, nrow=9, ncol=length(Example)/9)
s[,] - Example[sample(Example, length(Example) )]

 str(s)
 int [1:9, 1:87] 503 731 708 23 255 675 163 381 361 412 ...

Or even:

s-matrix(Example[ sample(Example, length(Example) )], nrow=9,  
ncol=length(Example)/9)


--
David

On Oct 15, 2009, at 11:22 AM, Bert Gunter wrote:

If I understand what is wanted correctly, this can be a one-liner!  
-- think

whole objects:

splitup - function(x,n.groups)
#split x into n.groups mutually exclusive sets
{
 lx - length(x)
 if(n.groups = lx) stop(Number of groups greater than vector  
length)

 x - x[sample(lx,lx)]
 split(x,seq_len(n.groups))
}

## testit


splitup(1:71,9)


$`1`
[1] 22 26 38 50 65 60  9 27

$`2`
[1] 24  2 69 28 71 31 41 13

$`3`
[1] 16 47 63 45 23  1  8 32

$`4`
[1] 34 39 64 35  7 19  4 55

$`5`
[1] 54 10 37 68  6 17 70 18

$`6`
[1] 61 11  5 46 33 43 14 56

$`7`
[1] 42 44 12 62 66 48 57 58

$`8`
[1] 21 40 30 29 20 49 52 67

$`9`
[1] 59 15 25 51  3 36 53


Cheers,

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org 
] On

Behalf Of David Winsemius
Sent: Thursday, October 15, 2009 7:55 AM
To: Marcio Resende
Cc: r-help@r-project.org
Subject: Re: [R] Sampling procedure


On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote:



I would like to divide a vector in 9 groups in a way that each
number is
present in only one group.
In a vector of 783 I would like to divide in 9 different groups of 87

Example - matrix(c(1:783),ncol = 1)




Example - matrix(c(1:783),ncol = 1)
Grp1 - sample(Example, 87, replace=FALSE)
Grp2 - sample(Example[-Grp1], 87, replace=FALSE)
Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE)

# lather, rinse , repeat



s1 - as.matrix(sample(Example,87, re = FALSE))
Example - Example[-s1]
s2 - as.matrix(sample(Example,87, re = FALSE))
#however I don´t know how to remove the second group from the
Example to
continue sampling.



#Don't mess up the original


There is probably an easy and faster way to do this.
Could anybody help me?
Thanks

--


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling procedure

2009-10-15 Thread Bert Gunter
 
... except the matrix approach doesn't work if the length of the vector is
not exactly divisible by the number of groups. That's why I used split.

Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics 



-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, October 15, 2009 8:48 AM
To: Bert Gunter
Cc: 'Marcio Resende'; r-help@r-project.org
Subject: Re: [R] Sampling procedure

If parsimony is needed, then define a 9-row matrix and send a  
randomized indexed version of Example to it:

s-matrix(NA, nrow=9, ncol=length(Example)/9)
s[,] - Example[sample(Example, length(Example) )]

  str(s)
  int [1:9, 1:87] 503 731 708 23 255 675 163 381 361 412 ...

Or even:

s-matrix(Example[ sample(Example, length(Example) )], nrow=9,  
ncol=length(Example)/9)

-- 
David

On Oct 15, 2009, at 11:22 AM, Bert Gunter wrote:

 If I understand what is wanted correctly, this can be a one-liner!  
 -- think
 whole objects:

 splitup - function(x,n.groups)
 #split x into n.groups mutually exclusive sets
 {
  lx - length(x)
  if(n.groups = lx) stop(Number of groups greater than vector  
 length)
  x - x[sample(lx,lx)]
  split(x,seq_len(n.groups))
 }

 ## testit

 splitup(1:71,9)

 $`1`
 [1] 22 26 38 50 65 60  9 27

 $`2`
 [1] 24  2 69 28 71 31 41 13

 $`3`
 [1] 16 47 63 45 23  1  8 32

 $`4`
 [1] 34 39 64 35  7 19  4 55

 $`5`
 [1] 54 10 37 68  6 17 70 18

 $`6`
 [1] 61 11  5 46 33 43 14 56

 $`7`
 [1] 42 44 12 62 66 48 57 58

 $`8`
 [1] 21 40 30 29 20 49 52 67

 $`9`
 [1] 59 15 25 51  3 36 53


 Cheers,

 Bert Gunter
 Genentech Nonclinical Statistics


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org 
 ] On
 Behalf Of David Winsemius
 Sent: Thursday, October 15, 2009 7:55 AM
 To: Marcio Resende
 Cc: r-help@r-project.org
 Subject: Re: [R] Sampling procedure


 On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote:


 I would like to divide a vector in 9 groups in a way that each
 number is
 present in only one group.
 In a vector of 783 I would like to divide in 9 different groups of 87

 Example - matrix(c(1:783),ncol = 1)


 Example - matrix(c(1:783),ncol = 1)
 Grp1 - sample(Example, 87, replace=FALSE)
 Grp2 - sample(Example[-Grp1], 87, replace=FALSE)
 Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE)
 # lather, rinse , repeat


 s1 - as.matrix(sample(Example,87, re = FALSE))
 Example - Example[-s1]
 s2 - as.matrix(sample(Example,87, re = FALSE))
 #however I don´t know how to remove the second group from the
 Example to
 continue sampling.


 #Don't mess up the original

 There is probably an easy and faster way to do this.
 Could anybody help me?
 Thanks
 -- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling procedure

2009-10-15 Thread David Winsemius
OK, you're right. I thought it might be simple fix to increase the  
number of columns to accommodate, but the recycling conventions trips  
up that strategy.


Thanks;
David.

On Oct 15, 2009, at 11:55 AM, Bert Gunter wrote:



... except the matrix approach doesn't work if the length of the  
vector is
not exactly divisible by the number of groups. That's why I used  
split.


Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics



-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, October 15, 2009 8:48 AM
To: Bert Gunter
Cc: 'Marcio Resende'; r-help@r-project.org
Subject: Re: [R] Sampling procedure

If parsimony is needed, then define a 9-row matrix and send a
randomized indexed version of Example to it:

s-matrix(NA, nrow=9, ncol=length(Example)/9)
s[,] - Example[sample(Example, length(Example) )]


str(s)

 int [1:9, 1:87] 503 731 708 23 255 675 163 381 361 412 ...

Or even:

s-matrix(Example[ sample(Example, length(Example) )], nrow=9,
ncol=length(Example)/9)

--
David

On Oct 15, 2009, at 11:22 AM, Bert Gunter wrote:


If I understand what is wanted correctly, this can be a one-liner!
-- think
whole objects:

splitup - function(x,n.groups)
#split x into n.groups mutually exclusive sets
{
lx - length(x)
if(n.groups = lx) stop(Number of groups greater than vector
length)
x - x[sample(lx,lx)]
split(x,seq_len(n.groups))
}

## testit


splitup(1:71,9)


$`1`
[1] 22 26 38 50 65 60  9 27

$`2`
[1] 24  2 69 28 71 31 41 13

$`3`
[1] 16 47 63 45 23  1  8 32

$`4`
[1] 34 39 64 35  7 19  4 55

$`5`
[1] 54 10 37 68  6 17 70 18

$`6`
[1] 61 11  5 46 33 43 14 56

$`7`
[1] 42 44 12 62 66 48 57 58

$`8`
[1] 21 40 30 29 20 49 52 67

$`9`
[1] 59 15 25 51  3 36 53


Cheers,

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
] On
Behalf Of David Winsemius
Sent: Thursday, October 15, 2009 7:55 AM
To: Marcio Resende
Cc: r-help@r-project.org
Subject: Re: [R] Sampling procedure


On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote:



I would like to divide a vector in 9 groups in a way that each
number is
present in only one group.
In a vector of 783 I would like to divide in 9 different groups of  
87


Example - matrix(c(1:783),ncol = 1)




Example - matrix(c(1:783),ncol = 1)
Grp1 - sample(Example, 87, replace=FALSE)
Grp2 - sample(Example[-Grp1], 87, replace=FALSE)
Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE)

# lather, rinse , repeat



s1 - as.matrix(sample(Example,87, re = FALSE))
Example - Example[-s1]
s2 - as.matrix(sample(Example,87, re = FALSE))
#however I don´t know how to remove the second group from the
Example to
continue sampling.



#Don't mess up the original


There is probably an easy and faster way to do this.
Could anybody help me?
Thanks

--


David Winsemius, MD
Heritage Laboratories
West Hartford, CT




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling of non-overlapping intervals of variable length

2009-07-19 Thread David Winsemius


On Jul 19, 2009, at 1:05 PM, Hadassa Brunschwig wrote:


Hi,

I hope I am not repeating a question which has been posed already.
I am trying to do the following in the most efficient way:
I would like to sample from a finite (large) set of integers n non- 
overlapping

intervals, where each interval i has a different, set length L_i
(which is the number
of integers in the interval).
I had the idea to sample recursively on a vector with the already
chosen intervals
discarded but that seems to be too complicated.


It might be ridiculously easy if you sampled on an index of a group of  
intervals.
Why not pose the question in the form of example data.frames or other  
classes of R objects? Specification of the desired output would be  
essential. I think further specification of the sampling strategy  
would also help because I am unable to understand what sort of  
probability model you are hoping to apply.



Any suggestions on that?

Thanks a lot.

Hadassa


--
Hadassa Brunschwig
PhD Student
Department of Statistics



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling of non-overlapping intervals of variable length

2009-07-19 Thread Hadassa Brunschwig
Hi

I am not sure what you mean by sampling an index of a group of
intervals. I will try to give an example:
Let's assume I have a vector 1:100. Let's say I have 10 intervals
of different but known length, say,
c(4,6,11,2,8,14,7,2,18,32). For simulation purposes I have to sample
those 10 intervals 1000 times.
The requirement is, however, that they should be of those lengths and
should not be overlapping.
In short, I would like to obtain a 10x1000 matrix with sampled intervals.

Thanks
Hadassa

On Sun, Jul 19, 2009 at 9:48 PM, David Winsemiusdwinsem...@comcast.net wrote:

 On Jul 19, 2009, at 1:05 PM, Hadassa Brunschwig wrote:

 Hi,

 I hope I am not repeating a question which has been posed already.
 I am trying to do the following in the most efficient way:
 I would like to sample from a finite (large) set of integers n
 non-overlapping
 intervals, where each interval i has a different, set length L_i
 (which is the number
 of integers in the interval).
 I had the idea to sample recursively on a vector with the already
 chosen intervals
 discarded but that seems to be too complicated.

 It might be ridiculously easy if you sampled on an index of a group of
 intervals.
 Why not pose the question in the form of example data.frames or other
 classes of R objects? Specification of the desired output would be
 essential. I think further specification of the sampling strategy would also
 help because I am unable to understand what sort of probability model you
 are hoping to apply.

 Any suggestions on that?

 Thanks a lot.

 Hadassa


 --
 Hadassa Brunschwig
 PhD Student
 Department of Statistics


 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT





-- 
Hadassa Brunschwig
PhD Student
Department of Statistics
The Hebrew University of Jerusalem
http://www.stat.huji.ac.il

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >