Hi,
The solution Matthias gave works perfectly when we are doing random sample of
the dataframe without replacement. But it's not working with replacement. E.g.
if I've the original dataframe of the form matrix(seq(1,100,100, 1) and want to
select randomly 20 rows. With Matthias example, we can randomly sample that and
the new matrix might look like this
matrix("1 2 3 21 29 36 37 40 45 53 55 56 71 72 79 82 90 96 97 99", 20,1).
But if I want a matrix of this form, (which can be possible with random
sampling with replacement)
matrix("1 2 3 21 21 21 37 40 45 53 53 56 71 79 79 82 90 96 97 99", 20,1).
I'm not getting it.
I tried the following code:
data_ind = matrix(seq(1,nrow(actual_data), 1), nrow(bdframe_bt_subset_1), 1)
data_sample = sample(nrow(data_ind), 100, TRUE)
data_sample_matrix= matrix(data_sample, 100, 1)
a = matrix(0, (nrow(data_ind)- nrow(data_sample_matrix)), 1)
data_sample1 = rbind(data_sample, a)
b = removeEmpty(target=actual_data, margin="rows", select = data_sample1);
But this is not giving me the repeated row even though I can see in
"data_sample_matrix" I've repeated position in the data.
I also tried the follow "sample.dlm" in "utils" folder, but that also not
giving me the answer I'm looking for.
We can use the for-loop in this case using "data_sample_matrix" matrix. But
want to avoid looping.
Can anyone please help?
Thank you!
Arijit
________________________________
From: arijit chakraborty <[email protected]>
Sent: Saturday, April 22, 2017 12:45 PM
To: [email protected]
Subject: Re: Randomly Selecting rows from a dataframe
Thank you Matthias! You are most helpful!
Thanks again!
Arijit
________________________________
From: Matthias Boehm <[email protected]>
Sent: Saturday, April 22, 2017 2:20:48 AM
To: [email protected]
Subject: Re: Randomly Selecting rows from a dataframe
you can take for example a 1% sample of rows via a permutation matrix
(specifically selection matrix) as follows
I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
P = removeEmpty(target=diag(I), margin="rows");
Xsample = P %*% X;
or via removeEmpty and selection vector
I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
Xsample = removeEmpty(target=X, margin="rows", select=I);
Both should be compiled internally to very similar plans.
Regards,
Matthias
On Fri, Apr 21, 2017 at 1:42 PM, arijit chakraborty <[email protected]>
wrote:
> Hi,
>
>
> Suppose I've a dataframe of 10 variables (X1-X10) and have 1000 rows. Now
> I want to randomly select rows so that I've a subset of the dataset.
>
>
> Can anyone please help me to solve this problem?
>
>
> I tried the following code:
>
>
> randSample = sample(nrow(dataframe), 200);
>
>
> This gives me a column matrix with position of the row randomly selected.
> But I could not able to solve how from this matrix I can subset data from
> original dataframe.
>
>
> Thank you!
>
>
> Arijit
>