On Aug 1, 2014, at 1:58 PM, Stephen HK Wong <hon...@stanford.edu> wrote:

> Dear ALL,
> 
> I have a dataframe contains 4 columns and several 10 millions of rows like 
> below! I want to extract out "randomly" say 1 millions of rows, can you tell 
> me how to do that in R using base packages? Many Thanks!!!!
> 
> Col_1 Col_2   Col_3   Col_4
> chr1  3000215 3000250 -
> chr1  3000909 3000944 +
> chr1  3001025 3001060 +
> chr1  3001547 3001582 +
> chr1  3002254 3002289 +
> chr1  3002324 3002359 -
> chr1  3002833 3002868 -
> chr1  3004565 3004600 -
> chr1  3004945 3004980 +
> chr1  3004974 3005009 -
> chr1  3005115 3005150 +
> chr1  3005124 3005159 +
> chr1  3005240 3005275 -
> chr1  3005558 3005593 -
> chr1  3005890 3005925 +
> chr1  3005929 3005964 +
> chr1  3005913 3005948 -
> chr1  3005913 3005948 -
> 
> Stephen HK Wong


If your data frame is called 'DF':

  DF.Rand <- DF[sample(nrow(DF), 1000000), ]

See ?sample which will generate a random sample from a uniform distribution.

In the above, nrow(DF) returns the number of rows in DF and defines the sample 
space of 1:nrow(DF), from which 1000000 random integer values will be selected 
and used as indices to return the rows.

Using the built in 'iris' dataset, select 20 random rows from the 150 total:

> iris[sample(nrow(iris), 20), ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
122          5.6         2.8          4.9         2.0  virginica
79           6.0         2.9          4.5         1.5 versicolor
109          6.7         2.5          5.8         1.8  virginica
106          7.6         3.0          6.6         2.1  virginica
49           5.3         3.7          1.5         0.2     setosa
125          6.7         3.3          5.7         2.1  virginica
1            5.1         3.5          1.4         0.2     setosa
68           5.8         2.7          4.1         1.0 versicolor
84           6.0         2.7          5.1         1.6 versicolor
110          7.2         3.6          6.1         2.5  virginica
113          6.8         3.0          5.5         2.1  virginica
64           6.1         2.9          4.7         1.4 versicolor
102          5.8         2.7          5.1         1.9  virginica
71           5.9         3.2          4.8         1.8 versicolor
69           6.2         2.2          4.5         1.5 versicolor
65           5.6         2.9          3.6         1.3 versicolor
74           6.1         2.8          4.7         1.2 versicolor
99           5.1         2.5          3.0         1.1 versicolor
135          6.1         2.6          5.6         1.4  virginica
41           5.0         3.5          1.3         0.3     setosa



Regards,

Marc Schwartz
 
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to