Re: [R] generate ordered categorical variable in R

Marc Schwartz Wed, 16 Sep 2015 14:11:11 -0700

> On Sep 16, 2015, at 3:40 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
> 
> Nope. Take it back. I stand uncorrected.
> 
>> system.time(z <-sample(1:10,1e6, rep=TRUE))
>   user  system elapsed
>  0.045   0.001   0.047
> 
>> system.time(z <-sample.int(10,1e6,rep=TRUE))
>   user  system elapsed
>  0.012   0.000   0.013
> 
> 
> sample() has to do subscripting in the general case; sample.int doesn't.
> 
> But I would agree that the difference is likely almost always unnoticeable.



Well, in your defense Bert, given the nuance of the example you provided, it 
actually gets worse the larger the initial sample space is, if defined as a 
vector rather than a scalar.

On my MacBook Pro, with 16 Gb of RAM and a 2.5 Ghz i7, running R version 3.2.2 
(2015-08-14):

> system.time(x1 <- sample(1:1e10, 1e8, replace = TRUE))
Killed: 9

That ran for a couple of minutes and eventually crashed R.

However, as below:

> system.time(x1 <- sample(1e10, 1e8, replace = TRUE))
   user  system elapsed 
  2.943   0.238   3.191 

> system.time(x1 <- sample.int(1e10, 1e8, replace = TRUE))
   user  system elapsed 
  3.135   0.198   3.336 


Here is another example that works, showing a larger time difference with the 
sample space as a vector:

> system.time(x1 <- sample(1:1e9, 1e8, replace = TRUE))
   user  system elapsed 
  7.069   1.317   8.399 

> system.time(x1 <- sample(1e9, 1e8, replace = TRUE))
   user  system elapsed 
  1.324   0.111   1.438 

> system.time(x1 <- sample.int(1e9, 1e8, replace = TRUE))
   user  system elapsed 
  1.328   0.116   1.450 


If one is running Monte Carlo simulations, repeating the above a very large 
number of times, it can become a meaningful difference.

Thus, there is an incentive for one to specify the sample space as a scalar and 
perhaps consider the resultant vector, if needed, as indices (1:x) into the 
actual sample space desired.

Interesting...

Regards,

Marc


> 
> 
> -- Bert
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
> 
> 
> On Wed, Sep 16, 2015 at 1:34 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>> Yes. Thanks Marc. I stand corrected.
>> 
>> -- Bert
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>> 
>> 
>> On Wed, Sep 16, 2015 at 1:28 PM, Marc Schwartz <marc_schwa...@me.com> wrote:
>>> 
>>>> On Sep 16, 2015, at 1:06 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>>>> 
>>>> Yikes! The uniform distribution is a **continuous** distribution over
>>>> an interval. You seem to want to sample over a discrete distribution.
>>>> See ?sample for that, as in:
>>>> 
>>>> sample(1:4,100,rep=TRUE)
>>>> 
>>>> ## or for this special case and faster
>>>> 
>>>> sample.int(4,size=100,rep=TRUE)
>>> 
>>> 
>>> Bert,
>>> 
>>> I am not sure that it is really faster, since internally, sample() calls 
>>> sample.int():
>>> 
>>>> sample
>>> function (x, size, replace = FALSE, prob = NULL)
>>> {
>>>    if (length(x) == 1L && is.numeric(x) && x >= 1) {
>>>        if (missing(size))
>>>            size <- x
>>>        sample.int(x, size, replace, prob)
>>>    }
>>>    else {
>>>        if (missing(size))
>>>            size <- length(x)
>>>        x[sample.int(length(x), size, replace, prob)]
>>>    }
>>> }
>>> 
>>> 
>>> set.seed(1)
>>> 
>>>> system.time(x1 <- sample(1e10, 1e8, replace = TRUE))
>>>   user  system elapsed
>>>  2.755   0.170   2.925
>>> 
>>> 
>>> set.seed(1)
>>>> system.time(x2 <- sample.int(1e10, 1e8, replace = TRUE))
>>>   user  system elapsed
>>>  2.767   0.183   2.951
>>> 
>>> 
>>>> all(x1 == x2)
>>> [1] TRUE
>>> 
>>> 
>>> Regards,
>>> 
>>> Marc
>>> 
>>> 
>>>> 
>>>> Cheers,
>>>> Bert
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "Data is not information. Information is not knowledge. And knowledge
>>>> is certainly not wisdom."
>>>>  -- Clifford Stoll
>>>> 
>>>> 
>>>> On Wed, Sep 16, 2015 at 10:11 AM, thanoon younis
>>>> <thanoon.youni...@gmail.com> wrote:
>>>>> Dear R- users
>>>>> 
>>>>> I want to generate ordered categorical variable vector with 200x1 
>>>>> dimension
>>>>> and from 1 to 4 categories and i tried with this code
>>>>> 
>>>>> Q1=runif(200,1,4) the results are not just 1 ,2 3,4, but the results with
>>>>> decimals like 1.244, 2.342,4,321 and so on ... My question how can i
>>>>> generate a vector and also a matrix with orered categorical variables and
>>>>> without decimals just 1,2,3 ,4 ,1,2,3,4, ....
>>>>> 
>>>>> Many thanks in advance
>>> 

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generate ordered categorical variable in R

Reply via email to