Re: [R] R 3.1.2 : arima.sim(model=list(ma=0.5), n=250, innov=rnorm(250, mean=0, sd=0.1)) versus arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) => only the first element is not identical !

Fabien Tarrade Mon, 13 Jul 2015 07:21:05 -0700

Hi Mark,

Thanks for your message and sorry for the delay but it took me some timeto understand your message and to try few things.

I think one would say that that is not a bug. I looked at the detailsof arima.sim ( using debug(arima.sim) )and there are two different series that are created inside thefunction. one is called innov and the other is start.innov. start.innov is
used to create a burn in period for the constructed series.

I am not sure I fully understand how the 2 time series are created :oneis called innov and the other is start.innov but what I am not sure tounderstand is the following :


nobs = 5000000

set.seed(183);

test1 <- arima.sim(model=list(ma=0.5), n =nobs,innov=rnorm(nobs,mean=0,sd=0.1))


set.seed(183);
test2 <- arima.sim(model=list(ma=0.5), n=nobs, mean=0, sd=0.1)

sum(abs(test1-test2)>0)
test1[1]
test2[1]
test1[2]
test2[2]

Browse[4]> sum(abs(test1-test2)>0)
[1] 1
Browse[4]> test1[1]
[1] 0.09417
Browse[4]> test2[1]
[1] -0.02648
Browse[4]> test1[2]
[1] -0.1053
Browse[4]> test2[2]
[1] -0.1053

which indicate that on 5 millions of generated event with the 2 methodsonly the first element is different ! (I try to change the seed and Isaw the same). I could understand that most of the element are differentbut the fact that only the first one is different is something that Idon't understand, sorry.



I made a copy of the function to play a bit and print some value :

=> first line :

test1 <- test(model=list(ma=0.5), n =nobs,innov=rnorm(nobs,mean=0,sd=0.1))


=> second line :
   test2 <- test(model=list(ma=0.5), n=nobs, mean=0, sd=0.1)


after :
        x <- ts(c(start.innov[seq_len(n.start)], innov[1L:n]), start = 1 -
                        n.start)
I see :
[1]  0.2681255 -0.0398882 -0.0853729 -0.0707863 -0.0262917 -0.0001408
[1]  0.0268126 -0.0398882 -0.0853729 -0.0707863 -0.0262917 -0.0001408

after :
        if (length(model$ma)) {
                x <- filter(x, c(1, model$ma), sides = 1L)
                x[seq_along(model$ma)] <- 0
        }
I see :
[1]  0.00000  0.09417 -0.10532 -0.11347 -0.06168 -0.01329
[1]  0.00000 -0.02648 -0.10532 -0.11347 -0.06168 -0.01329

after :
        if (length(model$ar))
                x <- filter(x, model$ar, method = "recursive")
I see :
[1]  0.00000  0.09417 -0.10532 -0.11347 -0.06168 -0.01329
[1]  0.00000 -0.02648 -0.10532 -0.11347 -0.06168 -0.01329

after :
        if (n.start > 0)
                x <- x[-(seq_len(n.start))]
I see :
[1]  0.09417 -0.10532 -0.11347 -0.06168 -0.01329
[1] -0.02648 -0.10532 -0.11347 -0.06168 -0.01329

after :
        if (d > 0)
                x <- diffinv(x, differences = d)
I see :
[1]  0.09417 -0.10532 -0.11347 -0.06168 -0.01329
[1] -0.02648 -0.10532 -0.11347 -0.06168 -0.01329

so the difference appear since the beginning with the with element andwhich seems to be divided by 10!


after :
        x <- ts(c(start.innov[seq_len(n.start)], innov[1L:n]), start = 1 -
                        n.start)
I see :
[1]  0.2681255 ...
[1]  0.0268126 ...

in your test1 call, you are not supplying arguments for what shouldbe used for the innovations associated with start.innov which is usedfor the burn in period. So, arima.sim uses the defaults of mean = 0and sd = 1.0.

> set.seed(123);

> test1 <- arima.sim(model=list(ma=0.5), n =250,innov=rnorm(250,mean=0,sd=0.1))

> print(head(test1))
[1] -0.30326  0.14436  0.08499  0.01645  0.17797  0.13184

> set.seed(123);

> test3 <- arima.sim(model=list(ma=0.5), innov = rnorm(250,mean=0,sd=1.0), n = 250, mean=0, sd=0.1)

> print(head(test3))
[1] -0.2582  1.4436  0.8499  0.1645  1.7797  1.3184

this doesn't seems to be true or I misinterpret you explanation

For your test2 call, you do provide them ( so they are used for bothinnov and start.innov ) and you use sd = 0.1 So, for test2, thevalues for the burn in period end up being different from the ones intest1.

I am not sure I understand the meaning of "burn in period". Do we expectthat to change only the first element of 2 time series of 5M of events >

Below, I made a test3 that can be used to get the same values astest2. In short, by specifiying the innov call EXACTLY in test1,you're lettingarima.sim use the default arguments for the start.innov call so that'swhy they're different.

I did the test and I agree

Thanks a lot
Cheers
Fabien



#====================================================================================================================
 ##undebug(arima.sim)

set.seed(123);

test1 <- arima.sim(model=list(ma=0.5), n =250,innov=rnorm(250,mean=0,sd=0.1))

print(head(test1))

set.seed(123);
test2 <- arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1)
print(head(test2))

set.seed(123);

test3 <- arima.sim(model=list(ma=0.5), innov = rnorm(250,mean=0,sd=0.1), n = 250, mean=0, sd=0.1)

print(head(test3))

On Sat, Jul 11, 2015 at 3:46 AM, Fabien Tarrade<fabien.tarr...@gmail.com <mailto:fabien.tarr...@gmail.com>> wrote:


    Dear all,

    When doing a DataCamp tutorial with R I find the following
    observation that using 2 different syntax for "arima.sim" give
    different answer for the first element

    If I use the the function using the list of argument describe in
    the help manual :
    arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1))

    or if I use the following syntax use in a DataCamp example :

    arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) it is
    accepted by DataCamp

    I don't find exactly the same results. The reason is that even if
    the seed is the same in both cases the first element is not
    identical while it should be (it doesn't mean that the results is
    wrong, maybe for the first element the seed is not propagated
    correctly)

    here the results of the difference using the same seed (only the
    first element is different using the 2 different syntaxes) :

      [1] -0.252214  0.000000  0.000000  0.000000  0.000000 0.000000
    0.000000  0.000000  0.000000  0.000000

[11] 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000.000000 0.000000 0.000000 0.000000



    here the code to reproduce this feature :

    set.seed(123);
    test1 <- 0.05 +
    arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1))
    set.seed(123);
    test2 <- 0.05 + arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1)

    test1-test2

    I am using R 3.1.2 GUI 1.65 Mavericks build (6833) on Mac (I guess
    arima come with stats which is included in R (?))
    The DataCamp team ask me to report to you about this observation
    on this mailing list. If you want me to fill a bug report some R
    bug tracking system, let me know

    Please tell me if this is the wrong list and which other
    information do you need from R and how to get then (compiler,
    version of some R packages ...)


    Hope this help
    Thanks
    Cheers
    Fabien

--Dr Fabien Tarrade


    Quantitative Analyst - Data Scientist - Researcher

    Senior data analyst specialised in the modelling, processing and
    statistical treatment of data.
    PhD in Physics, 10 years of experience as researcher at the
    forefront of international scientific research.
    Fascinated by finance and data modelling.

    Geneva, Switzerland

    Email : cont...@fabien-tarrade.eu
    <mailto:cont...@fabien-tarrade.eu>
    <mailto:cont...@fabien-tarrade.eu <mailto:cont...@fabien-tarrade.eu>>
    Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu>
    <http://www.fabien-tarrade.eu>
    Phone : +33 (0)6 14 78 70 90
    <tel:%2B33%20%280%296%2014%2078%2070%2090>

    LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter
    <https://twitter.com/fabtar> Google
    <https://plus.google.com/+FabienTarradeProfile/posts> Facebook
    <https://www.facebook.com/fabien.tarrade.eu> Google
    <skype:fabtarhiggs?call> Xing
    <https://www.xing.com/profile/Fabien_Tarrade>
    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
    To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


--
Dr Fabien Tarrade

Quantitative Analyst - Data Scientist - Researcher

Senior data analyst specialised in the modelling, processing andstatistical treatment of data.PhD in Physics, 10 years of experience as researcher at the forefront ofinternational scientific research.

Fascinated by finance and data modelling.

Geneva, Switzerland

Email : cont...@fabien-tarrade.eu <mailto:cont...@fabien-tarrade.eu>
Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu>
Phone : +33 (0)6 14 78 70 90

LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter<https://twitter.com/fabtar> Google<https://plus.google.com/+FabienTarradeProfile/posts> Facebook<https://www.facebook.com/fabien.tarrade.eu> Google<skype:fabtarhiggs?call> Xing <https://www.xing.com/profile/Fabien_Tarrade>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 3.1.2 : arima.sim(model=list(ma=0.5), n=250, innov=rnorm(250, mean=0, sd=0.1)) versus arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) => only the first element is not identical !

Reply via email to