Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-08-02 Thread Paul Menzel
Dear Dennis and Steve,


Am Sonntag, den 31.07.2011, 23:32 -0400 schrieb Steve Lianoglou:

[…]

 How about trying to write the of this `f4` function below using the
 rcpp/inline combo. The C/C++ you will need to write looks to be quite
 trivial, let's change f4 to accept an x argument as a vector:
 
 I've defined f4 in the same way as Dennis did:
 
  f4 - function()
   {
  x - sample(c(-1L,1L), 1)
 
   if (x = 0 ) {return(1)} else {
csum - x
len - 1
while(csum  0) {
csum - csum + sample(c(-1, 1), 1)
len - len + 1
   } }
   len
   }
 
 Now, let's do some inline/c++ mojo:
 
 library(inline)
 inc - 
 #include stdio.h
 #include stdlib.h
 #include time.h
 
 
 fxx -cxxfunction(includes=inc, plugin=Rcpp, body=
   int len = 1;
   int x = ((rand() % 2 ) == 0) ? 1 : -1;
   int csum = x;
 
   while (csum  0) {
 x = ((rand() % 2 ) == 0) ? 1 : -1;
 len++;
 csum = csum + x;
   }
 
   return wrap(len);
 )
 
 Assuming I've faithfully translated this into c++, the timings aren't
 all that comparable.
 
 Doing 500 replicates with the pure R version:
 
 set.seed(123)
 system.time(out - replicate(500, f4()))
user  system elapsed
  31.525   0.120  32.510
 
 Doing 10,000 replicates using the fxx function doesn't even break a sweat:
 
 system.time(outxx - replicate(1, fxx()))
user  system elapsed
   0.371   0.001   0.373
 
 range(out)
 [1]   1 1994308
 
 range(outxx)
 [1]1 11909394

thank you very much for your suggestions.

This is indeed a nice speed.

1. I first had that implemented in FORTRAN (and Python) too, but turned
to R for two reasons. First I wanted to use also other distributions
later on and thought that it would be easier with R and that R would
have that implemented as fast as possible. Secondly I thought that R
would also operate faster having the right vectorization and using
`csum()`. But I guess it is difficult to find a good model to use the
advantages of R.

Especially looking at `top` when running this example CPU is used 100 %
but memory only 40 MB from 2 GB. So if one could use another data
structure maybe the calculations could be done on more walks at once.

2. It is indeed possible that the walk never returns to zero, so I
should make sure, that I abort the while loop after a certain length.

3. Looking at the data types I am wondering if some integer overflow(?)
could happen. I could make the length variable unsigned I suppose [1].
But still `csum` could go from `-len` to 0 and for the normal random
walk unsigned should not be a problem too besides that the logic/checks
have to be adapted.

For integrated random walks, `ccsum += csum`, `ccsum` would go from
-(ccsum**2)/2 up to 0. So later on I should use probably the 64 bit data
type (unsigned) `long` for `ccsum`, `csum` and `length` to avoid those
problems. Memory does not seem to be a problem. Also I need to add an
additional check for the height and length in the while loop like the
following.

(csum  0)  (csum  -ULONG_MAX)  (len = ULONG_MAX)

So I came up with the following and to use unsigned I only consider that
the random walk stays positive instead of negative.

 8  code  8 
library(inline)
inc - 
#include climits
#include stdio.h
#include stdlib.h
#include time.h


f9 -cxxfunction(includes=inc, plugin=Rcpp, body=
  unsigned long len = 1;

  if ((rand() % 2 ) == 0) {
return wrap(len);
  }

  unsigned long x = 1;

  for (unsigned long csum = x; csum  0; csum = ((rand() % 2 ) == 0) ? csum + 
1: csum - 1) {
len++;
if ((csum == ULONG_MAX)  (len == ULONG_MAX)) {
  return wrap(len);
}
  }

  return wrap(len);
)
 8  code  8 

I do not know if the compiler would have optimized it that way anyway
and if there is any difference (besides the overflow checks).

 set.seed(1); system.time( z9_1 - replicate(1000, f9()) )
   User  System verstrichen 
  0.076   0.004   0.084 
 range(z9_1)
[1]   1 1449034
 length(z9_1)
[1] 1000


Thanks,

Paul


[1] 
https://secure.wikimedia.org/wikipedia/en/wiki/Integer_(computer_science)#Common_integral_data_types


signature.asc
Description: This is a digitally signed message part
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-08-01 Thread Paul Menzel
Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
 Glad to help -- I haven't taken a look at Dennis' solution (which may be far
 better than mine), but if you do want to keep going down the path outlined
 below you might consider the following:

I will try Dennis’ solution right away but looked at your suggestions
first. Thank you very much.

 Instead of throwing away a simulation if something starts negative, why not
 just multiply the entire sample by -1: that lets you still use the sample
 and saves you some computations: of course you'll have to remember to adjust
 your final results accordingly.

That is a nice suggestion. For a symmetric random walk this is indeed
possible and equivalent to looking when the walk first hits zero.

 This might avoid the loop:
 
 x = ## Whatever x is.
 xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
 which.max((x=0)  (xLag 0)) + 1 # Depending on how you've decided to count
 things, this +1 may be extraneous.

 The inner expression sets a 0 except where there is a switch from negative
 to positive and a one there: the which.max function returns the location of
 the first maximum, which is the first 1, in the vector. If you are
 guaranteed the run starts negative, then the location of the first positive
 should give you the length of the negative run.

That is the same idea as from Bill [1]. The problem is, when the walk
never returns to zero in a sample, `which.max(»everything FALSE)`
returns 1 [2]. That is no problem though, when we do not have to worry
about a walk starting with a positive value and adding 1 (+1) can be
omitted when we count the epochs of first hitting 0 instead of the time
of how long the walk stayed negative, which is always one less.

Additionally my check `(x=0)  (xLag 0)` is redundant when we know we
start with a negative value. `(x=0)` should be good enough in this
case.

 This all gives you,
 
 f4 - function(n = 10, # number of simulations
length = 10) # length of iterated sum
 {
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
 
 R = apply(R,1,cumsum)
 
   R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first 
 element in the row is positive, flip the entire row

The line above seems to look the columns instead of rows. I think the
following is correct since after the `apply()` above the random walks
are in the columns.

R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]

 fTemp - function(x) {
 
 xLag = c(0,x[-length(x)])
 return(which.max((x=0)  (xLag 0))+1)
 
 countNegative = apply(R,2,fTemp)
 tabulate(as.vector(countNegative), length)
  }
 
 That just crashed my computer though, so I wouldn't recommend it for large
 n,length.

Welcome to my world. I would have never thought that simulating random
walks with a length of say a million would create that much data and
push common desktop systems with let us say 4 GB of RAM to their limits.

 Instead, you can help a little by combining the lagging and the 
 all in one.
 
 f4 - function(n = 10, llength = 10)
 {
 R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
 R = apply(R,1,cumsum)
 R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row 
 is positive, flip the entire row
 R = (cbind(rep(0,NROW(R)),R)0)(cbind(R,rep(0,NROW(R)))=0)
 countNegative = apply(R,1,which.max) + 1
 return (tabulate(as.vector(countNegative), length) )
 
  }

I left that one out, because as written above the check can be
shortened.

 Of course, this is all starting to approach a very specific question that
 could actually be approached much more efficiently if it's your end goal
 (though I think I remember from your first email a different end goal):

That is true. But to learn some optimization techniques on a simple
example is much appreciated and will hopefully help me later on for the
iterated random walk cases.

 We can use the symmetry and restartability of RW to do the following:
 
 x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T)
 D  = diff(which(x == 0))

Nice!

 This will give you a vector of how long x stays positive or negative at a
 time. Thinking through some simple translations lets you see that this set
 has the same distribution as how long a RW that starts negative stays
 negative.

I have to write those translations down. On first sight though we need
again to handle the case where it stays negative the whole time. `D`
then has length 0 and we have to count that for a walk longer than
`BIGNUMBER`.

 Again, this is only good for answering a very specific question
 about random walks and may not be useful if you have other more complicated
 questions in sight.

Just testing for 0 for the iterated cases will not be enough for
iterated random walks since an iterated random walk can go from negative
to non-negative without being zero at this time/epoch.

I implemented all your suggestions and got 

Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-08-01 Thread R. Michael Weylandt michael.weyla...@gmail.com
I've only got a 20 minute layover, but three quick remarks:

1) Do a sanity check on your data size: if you want a million walks of a 
thousand steps, that already gets you to a billion integers to store--even at a 
very low bound of one byte each, thats already 1GB for the data and you still 
have to process it all and run the OS. If you bump this to walks of length 10k, 
you are in big trouble. 

Considered like that, it shouldn't surprise you that you are getting near 
memory limits. 

If you really do need such a large simulation and are willing to make the 
time/space tradeoff, it may be worth doing simulations in smaller batches (say 
50-100) and aggregating the needed stats for analysis. Also, consider direct 
use of the rm() function for memory management. 

2) If you know that which.max()==1 can't happen for your data, might this trick 
be easier than forcing it through some tricky logic inside the which.max()

X=which.max(...)
if(X[1]==1) X=Inf # or whatever value

3) I dont have any texts at hand to confirm this but isn't the expected value 
of the first hit time of a RW infinite? I think a  handwaving proof can be 
squeezed out of the optional stopping theorem with T=min(T_a,T_b) for a0b and 
let a - -Inf. 

If I remember right, this suggests you are trying to calculate a CI for a 
distribution with no finite moments, a difficult task to say the least. 

Hope these help and I'll write a more detailed reply to your notes below later,

Michael Weylandt

PS - what's an iterated RW? This is all outside my field (hence my spitball on 
#2 above)

PS2 - sorry about the row/column mix-up: I usually think of sample paths as 
rows...

On Aug 1, 2011, at 8:49 AM, Paul Menzel paulepan...@users.sourceforge.net 
wrote:

 Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
 Glad to help -- I haven't taken a look at Dennis' solution (which may be far
 better than mine), but if you do want to keep going down the path outlined
 below you might consider the following:
 
 I will try Dennis’ solution right away but looked at your suggestions
 first. Thank you very much.
 
 Instead of throwing away a simulation if something starts negative, why not
 just multiply the entire sample by -1: that lets you still use the sample
 and saves you some computations: of course you'll have to remember to adjust
 your final results accordingly.
 
 That is a nice suggestion. For a symmetric random walk this is indeed
 possible and equivalent to looking when the walk first hits zero.
 
 This might avoid the loop:
 
 x = ## Whatever x is.
 xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
 which.max((x=0)  (xLag 0)) + 1 # Depending on how you've decided to count
 things, this +1 may be extraneous.
 
 The inner expression sets a 0 except where there is a switch from negative
 to positive and a one there: the which.max function returns the location of
 the first maximum, which is the first 1, in the vector. If you are
 guaranteed the run starts negative, then the location of the first positive
 should give you the length of the negative run.
 
 That is the same idea as from Bill [1]. The problem is, when the walk
 never returns to zero in a sample, `which.max(»everything FALSE)`
 returns 1 [2]. That is no problem though, when we do not have to worry
 about a walk starting with a positive value and adding 1 (+1) can be
 omitted when we count the epochs of first hitting 0 instead of the time
 of how long the walk stayed negative, which is always one less.
 
 Additionally my check `(x=0)  (xLag 0)` is redundant when we know we
 start with a negative value. `(x=0)` should be good enough in this
 case.
 
 This all gives you,
 
 f4 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
 {
   R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
 
   R = apply(R,1,cumsum)
 
  R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first 
 element in the row is positive, flip the entire row
 
 The line above seems to look the columns instead of rows. I think the
 following is correct since after the `apply()` above the random walks
 are in the columns.
 
R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]
 
   fTemp - function(x) {
 
xLag = c(0,x[-length(x)])
return(which.max((x=0)  (xLag 0))+1)
 
   countNegative = apply(R,2,fTemp)
   tabulate(as.vector(countNegative), length)
 }
 
 That just crashed my computer though, so I wouldn't recommend it for large
 n,length.
 
 Welcome to my world. I would have never thought that simulating random
 walks with a length of say a million would create that much data and
 push common desktop systems with let us say 4 GB of RAM to their limits.
 
 Instead, you can help a little by combining the lagging and the 
 all in one.
 
 f4 - function(n = 10, llength = 10)
 {
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R = apply(R,1,cumsum)

Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-08-01 Thread Paul Menzel
Am Montag, den 01.08.2011, 12:43 -0400 schrieb R. Michael Weylandt :
 I've only got a 20 minute layover, but three quick remarks:
 
 1) Do a sanity check on your data size: if you want a million walks of
 a thousand steps, that already gets you to a billion integers to
 store--even at a very low bound of one byte each, thats already 1GB
 for the data and you still have to process it all and run the OS. If
 you bump this to walks of length 10k, you are in big trouble. 
 
 Considered like that, it shouldn't surprise you that you are getting
 near memory limits. 
 
 If you really do need such a large simulation and are willing to make
 the time/space tradeoff, it may be worth doing simulations in smaller
 batches (say 50-100) and aggregating the needed stats for analysis.

I already did that, saved the result and ran it again. I also found [1]
and will look to do these things in parallel, since the simulations do
not depend on each other. I hope I can avoid the matrix then and use
just `replicate()`.

 Also, consider direct use of the rm() function for memory management.

I was looking for such a feature the last days and read to set the
variables to `NULL` to delete them somewhere. Now I found the correct
command. Thank you!

 2) If you know that which.max()==1 can't happen for your data, might
 this trick be easier than forcing it through some tricky logic inside
 the which.max()
 
 X=which.max(...)
 if(X[1]==1) X=Inf # or whatever value

Noted for when I need that again.

 3) I dont have any texts at hand to confirm this but isn't the
 expected value of the first hit time of a RW infinite?

That is indeed correct. The generating function of the first hitting
time of zero T₀ is (g_T₀)(s) ≔ 1/s (1 - \sqrt(1 - s). Therefore

(g_T₀)ʹ(s) ≔ 1/s² (1 - \sqrt(1 - s) + 1/s (2(1 - s))^(-½) → ∞ for s → 1.

 I think a  handwaving proof can be squeezed out of the optional
 stopping theorem with T=min(T_a,T_b) for a0b and let a - -Inf.

Apparently there are several ways to prove that.

 If I remember right, this suggests you are trying to calculate a CI
 for a distribution with no finite moments, a difficult task to say the
 least.

I do not know. It scares me. ;-) For the symmetric simple random walk
S_n ≔ ∑_{i=0}^n X_i I want to verify (1).

(1) n^(-½) ~ p_n ≔ P(max_{1 ≤ k ≤ n} S_n  0)

a(x) ~ b(x) means that the quotient converges to 1 for x → ∞.

[…]

 PS - what's an iterated RW? This is all outside my field (hence my
 spitball on #2 above)

I am sorry, I meant *integrated* random walk [3][4]. Basically that is
the integral (“area” – can be negative).

A_n ≔ ∑_{i=0}^n S_i = ∑_{i=0}^n (n - i + 1) X_i

Being 0, S₀ and X₀ can always be omitted. So I basically just need to do
one `cumsum()` more over the walks.

 PS2 - sorry about the row/column mix-up: I usually think of sample
 paths as rows...

No problem at all. I already was confused that it looked differently
(transposed) after the first `apply()`. But it made sense.


Thanks,

Paul


[1] 
http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf
[2] http://www.steinsaltz.me.uk/probA/ProbALN13.pdf
[3] http://www-stat.stanford.edu/~amir/preprints/irw.ps
[4] http://arxiv.org/abs/0911.5456


signature.asc
Description: This is a digitally signed message part
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-31 Thread Paul Menzel
Am Mittwoch, den 27.07.2011, 19:59 -0400 schrieb R. Michael Weylandt :
 Some more skilled folks can help with the curve fitting, but the general
 answer is yes -- R will handle this quite ably.

Great to read that.

 Consider the following code:
 
 --
 n = 1e5
 length = 1e5
 
 R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
 R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
 your life INFINITELY better
 R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
 
 
 
 There are actually even faster ways to do what you are asking for, but this
 introduces you to some useful R architecture, above all the apply function.

Thank you very much. I realized the the 0 column is not need when
summing this up. Additionally I posted the wrong example code and I
actually am only interested how long it stays negative from the
beginning.

 To see how long the longest stretch of negatives in each row is, the
 following is a little sneaky but works pretty well:
 
 countNegative = apply(R,1,function(x){which.max(table(cumsum(x=0))})
 
 then you can study these random numbers to do whatever with them.

 The gist of this code is that it counts how many positive number have been
 seen up to each point: for any stretch this doesn't increase, you must be
 negative, so this identifies the longest such stretch on each row and
 records the length. (It may be off by one so check it on a smaller R matrix.

That is a great example. It took me a while what `table()` does here but
thanks to your explanation I finally understood it.

[…]

 So all together:
 
 --
 n = 1e3
 length = 1e3
 
 R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
 R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
 your life INFINITELY better
 R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
 fTemp - function(x) {
 return(max(table(cumsum(x=0
 }
 countNegative = apply(R,1,fTemp)
 mu = mean(countNegative)
 sig = sd(countNegative)/sqrt(length(countNegative))
 
 
 
 This runs pretty fast on my laptop, but you'll need to look into the
 memory.limit() function if you want to increase the simulation parameters.
 There are much faster ways to handle the simulation as well, but this should
 get you off to a nice start with R.
 
 Hope this helps,

It did. Thank you again for the kind and elaborate introduction.

Trying to run your example right away froze my system using `n = 1000`
and `length = 1e5` [1]. So I really need to be careful how big such a
matrix can get. One thing is to use integers as suggested in [2].

My current code looks like the following.

 8  code  8 
f4 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
{
R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will 
make your life INFINITELY better
fTemp - function(x) {
if (x[1] = 0 ) {
return(1)
}

for (i in 1:length-1) {
if (x[i]  0  x[i + 1] = 0) {
return(as.integer(i/2 + 2)) # simple random 
walks only hit 0 on even “times”
}
}
}
countNegative = apply(R,2,fTemp)
tabulate(as.vector(countNegative), length)
}
 8  code  8 

1.I could actually avoid `cumsum()` half the time, when the first entry
is already positive. So I am still looking for a way to speed that up in
comparison to a simple two loops scenario.
2. The counting of the length how long the walk stayed negative is
probably also inefficient and I should find a better way on how to
return the values.

I am still thinking about both cases, but to come up with
vectoriazations of the problem is quite hard.

So I welcome any suggestions. ;-)


Thanks,

Paul


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832#10


signature.asc
Description: This is a digitally signed message part
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-31 Thread Dennis Murphy
Hi:

See if this works for you:

f4 - function()
  {
 x - sample(c(-1L,1L), 1)

  if (x = 0 ) {return(1)} else {
   csum - x
   len - 1
   while(csum  0) {
   csum - csum + sample(c(-1, 1), 1)
   len - len + 1
  } }
  len
  }

# In one batch of repetitions of this function,
system.time(out - replicate(1000, f4()))
   user  system elapsed
   0.510.000.52
 range(out)
[1] 1 17372

but in another (untimed), this took a significantly longer amount of
time to run [for obvious reasons]:
 range(out)
[1]  1 987752

For 10 repetitions, I'd guess this could run anywhere from one to
several minutes, depending on the lengths of the sojourns encountered.
This looks like a reasonable way to visualize the output for 1000
replications:

hist(log(out), nclass = 20)

Notice that the function takes no arguments, returns the length of the
random walk while its cumulative sum is negative [or 1 if it starts
out positive], and then uses the replicate() function to iterate the
function f4() N times.

HTH,
Dennis


On Sun, Jul 31, 2011 at 4:36 PM, Paul Menzel
paulepan...@users.sourceforge.net wrote:
 Am Mittwoch, den 27.07.2011, 19:59 -0400 schrieb R. Michael Weylandt :
 Some more skilled folks can help with the curve fitting, but the general
 answer is yes -- R will handle this quite ably.

 Great to read that.

 Consider the following code:

 --
 n = 1e5
 length = 1e5

 R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
 R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
 your life INFINITELY better
 R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.

 

 There are actually even faster ways to do what you are asking for, but this
 introduces you to some useful R architecture, above all the apply function.

 Thank you very much. I realized the the 0 column is not need when
 summing this up. Additionally I posted the wrong example code and I
 actually am only interested how long it stays negative from the
 beginning.

 To see how long the longest stretch of negatives in each row is, the
 following is a little sneaky but works pretty well:

 countNegative = apply(R,1,function(x){which.max(table(cumsum(x=0))})

 then you can study these random numbers to do whatever with them.

 The gist of this code is that it counts how many positive number have been
 seen up to each point: for any stretch this doesn't increase, you must be
 negative, so this identifies the longest such stretch on each row and
 records the length. (It may be off by one so check it on a smaller R matrix.

 That is a great example. It took me a while what `table()` does here but
 thanks to your explanation I finally understood it.

 […]

 So all together:

 --
 n = 1e3
 length = 1e3

 R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
 R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
 your life INFINITELY better
 R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
 fTemp - function(x) {
     return(max(table(cumsum(x=0
 }
 countNegative = apply(R,1,fTemp)
 mu = mean(countNegative)
 sig = sd(countNegative)/sqrt(length(countNegative))

 

 This runs pretty fast on my laptop, but you'll need to look into the
 memory.limit() function if you want to increase the simulation parameters.
 There are much faster ways to handle the simulation as well, but this should
 get you off to a nice start with R.

 Hope this helps,

 It did. Thank you again for the kind and elaborate introduction.

 Trying to run your example right away froze my system using `n = 1000`
 and `length = 1e5` [1]. So I really need to be careful how big such a
 matrix can get. One thing is to use integers as suggested in [2].

 My current code looks like the following.

  8  code  8 
 f4 - function(n = 10, # number of simulations
               length = 10) # length of iterated sum
 {
        R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
        R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will 
 make your life INFINITELY better
        fTemp - function(x) {
                if (x[1] = 0 ) {
                        return(1)
                }

                for (i in 1:length-1) {
                        if (x[i]  0  x[i + 1] = 0) {
                                return(as.integer(i/2 + 2)) # simple random 
 walks only hit 0 on even “times”
                        }
                }
        }
        countNegative = apply(R,2,fTemp)
        tabulate(as.vector(countNegative), length)
 }
  8  code  8 

 1.I could actually avoid `cumsum()` half the time, when the first entry
 is already positive. So I am still looking for a way 

Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-31 Thread Steve Lianoglou
Hi,

I haven't been following this thread very closely, but I'm getting the
impression that the inner loop that's killing you folks here looks
quite simple (assuming it is the one provided below).

How about trying to write the of this `f4` function below using the
rcpp/inline combo. The C/C++ you will need to write looks to be quite
trivial, let's change f4 to accept an x argument as a vector:

I've defined f4 in the same way as Dennis did:

 f4 - function()
  {
     x - sample(c(-1L,1L), 1)

      if (x = 0 ) {return(1)} else {
           csum - x
           len - 1
               while(csum  0) {
                   csum - csum + sample(c(-1, 1), 1)
                   len - len + 1
                                          }     }
      len
  }

Now, let's do some inline/c++ mojo:

library(inline)
inc - 
#include stdio.h
#include stdlib.h
#include time.h


fxx -cxxfunction(includes=inc, plugin=Rcpp, body=
  int len = 1;
  int x = ((rand() % 2 ) == 0) ? 1 : -1;
  int csum = x;

  while (csum  0) {
x = ((rand() % 2 ) == 0) ? 1 : -1;
len++;
csum = csum + x;
  }

  return wrap(len);
)

Assuming I've faithfully translated this into c++, the timings aren't
all that comparable.

Doing 500 replicates with the pure R version:

set.seed(123)
system.time(out - replicate(500, f4()))
   user  system elapsed
 31.525   0.120  32.510

Doing 10,000 replicates using the fxx function doesn't even break a sweat:

system.time(outxx - replicate(1, fxx()))
   user  system elapsed
  0.371   0.001   0.373

range(out)
[1]   1 1994308

range(outxx)
[1]1 11909394

Hope I'm not too off of the mark, here.
-steve
-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-31 Thread Dennis Murphy
Hi Steve:

Very, very nice. Thanks for the useful Rcpp script. I'm not surprised
that a C++ version blows my humble little R function out of the water
:) I noticed that the R function ran a lot more slowly when the
sojourns were very long. It suggests that algorithms that entail
conditional iteration are quite likely to be better off written in a
compiled programming language that can communicate with R. It also
shows off the capabilities of the Rcpp package.

Best regards,
Dennis

On Sun, Jul 31, 2011 at 8:32 PM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Hi,

 I haven't been following this thread very closely, but I'm getting the
 impression that the inner loop that's killing you folks here looks
 quite simple (assuming it is the one provided below).

 How about trying to write the of this `f4` function below using the
 rcpp/inline combo. The C/C++ you will need to write looks to be quite
 trivial, let's change f4 to accept an x argument as a vector:

 I've defined f4 in the same way as Dennis did:

 f4 - function()
  {
     x - sample(c(-1L,1L), 1)

      if (x = 0 ) {return(1)} else {
           csum - x
           len - 1
               while(csum  0) {
                   csum - csum + sample(c(-1, 1), 1)
                   len - len + 1
                                          }     }
      len
  }

 Now, let's do some inline/c++ mojo:

 library(inline)
 inc - 
 #include stdio.h
 #include stdlib.h
 #include time.h
 

 fxx -cxxfunction(includes=inc, plugin=Rcpp, body=
  int len = 1;
  int x = ((rand() % 2 ) == 0) ? 1 : -1;
  int csum = x;

  while (csum  0) {
    x = ((rand() % 2 ) == 0) ? 1 : -1;
    len++;
    csum = csum + x;
  }

  return wrap(len);
 )

 Assuming I've faithfully translated this into c++, the timings aren't
 all that comparable.

 Doing 500 replicates with the pure R version:

 set.seed(123)
 system.time(out - replicate(500, f4()))
   user  system elapsed
  31.525   0.120  32.510

 Doing 10,000 replicates using the fxx function doesn't even break a sweat:

 system.time(outxx - replicate(1, fxx()))
   user  system elapsed
  0.371   0.001   0.373

 range(out)
 [1]       1 1994308

 range(outxx)
 [1]        1 11909394

 Hope I'm not too off of the mark, here.
 -steve
 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-31 Thread R. Michael Weylandt michael.weyla...@gmail.com
Glad to help -- I haven't taken a look at Dennis' solution (which may be far
better than mine), but if you do want to keep going down the path outlined
below you might consider the following:

Instead of throwing away a simulation if something starts negative, why not
just multiply the entire sample by -1: that lets you still use the sample
and saves you some computations: of course you'll have to remember to adjust
your final results accordingly.

This might avoid the loop:

x = ## Whatever x is.
xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
which.max((x=0)  (xLag 0)) + 1 # Depending on how you've decided to count
things, this +1 may be extraneous.

The inner expression sets a 0 except where there is a switch from negative
to positive and a one there: the which.max function returns the location of
the first maximum, which is the first 1, in the vector. If you are
guaranteed the run starts negative, then the location of the first positive
should give you the length of the negative run. This all gives you,

f4 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
{
   R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)

R = apply(R,1,cumsum)

  R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first
element in the row is positive, flip the entire row

fTemp - function(x) {

xLag = c(0,x[-length(x)])
return(which.max((x=0)  (xLag 0))+1)

countNegative = apply(R,2,fTemp)
tabulate(as.vector(countNegative), length)
 }


That just crashed my computer though, so I wouldn't recommend it for large
n,length. Instead, you can help a little by combining the lagging and the 
all in one.

f4 - function(n = 10, llength = 10)
{
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R = apply(R,1,cumsum)
R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row
is positive, flip the entire row
R = (cbind(rep(0,NROW(R)),R)0)(cbind(R,rep(0,NROW(R)))=0)
countNegative = apply(R,1,which.max) + 1
return (tabulate(as.vector(countNegative), length) )

 }

Of course, this is all starting to approach a very specific question that
could actually be approached much more efficiently if it's your end goal
(though I think I remember from your first email a different end goal):

We can use the symmetry and restartability of RW to do the following:

x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T)
D  = diff(which(x == 0))

This will give you a vector of how long x stays positive or negative at a
time. Thinking through some simple translations lets you see that this set
has the same distribution as how long a RW that starts negative stays
negative. Again, this is only good for answering a very specific question
about random walks and may not be useful if you have other more complicated
questions in sight.

Hope this helps,

Michael Weylandt


On Sun, Jul 31, 2011 at 6:36 PM, Paul Menzel 
paulepan...@users.sourceforge.net wrote:

 Am Mittwoch, den 27.07.2011, 19:59 -0400 schrieb R. Michael Weylandt :
  Some more skilled folks can help with the curve fitting, but the general
  answer is yes -- R will handle this quite ably.

 Great to read that.

  Consider the following code:
 
  --
  n = 1e5
  length = 1e5
 
  R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
  R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will
 make
  your life INFINITELY better
  R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
 
  
 
  There are actually even faster ways to do what you are asking for, but
 this
  introduces you to some useful R architecture, above all the apply
 function.

 Thank you very much. I realized the the 0 column is not need when
 summing this up. Additionally I posted the wrong example code and I
 actually am only interested how long it stays negative from the
 beginning.

  To see how long the longest stretch of negatives in each row is, the
  following is a little sneaky but works pretty well:
 
  countNegative = apply(R,1,function(x){which.max(table(cumsum(x=0))})
 
  then you can study these random numbers to do whatever with them.
 
  The gist of this code is that it counts how many positive number have
 been
  seen up to each point: for any stretch this doesn't increase, you must be
  negative, so this identifies the longest such stretch on each row and
  records the length. (It may be off by one so check it on a smaller R
 matrix.

 That is a great example. It took me a while what `table()` does here but
 thanks to your explanation I finally understood it.

 […]

  So all together:
 
  --
  n = 1e3
  length = 1e3
 
  R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
  R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will
 make
  your life INFINITELY 

[R] Is R the right choice for simulating first passage times of random walks?

2011-07-27 Thread Paul Menzel
Dear R folks,


I need to simulate first passage times for iterated partial sums. The
related papers are for example [1][2].

As a start I want to simulate how long a simple random walk stays
negative, which should result that it behaves like n^(-½). My code looks
like this.

 8  code  8 
n = 10 # number of simulations

length = 10 # length of iterated sum
z = rep(0, times=length + 1)

for (i in 1:n) {
x = c(0, sign(rnorm(length)))
s = cumsum(x)
for (i in 1:length) {
if (s[i]  0  s[i + 1] = 0) {
z[i] = z[i] + 1
}
}
}
plot(1:length(z), z/n)
curve(x**(-0.5), add = TRUE)
 8  code  8 

This code already runs for over half an hour on my system¹.

Reading about the for loop [3] it says to try to avoid loops and I
probably should use a matrix where every row is a sample.

Now my first problem is that there is no matrix equivalent for
`cumsum()`. Can I use matrices to avoid the for loop?

My second question is, is R the right choice for such simulations? It
would be great when R can also give me a confidence interval(?) and also
try to fit a curve through the result and give me the rule of
correspondence(?) [4]. Do you have pointers for those?

I glanced at simFrame [5] and read `?simulate` but could not understand
it right away and think that this might be overkill.

Do you have any suggestions?


Thanks,

Paul


¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz


[1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
[2] http://arxiv.org/abs/0911.5456
[3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
[4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
[5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html


signature.asc
Description: This is a digitally signed message part
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-27 Thread Paul Menzel
Dear R folks,


Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel:

 I need to simulate first passage times for iterated partial sums. The
 related papers are for example [1][2].
 
 As a start I want to simulate how long a simple random walk stays
 negative, which should result that it behaves like n^(-½). My code looks
 like this.
 
  8  code  8 
 n = 10 # number of simulations
 
 length = 10 # length of iterated sum
 z = rep(0, times=length + 1)
 
 for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   for (i in 1:length) {
   if (s[i]  0  s[i + 1] = 0) {
   z[i] = z[i] + 1
   }
   }
 }
 plot(1:length(z), z/n)
 curve(x**(-0.5), add = TRUE)
  8  code  8 

Of course the program above is not complete, because it only checks for
the first passage from negativ to positive. `if (s[2]  0) {}` should be
added before the for loop.

 This code already runs for over half an hour on my system¹.
 
 Reading about the for loop [3] it says to try to avoid loops and I
 probably should use a matrix where every row is a sample.
 
 Now my first problem is that there is no matrix equivalent for
 `cumsum()`. Can I use matrices to avoid the for loop?

I mean the inner for loop.

Additionally I wonder if `cumsum` is really faster or if I should sum
the elements by myself and check after every step if the walk gets
non-negative/0. With a length of 100 this should save some cycles.
On the other hand adding numbers should be really fast and adding checks
in between could potentially be slower.

 My second question is, is R the right choice for such simulations? It
 would be great when R can also give me a confidence interval(?) and also
 try to fit a curve through the result and give me the rule of
 correspondence(?) [4]. Do you have pointers for those?
 
 I glanced at simFrame [5] and read `?simulate` but could not understand
 it right away and think that this might be overkill.
 
 Do you have any suggestions?


Thanks,

Paul


 ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz
 
 
 [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
 [2] http://arxiv.org/abs/0911.5456
 [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
 [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
 [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html


signature.asc
Description: This is a digitally signed message part
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-27 Thread Mike Marchywka




Top posting cuz hotmail decided not to highlight...

Personally I would tend to use java or c++ for the inner loops
but you could of course later make an R package out of that.
This is especially true if your code will be used elsewhere
in a performance critical system. For example, I wrote some
c++ code for dealing with graphs nothing fancy but it
let me play with some data structure ideas and I could
then build it into standalone programs or perhaps an R
package. 

Many of these things slow down due to memory incoherence
or IO long before you use up the processor. With c++
in principle anyway you have a lot of control over these things.

Once you have your results and want to anlyze them, that's
when I would use R. Dumping simulation samples to a text file
is easy to also lets you use other things like sed/grep or
vi to explore as needed. 








From: paulepan...@users.sourceforge.net
To: r-help@r-project.org
Date: Thu, 28 Jul 2011 02:00:13 +0200
Subject: Re: [R] Is R the right choice for simulating first passage times of 
random walks?


Dear R folks,


Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel:

 I need to simulate first passage times for iterated partial sums. The
 related papers are for example [1][2].

 As a start I want to simulate how long a simple random walk stays
 negative, which should result that it behaves like n^(-½). My code looks
 like this.

  8  code  8 
 n = 10 # number of simulations

 length = 10 # length of iterated sum
 z = rep(0, times=length + 1)

 for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   for (i in 1:length) {
   if (s[i]  0  s[i + 1] = 0) {
   z[i] = z[i] + 1
   }
   }
 }
 plot(1:length(z), z/n)
 curve(x**(-0.5), add = TRUE)
  8  code  8 

Of course the program above is not complete, because it only checks for
the first passage from negativ to positive. `if (s[2]  0) {}` should be
added before the for loop.

 This code already runs for over half an hour on my system¹.

 Reading about the for loop [3] it says to try to avoid loops and I
 probably should use a matrix where every row is a sample.

 Now my first problem is that there is no matrix equivalent for
 `cumsum()`. Can I use matrices to avoid the for loop?

I mean the inner for loop.

Additionally I wonder if `cumsum` is really faster or if I should sum
the elements by myself and check after every step if the walk gets
non-negative/0. With a length of 100 this should save some cycles.
On the other hand adding numbers should be really fast and adding checks
in between could potentially be slower.

 My second question is, is R the right choice for such simulations? It
 would be great when R can also give me a confidence interval(?) and also
 try to fit a curve through the result and give me the rule of
 correspondence(?) [4]. Do you have pointers for those?

 I glanced at simFrame [5] and read `?simulate` but could not understand
 it right away and think that this might be overkill.

 Do you have any suggestions?


Thanks,

Paul


 ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz


 [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
 [2] http://arxiv.org/abs/0911.5456
 [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
 [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
 [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-27 Thread William Dunlap
You can replace the inner loop
  for (i in 1:length) {
if (s[i]  0  s[i + 1] = 0) {
  z[i] = z[i] + 1
}
  }
with the faster
  z - z + (diff(s=0)==1)
(Using the latter forces you fix up the length of
z to be one less than you declared -- your loop never
touched the last entry in it.)

My code relies on the factor that the logicals FALSE
and TRUE are converted to integer 0 and 1, respectively,
when you do arithmetic on them.

E.g., here is your version written as a function, f1, and
2 equivalent ones, f2 and f3, without the inner loop:
f1 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
{
   z = rep(0, times=length) # orig had length+1 but last entry was never used

   for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   for (i in 1:length) {
   if (s[i]  0  s[i + 1] = 0) {
   z[i] = z[i] + 1
   }
   }
   }
   z
}

f2 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
{
   z = rep(0, times=length)
   l1 - length+1
   for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   z - z + ( s[-l1]0  s[-1]=0 )
   }
   z
}

f3 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
{
   z = rep(0, times=length)

   for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   z - z + (diff(s=0)==1)
   }
   z
}

and here are some timing and correctness tests:
   set.seed(1) ; system.time( z1 - f1(1000, 1e5) )
 user  system elapsed
   278.100.25  271.44
   set.seed(1) ; system.time( z2 - f2(1000, 1e5) )
 user  system elapsed
37.290.84   38.02
   set.seed(1) ; system.time( z3 - f3(1000, 1e5) )
 user  system elapsed
40.110.80   40.17
   identical(z1, z2)  identical(z1, z3)
  [1] TRUE

You might try using sample(c(-1,1), size=length, replace=TRUE)
instead of sign(rnorm(length)).  I don't know if there would
be any speed difference, but it frees you from the worry that
rnorm() might return an exact 0.0.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Paul Menzel
 Sent: Wednesday, July 27, 2011 4:36 PM
 To: r-help@r-project.org
 Subject: [R] Is R the right choice for simulating first passage times of 
 random walks?
 
 Dear R folks,
 
 
 I need to simulate first passage times for iterated partial sums. The
 related papers are for example [1][2].
 
 As a start I want to simulate how long a simple random walk stays
 negative, which should result that it behaves like n^(-½). My code looks
 like this.
 
  8  code  8 
 n = 10 # number of simulations
 
 length = 10 # length of iterated sum
 z = rep(0, times=length + 1)
 
 for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   for (i in 1:length) {
   if (s[i]  0  s[i + 1] = 0) {
   z[i] = z[i] + 1
   }
   }
 }
 plot(1:length(z), z/n)
 curve(x**(-0.5), add = TRUE)
  8  code  8 
 
 This code already runs for over half an hour on my system¹.
 
 Reading about the for loop [3] it says to try to avoid loops and I
 probably should use a matrix where every row is a sample.
 
 Now my first problem is that there is no matrix equivalent for
 `cumsum()`. Can I use matrices to avoid the for loop?
 
 My second question is, is R the right choice for such simulations? It
 would be great when R can also give me a confidence interval(?) and also
 try to fit a curve through the result and give me the rule of
 correspondence(?) [4]. Do you have pointers for those?
 
 I glanced at simFrame [5] and read `?simulate` but could not understand
 it right away and think that this might be overkill.
 
 Do you have any suggestions?
 
 
 Thanks,
 
 Paul
 
 
 ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz
 
 
 [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
 [2] http://arxiv.org/abs/0911.5456
 [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
 [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
 [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-27 Thread R. Michael Weylandt michael.weyla...@gmail.com
Some more skilled folks can help with the curve fitting, but the general
answer is yes -- R will handle this quite ably.

Consider the following code:

--
n = 1e5
length = 1e5

R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
your life INFINITELY better
R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.



There are actually even faster ways to do what you are asking for, but this
introduces you to some useful R architecture, above all the apply function.

To see how long the longest stretch of negatives in each row is, the
following is a little sneaky but works pretty well:

countNegative = apply(R,1,function(x){which.max(table(cumsum(x=0))})

then you can study these random numbers to do whatever with them.

The gist of this code is that it counts how many positive number have been
seen up to each point: for any stretch this doesn't increase, you must be
negative, so this identifies the longest such stretch on each row and
records the length. (It may be off by one so check it on a smaller R matrix.


An empirical confidence interval might be given by this

mu = mean(countNegative)
sig = sd(countNegative)/sqrt(length(countNegative))

So all together:

--
n = 1e3
length = 1e3

R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
your life INFINITELY better
R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
fTemp - function(x) {
return(max(table(cumsum(x=0
}
countNegative = apply(R,1,fTemp)
mu = mean(countNegative)
sig = sd(countNegative)/sqrt(length(countNegative))




This runs pretty fast on my laptop, but you'll need to look into the
memory.limit() function if you want to increase the simulation parameters.
There are much faster ways to handle the simulation as well, but this should
get you off to a nice start with R.

Hope this helps,

Michael


On Wed, Jul 27, 2011 at 7:36 PM, Paul Menzel 
paulepan...@users.sourceforge.net wrote:

 Dear R folks,


 I need to simulate first passage times for iterated partial sums. The
 related papers are for example [1][2].

 As a start I want to simulate how long a simple random walk stays
 negative, which should result that it behaves like n^(-½). My code looks
 like this.

  8  code  8 
 n = 10 # number of simulations

 length = 10 # length of iterated sum
 z = rep(0, times=length + 1)

 for (i in 1:n) {
x = c(0, sign(rnorm(length)))
s = cumsum(x)
for (i in 1:length) {
if (s[i]  0  s[i + 1] = 0) {
z[i] = z[i] + 1
}
}
 }
 plot(1:length(z), z/n)
 curve(x**(-0.5), add = TRUE)
  8  code  8 

 This code already runs for over half an hour on my system¹.

 Reading about the for loop [3] it says to try to avoid loops and I
 probably should use a matrix where every row is a sample.

 Now my first problem is that there is no matrix equivalent for
 `cumsum()`. Can I use matrices to avoid the for loop?

 My second question is, is R the right choice for such simulations? It
 would be great when R can also give me a confidence interval(?) and also
 try to fit a curve through the result and give me the rule of
 correspondence(?) [4]. Do you have pointers for those?

 I glanced at simFrame [5] and read `?simulate` but could not understand
 it right away and think that this might be overkill.

 Do you have any suggestions?


 Thanks,

 Paul


 ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz


 [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
 [2] http://arxiv.org/abs/0911.5456
 [3]
 http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
 [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
 [5]
 http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.