Many thanks Simon and Taylor - your advice works a treat. Steve
On Saturday, February 21, 2015 at 11:37:27 AM UTC, Steve Kay wrote: > > I'm trying to program a simple Monte Carlo experiment that after > generating sample data within each MC rep then runs a GLM within it, > accumulating the coefficients from each rep in a matrix. As the GLM command > needs a dataframe I convert all my matrix held simulated data into this > form and then try and run a for loop (see end of included code below). It > does not like the $i notation. Can anyone please come up with a workaround? > I had a quick look in the glm code to see if I could figure out how to > create a modelmatrix myself and call the glm directly on that - it defeated > me (as I'm not the greatest of programmers). If anyone could offer advice > on that too I'd be grateful - would save the glm command having to convert > the dataframe on each call - which on a large number of MC reps might be > worth while. > > Any help much appreciated (and apologies for all the code comments - my > memories terrible so need to include). > > Best, > > Steve > > using DataFrames, GLM, Distributions > reps=100 # number of MC sims > n=50 # sample size in each sim > X0 = rep(1,n) # the constant > nLevelsX1 = 5 # distinct values in continuous X1 - make it a factor of n > X1=gl(nLevelsX1,int(n/nLevelsX1)) # gl() is in DataFrames > X2 = gl(2,1,n) - 1 # the treatmnt dummy > X = [X0 X1 X2] # design matrix > T=1.0 # the treatment effect > B = [10, 1, T] # Beta vector > mu = float(X*B) # varying mu parameter in lognormal distribution > sigma = rep(1.0,n) # constant sigma parameter in lognormal distribution > > Obs = Array(Float64, reps,n) # holds data for analysis - each row new > dataset > # loop below creates the Y (dependent) data - one row for each MC sim > for i = 1:n > Obs[:,i]= rand(LogNormal(mu[i],sigma[i]),reps) > end > > > # create dataframe where first two cols hold the unvarying X > # and the remaining cols hold each Y sim > df= convert(DataFrame, [X[:,2:3] Obs']) > rename!(df.colindex, [(symbol("x$i")=>symbol("Y$(i-2)")) for i in 3:(n+2)]) > names(df) # x1 x2 Y1 Y2 ...Yreps > > GLMresults = Array(Float64, reps,3) # will hold all coeff > #next loop fails - please help - won't accept $i > for i = 1:reps > GLMresults[i,:] = coef(glm(Y$i ~ x1 + x2,df,Normal(),LogLink())) > end >