Re: [R] Difference betweeen cor.test() and formula everyone says to use

2014-10-16 Thread Joshua Wiley
Hi Jeremy,

I don't know about references, but this around.  See for example:
http://afni.nimh.nih.gov/sscc/gangc/tr.html

the relevant line in cor.test is:

STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2))

You can convert *t*s to *r*s and vice versa.

Best,

Josh



On Fri, Oct 17, 2014 at 10:32 AM, Jeremy Miles 
wrote:

> I'm trying to understand how cor.test() is calculating the p-value of
> a correlation. It gives a p-value based on t, but every text I've ever
> seen gives the calculation based on z.
>
> For example:
> > data(cars)
> > with(cars[1:10, ], cor.test(speed, dist))
>
> Pearson's product-moment correlation
>
> data:  speed and dist
> t = 2.3893, df = 8, p-value = 0.04391
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
>  0.02641348 0.90658582
> sample estimates:
>   cor
> 0.6453079
>
> But when I use the regular formula:
> > r <- cor(cars[1:10, ])[1, 2]
> > r.z <- fisherz(r)
> > se <- se <- 1/sqrt(10 - 3)
> > z <- r.z / se
> > (1 - pnorm(z))*2
> [1] 0.04237039
>
> My p-value is different.  The help file for cor.test doesn't (seem to)
> have any reference to this, and I can see in the source code that it
> is doing something different. I'm just not sure what.
>
> Thanks,
>
> Jeremy
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] evaluate NA to FALSE instead of NA?

2014-10-14 Thread Joshua Wiley
Hi,

Perhaps still not as short as you want, but I normally use which():

p <- c(1:10/100, NA, NaN)
p[which(p <= .05)]
[1] 0.01 0.02 0.03 0.04 0.05

Cheers,

Josh



On Tue, Oct 14, 2014 at 8:51 PM, Rainer M Krug  wrote:

> Hi
>
> I want to evaluate NA and NaN to FALSE (for indexing) so I would like to
> have the result as indicated here:
>
> ,
> | > p <- c(1:10/100, NA, NaN)
> | > p
> |  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10   NA  NaN
> | > p[p<=0.05]
> | [1] 0.01 0.02 0.03 0.04 0.05   NA   NA
> | > p[sapply(p<=0.05, isTRUE)]
> | [1] 0.01 0.02 0.03 0.04 0.05  <<<=== I want this
> `
>
> Is there a way that I can do this more easily then in my example above?
> It works, but it strikes me that there is not a better way of doing
> this - am I missing a command or option?
>
> Thanks,
>
> Rainer
>
> --
> Rainer M. Krug
> email: Rainerkrugsde
> PGP: 0x0F52F982
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA and permutation tests : beware of traps

2014-09-23 Thread Joshua Wiley
Hi Stephane,

This is the well known result of limitted floating point precision (e.g.,
http://www.validlab.com/goldberg/addendum.html).  Using a test of
approximate rather than exact equality shows R yields the correct answer:

nperm <- 1
Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) #
calculates nperm F values
(prob <- (sum(anova1$F[1] <= (Fperm + .Machine$double.eps ^ 0.5)) +
1)/(nperm +1))  # risk calculation

yields

0.10009

I picked .Machine$double.eps ^ 0.5 somewhat arbitrarily as the default for
equality testing from ?all.equal

Cheers,

Josh



On Tue, Sep 23, 2014 at 5:14 PM, Stéphane Adamowicz <
stephane.adamow...@avignon.inra.fr> wrote:

> Recently, I came across a strange and potentially troublesome behaviour of
> the lm and aov functions that ask questions about calculation accuracy. Let
> us consider the 2 following datasets dat1 & dat2 :
>
> > (dat1 <- data.frame(Y=c(1:3, 10+1:3), F=c(rep("A",3), rep("B",3
>Y F
> 1  1 A
> 2  2 A
> 3  3 A
> 4 11 B
> 5 12 B
> 6 13 B
> > (dat2 <- data.frame(Y=c(10+1:3, 1:3), F=c(rep("A",3), rep("B",3
>Y F
> 1 11 A
> 2 12 A
> 3 13 A
> 4  1 B
> 5  2 B
> 6  3 B
>
> They only differ in the order of values that were exchanged between
> samples A and B. Thus the sd is 1 for each sample in either data sets, and
> the absolute mean difference |A-B| is 10 in both datasets.
> Now, let us perform an anova to compare samples A and B in both datasets
> (of course, in such simple case, a bilateral T test would do the job, but
> an anova is nevertheless allowed and should give the same probability than
> Student's test):
>
> > (anova1 <- anova(lm(Y~F, dat1)))
> Analysis of Variance Table
>
> Response: Y
>   Df Sum Sq Mean Sq F valuePr(>F)
> F  1150 150 150 0.0002552 ***
> Residuals  4  4   1
>
> > (anova2 <- anova(lm(Y~F, dat2)))
> Analysis of Variance Table
>
> Response: Y
>   Df Sum Sq Mean Sq F valuePr(>F)
> F  1150 150 150 0.0002552 ***
> Residuals  4  4   1
>
> As expected, both datasets give a same anova table, but this is only
> apparent. Indeed :
>
> > anova1$F[1] == anova2$F[1]
> [1] FALSE
> > anova1$F[1] - anova2$F[1]
> [1] 5.684342e-14
>
> In fact the F values differ slightly, and this holds also for the aov
> function. I checked also (not shown) that both the residual and factorial
> sums of squares differ between dat1 and dat2. Thus, for some undocumented
> reason (at least for the end user), the F values depend on the order of
> data!
> While such tiny differences (e-14 in this example) are devoid of
> consequences on the risk evaluation by Fisher's distribution, they may have
> huge consequences on the risk evaluation by the permutation method. Indeed,
> the shift from continuous to discrete distributions is far from being
> insignificant.
>
> For instance, the following code in R is at the basis of many permutation
> algorithms found in the internet and in teaching because it seems quite
> straightforward (see for example
> http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html
> http://www.usna.edu/Users/math/jager/courses/sm439/labs/Lab7.pdf
>
> http://biostatistics1014.blogspot.fr/2013/04/one-way-anova-permutation-test-and.html
> http://adn.biol.umontreal.ca/~numericalecology/Rcode/):
>
> > nperm <- 1000 # number of permutations
> > Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) #
> calculates nperm F values
> > (prob <- (sum(anova1$F[1] <= Fperm) + 1)/(nperm +1))  # risk calculation
> [1] 0.04695305
>
> Of course, because of the sample function, repeating this code gives
> different prob values, but they remain always close to 5% instead of the
> exact probability of 10%. Indeed, there are only choose(6,3) = 20 possible
> permutations, but because they are symmetric, they give only 10 absolute
> mean differences. Thus, the only exact probabilities are 10%, 20% ... 100%.
> In our example where samples do not overlap, 10% is obviously the right
> answer.
>
> Thus, the use of lm and aov functions in permutation methods does not seem
> a good idea as it results in biases that underestimate the exact risk. In
> the simple case of one-way anova, it is quite simple to remedy this
> problem. As the total Sums of squares and the degrees of freedom do not
> change with permutations, it is easier and much faster to compare the
> residual sums of squares. For instance, the exact probabilities can be
> calculated that way :
>
> > combi <- combn(6, 3, FUN=function(i) {append(dat1$Y[i], dat1$Y[-i])}) #
> all permutations
> > SCEResid <- apply(combi, 2, FUN=function(x) sum(tapply(x, dat1$F,
> function(x) sum((x - mean(x))^2 # all resi SS
> > (prob <- mean(SCEResid <= SCEResid[1])) # risk calculation
> [1] 0.1
>
> 10% is indeed the exact risk.
>
> Finally, my problem is : How can we know if R libraries that use
> randomization procedures are not biased ? In the basic case of one way
> anova, it seem

Re: [R] Simulating from a Weibull distribution

2014-09-02 Thread Joshua Wiley
Hi Lucy,

Try the gamlss.dist package, specifically the rWEI2() function.

Cheers,

Josh


On Wed, Sep 3, 2014 at 11:52 AM, Lucy Leigh 
wrote:

> Hi,
> I wish to simulate some data from a Weibull distribution. The rweibull
> function in R uses the parameterisation
>
> 'with shape parameter a and scale parameter b has density given by f(x) =
> (a/b) (x/b)^(a-1) exp(- (x/b)^a)'.
>
> However, it would be much more useful for me to simulate data using a
> different parameterisation of the
>
> Weibull, with shape a1 and scale b1,  namely f(x) =
> a1*b1*x^(a1-1)exp(-b1*x^a1).
>
> However I'm not sure how to do this, as I have only ever used the random
> generators that come pre-programmed in R.
>
> Is there a package or example someone can point me to that demonstrates
> how to simulate data from any given distribution?
>
> Cheers,
>
> Lucy
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A basic statistics question

2014-08-15 Thread Joshua Wiley
On Wed, Aug 13, 2014 at 7:41 AM, Rolf Turner 
wrote:

> On 13/08/14 07:57, Ron Michael wrote:
>
>> Hi,
>>
>> I would need to get a clarification on a quite fundamental statistics
>> property, hope expeRts here would not mind if I post that here.
>>
>> I leant that variance-covariance matrix of the standardized data is equal
>> to the correlation matrix for the unstandardized data. So I used following
>> data.
>>
>
> 
>
>
>  (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
>>
>>
>>
>> Point is that I am not getting exact CORR matrix. Can somebody point me
>> what I am missing here?
>>
>
> You are using a denominator of "n" in calculating your "covariance" matrix
> for your normalized data.  But these data were normalized using the sd()
> function which (correctly) uses a denominator of n-1 so as to obtain an
> unbiased estimator of the population standard deviation.
>

As a small point n - 1 is not _quite_ an unbiased estimator of the
population SD see Cureton. (1968).
Unbiased Estimation of the Standard Deviation, The American Statistician,
22(1).

To see this in action:

res <- unlist(parLapply(cl, 1:1e7, function(i) sd(rnorm(10, mean = 0, sd =
1
correction <- function(n) {
gamma((n-1)/2) * sqrt((n-1)/2) / gamma(n/2)
}
mean(res)
# 0.972583
mean(res * correction(10))
# 0.216

The calculation for sample variance is an unbiased estimate of the
population variance, but square root is a nonlinear function and the square
root of an unbiased estimator is not itself necessarily unbiased.




>
> If you calculated
>
>
>(t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
>
> then you would get the same result as you get from cor(Data) (to within
> about 1e-15).
>
> cheers,
>
> Rolf Turner
>
> --
> Rolf Turner
> Technical Editor ANZJS
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logical operators and named arguments

2014-08-08 Thread Joshua Wiley
On Sat, Aug 9, 2014 at 9:56 AM, Patrick Burns 
wrote:

> On 07/08/2014 07:21, Joshua Wiley wrote:
>
>> Hi Ryan,
>>
>> It does work, but the *apply family of functions always pass to the first
>> argument, so you can specify e2 = , but not e1 =.  For example:
>>
>>  sapply(1:3, `>`, e2 = 2)
>>>
>> [1] FALSE FALSE  TRUE
>>
>
> That is not true:
>

But it is passed as the first argument, not by name, but positionally.  The
reason it works with your gt() is because R with regular functions is
flexible:

> f <- function(x, y) x > y
> f(1:3, x = 2)
[1]  TRUE FALSE FALSE

but primitives ARE positionally matched

> `>`(1:3, 2)
[1] FALSE FALSE  TRUE
> `>`(1:3, e1 = 2)
[1] FALSE FALSE  TRUE



>
> gt <- function(x, y) x > y
>
> > sapply(1:3, gt, y=2)
> [1] FALSE FALSE  TRUE
> > sapply(1:3, gt, x=2)
> [1]  TRUE FALSE FALSE
>
> Specifying the first argument(s) in an apply
> call is a standard way of getting flexibility.
>
> I'd hazard to guess that the reason the original
> version doesn't work is because `>` is Primitive.
> There's speed at the expense of not behaving quite
> the same as typical functions.
>
> Pat
>
>
>>  From ?sapply
>>>
>>
>>   'lapply' returns a list of the same length as 'X', each element of
>>   which is the result of applying 'FUN' to the corresponding element
>>   of 'X'.
>>
>> so `>` is applied to each element of 1:3
>>
>> `>`(1, ...)
>> `>`(2, ...)
>> `>`(3, ...)
>>
>> and if e2 is specified than that is passed
>>
>> `>`(1, 2)
>> `>`(2, 2)
>> `>`(3, 2)
>>
>> Further, see ?Ops
>>
>> If the members of this group are called as functions, any
>>argument names are removed to ensure that positional matching
>>is always used.
>>
>> and you can see this at work:
>>
>>  `>`(e1 = 1, e2 = 2)
>>>
>> [1] FALSE
>>
>>> `>`(e2 = 1, e1 = 2)
>>>
>> [1] FALSE
>>
>> If you want to the flexibility to specify which argument the elements of X
>> should be *applied to, use a wrapper:
>>
>>  sapply(1:3, function(x) `>`(x, 2))
>>>
>> [1] FALSE FALSE  TRUE
>>
>>> sapply(1:3, function(x) `>`(2, x))
>>>
>> [1]  TRUE FALSE FALSE
>>
>>
>> HTH,
>>
>> Josh
>>
>>
>>
>> On Thu, Aug 7, 2014 at 2:20 PM, Ryan  wrote:
>>
>>  Hi,
>>>
>>> I'm wondering why calling ">" with named arguments doesn't work as
>>> expected:
>>>
>>>  args(">")
>>>>
>>> function (e1, e2)
>>> NULL
>>>
>>>  sapply(c(1,2,3), `>`, e2=0)
>>>>
>>> [1] TRUE TRUE TRUE
>>>
>>>  sapply(c(1,2,3), `>`, e1=0)
>>>>
>>> [1] TRUE TRUE TRUE
>>>
>>> Shouldn't the latter be FALSE?
>>>
>>> Thanks for any help,
>>> Ryan
>>>
>>>
>>> The information in this e-mail is intended only for th...{{dropped:23}}
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> Patrick Burns
> pbu...@pburns.seanet.com
> twitter: @burnsstat @portfolioprobe
> http://www.portfolioprobe.com/blog
> http://www.burns-stat.com
> (home of:
>  'Impatient R'
>  'The R Inferno'
>  'Tao Te Programming')
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logical operators and named arguments

2014-08-06 Thread Joshua Wiley
Hi Ryan,

It does work, but the *apply family of functions always pass to the first
argument, so you can specify e2 = , but not e1 =.  For example:

> sapply(1:3, `>`, e2 = 2)
[1] FALSE FALSE  TRUE

>From ?sapply

 'lapply' returns a list of the same length as 'X', each element of
 which is the result of applying 'FUN' to the corresponding element
 of 'X'.

so `>` is applied to each element of 1:3

`>`(1, ...)
`>`(2, ...)
`>`(3, ...)

and if e2 is specified than that is passed

`>`(1, 2)
`>`(2, 2)
`>`(3, 2)

Further, see ?Ops

   If the members of this group are called as functions, any
  argument names are removed to ensure that positional matching
  is always used.

and you can see this at work:

> `>`(e1 = 1, e2 = 2)
[1] FALSE
> `>`(e2 = 1, e1 = 2)
[1] FALSE

If you want to the flexibility to specify which argument the elements of X
should be *applied to, use a wrapper:

> sapply(1:3, function(x) `>`(x, 2))
[1] FALSE FALSE  TRUE
> sapply(1:3, function(x) `>`(2, x))
[1]  TRUE FALSE FALSE


HTH,

Josh



On Thu, Aug 7, 2014 at 2:20 PM, Ryan  wrote:

> Hi,
>
> I'm wondering why calling ">" with named arguments doesn't work as
> expected:
>
> > args(">")
> function (e1, e2)
> NULL
>
> > sapply(c(1,2,3), `>`, e2=0)
> [1] TRUE TRUE TRUE
>
> > sapply(c(1,2,3), `>`, e1=0)
> [1] TRUE TRUE TRUE
>
> Shouldn't the latter be FALSE?
>
> Thanks for any help,
> Ryan
>
>
> The information in this e-mail is intended only for th...{{dropped:23}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a knitr question

2014-07-30 Thread Joshua Wiley
On Thu, Jul 31, 2014 at 9:47 AM, Duncan Murdoch 
wrote:

> On 30/07/2014, 2:20 PM, Yihui Xie wrote:
> > As a reader, I often want to run the code by myself _while_ I'm
> > reading a particular part of an article/report. I find it convenient
> > to be able to copy the code as I'm reading it, instead of minimizing
> > my current window, opening an R script, and running the part that I'm
> > interested in. Of course, this may not work if the code I copy is not
> > self-contained; your purl() approach certainly has an advantage
> > sometimes.
> >
> > I do not see a whole lot of value in maintaining the same appearance
> > of the R code in the R console and a report. You can teach your
> > students what the prompt characters mean, and I think that is enough.
> > Journal of Statistical Software requires "R> " as the prompt character
> > (which is worse), and your students will probably be confused when
> > reading JSS papers if they have been seeing the default prompts all
> > the time. I see the point of keeping prompts (i.e. I do not completely
> > disagree), but I do not think it is an essential or important thing to
> > do. Personally I prefer reading "vanilla" code, and >/+ may confuse my
> > eyes occasionally, e.g.
> >
> >> z > 5
> >> x +
> > + y
> >
> > (More on prompts:
> > http://yihui.name/en/2013/01/code-pollution-with-command-prompts/)
> >
> > Re Rich: yes, I'm aware of approaches of post-processing the prompts,
> > but this problem would not have existed in the first place if we do
> > not include prompts at all. I'm not sure if it makes much sense to
> > create some mess and clean it afterwards.
> >
>
> So your suggestion is that the R console should not prompt for input?
> Do you know of *any* interactive system which doesn't prompt for input?
>  How would users be able to tell the difference between R waiting for
> input, and R busy on the last calculation?
>
>
I don't think that this is about prompts in interactive R, but when a
document is knit, should the echoed code in the report have prompts or not.



> Duncan Murdoch
>
>
> > Regards,
> > Yihui
> > --
> > Yihui Xie 
> > Web: http://yihui.name
> >
> >
> > On Wed, Jul 30, 2014 at 12:50 PM, Greg Snow <538...@gmail.com> wrote:
> >> My preference when teaching is to have the code and results look the
> >> same as it appears in the R console window, so with the prompts and
> >> without the output commented.  But then I also `purl` my knitr file to
> >> create a script file to give to the students that they can copy and
> >> paste from easily.
> >>
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a package for EFA with multiple groups?

2014-07-28 Thread Joshua Wiley
Hi Elizabeth,

In confirmatory factor analysis with multiple groups, the reason one needs
to estimate the models simultaneously is that, typically, one is interested
in applying constraints (e.g., forcing all or some of the factor loadings
to be equal across groups).  In exploratory factor analysis, constraints
are uncommon (they are somewhat un-exploratory).

I would suggest simply using the psych package and subsetting your data to
the particular group, as in:

efa( data = subset(data, Group == "Group1") )

efa( data = subset(data, Group == "Group2") )

etc.

As you noted, lavaan will allow you to test multiple group CFAs, so if/when
you are ready to see whether the same configural factor structure or any
other level of invariance holds across your groups, you can use it.

Sincerely,

Josh




On Mon, Jul 28, 2014 at 2:46 PM, Elizabeth Barrett-Cheetham <
ebarrettcheet...@gmail.com> wrote:

> Hello R users,
>
> I’m hoping to run an exploratory and confirmatory factor analysis on a
> psychology survey instrument. The data has been collected from
> multiple groups, and it’s likely that the data is hierarchical/has 2nd
> order factors.
>
> It appears that the lavaan package allows me to run a multiple group
> hierarchical confirmatory factor analysis. Yet, I can’t locate a
> package that can run the equivalent exploratory analysis.
>
> Could anyone please direct me to an appropriate package?
>
> Many thanks,
>
> Elizabeth
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Box-cox transformation

2014-07-07 Thread Joshua Wiley
Dear Ravi,

In my previous example, I used the residuals, so:

sum [ (r_i / scaling)^2 ]

If you want to use the deviance from glm, that gives you:

sum [ r_i^2 ]

and since the scaling factor is just a constant for any given lambda,
then the modification would be:

sum [ r_i^2 ] / ( scaling^2 )

and is given in the modified code below (posted back to R-help in case
any else has this question).

Hope this helps,

Josh


##
require(MASS)

myp <- function(y, lambda) (y^lambda-1)/lambda


lambda <- seq(-0.05, 0.45, len = 20)
N <- nrow(quine)
res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames =
list(NULL, c("Lambda", "LL")))

# scaling contant
C <- exp(mean(log(quine$Days+1)))

for(i in seq_along(lambda)) {
  SS <- deviance(glm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine))
  LL <- (- (N/2) * log(SS/((C^lambda[i])^2)))
  res[i, ] <- c(lambda[i], LL)
}

# box cox
boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda)
# add our points on top to verify match
points(res[, 1], res[,2], pch = 16)

##

On Mon, Jul 7, 2014 at 11:57 PM, Ravi Varadhan  wrote:
> Dear Josh,
> Thank you very much.  I knew that the scaling had to be adjusted, but was not 
> sure on how to do this.
>
> Can you please show me how to do this scaling with `glm'? In other words, how 
> would I scale the deviance from glm?
>
> Thanks,
> Ravi
>
> -Original Message-
> From: Joshua Wiley [mailto:jwiley.ps...@gmail.com]
> Sent: Sunday, July 06, 2014 11:34 PM
> To: Ravi Varadhan
> Cc: r-help@r-project.org
> Subject: Re: [R] Box-cox transformation
>
> Hi Ravi,
>
> Deviance is the SS in this case, but you need a normalizing constant adjusted 
> by the lambda to put them on the same scale.  I modified your example below 
> to simplify slightly and use the normalization (see the LL line).
>
> Cheers,
>
> Josh
>
> ##
>
> require(MASS)
>
> myp <- function(y, lambda) (y^lambda-1)/lambda
>
>
> lambda <- seq(-0.05, 0.45, len = 20)
> N <- nrow(quine)
> res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, 
> c("Lambda", "LL")))
>
> # scaling contant
> C <- exp(mean(log(quine$Days+1)))
>
> for(i in seq_along(lambda)) {
>   r <- resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine))
>   LL <- (- (N/2) * log(sum((r/(C^lambda[i]))^2)))
>   res[i, ] <- c(lambda[i], LL)
> }
>
> # box cox
> boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our 
> points on top to verify match points(res[, 1], res[,2], pch = 16)
>
> ##
>
>
>
> On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan  wrote:
>> Hi,
>>
>> I am trying to do Box-Cox transformation, but I am not sure how to do it 
>> correctly.  Here is an example showing what I am trying:
>>
>>
>>
>> # example from MASS
>>
>> require(MASS)
>> boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine,
>>lambda = seq(-0.05, 0.45, len = 20))
>>
>> # Here is My attempt at getting the profile likelihood for the Box-Cox
>> parameter lam <- seq(-0.05, 0.45, len = 20) dev <- rep(NA, length=20)
>>
>> for (i in 1:20) {
>> a <- lam[i]
>> ans <- glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data =
>> quine) dev[i] <- ans$deviance }
>>
>> plot(lam, dev, type="b", xlab="lambda", ylab="deviance")
>>
>> I am trying to create the profile likelihood for the Box-Cox parameter, but 
>> obviously I am not getting it right.  I am not sure that ans$deviance is the 
>> right thing to do.
>>
>> I would appreciate any guidance.
>>
>> Thanks & Best,
>> Ravi
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joshua F. Wiley
> Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior 
> Analyst, Elkhart Group Ltd.
> http://elkhartgroup.com
> Office: 260.673.5518



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Box-cox transformation

2014-07-06 Thread Joshua Wiley
Hi Ravi,

Deviance is the SS in this case, but you need a normalizing constant
adjusted by the lambda to put them on the same scale.  I modified your
example below to simplify slightly and use the normalization (see the
LL line).

Cheers,

Josh

##

require(MASS)

myp <- function(y, lambda) (y^lambda-1)/lambda


lambda <- seq(-0.05, 0.45, len = 20)
N <- nrow(quine)
res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames =
list(NULL, c("Lambda", "LL")))

# scaling contant
C <- exp(mean(log(quine$Days+1)))

for(i in seq_along(lambda)) {
  r <- resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine))
  LL <- (- (N/2) * log(sum((r/(C^lambda[i]))^2)))
  res[i, ] <- c(lambda[i], LL)
}

# box cox
boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda)
# add our points on top to verify match
points(res[, 1], res[,2], pch = 16)

##



On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan  wrote:
> Hi,
>
> I am trying to do Box-Cox transformation, but I am not sure how to do it 
> correctly.  Here is an example showing what I am trying:
>
>
>
> # example from MASS
>
> require(MASS)
> boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine,
>lambda = seq(-0.05, 0.45, len = 20))
>
> # Here is My attempt at getting the profile likelihood for the Box-Cox 
> parameter
> lam <- seq(-0.05, 0.45, len = 20)
> dev <- rep(NA, length=20)
>
> for (i in 1:20) {
> a <- lam[i]
> ans <- glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = quine)
> dev[i] <- ans$deviance
> }
>
> plot(lam, dev, type="b", xlab="lambda", ylab="deviance")
>
> I am trying to create the profile likelihood for the Box-Cox parameter, but 
> obviously I am not getting it right.  I am not sure that ans$deviance is the 
> right thing to do.
>
> I would appreciate any guidance.
>
> Thanks & Best,
> Ravi
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame(1)*1:4 = 1?

2014-04-03 Thread Joshua Wiley
Hi Spencer,

One piece is that a data frame of the same dimensions as went in comes out.
 The second piece is that the vector is recycled.

So in your first example:

data.frame(1) * 1:4

you only end up with the first element:

data.frame(1) * 1

If you try:

data.frame(1) * 4:1

you get a data frame with a value of 4.

Now for:

data.frame(1:2, 3:4) * 5:7

recycling kicks in again, and you get:

1 * 5, 2 * 6, 3 * 7, and 4 * 5

When working with vectors, you get recycling and it expands to the greater
length vector:

1:3 * 1:6

has length 6.  But data frames are sort of a 'higher' class and the
dimensions of the data frame trump the vector.

A slightly different behavior is observed for matrices:

matrix(1:6, ncol=2) * 1:3

Gives recycling as expected to the longer of the vectors, but

matrix(1:6, ncol=2) * 1:9

gives an error, but the error is _not_ directly in the multiplication, as
it were, but rather the results (which because matrices are stored as
vectors has expanded to be the length of the longer vector, here 1:9) do
not match the input dimensions of the matrix.  In particular, this is the
same as trying to do:

x <- 1:9
attributes(x)$dim <- c(3, 2)
Error in attributes(x)$dim <- c(3, 2) :
  dims [product 6] do not match the length of object [9]

basically, R gets the result of 1:6 * 1:9, but then cannot format it back
as a matrix, because the saved dimensions do not fit the new resulting
data.  You can verify that R does indeed to the calculations if you go
under the hood --- the multiplication is done, and then it tries to apply
the dims and it errors out.

Cheers,

Josh






On Wed, Apr 2, 2014 at 11:42 PM, Spencer Graves <
spencer.gra...@structuremonitoring.com> wrote:

> Hello, All:
>
>
>   What's the logic behind "data.frame(1)*1:4" producing a scalar 1?
>  Or the following:
>
>
>  data.frame(1:2, 3:4)*5:7
>   X1.2 X3.4
> 15   21
> 2   12   20
>
>
>   I stumbled over this, because I thought I was multiplying a scalar
> times a vector, and obtaining a scalar rather than the anticipated vector.
>  I learned that my "scalar" was in fact a data.frame with one row and one
> column.
>
>
>   What am I missing?
>
>
>   Thanks,
>   Spencer
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com
260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predicted values

2014-02-01 Thread Joshua Wiley
Dear Felipe,

That is a normal behavior --- The prediction for that simple model
decreases over time, and ends up negative.  If the outcome cannot take
on negative values, treating it as a continuous gaussian may not be
optimal --- perhaps some transformation, like using a log link so that
the expoentiated values are always positive would be better?
Alternately, if the predictions are going negative, not because the
data is over all, but say there is a quick decrease in values in the
first part of time but later on it slows, but if you have an overly
simplisitic time model, it may just keep decreasing.  Using a smoother
with a higher basis dimensions may help more accurately model the
function over the span of time in your dataset and then not have
predicted values.

I do not think that there would be any straight forward 'force' the
model to be positive only.

Best,

Joshua


On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo
 wrote:
> Consider this dummy dataset.
> My real dataset with over 1000 records has
> scatter large and small values.
> I want to predict for values with NA but I
> get negative predictions. Is this a normal
> behaviour or I am missing a gam argument
> to force the model to predict positive values.
> library(mgcv)
> test <- data.frame(iddate=seq(as.Date("2014-01-01"),
> as.Date("2014-01-12"), by="days"),
> value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
> test
> str(test)
> mod <- gam(value ~ s(as.numeric(iddate)),data=test)
> # Predict for values with NA's
> test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
> test
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] useR! 2014 cal for tutorials

2013-12-11 Thread Joshua Wiley
The R User Conference, useR! 2014 is scheduled for July 1-3, 2014 at
the University of California, Los Angeles.  Before the official
program, half-day tutorials will be offered on Monday, June 30.

We invite R users to submit proposals for three hour tutorials on
special topics regarding R. The proposals should include:

1) A brief description (abstract) of the tutorial
2) Goals
3) Detailed outline
4) Justification of why the tutorial is important
5) Background knowledge required and potential attendees

The deadline for tutorial submission is January 5, 2014.

Please email all tutorial proposals to useR-2014_at_R-project.org.

A web page offering more information on the useR! conference is
available at http://www.R-project.org/useR-2014

We look forward to your submissions,

The organizing committee:

   Yunyun Dai, Phil Ender, Jan de Leeuw, David McArthur,
   Amelia McNamara, Sanjog Misra, Katharine Mullen, Jeroen Ooms,
   Szilard Pafka, Tim Triche, Joshua Wiley

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Setting contrasts

2013-12-02 Thread Joshua Wiley
1  0.1558 0.6940313
>   #Condition  351.52  4 10.9500  2.80e-07 ***
>   #Age:Condition 190.30  4  5.9279 0.0002793 ***
>   #Residuals722.30 90
>
> ##Damn! We are still wrong even though the above shows sum contrasts
>
> ## So we do it another way
> options(contrasts = c("contr.sum","contr.poly"))
> getOption("contrasts")
>  #[1] "contr.sum" "contr.poly"
> resutsCar4 <- lm(Recall~Age*Condition, data = Eysenck )
> type4 <- Anova(resultsCar4, type = "III")
>   #print(type4)  # Now we're back where we should be
>   #Anova Table (Type III tests)
> #
>   #Response: Recall
>   #  Sum Sq Df   F value Pr(>F)
>   #(Intercept)   13479.2  1 1679.5361 < 2.2e-16 ***
>   #Age   240.2  1   29.9356 3.981e-07 ***
>   #Condition  1514.9  4   47.1911 < 2.2e-16 ***
>   #Age:Condition  190.3  45.9279 0.0002793 ***
>   #Residuals   722.3 90
> ## Now it works just fine.
> ##So what is the difference between setting contrasts individuall and
> setting
> ##them through the options?
>
> I get similar results if I use drop1, but perhaps that is what Fox did
> also.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How would i sum the number of NA's in multiple vectors

2013-10-17 Thread Joshua Wiley
Or faster (both computational speed and amount of code):

colSums(is.na(rbind(c1, c2, c3)))


On Thu, Oct 17, 2013 at 4:34 AM, Carl Witthoft  wrote:

> mattbju2013 wrote
> > Hi guys this is my first post, i need help summing the number of NA's in
> a
> > few vectors
> >
> > for example..
> >
> > c1<-c(1,2,NA,3,4)
> > c2<-c(NA,1,2,3,4)
> > c3<-c(NA,1,2,3,4)
> >
> > how would i get a result that only sums the number of NA's in the vector?
> > the.result.i.want<-c(2,0,1,0,0)
>
> See ?is.na .
> Now, if I can interpret your question correctly, you're actually looking
> for
> the number of NA per *position* in the vectors, so let's make them into a
> matrix first.
>
> cmat<-rbind(c1,c2,c3)
> then use apply over columns
> apply(cmat,2,function(k)sum(is.na(k)))
>
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/How-would-i-sum-the-number-of-NA-s-in-multiple-vectors-tp4678411p4678432.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trying to compile R in win 7 (with Rtools) ... tcl.h

2013-10-04 Thread Joshua Wiley
Hi Cleber,

When you install Rtools, it asks you the home directory of R, and there it
puts a directory called src and Tcl.  You need to copy those over to
whereever you are making R.

So for example, I have:

C:\usr\R\R-devel\Tcl

Where I tar -xf R devel into C:\usr\R\

and then copy the src and Tcl dirs from R tools over into C:\usr\R\R-devel\

Also don't forget to when you finish make all recommended to

cd bitmap
make all

so you can save graphs as PNG, etc.

Cheers,

Josh



On Fri, Oct 4, 2013 at 7:29 PM, Cleber N.Borges  wrote:

> stop because
> *had a stone in the middle of the way*
> *in the middle of the way had a stone*
> (by vinicius de moraes)
> #
>
> so, one more help? somebody? :-)
>
> thanks...
> cleber
>
> building package 'tcltk'
> making init.d from init.c
> making tcltk.d from tcltk.c
> making tcltk_win.d from tcltk_win.c
> gcc  -I"../../../../include" -DNDEBUG -I "../../../../Tcl"/include -DWin32
> -O3 -Wall  -std=gnu99 -mtune=core2 -c init.c -o init.o
> In file included from init.c:22:0:
> tcltk.h:23:17: fatal error: tcl.h: No such file or directory
> compilation terminated.
> make[4]: *** [init.o] Error 1
> make[3]: *** [mksrc-win2] Error 1
> make[2]: *** [all] Error 2
> make[1]: *** [R] Error 1
> make: *** [all] Error 2
>
>
>
>
> Em 04/10/2013 22:46, Cleber N.Borges escreveu:
>
>>
>> bingo!  :-)
>> I got one pass to advanced!
>>
>> my TMP environment variable is:
>> %SystemRoot%\TEMP
>>
>>
>> thanks
>> cleber
>>
>>
>> Em 04/10/2013 22:02, Joshua Wiley escreveu:
>>
>>> Hi Cleber,
>>>
>>> You need to set TMPDIR to a valid directory, the default /tmp/ does not
>>> work on Windows.
>>>
>>> From the cmd shell:
>>>
>>> set TMPDIR=C:/TMP
>>>
>>> for example
>>>
>>> and then run make all recommended
>>>
>>> Cheers,
>>>
>>> Josh
>>>
>>>
>>>
> ______**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trying to compile R in win 7 (with Rtools)

2013-10-04 Thread Joshua Wiley
not create /tmp/R4428: directory nonexistent
> mv: cannot stat `/tmp/R4428': No such file or directory
> make[3]: *** [mkR1] Error 1
> make[2]: *** [all] Error 2
> make[1]: *** [R] Error 1
> make: *** [all] Error 2
>
> C:\Rsrc\R-3.0.2\src\gnuwin32>
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (sans objet)

2013-10-03 Thread Joshua Wiley
Hi Jesse,

Here is one approach:

score <- function(dat, minimumN) {
  # get the number of columns (variables)
  k <- ncol(dat)
  # take the row means, excluding missing
  mean <- rowMeans(dat, na.rm=TRUE)
  # get the number missing for each row
  nmiss <- rowSums(is.na(dat))

  # if nmiss is greater than threshold, set to missing
  # k = columns, minimum N is
  mean[nmiss > (k - minimumN)] <- NA

  # calculate the sum (reweighted by missing)
  sum <- mean * k

  # put results in a dataframe and return
  data.frame(Mean = mean, Sum = sum, Nmissing = nmiss)
}

# score 4 variables, requiring at least 3
score(mtcars[, c("mpg", "disp", "hp", "wt")], 3)

# or put back into dataset, just the mean

mtcars$ScaleMean <- score(mtcars[, c("mpg", "disp", "hp", "wt")], 3)$Mean

It will be somewhat faster than using apply() because all you need are
rowMeans, which uses more optimized code.

Cheers,

Josh




On Thu, Oct 3, 2013 at 3:42 PM, Jesse Gervais  wrote:

> Hello there,
>
> I try to construct a variable with R, but I have some difficulties.
>
> Assume that I use a data set named = mydata. I want to create a variable
> that is the mean (totmean) or the sum (totsum) of 6 variables (var1, var2,
> var3, var4, var5, var6). However, I want only participants who have
> responded to at least 4 items to be include. Those who have one or two
> missing for var1-var6 should be coded NA for totmean and totsum.
>
> How I do that?
>
> Thank you!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Translating recoding syntax from SPSS to R

2013-09-22 Thread Joshua Wiley
Just a slight addition to Arun's answer: within() saves typing and
makes life a bit easier

#
set.seed(429)
dat1 <- data.frame(
  race=sample(1:3,20,replace=TRUE),
  usborn=sample(0:2,20,replace=TRUE))

# within is a nice way to add variables
# and make modifications within a dataset
# saves typing lots of dat1$

dat1 <- within(dat1, {
 confused <- 1 * ((race==1 | race==2) & usborn==0)
})

head(dat1)

# only confused 0
dat1 <- subset(dat1, confused == 0)
#


On Sat, Sep 21, 2013 at 11:47 PM, arun  wrote:
> Hi,
> Try:
> set.seed(429)
> dat1<- 
> data.frame(race=sample(1:3,20,replace=TRUE),usborn=sample(0:2,20,replace=TRUE))
>  dat1$confused<- 1*((dat1$race==1|dat1$race==2) & dat1$usborn==0)
> head(dat1)
> #  race usborn confused
> #13  20
> #21  01
> #32  10
> #43  20
> #51  20
> #61  10
> A.K.
>
>
>
> - Original Message -
> From: Mosi Ifatunji 
> To: "r-help@r-project.org" 
> Cc:
> Sent: Saturday, September 21, 2013 6:03 PM
> Subject: [R] Translating recoding syntax from SPSS to R
>
> Colleagues,
>
> I am in the process of learning R. I've been able to import my dataset (from 
> Stata) and do some simple coding. I have now come to coding situation that 
> requires some assistance. This is some code in SPSS that I would like to be 
> able to execute in R:
>
> if (race eq 1 and usborn=0) confused=1 .
> if (race eq 2 and usborn=0) confused=1 .
> if (race eq 1 and usborn=1) confused=0 .
> if (race eq 2 and usborn=1) confused=0 .
> if (race eq 3 and usborn=1) confused=0 .
> if (race eq 3 and cohort=1) confused=0 .
> if (race eq 3 and cohort=2) confused=0 .
> variable labels confused "R claims to be both an African American and foriegn 
> born" .
> value labels confused
> 1 "Both AfAm and Foreign"
> 2 "Not" .
> select if (confused eq 0) .
>
> Any assistance would be greatly appreciated.
>
> -- Mosi
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with converting F to FALSE

2013-09-05 Thread Joshua Wiley
Hi,

You can either manually specify colClasses or the asis argument.  See
?read.csv for more details.

If you just had those two columns, something like:

 read.table(header = TRUE, text = "
 sex group
 F 1
 T 2
 ", colClasses = c("character", "integer"))

Cheers,

Josh


read.csv("file.csv", colClasses = c("character", "integer"))




On Thu, Sep 5, 2013 at 5:44 AM, Venkata Kirankumar
 wrote:
> Hi,
> I have a peculier problem in R-Project that is when my CSV file have one
> column with all values as 'F' the R-Project converting this 'F' to FALSE.
> Can some one please suggest how to stop this convertion. Because I want to
> use 'F' in my calculations and show it in screen. for example my data is
> like
>
> sex  group
> F   1
> F   2
> F   3
>
> but when I use read.csv and load the csv file data is converting it to
>
> sex  group
> FALSE   1
> FALSE   2
> FALSE   3
> but i want it as source data like
>
> sex group
> F  1
> F  2
> F  3
>
>
> Thanks in advance,
> D V Kiran Kumar
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read a Google Spreadsheet?

2013-09-04 Thread Joshua Wiley
Hi Spencer,

It really is not very hard, and I have never had issue with it:

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

Just download the x86 and x64 versions for your OS and install.  Worst
case, you need to add the directory to the PATH variable in Windows.

I do this regularly so I can use/test either version of R.

Cheers,

Josh

P.S. Emacs + ESS allows for different versions of R and it is not too
difficult to use the 64 or 32 bit version... M-x
R-version-architecture


On Wed, Sep 4, 2013 at 6:36 PM, Spencer Graves
 wrote:
> On 9/4/2013 6:09 PM, Ista Zahn wrote:
>>
>> Hi Spencer,
>>
>> Why don't you want to install 64bit Java?
>
>
>
>   That may be a reasonable approach.
>
>
>   I may have Java confused with something else, but I remember hearing
> that it was difficult or unwise to try to install both 32- and 64-bit
> versions of something like Java or Java Script on the same Windows operating
> system.  If I need to uninstall 32-bit Java to install 64-bit, who knows
> what else I could break.  I'm a statistician, not an information
> technologist:  If I spend more time playing with Java, I'll have less time
> for other things I want to do.
>
>
>   Thanks for the reply.
>   Spencer
>>
>>
>> On Wed, Sep 4, 2013 at 6:12 PM, Spencer Graves
>>  wrote:
>>>
>>> Hello, All:
>>>
>>>
>>> What do you recommend for reading a Google Spreadsheet into R? I didn't
>>> find
>>> anything useful using library(sos); findFn('google spreadsheet').
>>>
>>>
>>> I can solve the problem by downloading the file either as *.ods or *.xlsx
>>> format, then opening it and saving it as *.xls, then using
>>> read.xls{gdata}.
>>>
>>>
>>> Alternatives I haven't tried use read.xlsx{xlsx} and
>>> readWorksheetFromFile{XLConnect} with 32-bit R. Neither of these work for
>>> me
>>> with 64-bit R, because they can't find an appropriate rJava on my
>>> computer;
>>> see below. (I've been using 64-bit R with Emacs, so switching to 32-bit R
>>> is
>>> not completely trivial.) Similarly, read.gnumeric.sheet{gnumeric}
>>> requires
>>> the "external program, ssconvert", which seems not to be available on my
>>> computer or installed for 64-bit R.
>>>
>>>
>>> What do you suggest? Avoid 64-bit R unless I really need it? That seems
>>> to
>>> be the message I'm getting from this. (The writeFindFn2xls{sos} also
>>> works
>>> in 32-bit R but fails in 64-bit apparently for the same reason.)
>>>
>>>
>>> Thanks,
>>> Spencer
>>>
>>>
>>>> library(xlsx)
>>>
>>> Loading required package: xlsxjars
>>> Loading required package: rJava
>>> Error : .onLoad failed in loadNamespace() for 'rJava', details:
>>> call: fun(libname, pkgname)
>>> error: No CurrentVersion entry in Software/JavaSoft registry! Try
>>> re-installing Java and make sure R and Java have matching architectures.
>>> Error: package ‘rJava’ could not be loaded
>>>>
>>>> library(XLConnect)
>>>
>>> Loading required package: rJava
>>> Error : .onLoad failed in loadNamespace() for 'rJava', details:
>>> call: fun(libname, pkgname)
>>> error: No CurrentVersion entry in Software/JavaSoft registry! Try
>>> re-installing Java and make sure R and Java have matching architectures.
>>> Error: package ‘rJava’ could not be loaded
>>>>
>>>> sessionInfo()
>>>
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple regression (with interactions) by hand

2013-09-04 Thread Joshua Wiley
Hi Christoph,

ginv() computes the Moore-Penrose generalized inverse by way of a
singular value decomposition.  Part of the calculation involves taking
the reciprocal of the non zero values.  In practice, non zero is
really "within some precision tolerance of zero".  Numerical precision
can bite you in scientific computing.

There are many examples where the most conceptually straightforward
approach is not the best approach because whereas the equation may be
easy to write symbolically, it is more vulnerable to rounding or
truncation errors that occur in floating point representations.

Aside from working through some matrix algebra for understanding,
using established code (like lm) for models where the authors will
have taken issues like numerical precision and stability into
consideration is generally safest.

Cheers,

Josh



On Tue, Sep 3, 2013 at 6:22 AM, Christoph Scherber
 wrote:
> Dear all,
>
> But why are there such huge differences betwen solve() and ginv()? (see code 
> below)?
>
> ##
> m1=lm(Ozone~Solar.R*Wind,airquality)
>
> # remove NA´s:
> airquality2=airquality[complete.cases(airquality$Ozone)&
> complete.cases(airquality$Solar.R)&
> complete.cases(airquality$Wind),]
>
> # create the model matrix by hand:
> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
> # is the same as:
> model.matrix(m1)
> # create the response vector by hand:
> Y=airquality2$Ozone
> # is the same as:
> m1$model$Ozone
> # Now solve for the parameter estimates:
>
> solve(crossprod(X)) %*% crossprod(X,Y) #gives the correct answer
>
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y #gives a wrong answer
>
>
>
>
>
> Am 03/09/2013 12:29, schrieb Joshua Wiley:
>> Hi Christoph,
>>
>> Use this matrix expression instead:
>>
>> solve(crossprod(X)) %*% t(X) %*% Y
>>
>> Note that:
>>
>> all.equal(crossprod(X), t(X) %*% X)
>>
>> Cheers,
>>
>> Joshua
>>
>>
>>
>> On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber
>>  wrote:
>>> Dear all,
>>>
>>> I´ve played around with the "airquality" dataset, trying to solve the 
>>> matrix equations of a simple
>>> multiple regression by hand; however, my matrix multiplications don´t lead 
>>> to the estimates returned
>>> by coef(). What have I done wrong here?
>>>
>>> ##
>>> m1=lm(Ozone~Solar.R*Wind,airquality)
>>>
>>> # remove NA´s:
>>> airquality2=airquality[complete.cases(airquality$Ozone)&
>>> complete.cases(airquality$Solar.R)&
>>> complete.cases(airquality$Wind),]
>>>
>>> # create the model matrix by hand:
>>> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
>>> # is the same as:
>>> model.matrix(m1)
>>>
>>> # create the response vector by hand:
>>> Y=airquality2$Ozone
>>> # is the same as:
>>> m1$model$Ozone
>>>
>>> # Now solve for the parameter estimates:
>>> library(MASS)
>>> ginv(t(X)%*%X)%*%t(X)%*%Y
>>>
>>> # is not the same as:
>>> coef(m1)
>>>
>>> ##
>>> Now why is my result (line ginv(...)) not the same as the one returned by 
>>> coef(m1)?
>>>
>>> Thanks very much for your help!
>>>
>>> Best regards,
>>> Christoph
>>>
>>> [using R 3.0.1 on Windows 7 32-Bit]
>>>
>>>
>>>
>>>
>>>
>>> --
>>> PD Dr Christoph Scherber
>>> Georg-August University Goettingen
>>> Department of Crop Science
>>> Agroecology
>>> Grisebachstrasse 6
>>> D-37077 Goettingen
>>> Germany
>>> phone 0049 (0)551 39 8807
>>> fax 0049 (0)551 39 8806
>>> http://www.gwdg.de/~cscherb1
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim evils

2013-09-04 Thread Joshua Wiley
Hi Michael,

You do not need to create a self-contained example from the mass of
code where it is embedded, but given that optim() works in many cases,
to file a bug report, you do need to give _an_ example where it is
failing.

Here is an example where it works great:

> optim(1, fn = function(x) x - 5, method = "CG", lower = 3)
$par
[1] 3

$value
[1] -2

$counts
function gradient
   11

$convergence
[1] 0

$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"

Warning message:
In optim(1, fn = function(x) x - 5, method = "CG", lower = 3) :
  bounds can only be used with method L-BFGS-B (or Brent)

and it gives a warning at the end regarding L-BFGS-B.


On Wed, Sep 4, 2013 at 1:34 AM, Michael Meyer  wrote:
> It would take some effort to extract selfcontained code from the mass of code 
> wherein this optimization is embedded. Moreover I would have to obtain 
> permission from my employer to do so.
>
> This is not efficient.
> However some things are evident from the trace log which I have submitted:
> (a) L-BFGS-B does not identify itself even though it was called overriding 
> the method
> parameter in optim.
> (b) Optim  reports as final converged minimum value a function value that is 
> much larger than
> others computed during the optimization.
>
> I think we can agree on calling this a bug.
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple regression (with interactions) by hand

2013-09-03 Thread Joshua Wiley
Hi Christoph,

Use this matrix expression instead:

solve(crossprod(X)) %*% t(X) %*% Y

Note that:

all.equal(crossprod(X), t(X) %*% X)

Cheers,

Joshua



On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber
 wrote:
> Dear all,
>
> I´ve played around with the "airquality" dataset, trying to solve the matrix 
> equations of a simple
> multiple regression by hand; however, my matrix multiplications don´t lead to 
> the estimates returned
> by coef(). What have I done wrong here?
>
> ##
> m1=lm(Ozone~Solar.R*Wind,airquality)
>
> # remove NA´s:
> airquality2=airquality[complete.cases(airquality$Ozone)&
> complete.cases(airquality$Solar.R)&
> complete.cases(airquality$Wind),]
>
> # create the model matrix by hand:
> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
> # is the same as:
> model.matrix(m1)
>
> # create the response vector by hand:
> Y=airquality2$Ozone
> # is the same as:
> m1$model$Ozone
>
> # Now solve for the parameter estimates:
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y
>
> # is not the same as:
> coef(m1)
>
> ##
> Now why is my result (line ginv(...)) not the same as the one returned by 
> coef(m1)?
>
> Thanks very much for your help!
>
> Best regards,
> Christoph
>
> [using R 3.0.1 on Windows 7 32-Bit]
>
>
>
>
>
> --
> PD Dr Christoph Scherber
> Georg-August University Goettingen
> Department of Crop Science
> Agroecology
> Grisebachstrasse 6
> D-37077 Goettingen
> Germany
> phone 0049 (0)551 39 8807
> fax 0049 (0)551 39 8806
> http://www.gwdg.de/~cscherb1
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble with Slidify and Latex

2013-09-01 Thread Joshua Wiley
Hi Noah,

Looks like mathjax --- \( is the inline escape for mathjax
(http://www.mathjax.org/download/)

As an example, checkout slide 20
(http://elkhartgroup.com/tutorials/psychometrics.html)

Cheers,

Josh


On Sun, Sep 1, 2013 at 1:40 PM, Noah Silverman  wrote:
> Hi,
>
> (Re-submitting as the original doesn't look like it made it to the list.)
>
> Just starting to play around with the awesome Slidify package.
>
> For some reason, I can't get it to render any Latex in the presentation.  
> Have reviews all the docs and think I'm doing things correctly.  Is there 
> something possibly broken with my installation, or am I misunderstanding the 
> markdown syntax?
>
> I have, on a single slide:
>
> Test $A = 1+2$ and some text after
>
> What I see in the Presentation:
>
>
> Test \(A = 1+2\) and some text after
>
>
> Note:  This is literally a "slash" followed by a "parenthesis" in the final 
> HTML slide.
>
>
> Any ideas on what's wrong here?
>
> Thanks.
>
> --
> Noah Silverman, C.Phil
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrapping respecting subject level information

2013-07-04 Thread Joshua Wiley
Hi,

It is not the easiest to follow code, but when I was working at UCLA,
I wrote a page demonstrating a multilevel bootstrap, where I use a two
stage sampler, (re)sampling at each level.  In your case, could be
first draw subjects, then draw observations within subjects.  A strata
only option does not resample all sources of variability, which are:

1) which subjects you get and
2) which observations within those

The page is here: http://www.ats.ucla.edu/stat/r/dae/melogit.htm

As a side note, it demonstrates a mixed effects model in R, although
as I mentioned it is not geared for beginners.

Cheers,

Josh



On Wed, Jul 3, 2013 at 7:19 PM, Sol Lago  wrote:
> Hi there,
>
> This is the first time I use this forum, and I want to say from the start I 
> am not a skilled programmer. So please let me know if the question or code 
> were unclear!
>
> I am trying to bootstrap an interaction (that is my test statistic) using the 
> package "boot". My problem is that for every resample, I would like the 
> randomization to be done within subjects, so that observations from different 
> subjects are not mixed. Here is the code to generate a dataframe similar to 
> mine:
>
> Subject = rep(c("S1","S2","S3","S4"),4)
> Num = rep(c("singular","plural"),8)
> Gram= rep(c("gram","gram","ungram","ungram"),4)
> RT  = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
> data= data.frame(Subject,Num,Gram,RT)
>
> This is the code I used to get the empirical interaction value:
>
> summary(lm(RT ~ Num*Gram, data=data))
>
> As you can see, the interaction between my two factors is -348. I want to get 
> a bootstrap confidence interval for this statistic, which I can generate 
> using the "boot" package:
>
> #Function to create the statistic to be boostrapped
> boot.huber <- function(data, indices) {
> data <- data[indices, ] #select obs. in bootstrap sample
> mod <- lm(RT ~ Num*Gram, data=data)
> coefficients(mod)   #return coefficient vector
> }
>
> #Generate bootstrap estimate
> data.boot <- boot(data, boot.huber, 1999)
>
> #Get confidence interval
> boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets 
> the CI for the interaction
>
> My problem is that I think the resamples should be generated without mixing 
> the individual subjects observations: that is, to generate the new resamples, 
> the observations from subject 1 (S1) should be shuffled within subject 1, not 
> mixing them with the observations from subjects 2, etc... I don't know how 
> "boot" is doing the resampling (I read the documentation but don't understand 
> how the function is doing it)
>
> Does anyone know how I could make sure that the resampling procedure used by 
> "boot" respects the subject level information?
>
> Thanks a lot for your help/advice!
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help

2013-07-04 Thread Joshua Wiley
Well, it implies either 2help or help^2, right?  In this case, it
seems oddly appropriate, as really needing help getting packages is
sort of greater in a "need hierarchy" than coding help, as the latter
is largely useless without the former.

At one point, I seem to recall discussion of whether there was use for
a sort of "beginners" R help and a more "advanced" one (part of the
discussion was that R-devel is kind of advanced), but perhaps there is
implicit sorting, by virtue of subjects, yes, even one as simple as
"Help" as that allows seasoned veterans to distinguish a new asker
from someone who has been through the ropes here many times.

Cheers,

Josh


On Thu, Jul 4, 2013 at 2:59 PM, Rolf Turner  wrote:
>
> Please (for crying out loud!) use a meaningful subject line!!! This
> list is called "r-help"!!!  Using a subject line of "Help" is completely
> vacuous.
>
> cheers,
>
> Rolf Turner
>
>
> On 04/07/13 20:33, Aboagye-sarfo, Patrick wrote:
>>
>> I am trying to download packages and the message I get is as follows
>>
>
> 
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode: how to avoid nested ifelse

2013-06-07 Thread Joshua Wiley
I still argue for na.rm=FALSE, but that is cute, also substantially faster

f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3))

es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)

system.time(replicate(1000, f1(cg, hs, es)))
system.time(replicate(1000, f2(cg, hs, es)))
system.time(replicate(1000, f3(cg, hs, es)))
system.time(replicate(1000, f4(cg, hs, es)))

> system.time(replicate(1000, f1(cg, hs, es)))
   user  system elapsed
  22.730.03   22.76
> system.time(replicate(1000, f2(cg, hs, es)))
   user  system elapsed
   0.920.040.95
> system.time(replicate(1000, f3(cg, hs, es)))
   user  system elapsed
   0.190.020.20
 > system.time(replicate(1000, f4(cg, hs, es)))
   user  system elapsed
   0.950.030.98


R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)




On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz  wrote:
> I would do this to get the highest non-missing level:
>
> x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)
>
> rock chalk...
>
> -nfultz
>
> On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
>> Hi Paul,
>>
>> Unless you have truly offended the data generating oracle*, the
>> pattern: NA, 1, NA, should be a data entry error --- graduating HS
>> implies graduating ES, no?  I would argue fringe cases like that
>> should be corrected in the data, not through coding work arounds.
>> Then you can just do:
>>
>> x <- do.call(paste0, list(es, hs, cg))
>>
>> > table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", 
>> > "es","hs", "cg")))
>> none   es   hs   cg
>>4112
>>
>> Cheers,
>>
>> Josh
>>
>> *Drawn from comments by Judea Pearl one lively session.
>>
>>
>> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson  wrote:
>> > In our Summer Stats Institute, I was asked a question that amounts to
>> > reversing the effect of the contrasts function (reconstruct an ordinal
>> > predictor from a set of binary columns). The best I could think of was to
>> > link together several ifelse functions, and I don't think I want to do this
>> > if the example became any more complicated.
>> >
>> > I'm unable to remember a less error prone method :). But I expect you 
>> > might.
>> >
>> > Here's my working example code
>> >
>> > ## Paul Johnson 
>> > ## 2013-06-07
>> >
>> > ## We need to create an ordinal factor from these indicators
>> > ## completed elementary school
>> > es <- c(0, 0, 1, 0, 1, 0, 1, 1)
>> > ## completed high school
>> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
>> > ## completed college graduate
>> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
>> >
>> > ed <- ifelse(cg == 1, 3,
>> >  ifelse(hs == 1, 2,
>> > ifelse(es == 1, 1, 0)))
>> >
>> > edf <- factor(ed, levels = 0:3,  labels = c("none", "es", "hs", "cg"))
>> > data.frame(es, hs, cg, ed, edf)
>> >
>> > ## Looks OK, but what if there are missings?
>> > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
>> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
>> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
>> > ed <- ifelse(cg == 1, 3,
>> >  ifelse(hs == 1, 2,
>> > ifelse(es == 1, 1, 0)))
>> > cbind(es, hs, cg, ed)
>> >
>> > ## That's bad, ifelse returns NA too frequently.
>> > ## Revise (becoming tedious!)
>> >
>> > ed <- ifelse(!is.na(cg) & cg == 1, 3,
>> >  ifelse(!is.na(hs) & hs == 1, 2,
>> > ifelse(!is.na(es) & es == 1, 1,
>> >ifelse(is.na(es), NA, 0
>> > cbind(es, hs, cg, ed)
>> >
>> >
>> > ## Does the project director want us to worry about
>> > ## logical inconsistencies, such as es = 0 but cg = 1?
>> > ## I hope not.
>> >
>> > Thanks in advance, I hope you are having a nice summer.
>> >
>> > pj
>> >
>> > --
>> > Paul E. Johnson
>> > Professor, Political Science  Assoc. Director
>> 

Re: [R] recode: how to avoid nested ifelse

2013-06-07 Thread Joshua Wiley
Hi Paul,

Unless you have truly offended the data generating oracle*, the
pattern: NA, 1, NA, should be a data entry error --- graduating HS
implies graduating ES, no?  I would argue fringe cases like that
should be corrected in the data, not through coding work arounds.
Then you can just do:

x <- do.call(paste0, list(es, hs, cg))

> table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", 
> "es","hs", "cg")))
none   es   hs   cg
   4112

Cheers,

Josh

*Drawn from comments by Judea Pearl one lively session.


On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson  wrote:
> In our Summer Stats Institute, I was asked a question that amounts to
> reversing the effect of the contrasts function (reconstruct an ordinal
> predictor from a set of binary columns). The best I could think of was to
> link together several ifelse functions, and I don't think I want to do this
> if the example became any more complicated.
>
> I'm unable to remember a less error prone method :). But I expect you might.
>
> Here's my working example code
>
> ## Paul Johnson 
> ## 2013-06-07
>
> ## We need to create an ordinal factor from these indicators
> ## completed elementary school
> es <- c(0, 0, 1, 0, 1, 0, 1, 1)
> ## completed high school
> hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
> ## completed college graduate
> cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
>
> ed <- ifelse(cg == 1, 3,
>  ifelse(hs == 1, 2,
> ifelse(es == 1, 1, 0)))
>
> edf <- factor(ed, levels = 0:3,  labels = c("none", "es", "hs", "cg"))
> data.frame(es, hs, cg, ed, edf)
>
> ## Looks OK, but what if there are missings?
> es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
> hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
> cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
> ed <- ifelse(cg == 1, 3,
>  ifelse(hs == 1, 2,
> ifelse(es == 1, 1, 0)))
> cbind(es, hs, cg, ed)
>
> ## That's bad, ifelse returns NA too frequently.
> ## Revise (becoming tedious!)
>
> ed <- ifelse(!is.na(cg) & cg == 1, 3,
>  ifelse(!is.na(hs) & hs == 1, 2,
> ifelse(!is.na(es) & es == 1, 1,
>ifelse(is.na(es), NA, 0
> cbind(es, hs, cg, ed)
>
>
> ## Does the project director want us to worry about
> ## logical inconsistencies, such as es = 0 but cg = 1?
> ## I hope not.
>
> Thanks in advance, I hope you are having a nice summer.
>
> pj
>
> --
> Paul E. Johnson
> Professor, Political Science  Assoc. Director
> 1541 Lilac Lane, Room 504  Center for Research Methods
> University of Kansas University of Kansas
> http://pj.freefaculty.org   http://quant.ku.edu
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate simple function with pre-defined constants

2013-06-06 Thread Joshua Wiley
On Thu, Jun 6, 2013 at 9:05 AM, William Dunlap  wrote:
>
> I said the force was 'required' in the sense that without it
> the function will fail to do what you want in some situations.

With the Force on your side, functions always do what you want.

>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrapped 1-sided confidence intervals

2013-05-07 Thread Joshua Wiley
Hi Janh,

I do not believe that a "one sided" or "two sided" bootstrap makes any
sense.  It is just a resampling procedure that constructs an empirical
distribution.  If you wish to examine the point where 5% of the
distribution falls in one tail instead of the ends of both tails being 5%,
you could simply look at the 90% CI, which will have 5% below and 5% above.

Cheers,

Josh



On Tue, May 7, 2013 at 8:21 PM, Janh Anni  wrote:

> Hello All,
>
> Does anyone know if there’s a function for computing 1-sided confidence
> intervals for bootstrapped statistics (mean, median, percentiles,
> etc.)?  Thanks
> in advance
>
> Janh
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixed Modeling in lme4

2013-04-30 Thread Joshua Wiley
Hi Indrajit,

In your first SAS code, change to type=un.  cs imposes the (somewhat
dubious) assumption that the variance of both the intercept and slope are
equal.  If you are using lme4, all random effects in a single block (e.g.,
(1 + month | batch) the 1 = intercept and month = random slope) will have
an unstructured (or freely estimated) variance covariance matrix.

Cheers,

Josh



On Mon, Apr 29, 2013 at 11:26 PM, Indrajit Sengupta <
indrajitsg2...@gmail.com> wrote:

> Hi All,
>
> I am trying to shift from running mixed models in SAS using PROC MIXED
> to using lme4 package in R. In trying to match the coefficients of R
> output to that of SAS output, I came across this problem.
>
> The dataset I am using is this one:
>
>
> http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect034.htm
>
> If I run the following code:
>
> proc mixed data=rc method=ML covtest;
> class Batch;
> model Y = Month / s;
> random Int Month / type=cs sub=Batch s;
> run;
>
> The Fixed effect coefficients match with that of R. But the random
> effect does not. Here is the R code:
>
> rc <- read.table('rc.csv', sep = ',', header=T, na.strings=".")
>
> m1 <- lmer(formula = Y ~ Month + (Month|Batch), data = rc, REML = F)
>
> summary(m1)
>
> fixef(m1)
>
> ranef(m1)
>
> But if I change the SAS code as follows:
>
> proc mixed data=rc method=ML covtest;
> class Batch;
> model Y = Month / s;
> random Int / type=cs sub=Batch s;
> run;
>
> and the R code as follows:
>
> m2 <- lmer(formula = Y ~ Month + (1|Batch), data = rc, REML = F)
>
> summary(m2)
>
> fixef(m2)
>
> ranef(m2)
>
> both fixed and random effect coefficients match. I am unable to
> understand this discrepancy. Am I wrongly specifying the model in the
> first case?
>
> It would be helpful if someone can throw some light on this.
>
> Regards,
> Indrajit
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Polynomial Regression and NA coefficients in R

2013-04-27 Thread Joshua Wiley
Hi Lucas,

You may find some of these examples useful (towards the end):

http://elkhartgroup.com/rmodels.php

For example in your case you could be using b splines instead of an 11th
order polynomial, or use thin plate regression splines from the mgcv
package.  I will also humbly suggest that ggplot2 overlaying observed
values with predicted lines is a more elegant way to visualize the data and
the results.

Cheers,

Josh



On Sat, Apr 27, 2013 at 8:48 AM, Lucas Holland wrote:

> Hey all,
>
> I'm performing polynomial regression. I'm simulating x values using
> runif() and y values using a deterministic function of x and rnorm().
>
> When I perform polynomial regression like this:
>
> fit_poly <- lm(y ~ poly(x,11,raw = TRUE))
>
> I get some NA coefficients. I think this is due to the high correlation
> between say x and x^2 if x is distributed uniformly on the unit interval
> (as is the case in my example). However, I'm still able to plot a
> polynomial fit like this:
>
> points(x, predict(fit_poly), type="l", col="green", lwd=2)
>
> What I'm interested in finding out is, how R handles the NA values I get
> for some coefficients (and how that affects the polynomial I see plotted).
>
> Thanks!
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] double exponential regression R

2013-04-21 Thread Joshua Wiley
Hi CR,

Assuming your data are stored in "d", perhaps something like this:

m <- nls(log(proc) ~ k + a*(b^cls), start = list(a = 2, b = .25, k =
0), data = d)
summary(m)
# gives
## Formula: log(proc) ~ k + a * (b^cls)
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)
## a  5.246170.50905  10.306 1.90e-09 ***
## b  0.696400.07718   9.023 1.73e-08 ***
## k  1.930550.42331   4.561  0.00019 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.6967 on 20 degrees of freedom
## Number of iterations to convergence: 8
## Achieved convergence tolerance: 4.268e-06

and you can examine the fit:

plot(fitted(m), log(d$proc))

which looks not too bad (see attached)

Cheers,

Joshua


On Sun, Apr 21, 2013 at 12:34 PM, catalin roibu  wrote:
> Hello all!
> I have a problem with a double exponential equation.
> This are my data's> structure(list(proc = c(1870.52067384719,
> 766.789388745793, 358.701545859122,  237.113777545511, 43.2726259059654,
> 148.985133316262, 92.6242882655781,  88.4521557193262, 56.6404686159112,
> 27.0374477259404, 34.3347291080268,  18.3226992991316, 15.2196612445747,
> 5.31600719692165, 16.7015717397302,  16.3923389973684, 24.2702542054496,
> 21.247247993673, 18.3070717608672,  2.8811892177331, 3.18018869564679,
> 8.74204132937479, 7.11596966047229 ), cls = c(0.25, 0.5, 0.75, 1, 1, 1.5,
> 2, 2.5, 3, 3.5, 4, 4.5,  5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10)),
> .Names = c("proc",  "cls"), row.names = c("0.25", "0.5", "0.75", "1", "11",
> "1.5",  "2", "2.5", "3", "3.5", "4", "4.5", "5", "5.5", "6", "6.5", "7",
>  "7.5", "8", "8.5", "9", "9.5", "10"), class = "data.frame")
> I want to compute a double exponential equation like this:
> proc=a*exp(b*class)+c*exp(d*class)
>
> Thank you very much for your help!
>
> Best regards!
>
> CR
> --
> ---
> Catalin-Constantin ROIBU
> Lecturer PhD, Forestry engineer
> Forestry Faculty of Suceava
> Str. Universitatii no. 13, Suceava, 720229, Romania
> office phone +4 0230 52 29 78, ext. 531
> mobile phone   +4 0745 53 18 01
>        +4 0766 71 76 58
> FAX:+4 0230 52 16 64
> silvic.usv.ro
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com
<>__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pls help to prevent my post from being indexed on google

2013-04-21 Thread Joshua Wiley
ec>0])
> DiffVec<-w5[,p]-w1[,p]FiveMinusOne[p]<-length(DiffVec[DiffVec>0])
> DiffVec<-w10[,p]-w5[,p]TenMinusFive[p]<-length(DiffVec[DiffVec>0])}
> diff<-cbind(FiveMinusOne,TenMinusOne)diff<-cbind(diff, 
> TenMinusFive)sn<-seq(1, length(lamdaseq))f2<-cbind(sn, diff)f2
> ##END
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Full Documentation of R analysis

2013-04-04 Thread Joshua Wiley
Hi Will,

I think that save.image() is the way to go.  Regarding the directory,
its possible to pass arguments to an R script from the command line,
or as long as the R script is in the directory you want the image
saved in, just save it to the current working directory.

Cheers,

Josh


On Thu, Apr 4, 2013 at 7:18 AM,   wrote:
>
>Dear all,
>
>I would like to fully document an analysis in R including script, output 
> and
>workspace data. To do this I currently run under Linux
>
>$ R CMD BATCH myscript.R
>
>This command combines and saves only the script commands and the output
>./myscript.Rout. To save the workspace, I have to explicitly specify the
>path in the R file and apply save.image(...).
>
>How can I save the full workspace with the same naming scheme under the 
> same
>path, e.g. ./myscript.RData similar to ./myscript.Rout?
>
>Note: R CMD BATCH myscript.R --save does not work.
>
>Best regards and thanks in advance,
>
>Will
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAM model with interactions between continuous variables and factors

2013-03-25 Thread Joshua Wiley
Yep that's exactly right! :)

On Mon, Mar 25, 2013 at 6:22 PM, Antonio P. Ramos
 wrote:
> Just to clarify: I should include wealth - the categorical variable - as a
> fixed effects *and* within the smooth using the argument "by". It that
> correct? thanks a bunch
>
>
> On Mon, Mar 25, 2013 at 6:18 PM, Joshua Wiley 
> wrote:
>>
>> Hi Antonio,
>>
>> If wealth is a factor variable, you should include the main effect in
>> the model, as the smooths will be centered.
>>
>> Cheers,
>>
>> Josh
>>
>>
>>
>> On Mon, Mar 25, 2013 at 6:09 PM, Antonio P. Ramos
>>  wrote:
>> > Hi all,
>> >
>> > I am not sure how to handle interactions with categorical predictors in
>> > the
>> > GAM models. For example what is the different between these bellow two
>> > models. Tests are indicating that they are different but their
>> > predictions
>> > are essentially the same.
>> >
>> > Thanks a bunch,
>> >
>> >> gam.1 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
>> > +s(birth_year,by=wealth) +
>> > ++ wealth + sex +
>> > +residence+ maternal_educ + birth_order,
>> > +  ,data=rwanda2,family="binomial")
>> >>
>> >> gam.2 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
>> > +s(birth_year,by=wealth) +
>> > + + sex +
>> > +residence+ maternal_educ + birth_order,
>> > +  ,data=rwanda2,family="binomial")
>> >>
>> >> anova(gam.1,gam.2,test="Chi")
>> > Analysis of Deviance Table
>> >
>> > Model 1: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) +
>> > s(birth_year,
>> > by = wealth) + +wealth + sex + residence + maternal_educ +
>> > birth_order
>> > Model 2: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) +
>> > s(birth_year,
>> > by = wealth) + +sex + residence + maternal_educ + birth_order
>> >   Resid. Df Resid. Dev  Df Deviance  Pr(>Chi)
>> > 1 28986  24175
>> > 2 28989  24196 -3.6952  -21.378 0.0001938 ***
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >> str(rwanda2)
>> > 'data.frame': 29027 obs. of  18 variables:
>> >  $ CASEID: Factor w/ 10718 levels "1  5  2",..: 289
>> > 2243 7475 9982 6689 10137 7426 428 8415 10426 ...
>> >  $ mortality.under.2 : int  0 1 0 0 0 0 0 0 1 0 ...
>> >  $ maternal_age_disct: Factor w/ 3 levels "-25","+35","25-35": 1 1 1 1 1
>> > 1
>> > 3 1 3 1 ...
>> >  $ maternal_age  : int  18 21 21 23 21 22 26 18 27 21 ...
>> >  $ time  : int  3 3 3 3 3 3 3 3 3 3 ...
>> >  $ child_mortality   : num  0.232 0.232 0.232 0.232 0.232 ...
>> >  $ democracy : Factor w/ 1 level "dictatorship": 1 1 1 1 1 1 1 1
>> > 1
>> > 1 ...
>> >  $ wealth: Factor w/ 5 levels "Lowest quintile",..: 2 4 1 4
>> > 5 1
>> > 4 1 4 5 ...
>> >  $ birth_year: int  1970 1970 1970 1970 1970 1970 1970 1970 1970
>> > 1970 ...
>> >  $ residence : Factor w/ 2 levels "Rural","Urban": 1 1 1 1 2 1 1
>> > 1
>> > 1 2 ...
>> >  $ birth_order   : int  1 2 2 5 1 1 3 1 2 2 ...
>> >  $ maternal_educ : Factor w/ 4 levels "Higher","No education",..: 3
>> > 2 2
>> > 3 4 2 3 2 2 2 ...
>> >  $ sex   : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 1 2
>> > 2
>> > 2 2 ...
>> >  $ quinquennium  : Factor w/ 7 levels "00-5's","70-4",..: 2 2 2 2 2
>> > 2 2
>> > 2 2 2 ...
>> >  $ time.1: int  3 3 3 3 3 3 3 3 3 3 ...
>> >  $ new_time  : int  0 0 0 0 0 0 0 0 0 0 ...
>> >  $ maternal_age_c: num  -6.12 -3.12 -3.12 -1.12 -3.12 ...
>> >  $ birth_year_c  : num  -14.8 -14.8 -14.8 -14.8 -14.8 ...
>> >
>> > [[alternative HTML version deleted]]
>> >
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://joshuawiley.com/
>> Senior Analyst - Elkhart Group Ltd.
>> http://elkhartgroup.com
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAM model with interactions between continuous variables and factors

2013-03-25 Thread Joshua Wiley
Hi Antonio,

If wealth is a factor variable, you should include the main effect in
the model, as the smooths will be centered.

Cheers,

Josh



On Mon, Mar 25, 2013 at 6:09 PM, Antonio P. Ramos
 wrote:
> Hi all,
>
> I am not sure how to handle interactions with categorical predictors in the
> GAM models. For example what is the different between these bellow two
> models. Tests are indicating that they are different but their predictions
> are essentially the same.
>
> Thanks a bunch,
>
>> gam.1 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
> +s(birth_year,by=wealth) +
> ++ wealth + sex +
> +residence+ maternal_educ + birth_order,
> +  ,data=rwanda2,family="binomial")
>>
>> gam.2 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
> +s(birth_year,by=wealth) +
> + + sex +
> +residence+ maternal_educ + birth_order,
> +  ,data=rwanda2,family="binomial")
>>
>> anova(gam.1,gam.2,test="Chi")
> Analysis of Deviance Table
>
> Model 1: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) +
> s(birth_year,
> by = wealth) + +wealth + sex + residence + maternal_educ +
> birth_order
> Model 2: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) +
> s(birth_year,
> by = wealth) + +sex + residence + maternal_educ + birth_order
>   Resid. Df Resid. Dev  Df Deviance  Pr(>Chi)
> 1 28986  24175
> 2 28989  24196 -3.6952  -21.378 0.0001938 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> str(rwanda2)
> 'data.frame': 29027 obs. of  18 variables:
>  $ CASEID: Factor w/ 10718 levels "1  5  2",..: 289
> 2243 7475 9982 6689 10137 7426 428 8415 10426 ...
>  $ mortality.under.2 : int  0 1 0 0 0 0 0 0 1 0 ...
>  $ maternal_age_disct: Factor w/ 3 levels "-25","+35","25-35": 1 1 1 1 1 1
> 3 1 3 1 ...
>  $ maternal_age  : int  18 21 21 23 21 22 26 18 27 21 ...
>  $ time  : int  3 3 3 3 3 3 3 3 3 3 ...
>  $ child_mortality   : num  0.232 0.232 0.232 0.232 0.232 ...
>  $ democracy : Factor w/ 1 level "dictatorship": 1 1 1 1 1 1 1 1 1
> 1 ...
>  $ wealth: Factor w/ 5 levels "Lowest quintile",..: 2 4 1 4 5 1
> 4 1 4 5 ...
>  $ birth_year: int  1970 1970 1970 1970 1970 1970 1970 1970 1970
> 1970 ...
>  $ residence : Factor w/ 2 levels "Rural","Urban": 1 1 1 1 2 1 1 1
> 1 2 ...
>  $ birth_order   : int  1 2 2 5 1 1 3 1 2 2 ...
>  $ maternal_educ : Factor w/ 4 levels "Higher","No education",..: 3 2 2
> 3 4 2 3 2 2 2 ...
>  $ sex   : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 1 2 2
> 2 2 ...
>  $ quinquennium  : Factor w/ 7 levels "00-5's","70-4",..: 2 2 2 2 2 2 2
> 2 2 2 ...
>  $ time.1: int  3 3 3 3 3 3 3 3 3 3 ...
>  $ new_time  : int  0 0 0 0 0 0 0 0 0 0 ...
>  $ maternal_age_c: num  -6.12 -3.12 -3.12 -1.12 -3.12 ...
>  $ birth_year_c  : num  -14.8 -14.8 -14.8 -14.8 -14.8 ...
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About name of list elements

2013-03-25 Thread Joshua Wiley
Hi Max,

This is known as fuzzy matching.  When using `$`, if R can uniquely
match the element name based on what is typed, it returns it.  Thus,
in your example, foo uniquely matches foobar, but if you had foobar,
foobox, $foo would not be a unique match.

Cheers,

Josh


On Mon, Mar 25, 2013 at 12:21 PM, Andrew Lin  wrote:
> Hi folks,
>
> I am starter for R. While I tried list as following:
>
>> l <- list()
>> l$foo
> NULL
>> l$foobar <- 1
>> l$foo
> [1] 1
>
> Apparently, foo and foobar are different name for elements in list (actually
> foo does not exist). But why they are sharing same value?
>
> Thanks a lot!
>
> Max
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] identifying and drawing from T distribution

2013-03-15 Thread Joshua Wiley
Hi Ivo,

Try something like this:

rt(1e5, df = 2.6, ncp = (1 - 0) * sqrt(2.6 + 1)/2)

The NCP comes from the mean, N, and SD.  See ?rt

Cheers,

Josh



On Fri, Mar 15, 2013 at 6:58 PM, ivo welch  wrote:
> dear R experts:
>
> fitdistr suggests that a t with a mean of 1, an sd of 2, and 2.6
> degrees of freedom is a good fit for my data.
>
> now I want to draw random samples from this distribution.should I
> draw from a uniform distribution and use the distribution function
> itself for the transform, or is there a better way to do this?   there
> is a non-centrality parameter ncp in rt, but one parameter ncp cannot
> subsume two (m and s), of course.  my first attempt was to draw
> rt(..., df=2.63)*s+m, but this was obviously not it.
>
> advice appreciated.
>
> /iaw
>
> 
> Ivo Welch (ivo.we...@gmail.com)
> http://www.ivo-welch.info/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filling an array

2013-02-23 Thread Joshua Wiley
Hi Jannetta,

Try this:

rep(c(0, v), each = 10)

See ?rep for details

Cheers,

Josh


On Sat, Feb 23, 2013 at 7:56 PM, Jannetta Steyn  wrote:
> Hi All
>
> I'm just wondering if there is a quick way of filling a way with  the
> following.
>
> I want to declare array I with a specific length and then alternatively
> fill it with 10 zeros and 10 specified values:
>
> v<- 14
> I <- c(0,length(t))
>
> But in stead of just filling I with 0 I want 10 zeros and then 10 fourteens
> (i.e. the value in v)
>
> Hope that makes sense.
>
> Regards
> Jannetta
>
> --
>
> ===
> Web site: http://www.jannetta.com
> Email: janne...@henning.org
> ===
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with R CMD check --as-cran

2013-02-22 Thread Joshua Wiley
Hi Tao,

I wonder if you are on Windows?  If so, make sure you have qpdf
installed, and the location is in your path.

Cheers,

Josh

On Fri, Feb 22, 2013 at 3:32 PM, Tao Wang  wrote:
> Hi Everyone!
>
> This is my first time using R-help. I am trying to do R CMD check before 
> uploading my package to CRAN.
>
> R CMD check --as-cran "my package folder".
>
> However, it spits out this warning:
>
> "pdf is needed for checks on size reduction of PDFs"
>
> I searched online but found no clue to solve this problem. Can someone tell 
> me what could be wrong with my package?
>
> Thanks a lot!
>
> Tao
>
>
> 
>
> UT Southwestern Medical Center
> The future of medicine, today.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] foreach loop, stata equivalent

2013-02-18 Thread Joshua Wiley
; >
>> >   reg `var' `y'
>> >
>> > }
>> >
>> > }
>> >
>> >
>> >
>> > It's looping p1-p15, p1-p16, p1-p269, p2-p15, p2-p16,... p2-p269,...
>> > variable pairs.
>> >
>> >
>> >
>> > How can I write something similar in R?
>> >
>> > I 'tried' understanding the package.foreach but can't get it to work.
>> You do not need package foreach, which is intended at a completely different
>> problem.
>>
>> R does not really have the syntactic equivalent of "varlist", but you can
>> easily do something like:
>> for(var in paste0("p", 1:14)) {
>> for(y in paste0("p", 15:269))
>> lm(yourData[[var]] ~ yourData[[y]]) }
>>
>> provided that yourData is the data frame in which the p* variables are
>> stored.
>>
>> There are probably more direct ways of doing the same thing and storing the
>> resulting lm objects in a list, but you did not state what you intend to do
>> with this enormous set of regressions...
>>
>>
>> Regards
>>
>> > Thanks for any help
>> >
>> > Nelissa
>> >
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] improving/speeding up a very large, slow simulation

2013-02-11 Thread Joshua Wiley
n, project.sd.max,
> length.out=length.out) # assume verification sd is the same as the project
> sd to simplify - seems a reasonable simpification
>
> number.verification.plots<- seq(number.verification.plots.min,
> number.verification.plots.max, length.out=length.out)
>
> verification.range <- seq(verification.mean.min, verification.mean.max,
> length.out=length.out)
>
> permut.grid<-expand.grid(number.strata.range, project.n.range,
> project.acreage.range, project.mean, project.sd.range,
> number.verification.plots, verification.range, allowed.deviation) # create
> a matrix with all combinations of the supplied vectors
>
> #assign names to the colums of the grid of combinations
> names.permut<-c("number.strata", "project.n.plots", "project.acreage",
> "project.mean", "project.sd", "number.verification.plots",
> "verification.mean", "allowed.deviation")
>
> names(permut.grid)<-names.permut # done
>
> combinations<-length(permut.grid[,1])
>
> size <-reps*combinations #need to know the size of the master matrix, which
> is the the number of replications * each combination of the supplied factors
>
> # we want a df from which to read all the data into the simulation, and
> record the results
> permut.col<-ncol(permut.grid)
> col.base<-ncol(permut.grid)+2
> base <- matrix(nrow=size, ncol=col.base)
> base <-data.frame(base)
>
> # supply the names
> names.base<-c("number.strata", "project.n.plots", "project.acreage",
> "project.mean", "project.sd", "number.verification.plots",
> "verification.mean", "allowed.deviation","success.fail", "plots.to.success")
>
> names(base)<-names.base
>
> # need to create index vectors for the base, master df
> ends <- seq(reps+1, size+1, by=reps)
> begins <- ends-reps
> index <- cbind(begins, ends-1)
> #done
>
> # next, need to assign the first 6 columns and number of rows = to the
> number of reps in the simulation to be the given row in the permut.grid
> matrix
>
> pb <- winProgressBar(title="Create base, big loop 1 of 2", label="0% done",
> min=0, max=100, initial=0)
>
> for (i in 1:combinations) {
>
> base[index[i,1]:index[i,2],1:permut.col] <- permut.grid[i,]
> #progress bar
>  info <- sprintf("%d%% done", round((i/combinations)*100))
> setWinProgressBar(pb, (i/combinations)*100, label=info)
> }
>
> close(pb)
>
> # now, simply feed the values replicated the number of times we want to run
> the simulation into the sequential.unpaired function, and assign the values
> to the appropriate columns
>
> out.index1<-ncol(permut.grid)+1
> out.index2<-ncol(permut.grid)+2
>
> #progress bar
> pb <- winProgressBar(title="fill values, big loop 2 of 2", label="0% done",
> min=0, max=100, initial=0)
>
> for (i in 1:size){
>
> scalar.base <- base[i,]
>  verification.plots <- rnorm(scalar.base$number.verification.plots,
> scalar.base$verification.mean, scalar.base$project.sd)
>  result<- sequential.unpaired(scalar.base$number.strata,
> scalar.base$project.n.plots, scalar.base$project.mean, scalar.base$
> project.sd, verification.plots, scalar.base$allowed.deviation,
> scalar.base$project.acreage, min.plots='default', alpha)
>
> base[i,out.index1] <- result[[6]][1]
> base[i,out.index2] <- result[[7]][1]
>  info <- sprintf("%d%% done", round((i/size)*100))
> setWinProgressBar(pb, (i/size)*100, label=info)
> }
>
> close(pb)
> #return the results
> return(base)
>
> }
>
> # I would like reps to = 1000, but that takes a really long time right now
> test5 <- simulation.unpaired(reps=5, project.acreage.range = c(99,
> 110,510,5100,11000), project.mean=100, project.n.min=10, project.n.max=100,
> project.sd.min=.01,  project.sd.max=.2, verification.mean.min=100,
> verification.mean.max=130, number.verification.plots.min=10,
> number.verification.plots.max=50, length.out = 5)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Migrating R packages from svn/R-Forge to git/Github

2013-02-10 Thread Joshua Wiley
Hi Michael,

Just to add to what Yihui suggested, I have used Cygwin on all my
Windows machines to create a Linux environment for years.  It is
stable and allows for many features.  You can easily get git, svn, and
ruby --- I'm not sure about rubygems.  Anyway, with a bit of work, I
suspect you could use Cygwin to get an environment on Windows that
would allow you to run svn2git to handle the conversions.  If you're
willing to have a bit of a mess or drop some history, git svn will
work nicely too.

http://www.cygwin.com/

Cheers,

Josh


On Sun, Feb 10, 2013 at 10:22 AM, Yihui Xie  wrote:
> In the past Github allows one to import from an existing SVN
> repository automatically via its web interface, but I just checked it
> and it seems to have gone. What is left is:
> https://help.github.com/articles/importing-from-subversion Perhaps it
> is best for you to do the conversion under Linux and then work under
> Windows.
>
> Regards,
> Yihui
> --
> Yihui Xie 
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
>
>
> On Sun, Feb 10, 2013 at 12:02 PM, Michael Friendly  wrote:
>> [I'm not sure if this post should go to R-devel, but thought I'd start here
>> for wider readership.]
>>
>> I have a  collection of R packages that I maintain using the eclipse/StatET
>> IDE
>> (mainly on Windows) with SVN on the R-Forge system,
>> https://r-forge.r-project.org/users/friendly/
>>
>> I'd like to consider moving these to git projects on GitHub, but I'm not
>> sure how best to get started.  I'm not sure if can can clone/copy
>> a project from the R-Forge svn directly to github, or if I have to first
>> create a local git repo from the R-Forge svn and then push to github
>> from there.
>>
>> However, there are linux tools for this (git-svn and svn2git), but I haven't
>> found equivalents
>> for Windows.  I suppose I could do this initial part on my linux server
>> where I now use git and
>> github for other projects.
>>
>> I'm also contemplating moving from eclipse/StatET to RStudio for
>> development, but that would be
>> down the road a bit.
>>
>> Has anyone made this transition or have suggestions for how to ease it?
>> What are the
>> gotchas to watch out for?
>>
>> -Michael
>>
>>
>>
>> -- Michael Friendly
>> Email: friendly AT yorku DOT ca
>> Professor, Psychology Dept. & Chair, Quantitative Methods York University
>> Voice: 416 736-2100 x66249
>> Web: http://www.datavis.ca
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] decimal places in R2HTML

2013-01-26 Thread Joshua Wiley
 Realized I did not reply to this list.

On Sat, Jan 26, 2013 at 7:54 PM, Joshua Wiley  wrote:
>  Hi Erin,
>
> Most packages creating output for inclusion in pages, reports, books,
> etc. do some rounding as R's default level of printed precision tends
> to be overkill.  R2HTML is no different, but you can control this.  To
> see what it currently is:
>
> getOption("R2HTML.format.digits")
>
> and you can set it via:
>
> options("R2HTML.format.digits" = 4)
>
>
> after which, I get:
>
>> HTML(confint(aov(Sepal.Length ~ Species, iris)), file=stdout())
>
> 
>  class=captiondataframe>
> 
>  
>2.5 %97.5 % 
>
> (Intercept)
> 4.8621
> 5.1499
>
> Speciesversicolor
> 0.7265
> 1.1335
>
> Speciesvirginica
> 1.3785
> 1.7855
>
>  
> 
>  
>  
>
> Cheers,
>
> Josh
>
>
> On Sat, Jan 26, 2013 at 7:36 PM, Erin Hodgess  wrote:
>> Dear R People:
>>
>> I have an AOV model that I get confidence intervals from via
>>
>>> confint(chick1.aov1)
>> 2.5 %   97.5 %
>> trtA 1.472085 1.607915
>> trtB 1.512085 1.647915
>> trtC 1.328751 1.464582
>>>
>>
>> I am using R2HTML to produce HTML output.  However, the HTML code
>> itself just has rounded values, i.e., 1.5 and 1.6.
>>
>> Has anyone run across this, please?
>> Any suggestions would be much appreciated.
>> Sincerely,
>> Erin
>>
>>
>> --
>> Erin Hodgess
>> Associate Professor
>> Department of Computer and Mathematical Sciences
>> University of Houston - Downtown
>> mailto: erinm.hodg...@gmail.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SEM validation: Cross-Validation vs. Bootstrapping

2012-11-01 Thread Joshua Wiley
r life may become quite
painful.  You will then need to combine the results from multiply
imputed datasets as well as from the bootstrap.  I know of no canned
tools for this so you'll be doing a lot on your own.  I would suggest
instead that you use maximum likelihood estimation where the
likelihood function allows missingness (sometimes called full
information maximum likelihood).  Both the lavaan and OpenMx package
can handle this.  If you have additional variables you think may
predict missingness, you should condition on those by adding them into
the model as auxillary variables.  Complicates the assessment of fit
somewhat as they contribute but you do not care about their
contribution so you have to parcel it out, but it is a good way to
deal with missingness.  Craig Enders has a relatively nice book on
missing data approaches that includes SEM.

If you stay in Mplus, it can natively bootstrap most things, if it
cannot I have written R functions to help it out.  I have been writing
a mediation tutorial that is available here:
http://joshuawiley.com/statistics/medanal.html

You can find the source code for that page here:
https://github.com/jwiley as well as the source for the semutils
package which contains many of the interface functions to link R to
Mplus.  I should point out that semutils is not on CRAN and will not
be, as I have teamed up with Michael Hallquist who wrote the
MplusAutomation package and will be folding the additional
functionality of semutils into MplusAutomation.  However, that may
take some time, so in the meantime, source is available from github.

Cheers,

Josh


>
> Thanks,
>
> Paul
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change lm log(x) to glm poisson

2012-10-28 Thread Joshua Wiley
Hi Elaine,

If you want identical models, you need to use the same family and then
the formula is the same.  Here is an example with a built in dataset:


## these two are identical
> coef(lm(mpg ~ hp + log(wt), data = mtcars))
 (Intercept)   hp  log(wt)
 38.86095585  -0.02808968 -13.06001270
> coef(glm(mpg ~ hp + log(wt), data = mtcars, family = gaussian))
 (Intercept)   hp  log(wt)
 38.86095585  -0.02808968 -13.06001270

## not identical
> coef(glm(mpg ~ hp + wt, data = mtcars, family = gaussian(link = "log")))
(Intercept)  hp  wt
 3.88335638 -0.00173717 -0.20851238

I show the log link because the poisson family default to a log link,
but that is equivalent to:
log(E(y)) = Xb

where X is your design matrix (intercept, A, B, log(C), log(D) for
you).  In short the link function operates on the outcome, not the
predictors so even though the poisson family includes a log link, it
will not yield the same results as a log transformation of two of your
predictors.

I do not have any online references off the top of my head, but it
seems like you may be well served by reading some about generalized
linear models and the concept of link functions.

Cheers,

Josh


On Sun, Oct 28, 2012 at 8:01 PM, Elaine Kuo  wrote:
>
> Hello list,
>
> I am running a regression using
>
> lm(Y~A+B+log(C)+log(D))
>
>
> Now, I would like to test if glm can produce similar results.
> So the code was revised as
>
> glm(Y~A+B+C+D, family=poisson) (code 1)
>
>
> However, I found some example using glm for lm.
> It suggests that the code should be revised like
> glm(Y~A+B+log(C)+log(D), family=poisson) (code 2)
>
> Please kindly advise which  code is correct.
> Thank you.
>
> Elaine
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hausman test in R

2012-10-28 Thread Joshua Wiley
Hi,

I can think of no reason a Hausman test could not be used for OLS---it is a
comparison of vectors of coefficients from different models usually assumed
to produce similar estimates under certain conditions.  Dissimilarity is
taken as indicative of a lack of some or all the conditions required for
the two models to yield similar parameters.
I suggest you look at the plm and systemfit packages.  They have many
functions for OLS, 2SLS, tests of endogeneity, etc.  The plm (and maybe
systemfit?) package also has a vignette which is a good thing to read.  It
has a lot of useful information on the code and examples of comparing
different types of models, that you may find instructive.

Hope this helps,

Josh


On Sun, Oct 28, 2012 at 1:33 PM, fxen3k  wrote:

> Hi there,
>
> I am really new to statistics in R and statistics itself as well.
> My situation: I ran a lot of OLS regressions with different independent
> variables. (using the lm() function).
> After having done that, I know there is endogeneity due to omitted
> variables. (or perhaps due to any other reasons).
> And here comes the Hausman test. I know this test is used to identify
> endogeneity.
> But what I am not sure about is: "Can I use the Hausman test in a simple
> OLS
> regression or is it only possible in a 2SLS regression model?" "And if it
> is
> possible to use it, how can I do it?"
>
> Info about the data:
>
> data = lots of data :)
>
> x1 <- data$x1
> x2 <- data$x2
> x3 <- data$x3
> x4 <- data$x4
> y1 <- data$y1
>
> reg1 <- summary(lm(y1 ~ x1 + x2 + x3 + x4))
>
> Thanks in advance for any support!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Poisson Regression: questions about tests of assumptions

2012-10-14 Thread Joshua Wiley
Hi Eiko,

This is not a direct response to your question, but I thought you
might find these pages helpful:

On negative binomial:

http://www.ats.ucla.edu/stat/R/dae/nbreg.htm

and zero inflated poisson:

http://www.ats.ucla.edu/stat/R/dae/zipoisson.htm

In general this page lists a variety of different models with examples:

http://www.ats.ucla.edu/stat/dae/

I wrote the code for most of those pages. Some of your questions seema
 little more specific than would be addressed on those general
introductory pages, but if there are particular things you would like
to see done, I am open to hearing about it.

One thing I do want to mention is that for publication or presentation
purposes, graphs of predicted counts can be quite nice for readers---I
tried to show how to do that in all of those pages, and you can even
use bootstrapping (which I gave examples for at least in the zero
inflated models) to get robust confidence intervals.

Cheers,

Josh


On Sun, Oct 14, 2012 at 3:14 PM, Eiko Fried  wrote:
> Thank you for the detailed answer, that was really helpful.
> I did some excessive reading and calculating in the last hours since your
> reply, and have a few (hopefully much more informed) follow up questions.
>
>
> 1) In the vignette("countreg", package = "pscl"), LLH, AIC and BIC values
> are listed for the models Negative-binomial (NB), Zero-Inflated (ZI), ZI
> NB, Hurdle NB, and Poisson (Standard). And although I found a way to
> determine LLH and DF for all models, BIC & AIC values are not displayed by
> default, neither using the code given in the vignette. How do I calculate
> these? (AIC is given as per default only in 2 models, BIC in none).
>
>
> 2) For the zero-inflated models, the first block of count model
> coefficients is only in the output in order to compare the changes,
> correct? That is, in my results in a paper I would only report the second
> block of (zero-inflation) model coefficients? Or do I misunderstand
> something? I am confused because in their large table to compare
> coefficients, they primarily compare the first block of coefficients (p. 18)
> R> fm <- list("ML-Pois" = fm_pois, "Quasi-Pois" = fm_qpois, "NB" = fm_nbin,
> + "Hurdle-NB" = fm_hurdle, "ZINB" = fm_zinb)
> R> sapply(fm, function(x) coef(x)[1:8])
>
>
> 3)
>
>> There are various formal tests for this, e.g., dispersiontest() in package
>> "AER".
>>
>
> I have to run 9 models - I am testing the influence of several predictors
> on different individual symptoms of a mental disorder, as "counted" in the
> last week (0=never in the last week, to 3 = on all day within the last
> week).
> So I'm regressing the same predictors onto 9 different symptoms in 9
> models.
>
> Dispersiontest() for these 9 models result in 3-4 overdispersed models
> (depending if testing one- or two-sided on p=.05 level), 2 underdispersed
> models, and 4 non-significant models. The by far largest dispersion value
> is 2.1 in a model is not overdispersed according to the test, but that's
> the symptom with 80% zeros, maybe that plays a role here.
>
> Does BN also make sense in underdispersed models?
>
> However, overdispersion can already matter before this is detected by a
>> significance test. Hence, if in doubt, I would simply use an NB model and
>> you're on the safe side. And if the NB's estimated theta parameter turns
>> out to be extremely large (say beyond 20 or 30), then you can still switch
>> back to Poisson if you want.
>
>
> Out of the 9 models, the 3 overdispersed NB models "worked" with theta
> estimates between 0.4 and 8.6. The results look just fine, so I guess NB is
> appropriate here.
>
> 4 other models (including the 2 underdispersed ones) gave warnings (theta
> iteration / alternation limit reached), and although the other values look
> ok (estimates, LLH, AIC), theta estimates are between 1.000 and 11.000.
> Could you recommend me what to do here?
>
> Thanks
> T
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] practical to loop over 2million rows?

2012-10-10 Thread Joshua Wiley
Hi Jay,

A few comments.

1) As you know, vectorize when possible.  Even if you must have a
loop, perhaps you can avoid nested loops or at least speed each
iteration.
2) Write your loop in a function and then byte compile it using the
cmpfun() function from the compiler package.  This can help
dramatically (though still not to the extent of vectorization).
3) If you really need to speed up some aspect and are stuck with a
loop, checkout the R + Rcpp + inline + C++ tool chain, which allows
you to write inline C++ code, compile it fairly easily, and move data
to and from it.

Here is an example of a question I answered on SO where the OP had an
algorithm to implement in R and I ran through with the R implemention,
the compiled R implementation, and one using Rcpp and compare timings.
 It should give you a bit of a sense for what you are dealing with at
least.

You are correct that some things can help speed in R loops, such as
preallocation, and also depending what you are doing, some classes are
faster than others.  If you are working with a vector of integers,
don't store them as doubles in a data frame (that is a silly extreme,
but hopefully you get the point).

Good luck,

Josh

On Wed, Oct 10, 2012 at 1:31 PM, Jay Rice  wrote:
> New to R and having issues with loops. I am aware that I should use
> vectorization whenever possible and use the apply functions, however,
> sometimes a loop seems necessary.
>
> I have a data set of 2 million rows and have tried run a couple of loops of
> varying complexity to test efficiency. If I do a very simple loop such as
> add every item in a column I get an answer quickly.
>
> If I use a nested ifelse statement in a loop it takes me 13 minutes to get
> an answer on just 50,000 rows. I am aware of a few methods to speed up
> loops. Preallocating memory space and compute as much outside of the loop
> as possible (or use create functions and just loop over the function) but
> it seems that even with these speed ups I might have too much data to run
> loops.  Here is the loop I ran that took 13 minutes. I realize I can
> accomplish the same goal using vectorization (and in fact did so).
>
> y<-numeric(length(x))
>
> for(i in 1:length(x))
>
> ifelse(!is.na(x[i]), y[i]<-x[i],
>
> ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1]))
>
> Presumably, complicated loops would be more intensive than the nested if
> statement above. If I write more efficient loops time will come down but I
> wonder if I will ever be able to write efficient enough code to perform a
> complicated loop over 2 million rows in a reasonable time.
>
> Is it useless for me to try to do any complicated loops on 2 million rows,
> or if I get much better at programming in R will it be manageable even for
> complicated situations?
>
>
> Jay
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any R function for data normalization?

2012-10-02 Thread Joshua Wiley
Hi Rui,

It doesn't really need one...

doit <- function(x) {(x - min(x, na.rm=TRUE))/(max(x,na.rm=TRUE) -
min(x, na.rm=TRUE))}

# use lapply to apply doit() to every column in a data frame
# mtcars is built into R
normed <- as.data.frame(lapply(mtcars, doit))
# very that the range of all is [0, 1]
lapply(normed, range)

Cheers,

Josh

On Tue, Oct 2, 2012 at 2:51 AM, Rui Esteves  wrote:
> Hello,
>
> I have a matrix with values, with columns c1..cn.
> I need the values to be normalized between 0 and 1 by column.
> Therefor, the 0 should correspond to the minimum value in the column c1 and
> 1 should correspond to the maximum value in the column c1.
> The remaining columns should be organized in the same way.
>
> Does a function in R exists for this purpose?
>
> Thanks,
> Rui
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior with weights in binomial glm()

2012-09-30 Thread Joshua Wiley
the same coefficients (maybe
>>> this is my incorrect assumption, but I can't see why it would be).
>>> Anyways, here's a minimum working example below:
>>>
>>>> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) )
>>>> glm( RESP ~ INDEP, family="binomial", data=d )
>>>
>>> Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = d)
>>>
>>> Coefficients:
>>> (Intercept)INDEP
>>> -1.609   21.176
>>>
>>> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
>>> Null Deviance:  13.86
>>> Residual Deviance: 5.407AIC: 9.407
>>>> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length )
>>>> colnames(dAgg) = c("RESP","INDEP","WT")
>>>> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT )
>>>
>>> Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg,
>>>weights = WT)
>>>
>>> Coefficients:
>>> (Intercept)INDEP
>>> -1.609   20.975
>>>
>>> Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
>>> Null Deviance:  13.86
>>> Residual Deviance: 5.407AIC: 9.407
>>
>> Those two results look very similar and it is with a data situation that 
>> seems somewhat extreme. The concern is for the 1% numerical  difference in 
>> the regression coefficient? Am I reading you correctly?
>>
>> --
>> David Winsemius, MD
>> Alameda, CA, USA
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bonferroni correction for multiple correlation tests

2012-08-29 Thread Joshua Wiley
On Wed, Aug 29, 2012 at 6:48 PM, R. Michael Weylandt
 wrote:
> On Wed, Aug 29, 2012 at 6:23 PM, Louise Cowpertwait
>  wrote:
>> Please can someone advise me how I can adjust correlations using 
>> bonferroni's correction? I am doing manny correlation tests as part of an 
>> investigation of the validity/reliability of a psychometric measure.
>> Help would be so appreciated!
>> Cheers,
>> Louise
>>
>
> The observed correlation is an immutable property of the observed data
> and the Bonferroni correction does not change it. Rather, it should be
> applied to the p-values of the observed correlations, much as it would
> be for any test. Those more statistically savy than I might jump in,
> but I don't see why the p-values of, e.g., cor.test() would be
> adjusted in a different way than those of t.test().

I am happy to be corrected, but under specific situations, I can see
an alternative correction method being appropriate.  For p variables,
the p x p correlation matrix has p * (p - 1) / 2 unique correlations,
however, once you know about some of the correlations, you actually
have some information about the other correlations.

Imagine the situation where p = 3 and cor(p1, p2) = .9, cor(p2, p3) =
0.  Is cor(p1, p3) free to be any possible correlation?  The answer of
course is no.  I am not sure what the exact rule would be, but this
would hold and increase for larger matrices.

> Consider a similar case for a set of t-tests: you see some data and do
> the tests based on the sample means. It doesn't make any sense to
> "adjust the mean" of your data, rather you might wish to adjust your
> _interpretation_ of calculated p-values to account for multiple
> comparisons.
>
> Cheers,
> Michael
>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] define subset argument for function lm as variable?

2012-08-29 Thread Joshua Wiley
On Wed, Aug 29, 2012 at 3:56 AM, Milan Bouchet-Valat  wrote:
> Le mardi 21 août 2012 à 07:51 -0700, Joshua Wiley a écrit :
>> Hi Rainer,
>>
>> You could try:
>>
>> subs <- expression(dead==FALSE & recTreat==FALSE)
>>
>> lme(formula, subset = eval(subs))
>>
>> Not tested, but something along those lines should work.
> Out of curiosity, why isn't "subset" (and "weights", which is very
> similar in that regard) evaluated in the "data" environment, just like
> the formula? Is this for historical reasons, or are there drawbacks to
> such a feature?

I am not sure about weights offhand, but subset is evaluated in the
data environmentthat is why that solution works.  The original
question was how to setup the expression as an object that was passed
to subset.  The trick is to avoid having the logical expression
evaluated when the object is created, which I avoided by using
expression, and then in lme() forcing the evaluation of the object.

>
> It seems very common to pass a data frame via the "data" argument, and
> use variables from it for subsetting and/or weighting.
>
>
> Regards
>
>
>> Josh
>>
>> On Tue, Aug 21, 2012 at 7:44 AM, Rainer M Krug  wrote:
>> > Hi
>> >
>> > I want to do a series of linear models, and would like to define the input
>> > arguments for lm() as variables. I managed easily to define the formula
>> > arguments in a variable, but I also would like to have the "subset" in a
>> > variable. My reasoning is, that I have the subset in the results object.
>> >
>> > So I wiould like to add a line like:
>> >
>> > subs <- dead==FALSE & recTreat==FALSE
>> >
>> > which obviously does not work as the expression is evaluated immediately. 
>> > Is
>> > is it possible to do what I want to do here, or do I have to go back to use
>> >
>> > dat <- subset(dat, dead==FALSE & recTreat==FALSE)
>> >
>> > ?
>> >
>> >
>> >
>> > dat <- loadSPECIES(SPECIES)
>> > feff <- height~pHarv*year   # fixed effect in the model
>> > reff <- ~year|plant # random effect in the model, where
>> > year is the
>> > dat.lme <- lme(
>> >  fixed = feff,   # fixed effect in the
>> > model
>> >  data  = dat,
>> >  random = reff,  # random effect in the
>> > model
>> >  correlation = corAR1(form=~year|plant), #
>> >  subset = dead==FALSE & recTreat==FALSE, #
>> >  na.action = na.omit
>> >  )
>> > dat.lm <- lm(
>> > formula =  feff,  # fixed effect in the model
>> > data = dat,
>> > subset = dead==FALSE & recTreat==FALSE,
>> > na.action = na.omit
>> > )
>> >
>> > Thanks,
>> >
>> > Rainer
>> >
>> > --
>> > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
>> > UCT), Dipl. Phys. (Germany)
>> >
>> > Centre of Excellence for Invasion Biology
>> > Stellenbosch University
>> > South Africa
>> >
>> > Tel :   +33 - (0)9 53 10 27 44
>> > Cell:   +33 - (0)6 85 62 59 98
>> > Fax :   +33 - (0)9 58 10 27 44
>> >
>> > Fax (D):+49 - (0)3 21 21 25 22 44
>> >
>> > email:  rai...@krugs.de
>> >
>> > Skype:  RMkrug
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] define subset argument for function lm as variable?

2012-08-21 Thread Joshua Wiley
  data = dat,
>>>>>   subset = dead==FALSE & recTreat==FALSE,
>>>>>   na.action = na.omit
>>>>>   )
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rainer
>>>>>
>>>>> --
>>>>> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
>>>>> Biology,
>>>>> UCT), Dipl. Phys. (Germany)
>>>>>
>>>>> Centre of Excellence for Invasion Biology
>>>>> Stellenbosch University
>>>>> South Africa
>>>>>
>>>>> Tel :   +33 - (0)9 53 10 27 44
>>>>> Cell:   +33 - (0)6 85 62 59 98
>>>>> Fax :   +33 - (0)9 58 10 27 44
>>>>>
>>>>> Fax (D):+49 - (0)3 21 21 25 22 44
>>>>>
>>>>> email:  rai...@krugs.de
>>>>>
>>>>> Skype:  RMkrug
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
> --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
> UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology
> Stellenbosch University
> South Africa
>
> Tel :   +33 - (0)9 53 10 27 44
> Cell:   +33 - (0)6 85 62 59 98
> Fax :   +33 - (0)9 58 10 27 44
>
> Fax (D):+49 - (0)3 21 21 25 22 44
>
> email:  rai...@krugs.de
>
> Skype:  RMkrug
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] define subset argument for function lm as variable?

2012-08-21 Thread Joshua Wiley
Hi Rainer,

You could try:

subs <- expression(dead==FALSE & recTreat==FALSE)

lme(formula, subset = eval(subs))

Not tested, but something along those lines should work.

Cheers,

Josh

On Tue, Aug 21, 2012 at 7:44 AM, Rainer M Krug  wrote:
> Hi
>
> I want to do a series of linear models, and would like to define the input
> arguments for lm() as variables. I managed easily to define the formula
> arguments in a variable, but I also would like to have the "subset" in a
> variable. My reasoning is, that I have the subset in the results object.
>
> So I wiould like to add a line like:
>
> subs <- dead==FALSE & recTreat==FALSE
>
> which obviously does not work as the expression is evaluated immediately. Is
> is it possible to do what I want to do here, or do I have to go back to use
>
> dat <- subset(dat, dead==FALSE & recTreat==FALSE)
>
> ?
>
>
>
> dat <- loadSPECIES(SPECIES)
> feff <- height~pHarv*year   # fixed effect in the model
> reff <- ~year|plant # random effect in the model, where
> year is the
> dat.lme <- lme(
>  fixed = feff,   # fixed effect in the
> model
>  data  = dat,
>  random = reff,  # random effect in the
> model
>  correlation = corAR1(form=~year|plant), #
>  subset = dead==FALSE & recTreat==FALSE, #
>  na.action = na.omit
>  )
> dat.lm <- lm(
> formula =  feff,  # fixed effect in the model
> data = dat,
> subset = dead==FALSE & recTreat==FALSE,
> na.action = na.omit
> )
>
> Thanks,
>
> Rainer
>
> --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
> UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology
> Stellenbosch University
> South Africa
>
> Tel :   +33 - (0)9 53 10 27 44
> Cell:   +33 - (0)6 85 62 59 98
> Fax :   +33 - (0)9 58 10 27 44
>
> Fax (D):+49 - (0)3 21 21 25 22 44
>
> email:  rai...@krugs.de
>
> Skype:  RMkrug
>
> ______
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R utilizing 25% of CPU for Dual core i3 370M processor

2012-08-17 Thread Joshua Wiley
You probably have hyper threading so even though you only have two
physical cores, there are 4 logical cores, hence 25%.  Google can lead
you to how to adjust this, if possible.

Cheers,

Josh

On Fri, Aug 17, 2012 at 2:41 PM, Shantanu MULLICK  wrote:
> Hello Everyone
>
> I have a dual-core Intel i3-370M processor and Windows 64 operating system.
> I have a 3GB RAM and my processor can support a maximum of 8GB RAM.
>
> I have virtual memory enabled on my computer.
>
> I am running a program in the 64 bit R which implements a MCMC on a large
> dataset and involves around 8 iterations.
>
> The processor estimates that it will take around 1000 minutes to complete
> running the program.
>
> However when I check the CPU usage it shows only 25% usage.
>
> I have read on threads that R runs on only 1 core - and thus was expecting
> a CPU usage of around 50%.
>
> It would be great if anyone can point out a way by which I can make the
> processor use 50% of the CPU which I believe would allow the program to be
> run quicker.
>
> Best
> Shantanu
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] named character question

2012-08-12 Thread Joshua Wiley
Hi Erin,

The first element of the character vector is a string.  You cannot
extract specifically characters from a string; try something like
?nchar

or perhaps better use regular expressions to extract things between
commas after two characters (or whatever logical rule accurately gets
the zip code).

Cheers,

Josh

On Sun, Aug 12, 2012 at 8:33 PM, Erin Hodgess  wrote:
> Dear R People:
>
> Here is a goofy question:
>
> I want to extract the zip code from an address and here is my work so far:
>
>> add1
>   results.formatted_address
> "200 W Rosamond St, Houston, TX 77076, USA"
>> add1[1][32:36]
> 
>   NA   NA   NA   NA   NA
>> str(add1)
>  Named chr "200 W Rosamond St, Houston, TX 77076, USA"
>  - attr(*, "names")= chr "results.formatted_address"
>>
>
> What am I not seeing, please?
>
> Thanks,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodg...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Align columns in data frame write.table

2012-08-10 Thread Joshua Wiley
I do not know of any option in write.table() that would allow a
variable spacer, such as padding with spaces to make columns centered
or right aligned.  Everything is just separated somehow.  You could
look at ?format or ?sprintf which have padding/alignment options.
Once you had properly padded character data, you could just use
writeLines() to push it to a file.

Cheers,

Josh

On Fri, Aug 10, 2012 at 6:39 PM, sharx  wrote:
> Does anyone know of a way to specify the alignment of individual columns in a
> data frame so that after using write.table the columns are aligned in the
> file?
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Align-columns-in-data-frame-write-table-tp4640007.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lavaan: Immediate non-positive definite matrix

2012-08-10 Thread Joshua Wiley
Dear Andrew,

Maximum likelihood estimation with missing data typically makes some
rather strong assumptions.  If I am not mistaken, the default
covariance coverage in Mplus is .05, the fact that you need to set it
lower suggests you have some combinations of variables with less than
5% jointly present?

If that is true, I am not surprised you are getting errors.
Estimation is an iterative process utilizing the expectation
maximization algorithm with the E step finding the expected value
given the available data and then maximizing the parameters based on
that, which becomes the basis for another E step and so on.  In the
case of no missing data, the expectations do not need to be updated at
all (easy) in the case of a lot of missing data, more steps are
required.  With < 5% (if the assumptions that conclusion was
predicated on are indeed true) pairwise present data for some
combinations, estimates could be quite instable.

If you are dealing with a typical longitudinal study, drop out
probably increases with each additional time point.  The easiest case
is where dropout is monotonic, in which case perhaps it still makes
sense to try to include all time points.  If the missingness is not
monotonic and the < 5% is with some combination of the final time
point, I would suggest you simply look at an unconditional growth
model of say the first 5 or 6 time points.

Actually, even if you keep all 7, if there is increasing missingness
at later time points, I would suggest that you try just 5 and 6 times
as a sort of sensitivity analysis to see how your results are
effected.

I realize that none of my comments really directly pertain to R so
this email is rather off topic for R help.  Here is my saving bit ;)
For data, try sharing it by uploading to some website such as google
drive, dropbox, skydrive, etc. and then just sharing a link with your
emails.  Also, while you're at that, perhaps include the R script to
read data in and load lavaan etc.

>From my brief read of the lavaan error message, it suggests that the
sample covariance matrix is not positive definite and not invertible
(well, I am assuming that S standards for the sample covariance
matrix).

Can you try fitting the model with listwise deletion and with direct
ML?  Have you look at the (listwise) present sample covariance matrix?
 Finally, you could try fitting the model in OpenMx, which also runs
in R.

Cheers,

Josh


On Fri, Aug 10, 2012 at 12:27 PM, Andrew Miles  wrote:
> Hi,
>
> I recently tried to estimate a linear unconditional latent growth curve on
> 7 repeated measures using lavaan (most recent version):
>
> modspec='
>
> alpha =~ 1*read_g0 + 1*read_g1 + 1*read_g2 + 1*read_g3 + 1*read_g4 +
> 1*read_g5 + 1*read_g6
>
> beta =~ 0*read_g0 + 1*read_g1 + 2*read_g2 + 3*read_g3 + 4*read_g4 +
> 5*read_g5 + 6*read_g6
>
> '
>
> gmod=lavaan(modspec, data=math, meanstructure=T, int.ov.free=F, int.lv.free=
> T, auto.var=T, auto.cov.lv.x=T, auto.cov.y=T, missing="direct", verbose=T)
> The model immediately returned the following error message:
>
> Error in chol.default(S) :
>
>   the leading minor of order 5 is not positive definite
>
> Error in lavSampleStatsFromData(Data = lavaanData, rescale =
> (lavaanOptions$estimator ==  :
>
>   lavaan ERROR: sample covariance can not be inverted
> I ran the same model in Mplus and found that it has low covariance coverage
> for the 7 repeated measures, but all coverage is greater than 0.  The model
> ran fine in Mplus once I set the covariance coverage limit to .001.  That
> provided me starting values to try in lavaan, but again it immediately
> returned the same error message.  In fact, nothing I could do allowed me to
> fit the model without it immediately returning the same error message
> (e.g., I tried changing the optimizer, the type of likelihood being used).
>
> Because the error message pops up immediately (i.e., not after lavaan tries
> to estimate the model for a few seconds), it makes me think that my data is
> failing some sort of initial data check.
>
> Any ideas what is happening, or what to do about it?  Thanks.
>
> P.S.  I haven't had success in attaching data files when I submit to the R
> help list, so if you would like the data please email me at
> andrewami...@hotmail.com.
>
> Andrew Miles
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

_

Re: [R] Simple question about formulae in R!?

2012-08-10 Thread Joshua Wiley
On Fri, Aug 10, 2012 at 9:16 AM, S Ellison  wrote:
>> > R in general tries hard to prohibit this behavior (i.e.,  including an
>> > interaction but not the main effect). When removing a main effect and
>> > leaving the interaction, the number of parameters is not reduced by
>> > one (as would be expected) but stays the same, at least
>> > when using model.matrix:
>
> Surely this behaviour is less to do with a dislike of interactions without 
> both main effects (which we will necessarily use if we fit a simple 
> two-factor nested model) than the need to avoid non-uniqueness of a model 
> fitted with too many coefficients?
> In a simple case, an intercept plus n coefficients for n factor levels gives 
> us n+1 coefficients to find, and we only have n independent groups to 
> estimate them from. In model matrix terms we would have one column that is a 
> linear combination of others. For OLS normal equations that generates a zero 
> determinant and for the numerical methods R uses the effect is the same; no 
> useful fit. To avoid that and allow least squares fitting, R sets up the 
> model matrix with only n-1 coefficients in addition to the intercept. As a 
> result we end up with fewer model coefficients than we might have expected 
> (and that annoyingly missing first level that always puzzles newcomers the 
> first time we look at a linear model summary), but we have exactly the number 
> of coefficients that we can estimate uniquely from the groups we have 
> specified.

N.B. Off topic.

This is an incredibly nice feature of R.  SAS overparameterizes the
design matrix and employs the sweep algorithm to zero out redundant
parameters.  With one result being if you want to specify your own
data to post multiply by the coefficient vector, you need to realize
that the vector is larger than the number of nonmissing parameters.  I
struggle to imagine this is more computationally efficient than simply
creating an appropriately parameterized design matrix, although I
suppose in either case you need to check for less than full rank
design matrices.

>
> S
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:17}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple question about formulae in R!?

2012-08-10 Thread Joshua Wiley
tor(vs)0:factor(am)1
   15.0500020.7428619.75000
factor(vs)1:factor(am)1
   28.37143

which you can see directly gets all the group expectations.

Cheers,

Josh

> I have asked a question on how to get around this "limitation" on
> stackoverflow with helpful answers by Ben Bolker and Joshua Wiley:
> http://stackoverflow.com/q/11335923/289572
> (this functionality is now used in function mixed() in my new package afex
> for obtaining "type 3" p-values for mixed models)

With few exceptions, I'm in the habit of not giving people type 3
p-values.  People like, but generally do not actually understand them.
 I would also like to point out that I am in psychology, so yes I know
psychology likes them.  Further, I am a doctoral student so it is not
like I am so established that people bow to my every whim, it is work
to explain why I am not sometimes and I have had discussions in
various lab meetings and with advisors about this, but my point is
that it is possible.  I would urge you to do the same and take heart
that there are others working for change---you are not the only one
even if it feels like it at your university.

>
> Cheers,
> Henrik
>
> Am 10.08.2012 15:48, schrieb R. Michael Weylandt:
>
>> On Fri, Aug 10, 2012 at 7:36 AM, ONKELINX, Thierry
>>  wrote:
>>>
>>> Dear Johan,
>>>
>>> Why should it be complicated? You have a very simple model, thus a very
>>> simple formula. Isn't that great?
>>>
>>> Your formula matches the model. Though Trust~Culture + Structure *
>>> Speed_of_Integration is another option. The model fit is the same, the only
>>> difference is the parameterization of the model.
>>
>>
>> Note quite, I don't think: A*B is shorthand for A + B + A:B where A:B
>> is the 2nd order (interaction) term. The OP only had the 2nd order
>> term without the main effects.
>>
>> OP: You almost certainly want A*B -- it's strange (though certainly
>> not impossible) to have good models where you only have interactions
>> but not main effects.
>>
>> Cheers,
>> Michael
>>
>>
>>>
>>> Best regards,
>>>
>>> Thierry
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>>> and Forest
>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>> Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>> + 32 2 525 02 51
>>> + 32 54 43 61 85
>>> thierry.onkel...@inbo.be
>>> www.inbo.be
>>>
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to say
>>> what the experiment died of.
>>> ~ Sir Ronald Aylmer Fisher
>>>
>>> The plural of anecdote is not data.
>>> ~ Roger Brinner
>>>
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of data.
>>> ~ John Tukey
>>>
>>>
>>> -Oorspronkelijk bericht-
>>> Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> Namens Johan Haasnoot
>>> Verzonden: vrijdag 10 augustus 2012 9:15
>>> Aan: r-help@r-project.org
>>> Onderwerp: [R] Simple question about formulae in R!?
>>>
>>> Good morning reader,
>>>
>>>
>>>
>>> I have encountered a, probably, simple issue with respect to the
>>> *formulae* of a *regression model* I want to use in my research. I'm
>>> researching alliances as part of my study Business Economics (focus
>>> Strategy) at the Vrije Universiteit in Amsterdam. In the research model I
>>> use a moderating variable, I'm looking for confirmation or help on the
>>> formulation of the model.
>>>
>>>
>>>
>>> The research model consist of 2 explanatory variables, a moderating
>>> variable and 1 response variable. The first explanatory variable is Culture,
>>> measured on a nominal scale and the second is Structure of the alliance,
>>> also measured on a nominal scale. The moderating variable in the relation
>>> towards Trust is Speed of Integration, measured as an interval.
>>> The response variable is Trust, measured on a nominal scale (highly
>>> likely a 5-point Likert scale). Given the research model and the measurement
>>> scales, I intent to use a ordered prob

Re: [R] Questionnaire Analysis virtually without continuous Variables

2012-08-04 Thread Joshua Wiley
You may be able to get around zero cells using a an MCMC approach such as with 
MCMCglmm.

On Aug 4, 2012, at 15:30, Sacha Viquerat  wrote:

> On 08/04/2012 07:57 PM, Joshua Wiley wrote:
>>  Hi Sacha,
>> 
>> You're right that this is not an R related question really (would be better 
>> somewhere like crossvalidated.com).
>> 
>> If basically everyone catches 0/1 birds, then I would consider dichotomizing:
>> 
>> Y <- as.integer(caught >= 1)
>> 
>> then check cross tabs to make sure there are no zero cells between 
>> predictors and outcome:
>> 
>> xtabs(~Y + dogs + guns, data=yourdata)
>> 
>> then use the glmer() function to model the nested random effects.
>> 
>> m <- glmer(Y ~ dog + gun + (1 | household) + (1 | village) + (1 | district), 
>> data = yourdata, family=binomial)
>> 
>> summary(m)
>> 
>> Cheers,
>> 
>> Josh
>> 
>> On Aug 4, 2012, at 7:12, Sacha Viquerat  wrote:
>> 
>>> Hello!
>>> I am doing an analysis on a questionnaire of hunters taken in 4 different 
>>> districts of some mysterious foreign country. The aim of the study was to 
>>> gather info on the factors that determine the hunting success of a 
>>> peculiarly beautiful bird in that area. All variables are factors, i.e. 
>>> they are variables such as "Use of Guns - yes / no", "Use of Dogs - yes / 
>>> no" and the likes. The response is upposed to be "number of Birds caught", 
>>> which was designed to be the only continuous variable. However, in reality 
>>> the number of caught birds is between 0 and 1, sometimes hunters answered 
>>> with 2. Unfortunately, it is not the questioner who is burdened with the 
>>> analysis, but me. I am struggling to find an appropriate approach to the 
>>> analysis. I don't really consider this as count data, since it would be 
>>> very vulnerable to overinflation (and a steep decline for counts above 0). 
>>> I can't really suggest binomial models either, since the lack of 
>>> explanatory, continuous data renders such !
 an approach quite vague. I also struggle with the random design of the survey 
(households nested within villages nested within districts). Adding to that, 
hunters don't even target the bird as their prime objective. The bird is 
essentially a by-catch, most often used for instant consumption on the hunting 
trip. I therefore doubt that any analysis makes more than a little sense, but I 
will not yet succumb to failure. Any ideas?
>>> 
>>> Thanks in advance!
>>> 
>>> PS: I just realized that this is not a question related to R but to 
>>> statistics in general. Apologies for that!
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> I did exactly what you proposed already (since the binomial model seemed 
> obvious to me), however, of course there are zero cells. I was thinking 
> someone more accustomed to doing questionnaire analysis could unveil some 
> mysterious approach common to sociologists but occluded from the naturalists 
> eyes (hardened after years of dealing with exact science ;)
> I think I will expand the binomial approach and just try to find fancy 
> graphics that make up for the low value of the actual results (maybe with 
> colours). :D
> Thank you for the reply (do they really give such tasks for homework these 
> days? These kids must be awesome statisticians!)
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Questionnaire Analysis virtually without continuous Variables

2012-08-04 Thread Joshua Wiley
 Hi Sacha,

You're right that this is not an R related question really (would be better 
somewhere like crossvalidated.com).

If basically everyone catches 0/1 birds, then I would consider dichotomizing:

Y <- as.integer(caught >= 1)

then check cross tabs to make sure there are no zero cells between predictors 
and outcome:

xtabs(~Y + dogs + guns, data=yourdata)

then use the glmer() function to model the nested random effects.

m <- glmer(Y ~ dog + gun + (1 | household) + (1 | village) + (1 | district), 
data = yourdata, family=binomial)

summary(m)

Cheers,

Josh

On Aug 4, 2012, at 7:12, Sacha Viquerat  wrote:

> Hello!
> I am doing an analysis on a questionnaire of hunters taken in 4 different 
> districts of some mysterious foreign country. The aim of the study was to 
> gather info on the factors that determine the hunting success of a peculiarly 
> beautiful bird in that area. All variables are factors, i.e. they are 
> variables such as "Use of Guns - yes / no", "Use of Dogs - yes / no" and the 
> likes. The response is upposed to be "number of Birds caught", which was 
> designed to be the only continuous variable. However, in reality the number 
> of caught birds is between 0 and 1, sometimes hunters answered with 2. 
> Unfortunately, it is not the questioner who is burdened with the analysis, 
> but me. I am struggling to find an appropriate approach to the analysis. I 
> don't really consider this as count data, since it would be very vulnerable 
> to overinflation (and a steep decline for counts above 0). I can't really 
> suggest binomial models either, since the lack of explanatory, continuous 
> data renders such an!
  approach quite vague. I also struggle with the random design of the survey 
(households nested within villages nested within districts). Adding to that, 
hunters don't even target the bird as their prime objective. The bird is 
essentially a by-catch, most often used for instant consumption on the hunting 
trip. I therefore doubt that any analysis makes more than a little sense, but I 
will not yet succumb to failure. Any ideas?
> 
> Thanks in advance!
> 
> PS: I just realized that this is not a question related to R but to 
> statistics in general. Apologies for that!
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Partial Likelihood

2012-08-04 Thread Joshua Wiley
In addition to Bert's suggestion of r sig mixed models (which I second), I 
would encourage you to create a more detailed example and explanation of what 
you hope to accomplish. Sounds a bit like an auto regressive structure, but 
more details would be good.

Cheers,

Josh

On Aug 4, 2012, at 9:34, Bert Gunter  wrote:

> Sounds like generalized linear mixed modeling (glmm) to me. Try
> posting to the r-sig-mixed-models list rather than here to increase
> the likelihood of a useful response.
> 
> -- Bert
> 
> On Sat, Aug 4, 2012 at 3:55 AM, doctoratza  wrote:
>> Hello everyone,
>> 
>> i would like to ask if everyone knows how to perfom a glm partial likelihood
>> estimation in a time series whrere dependence exists.
>> 
>> lets say that i want to perform a logistic regression for binary data (0, 1)
>> with binary responses which a re the previous days.
>> 
>> for example:
>> 
>> 
>> logistic<-glm(dat$Day~dat$Day1+dat$Day2, family=binomial(link="logit"))
>> 
>> where dat$Day (0 or 1) is the current day  and dat$Day1 is one day before (0
>> or 1).
>> 
>> is it possible that R performs partial likelihood estimation automatically?
>> 
>> 
>> thank you in advance
>> 
>> Konstantinos Mammas
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Partial-Likelihood-tp4639159.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to link two R packages together

2012-08-02 Thread Joshua Wiley
Hi Xuan,

I would expect ~/R/ to be the R_HOME directory, though perhaps I am
wrong.  Also, you only need a depends or an imports, not both.  At a
simplified level, the difference is that importing PKG2 in PKG1 makes
the (exported) functions in PKG2 available to PKG1. Depends also makes
the (exported) functions in PKG2 available to PKG1, but it _also_
makes them available to the user.  This can be good or bad, with the
recommendation now being not to. The reason it can be good is if you
want your users to have access to functions in both packages without
explicitly loading both. However, let's say that function foo() is
defined in PKG2 and in PKG3. PKG1 does not know about PKG3, and the
user had PKG3 loaded prior to loading PKG1. Then when PKG1 loads, PKG2
also loads, and because all the exported functions from PKG2 are
included, PKG2::foo() masks PKG3::foo(), which can be annoying for
users if your package adds unnecessary functions to the search path
potentially masking the versions they wanted to use.

That was a rather convoluted explanation, but hopefully it is somewhat clear.

Sincerely,

Josh

On Thu, Aug 2, 2012 at 7:02 AM, Xuan Zhao  wrote:
> Hi All,
> Thank you so much for all the help you have provided, I really appreciate it!
> The problem is solved, I just added PKG2 to the dependency of PKG1 in the 
> description file, also (I don't know if it's necessary or not), I added 
> "import(PKG2)" to the NAMESPACE file. In addition to that, I install PKG2 
> under a path R can recognize, namely, belong to ".libPaths()".
> I have tried other ways besides install it under the path R can recognize, 
> like  adding a file ~/.R/check.Environ, whose contents are: 
> 'R_LIBS_SITE=${R_LIBS_SITE-'directoryunderwhichPKG2isinstalled'}', but it 
> still doesn't work.
> I looked through the manual, they just say the file 'check.Environ' should be 
> put under '~/.R', but I am not should what the "~/" should be. Is that my 
> home directory or what? Should that be the host? I am not the host of the 
> server, does that matter?
> Thank you so much for the help!
> Yours,
> Xuan
>
> -Original Message-
> From: Bert Gunter [mailto:gunter.ber...@gene.com]
> Sent: Thursday, August 02, 2012 9:42 AM
> To: Joshua Wiley
> Cc: Michael Weylandt; r-help@r-project.org; Xuan Zhao
> Subject: Re: [R] How to link two R packages together
>
> Josh:
>
> You may be right ... but I still do not think so. Note that the post begins 
> with:
>
> One of them (PKG1) needs to use the functions
>> of the other package (PKG2).
>
> This is exactly what imports are for. I believe that, because he/she failed 
> to RTFM, he/she is not being accurate in his/her use of the relevant terms. 
> Certainly, to me at least, it is unclear. Moreover, as I understand it (see 
> the manual) imports are preferred over dependencies, as I indicated in my 
> reply.
>
> Anyway, in either case, the central advice is that which both Michael and I 
> gave: RTFM. Perhaps it's a generation gap (I am old), but I believe there has 
> been a trend for new R users to post here without making any significant 
> effort to consult the docs. This strikes me as intellectually lazy and, 
> frankly, inconsiderate to those like yourself
> -- or BDR -- who are willing to give up their time and effort to help those 
> with legitimate questions. You and others may consider this an overreaction 
> of course.
>
> Sorry for the rant, but it seems relevant to your close parsing of the thread.
>
> -- Cheers,
> Bert
>
> On Wed, Aug 1, 2012 at 10:35 PM, Joshua Wiley  wrote:
>> On Wed, Aug 1, 2012 at 10:28 PM, Bert Gunter  wrote:
>>> On Wed, Aug 1, 2012 at 6:45 PM, Michael Weylandt
>>>  wrote:
>>>> Isn't this what package dependencies are for?
>>>
>>> No. It's what package imports are for (preferably).
>>> As always, the OP should RTFM -- in this case the one to which you
>>> refer on the next line, especially the NAMESPACES section.
>>
>> But note that the original question included, "when I load one package
>> using 'library("PKG1")', PKG2 can be loaded at the same." which
>> imports does not exactly do. They become available to the package, but
>> not to the user, so if you really need both packages loaded to the
>> search path, then you need a dependency, not imports.
>>
>> Cheers,
>>
>> Josh
>>
>>>
>>> -- Bert
>>>
>>>>
>>>> See the description of the DESCRIPTION file in Writing R Extensions
>>>>
>>>> Michael
>>>>
>>>> On Aug 1, 2012, at 5:27 PM, xuan 

Re: [R] How to link two R packages together

2012-08-02 Thread Joshua Wiley
Hi Bert,

Likewise, you may well be right.  In fact, if I were going to assign
probabilities of being correct given the original post, I would give
you about .7 and me about .3.  I commented for completeness of
discussion.  I definitely see your point, and thank you for a
thoughtful reply to my post. I agree that many questions seem to me
that a quick read of the docs would solve them. That brings up an
interesting point, though.  In a population of 100, if 1-2 people have
a preventable illness, we might be inclined to blame them; what if
40-50 people have the preventable illness?  It seems to me lack of
reading the docs and following the posting guide are at epidemic
proportions. I do not know if it is a generation gap, true laziness,
lack of knowledge that the docs and posting guide _should_ be read
first, or ...

This is slightly OT, but I remember one time when I was working with
someone in person (not in R), and there was a question about how to
customize some graphic.  I immediately went to open up the help files
but the person I was working with said, "I don't want look at the
documentation" and instead starting searching through a cookbook of
different graphs to try to find the answer.  I was shocked because it
seemed like an innate visceral response to the official docs. Perhaps
our useRs are afflicted with this same condition.

Cheers,

Josh

On Thu, Aug 2, 2012 at 6:42 AM, Bert Gunter  wrote:
> Josh:
>
> You may be right ... but I still do not think so. Note that the post
> begins with:
>
> One of them (PKG1) needs to use the functions
>> of the other package (PKG2).
>
> This is exactly what imports are for. I believe that, because he/she
> failed to RTFM, he/she is not being accurate in his/her use of the
> relevant terms. Certainly, to me at least, it is unclear. Moreover, as
> I understand it (see the manual) imports are preferred over
> dependencies, as I indicated in my reply.
>
> Anyway, in either case, the central advice is that which both Michael
> and I gave: RTFM. Perhaps it's a generation gap (I am old), but I
> believe there has been a trend for new R users to post here without
> making any significant effort to consult the docs. This strikes me as
> intellectually lazy and, frankly, inconsiderate to those like yourself
> -- or BDR -- who are willing to give up their time and effort to help
> those with legitimate questions. You and others may consider this an
> overreaction of course.
>
> Sorry for the rant, but it seems relevant to your close parsing of the thread.
>
> -- Cheers,
> Bert
>
> On Wed, Aug 1, 2012 at 10:35 PM, Joshua Wiley  wrote:
>> On Wed, Aug 1, 2012 at 10:28 PM, Bert Gunter  wrote:
>>> On Wed, Aug 1, 2012 at 6:45 PM, Michael Weylandt
>>>  wrote:
>>>> Isn't this what package dependencies are for?
>>>
>>> No. It's what package imports are for (preferably).
>>> As always, the OP should RTFM -- in this case the one to which you
>>> refer on the next line, especially the NAMESPACES section.
>>
>> But note that the original question included, "when I load one package
>> using 'library("PKG1")', PKG2 can be loaded at the same." which
>> imports does not exactly do. They become available to the package, but
>> not to the user, so if you really need both packages loaded to the
>> search path, then you need a dependency, not imports.
>>
>> Cheers,
>>
>> Josh
>>
>>>
>>> -- Bert
>>>
>>>>
>>>> See the description of the DESCRIPTION file in Writing R Extensions
>>>>
>>>> Michael
>>>>
>>>> On Aug 1, 2012, at 5:27 PM, xuan zhao  wrote:
>>>>
>>>>> Hi,
>>>>> I have built two R packages. One of them (PKG1) needs to use the functions
>>>>> of the other package (PKG2).
>>>>> So I need to link these two packages together, so that the functions of 
>>>>> PKG2
>>>>> can be available to PKG1. And when I load one package using
>>>>> 'library("PKG1")', PKG2 can be loaded at the same.
>>>>> Any ideas welcome.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: 
>>>>> http://r.789695.n4.nabble.com/How-to-link-two-R-packages-together-tp4638765.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-hel

Re: [R] How to link two R packages together

2012-08-01 Thread Joshua Wiley
On Wed, Aug 1, 2012 at 10:28 PM, Bert Gunter  wrote:
> On Wed, Aug 1, 2012 at 6:45 PM, Michael Weylandt
>  wrote:
>> Isn't this what package dependencies are for?
>
> No. It's what package imports are for (preferably).
> As always, the OP should RTFM -- in this case the one to which you
> refer on the next line, especially the NAMESPACES section.

But note that the original question included, "when I load one package
using 'library("PKG1")', PKG2 can be loaded at the same." which
imports does not exactly do. They become available to the package, but
not to the user, so if you really need both packages loaded to the
search path, then you need a dependency, not imports.

Cheers,

Josh

>
> -- Bert
>
>>
>> See the description of the DESCRIPTION file in Writing R Extensions
>>
>> Michael
>>
>> On Aug 1, 2012, at 5:27 PM, xuan zhao  wrote:
>>
>>> Hi,
>>> I have built two R packages. One of them (PKG1) needs to use the functions
>>> of the other package (PKG2).
>>> So I need to link these two packages together, so that the functions of PKG2
>>> can be available to PKG1. And when I load one package using
>>> 'library("PKG1")', PKG2 can be loaded at the same.
>>> Any ideas welcome.
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: 
>>> http://r.789695.n4.nabble.com/How-to-link-two-R-packages-together-tp4638765.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm without intercept

2012-07-28 Thread Joshua Wiley
Hi,

R actually uses a different formula for calculating the R square
depending on whether the intercept is in the model or not.

You may also find this discussion helpful:
http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-lm/

If you conceptualize R^2 as the squared correlation between the
oberserved and fitted values, it is easy to get:

summary(m0 <- lm(mpg ~ 0 + disp, data = mtcars))
summary(m1 <- lm(mpg ~ disp, data = mtcars))
cor(mtcars$mpg, fitted(m0))^2
cor(mtcars$mpg, fitted(m1))^2

but that is not how R calculates R^2.

Cheers,

Josh

On Sat, Jul 28, 2012 at 10:40 AM, citynorman  wrote:
> I've just picked up R (been using Matlab, Eviews etc) and I'm having the same
> issue. Running reg=lm(ticker1~ticker2)  gives R^2=50% while running
> reg=lm(ticker1~0+ticker2) gives R^2=99%!! The charts suggest the fit is
> worse not better and indeed Eviews/Excel/Matlab all say R^2=15% with
> intercept=0. How come R calculates a totally different value?!
>
> Call:
> lm(formula = ticker1 ~ ticker2)
>
> Residuals:
>  Min   1Q   Median   3Q  Max
> -0.22441 -0.03380  0.01099  0.04891  0.16688
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept)  1.570620.08187   19.18   <2e-16 ***
> ticker2  0.617220.02699   22.87   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.07754 on 530 degrees of freedom
> Multiple R-squared: 0.4967, Adjusted R-squared: 0.4958
> F-statistic: 523.1 on 1 and 530 DF,  p-value: < 2.2e-16
>
> Call:
> lm(formula = ticker1 ~ 0 + ticker2)
>
> Residuals:
>   Min1QMedian3Q   Max
> -0.270785 -0.069280 -0.007945  0.087340  0.268786
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> ticker2 1.134508   0.001441   787.2   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1008 on 531 degrees of freedom
> Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991
> F-statistic: 6.197e+05 on 1 and 531 DF,  p-value: < 2.2e-16
>
>
> Jan private wrote
>>
>> Hi,
>>
>> thanks for your help. I'm beginning to understand things better.
>>
>>> If you plotted your data, you would realize that whether you fit the
>>> 'best' least squares model or one with a zero intercept, the fit is
>>> not going to be very good
>>> Do the data cluster tightly around the dashed line?
>> No, and that is why I asked the question. The plotted fit doesn't look
>> any better with or without intercept, so I was surprised that the
>> R-value etc. indicated an excellent regression (which I now understood
>> is the wrong interpretation).
>>
>> One of the references you googled suggests that intercepts should never
>> be omitted. Is this true even if I know that the physical reality behind
>> the numbers suggests an intercept of zero?
>>
>> Thanks,
>>   Jan
>>
>> __
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/lm-without-intercept-tp3312429p4638204.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting a subset of files to be processed

2012-07-28 Thread Joshua Wiley
Hi Erin,

It is not difficult to imagine doing it either in R or via the shell.
If they are all in the same directory, I would tend towards R, just
because you can easily set the seed and keep that information so you
can reproduce your random selection.

If the wd is in the directory with the files, from R just:

set.seed(10)
toprocess <- sample(list.files(), size = 45)

Cheers,

Josh

On Sat, Jul 28, 2012 at 10:49 AM, Erin Hodgess  wrote:
> Dear R People:
>
> I am using a Linux system in which I have about 3000 files.
>
> I would like to randomly select about 45 of those files to be processed in R.
>
> Could I make the selection in R or should I do it in Linux, please?
>
> This is with R-2.15.1.
>
> Thanks,
> erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodg...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zeroinfl problem: cannot get standard errors, hessian has NaN

2012-07-25 Thread Joshua Wiley
Hi,

This is not nearly enough information.  Please follow the posting
guide and provide us with a reproducible example, or at the bare
minimum, the code for your models.  Without more details we can only
wildly guess, but here are a few:

---You have more parameters than your data can support (the Hessian is
the matrix of partial second derivatives, and SEs are typically based
closely on the hessian (or rather the information matrix, the negative
of the hessian).
---you have some redundant effects?

Try a simpler model and see if you can find some particular parameter
that leads to the problem.  You could also try tweaking optimization
options, but I that is usually not the source of the problem.

We really need more information, with the ideal being putting data
that reproduces your problem (or your actual data) online somewhere
where we can easily download it, and providing us with all your model
code, so on our machines, we can exactly reproduce what is happening
for you and try to troubleshoot.

Cheers,

Josh

On Tue, Jul 24, 2012 at 10:50 PM, nikkks  wrote:
> Hi!
>
> I have three models.
>
>
>
> In the first model, everything is fine.
>
>
>
>
> However, in the second and third models, I have NA's for standard errors:
>
>
>
>
>
> The hessians also have NaN's (same for m2 and m3).
>
>
>
> What should I do about it? It there a way to obtain the hessian without
> transforming my variables? I will greatly appreciate your help!
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/zeroinfl-problem-cannot-get-standard-errors-hessian-has-NaN-tp4637715.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FIML using lavaan returns zeroes for coefficients

2012-07-23 Thread Joshua Wiley
Hi Andrew,

I do not think there is a reason to avoid it for univariate regression
other than:

1) as was stated the predictors must be continuous
2) it will be slower (non issue for a handful of regressions on a few
thousand cases but for people doing thousands of regression on
millions of observations, a big concern)

In the next month or so, I may have a beta version of a package
primarily providing helper functions for fitting SEM models, but could
also include an lm()ish wrapper to lavaan.  It would use the
traditional formula interface, but issue a warning if factor variables
or variables with insufficient unique values were used (as either
predictors or outcomes).  If anyone would be interested in beta
testing, feel free to email me.  Once I have a basic package working,
it will go up on github.

Cheers,

Josh

On Mon, Jul 23, 2012 at 6:07 AM, Andrew Miles  wrote:
> Thanks for the helpful explanation.
>
> As to your question, I sometimes use lavaan to fit univariate regressions 
> simply because it can handle missing data using FIML rather than listwise 
> deletion.  Are there reasons to avoid this?
>
> BTW, thanks for the update in the development version.
>
> Andrew Miles
>
>
> On Jul 21, 2012, at 12:59 PM, yrosseel wrote:
>
>> On 07/20/2012 10:35 PM, Andrew Miles wrote:
>>> Hello!
>>>
>>> I am trying to reproduce (for a publication) analyses that I ran
>>> several months ago using lavaan, I'm not sure which version, probably
>>> 0.4-12. A sample model is given below:
>>>
>>> pathmod='mh30days.log.w2 ~ mh30days.log + joingroup + leavegroup +
>>> alwaysgroup + grp.partic.w2 + black + age + bivoc + moved.conf +
>>> local.noretired + retired + ds + ministrytime + hrswork +
>>> nomoralescore.c + negint.c + cong.conflict.c + nomoraleXjoin +
>>> nomoraleXleave + nomoraleXalways + negintXjoin + negintXleave +
>>> negintXalways + conflictXjoin + conflictXleave + conflictXalways '
>>> mod1 = sem(pathmod, data=sampledat, missing="fiml", se="robust")
>>>
>>> At the time, the model ran fine.  Now, using version 0.4-14, the
>>> model returns all 0's for coefficients.
>>
>> What happened is that since 0.4-14, lavaan tries to 'detect' models that are 
>> just univariate regression, and internally calls lm.fit, instead of the 
>> lavaan estimation engine, at least when the missing="ml" argument is NOT 
>> used. (BTW, I fail to understand why you would use lavaan if you just want 
>> to fit a univariate regression).
>>
>> When missing="ml" is used, lavaan normally checks if you have fixed x 
>> covariates (which you do), and if fixed.x=TRUE (which is the default). In 
>> 0.4, lavaan internally switches to fixed.x=FALSE (which implicitly assumes 
>> that all your predictors are continuous, but I assume you would not using 
>> missing="ml" otherwise). Unfortunately, for the 'special' case of univariate 
>> regression, it fails to do this. This behavior will likely change in 0.5, 
>> where, by default, only endogenous/dependent variables will be handled by 
>> missing="ml", not exogenous 'x' covariates.
>>
>> To fix it: simply add the fixed.x=FALSE argument, or revert to 0.4-12 to get 
>> the old behavior.
>>
>> Hope this helps,
>>
>> Yves.
>> http://lavaan.org
>>
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Last answer

2012-07-19 Thread Joshua Wiley
My apologies.  I stand corrected.  Thanks Michael.

Josh

On Thu, Jul 19, 2012 at 10:29 PM, R. Michael Weylandt
 wrote:
> See: https://stat.ethz.ch/pipermail/r-help/2012-February/303110.html
>
> Michael
>
> On Fri, Jul 20, 2012 at 12:19 AM, Joshua Wiley  wrote:
>> Hi David,
>>
>> No, but you can store the results and access that.
>>
>> ## parentheses to force printing
>> (x <- 2 + 3)
>>
>> x
>>
>> Cheers,
>>
>> Josh
>>
>>
>> On Thu, Jul 19, 2012 at 10:12 PM, darnold  wrote:
>>> Hi,
>>>
>>> In Matlab, I can access the last computation as follows:
>>>
>>>>> 2+3
>>>
>>> ans =
>>>
>>>  5
>>>
>>>>> ans
>>>
>>> ans =
>>>
>>>  5
>>>
>>> Anything similar in R?
>>>
>>> David
>>>
>>>
>>>
>>> --
>>> View this message in context: 
>>> http://r.789695.n4.nabble.com/Last-answer-tp4637151.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Last answer

2012-07-19 Thread Joshua Wiley
Hi David,

No, but you can store the results and access that.

## parentheses to force printing
(x <- 2 + 3)

x

Cheers,

Josh


On Thu, Jul 19, 2012 at 10:12 PM, darnold  wrote:
> Hi,
>
> In Matlab, I can access the last computation as follows:
>
>>> 2+3
>
> ans =
>
>  5
>
>>> ans
>
> ans =
>
>  5
>
> Anything similar in R?
>
> David
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Last-answer-tp4637151.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] finding the values to minimize sum of functions

2012-07-19 Thread Joshua Wiley
Hi Linh,

Here is an approach:

f <- function(v) {
  v <- v/sum(v)
  (v[1]^2) + (2 * v[2]^2) + (3*v[3]^2)
}

(res <- optim(c(.6, .3, .1), f))

res$par/sum(res$par)

This is a downright lazy way to implement the constraint.  The main
idea is to combine all three functions into one function that takes a
vector of parameters (v, in this case).

Cheers,

Josh

On Thu, Jul 19, 2012 at 10:24 AM, Linh Tran  wrote:
> Hi fellow R users,
>
> I am desperately hoping there is an easy way to do this in R.
>
> Say I have three functions:
>
> f(x) = x^2
> f(y) = 2y^2
> f(z) = 3z^2
>
> constrained such that x+y+z=c (let c=1 for simplicity).
>
> I want to find the values of x,y,z that will minimize f(x) + f(y) + f(z).
>
> I know I can use the optim function when there is only one function, but
> don't know how to set it up when there are three.
>
> I would also like to apply this to higher dimensions (i.e. for more than
> three functions) if possible.
>
> Thank you for all your help!
>
> --
> Kind regards,
>
> Linh Tran, MPH
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] complexity of operations in R

2012-07-19 Thread Joshua Wiley
also rep.int()

> system.time(for (i in 1:1000) x <- rep.int(FALSE, 10))
   user  system elapsed
   0.290.020.29
> system.time(for (i in 1:1000) x <- rep(FALSE, 10))
   user  system elapsed
   1.960.082.05

On Thu, Jul 19, 2012 at 9:11 AM, Bert Gunter  wrote:
> Hadley et. al:
>
> Indeed. And using a loop is a poor way to do it anyway.
>
> v <- as.list(rep(FALSE,dotot))
>
> is way faster.
>
> -- Bert
>
> On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham  wrote:
>> On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan  wrote:
>>> Johan,
>>>
>>> Your 'list' and 'array doubling' code can be written much more efficient.
>>>
>>> The following function is faster than your g and easier to read:
>>>
>>> g2 <- function(dotot) {
>>>   v <- list()
>>>   for (i in seq_len(dotot)) {
>>> v[[i]] <- FALSE
>>>   }
>>> }
>>
>> Except that you don't need to pre-allocate lists...
>>
>> Hadley
>>
>> --
>> Assistant Professor / Dobelman Family Junior Chair
>> Department of Statistics / Rice University
>> http://had.co.nz/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] easy way to fit saturated model in sem package?

2012-07-13 Thread Joshua Wiley
Dear John,

Thanks very much for the reply.  Looking at the optimizers, I had
thought that the objectiveML did what I wanted.  I appreciate the
clarification.

I think that multiple imputation is more flexible in some ways because
you can easy create different models for every variable.  At the same
time, if the assumptions hold, FIML is equivalent to multiple
imputation, and considerably more convenient.  Further, I suspect that
in many circumstances, either option is equal to or better than
listwise deletion.

In my case, I am working on some tools primarily for data exploration,
in a SEM context (some characteristics of individual variables and
then covariance/correlation matrices, clustering, etc.) and hoped to
include listwise/pairwise/FIML as options.

I will check out the lavaan package.

Thanks again for your time,

Josh

On Thu, Jul 12, 2012 at 8:20 AM, John Fox  wrote:
> Dear Joshua,
>
> If I understand correctly what you want to do, the sem package won't do it.
> That is, the sem() function won't do what often is called FIML estimation
> for models with missing data. I've been thinking about implementing this
> feature, and don't think that it would be too difficult, but I can't promise
> when and if I'll get to it. You might also take a look at the lavaan
> package.
>
> As well, I must admit to some skepticism about the FIML estimator, as
> opposed to approaches such as multiple imputation of missing data. I suspect
> that the former is more sensitive than the latter to the assumption of
> multinormality.
>
> Best,
>  John
>
> 
> John Fox
> Senator William McMaster
>   Professor of Social Statistics
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox
>
>
>
>
>> -----Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
>> project.org] On Behalf Of Joshua Wiley
>> Sent: July-12-12 2:53 AM
>> To: r-help@r-project.org
>> Cc: John Fox
>> Subject: [R] easy way to fit saturated model in sem package?
>>
>> Hi,
>>
>> I am wondering if anyone knows of an easy way to fit a saturated model
>> using the sem package on raw data?  Say the data were:
>>
>> mtcars[, c("mpg", "hp", "wt")]
>>
>> The model would estimate the three means (intercepts) of c("mpg", "hp",
>> "wt").  The variances of c("mpg", "hp", "wt").  The covariance of mpg
>> with hp and wt and the covariance of hp with wt.
>>
>> I am interested in this because I want to obtain the MLE mean vector
>> and covariance matrix when there is missing data (i.e., the sum of the
>> case wise likelihoods or so-called full information maximum
>> likelihood).  Here is exemplary missing data:
>>
>> dat <- as.matrix(mtcars[, c("mpg", "hp", "wt")])
>> dat[sample(length(dat), length(dat) * .25)] <- NA dat <-
>> as.data.frame(dat)
>>
>> It is not too difficult to write a wrapper that does this in the OpenMx
>> package because you can easily define paths using vectors and get all
>> pairwise combinations using:
>>
>> combn(c("mpg", "hp", "wt"), 2)
>>
>> but I would prefer to use the sem package, because OpenMx does not work
>> on 64 bit versions of R for Windows x64 and is not available from CRAN
>> presently.  Obviously it is not difficult to write out the model, but I
>> am hoping to bundle this in a function that for some arbitrary data,
>> will return the FIML estimated covariance (and correlation matrix).
>> Alternately, if there are any functions/packages that just return FIML
>> estimates of a covariance matrix from raw data, that would be great
>> (but googling and using findFn() from the sos package did not turn up
>> good results).
>>
>> Thanks!
>>
>> Josh
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group University of
>> California, Los Angeles https://joshuawiley.com/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] easy way to fit saturated model in sem package?

2012-07-11 Thread Joshua Wiley
Hi,

I am wondering if anyone knows of an easy way to fit a saturated model
using the sem package on raw data?  Say the data were:

mtcars[, c("mpg", "hp", "wt")]

The model would estimate the three means (intercepts) of c("mpg",
"hp", "wt").  The variances of c("mpg", "hp", "wt").  The covariance
of mpg with hp and wt and the covariance of hp with wt.

I am interested in this because I want to obtain the MLE mean vector
and covariance matrix when there is missing data (i.e., the sum of the
case wise likelihoods or so-called full information maximum
likelihood).  Here is exemplary missing data:

dat <- as.matrix(mtcars[, c("mpg", "hp", "wt")])
dat[sample(length(dat), length(dat) * .25)] <- NA
dat <- as.data.frame(dat)

It is not too difficult to write a wrapper that does this in the
OpenMx package because you can easily define paths using vectors and
get all pairwise combinations using:

combn(c("mpg", "hp", "wt"), 2)

but I would prefer to use the sem package, because OpenMx does not
work on 64 bit versions of R for Windows x64 and is not available from
CRAN presently.  Obviously it is not difficult to write out the model,
but I am hoping to bundle this in a function that for some arbitrary
data, will return the FIML estimated covariance (and correlation
matrix).  Alternately, if there are any functions/packages that just
return FIML estimates of a covariance matrix from raw data, that would
be great (but googling and using findFn() from the sos package did not
turn up good results).

Thanks!

Josh


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to compare stacked histograms/datasets

2012-07-11 Thread Joshua Wiley
Hi,

Sure, you could do a qqplot for each variable between two datasets.
In a 2d graph, it will be hard to reasonably compare more than 2
datasets (you can put many such graphs on a single page, but it would
be pairwise sets of comparisons, I think.  Perhaps you could plots
multiple qqplots on top of each other varying the points by colour for
the different data sets?

I have not seen anything like this before, so I suppose it depends
what helps you understand your data.

Cheers,

Josh

On Sat, Jul 7, 2012 at 3:25 PM, Atulkakrana  wrote:
> Hello Joshua,
>
> Thanks for taking time out to help me with problem. Actually the comparison
> is to be done among two (if possible, more than two) datasets and not within
> the dataset. Each dataset hold 5 variables (i.e Red, Purple, Blue, Grey and
> Yellow) for 21 different positions i.e 1-21n. So, we have 5 values for each
> position (total 21) that make a single dataset or stacked histogram (Plot in
> original post).
>
> Initially I was comparing datasets by plotting stacked histograms for each
> and analyzing them visually. But that doesn't give a statistical idea of how
> similar or different the datasets are. Therefore, I want to evaluate the
> datasets in order to quantify their difference/similarity. So, end result
> would be a plot showing similarity/difference among two or more datasets.
>
> Example datasets: http://pastebin.com/iYj1RNvt
>
> Does the method you explained can be applied to multiple datasets? Can a
> qqplot be obtained in such a case?
>
> Awaiting your reply
>
> Thanks
>
> Atul
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668p4635744.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Notation for previous observation in a data frame

2012-07-07 Thread Joshua Wiley
Hi Ernie,

I'll use the built in mtcars dataset for demonstrative purposes.

You have some condition, which can be used to create an index

## show data frame
mtcars
## create index based on your condition
i <- which(mtcars$carb == 1)
## set those rows of mtcars in your index
## to the index - 1
mtcars[i, ] <- mtcars[i - 1, ]
## show resulting data frame
mtcars

note that if your condition grabs the first row, this will wreck havoc.

Cheers,

Josh


On Sat, Jul 7, 2012 at 8:20 PM, Ernie Tedeschi  wrote:
> I've created a data frame in R, but in order to clean up some of the data,
> I need to set certain variable observations equal to the value of their
> previous observation (it would be conditional, but that part's less
> important right now). In Stata, I would simply set var = var[_n-1] in those
> cases. What is the R equivalent?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to compare stacked histograms/datasets

2012-07-07 Thread Joshua Wiley
Hi,

Probably easier to work with the raw data, but whatever.  If your data
is in a data frame, dat,

## create row index
dat$x <- 1:21

## load packages
require(ggplot2)
require(reshape2)

## melt the data frame to be long, long dat, ldat for short
ldat <- melt(dat, id.vars="x")

## plot the distributions
ggplot(ldat, aes(x, value, colour = variable)) + geom_line()

## they don't really look on the same scale
## we could scale the data first to have equal mean and variance
dat2 <- as.data.frame(scale(dat))
## remake index so it is not scaled
dat2$x <- 1:21

ldat2 <- melt(dat2, id.vars="x")
ggplot(ldat2, aes(x, value, colour = variable)) + geom_line()

which yields the attached PDF (maybe scrubbed on the official list as
most file extensions are, but should go through to you personally via
gmail).  I'm not sure it's the greatest approach ever, but it gives
you a sense if they go up and down together or at different points.

Cheers,

Josh

On Fri, Jul 6, 2012 at 1:55 PM, Atulkakrana  wrote:
> Hello All,
>
> I have a couple of stacked histograms which I need to compare/evaluate for
> similarity or difference.
> http://r.789695.n4.nabble.com/file/n4635668/Selection_011.png
>
> I believe rather than evaluating histograms is will be east to work with
> dataset used to plot these stacked histograms, which is in format:
>
> RED  PURPLE BLUE
> GREY   YELLOW
> 22.0640569395   16.9483985765   0   60.9875444840
> 8.18505338088.85231316730   82.9626334520
> 6.85053380786.89501779360.756227758 85.4982206406   0.5338078292
> 6.76156583635.24911032031.645907473386.3434163701   0.6672597865
> 5.82740213527.384341637 2.135231316784.6530249111.1565836299
> 7.87366548046.628113879 1.556939501883.9412811388   1.2010676157
> 7.16192170828.18505338081.245551601483.4074733096   1.3790035587
> 5.560498220610.2758007117   1.067615658483.0960854093   1.0231316726
> 7.11743772247.60676156580.711743772284.5640569395   0.756227758
> 7.87366548043.95907473310.667259786587.50.3113879004
> 7.65124555167.87366548040.533807829283.9412811388   0.5338078292
> 7.60676156588.98576512461.467971530281.9395017794   0.3558718861
> 8.94128113888.00711743771.379003558781.6725978648   0.5782918149
> 19.0836298932   9.20818505342.135231316769.5729537367   1.3790035587
> 14.9911032028   11.0765124555   3.202846975170.7295373665   1.0676156584
> 15.3914590747   10.8985765125   3.024911032 70.6850533808   1.2900355872
> 17.4822064057   12.5444839858   2.491103202867.4822064057   1.334519573
> 15.8362989324   13.0338078292   2.001779359469.1281138791.334519573
> 17.03736654810.4537366548   2.402135231370.1067615658   1.2010676157
> 20.2846975089   10.0088967972   0   69.7064056941.0676156584
> 28.7366548043   12.6334519573   0   58.6298932384   0
>
> Is there any possible way I can compare such dataset from multiple
> experiments (n=8) and visually show (plot) that these datasets are in
> consensus or differ from each other?
>
> Awaiting reply,
>
> Atul
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/


plots.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal places in a number?

2012-07-07 Thread Joshua Wiley
Hi Martin,

Ted is spot on about the binary representation.  A very different
approach from his would be to convert to character and use regular
expressions:

## the example numbers in a vector
x <- c(3.14, 3.142, 3.1400, 123456.123456789, 123456789.123456789, pi, sqrt(2))

nchar(gsub("(.*\\.)|([0]*$)", "", as.character(x)))

which for me returns:
[1]  2  3  2  9  6 14 13

an advantage of this approach is that for numbers like
123456789.123456789, although R cannot represent it properly as a
binary number, the character string is totally fine.

nchar(gsub("(.*\\.)|([0]*$)", "", "123456789.123456789"))

returns 9

Essentially the expression looks for anything (the period) zero or
more times (the *) followed by an actual period (the \\.) OR 0
repeated zero or more times at the end of the string, and replaces all
of those with nothing (the "") and then returns the result, the number
of characters of which is counted by nchar()

See ?regex for details

Cheers,

Josh

On Sat, Jul 7, 2012 at 3:04 AM, Ted Harding  wrote:
> On 07-Jul-2012 08:52:35 Martin Ivanov wrote:
>>  Dear R users,
>>
>> I need a function that gets a number and returns its number of
>> actual decimal places.
>> For example f(3.14) should return 2, f(3.142) should return 3,
>> f(3.1400) should also return 2 and so on. Is such function already
>> available in R? If not, could you give me a hint how to achieve that?
>>
>> Many thanks in advance.
>
> I'm not aware of such a function in R. In any case, it will be
> a tricky question to solve in full generality, since R stores
> numbers internally in a binary representation and the exact
> conversion of this representation to a decimal number may not
> match the exact value of the decimal representation of the
> original number.
>
> In particular, a number entered as a decimal representation
> from the keyboard, or read as such from a text file, may not
> be exactly matched by the internal representation in R.
>
> However, that said, the following function definition seems to
> do what you are asking for, for cases such as you list:
>
> f<-function(x) {min(which( x*10^(0:20)==floor(x*10^(0:20)) )) - 1}
>
>   f(3.14)
>   # [1] 2
>   f(3.142)
>   # [1] 3
>   f(3.1400)
>   # [1] 2
>
>
>
> Note, however:
>
>   f(123456.123456789)
>   # [1] 9
>
>   f(123456789.123456789)
>   #[1] 7
>
> (a consequence of the fact that R does not have enough binary
> digits in its binary representation to accommodate the precision
> in all the decimal digits of 123456789.123456789 -- not that it
> can do that exactly anyway in binary, no matter how many binary
> digits it had available).
>
> Similarly:
>
>   f(pi)
>   # [1] 15
>   f(sqrt(2))
>   # [1] 16
>
> which is a consequence of the fact that 2 < pi < 4, while
> 1 < sqrt(2) < 2, so the binary representation of pi needs
> 1 more binary digit for its integer part than sqrt(2) does,
> which it therefore has to "steal" from the fractional part.
>
> Hoping this helps,
> Ted.
>
> -
> E-Mail: (Ted Harding) 
> Date: 07-Jul-2012  Time: 11:04:26
> This message was sent by XFMail
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differences between survival models between STATA and R

2012-07-06 Thread Joshua Wiley
Hi J,

You have not provided nearly enough information for us to evaluate
whether the results should be similar.  You are talking about two
completely different packages, with no information on your data, only
a small amount of information about your model ("the same analysis")
but clearly the analysis is *not* the same or you would get the same
results, so really you have "I think I am running the same analysis
but either the underlying code/model, the data, or my syntax is
actually running a different model and what I really want is someone
to help me synchronize the results from R and Stata so I feel more
confident, but please do this without data, code, or output".

Please do not take this the wrong way, I am not trying to be harsh,
just point out the difficulty of the question you have asked us.  I
primarily use R, but I work at a consulting center and have access to
Stata and some interest in survival models, so if you sent us your
data (or found/created some example data that replicated the
discrepancy you note), I would try to check out both your R and Stata
code, but I can only do this with some help from you.  Otherwise, I
have to find my own data, examine the functions in R and Stata to
compare what they are doing, and try many tests on my own to hopefully
try to replicate what you might be seeing and then see if I can
actually get the same output.

Perhaps a more saintly version of myself would do that, but unlike my
officemate, there is no special place waiting for me in heaven, or
perhaps I have yet to find my wings.  More detailed help provided when
you provide a reproducible example as the posting guide requests.

Cheers,

Josh


On Fri, Jul 6, 2012 at 2:12 PM, JPF  wrote:
> Dear Community,
>
> I have been using two types of survival programs to analyse a data set.
>
> The first one is an R function called aftreg. The second one an STATA
> function called streg.
>
> Both of them include the same analyisis with a weibull distribution. Yet,
> results are very different.
>
> Shouldn't the results be the same?
>
> Kind regards,
> J
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/differences-between-survival-models-between-STATA-and-R-tp4635670.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

it's this bit right here I am referring to:

> and provide commented, minimal, self-contained, reproducible code.
 -->  ^


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove loop which compares row i to row i-1

2012-07-05 Thread Joshua Wiley
Hi,

Great chance to practice debugging.  Rather than trying one
complicated statement, break it into pieces.  The basic structure is:

results <- ifelse(condition, value if true, value if false)

Each argument needs to be a vector of the same length.  In your case,
condition itself consists of two vectors:

v1 >= v2

so try creating all four vectors and making sure they are what you
want and are the appropriate length.  Then:

results <- ifelse(v1 >= v2, VIT, VIF)

will work.

Cheers,

Josh

On Thu, Jul 5, 2012 at 3:52 PM, jcrosbie  wrote:
> Thank you,
>
> I tired
>
>  ifelse(tUnitsort[length(tUnitsort$Hour),4]>=tUnitsort[-1,4],(tempMC
> =tUnitsort[length(tUnitsort$Hour),7]),tempMC )
>
> But this doesn't seem to work.
>
> Where am I going wrong?
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/remove-loop-which-compares-row-i-to-row-i-1-tp4635327p4635566.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vector entry in matix

2012-07-05 Thread Joshua Wiley
Hi Thomas,

This is non trivial to do, but if you will be working with this sort
of data and are inclined to do some programming, you might consider
creating a new class.  S4 classes and methods are quite flexible, and
you can allow them to lean on or inherit from existing classes such as
matrices.  For example, your class might consist of three lists and a
data frame.  The length of each list would be forced to be equal to
the number of rows in the data frame.  Each element of each list would
support a length 2 numeric vector (or integer vector or whatever it is
you need).  THe lists then would hold nodes 1 - 3, and the data frame
would hold other information.  You can similarly write methods so that
for your special class,

d[1, 1] would return c(x1, y1), essentially you create a mix of lists
and a matrix with all the necessary constraints, and you develop
methods that allow you to operate on this in a way that is sensible
for your data.

This is stab in the dark, but if you happen to be dealing with social
network style data, (triangles and neighbors make me think of that),
you should investigate existing packages for that as they may already
have their own classes/have a way they expect data to be structured.
Off the top of my head, the package "sna" comes to mind as a starting
place to look (though I am not personally very familiar with it).

Cheers,

Josh

On Thu, Jul 5, 2012 at 9:37 AM, Thomas C.  wrote:
> well thank you, i think we get closer to my problem. but i meant it in a
> different way. maybe i describe what i intended to do:
>
> i have number of triangles which i'd like to store in a list, matrix,...etc.
> i thought it could look sth. like that:
>
> trianglenode1node2node3   .
>  1   c(x1,y1) c(x2,y3)  c(x3,y3)
>  1   c(x4,y4) c(x5,y5)  c(x6,y6)
> .
> .
> .
>
> i know that i could just make more rows for the x/y values but the problem
> is, that i need to store more than these properties (say the neighbours of
> these nodes and their coordinates) and i thought it would be more convenient
> to save these as vectors. it was just a thought... is this possible?
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/vector-entry-in-matix-tp4635478p4635509.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GEE with Inverse Probability Weights

2012-07-05 Thread Joshua Wiley
Hi Frank,

It clusters by twin, that is why in Dr. Lumley's example, the "id" was
twin pair, not individual, and the SE is adjusted accordingly.

Cheers,

Josh

On Thu, Jul 5, 2012 at 12:10 PM, RFrank  wrote:
> Thanks -- extremely helpful.  But what is the mechanism by which this
> analysis corrects for the fact that my subjects are clustered (twins)?
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/GEE-with-Inverse-Probability-Weights-tp4633172p4635533.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop for regression

2012-07-04 Thread Joshua Wiley
Apologies to Bert, I see now that "market" is a variable in the
dataset, not a generic name for market indicators.  lm() with a matrix
as the outcome seems the best approach here.

Josh

On Wed, Jul 4, 2012 at 12:12 PM, Joshua Wiley  wrote:
> On Wed, Jul 4, 2012 at 11:48 AM, Bert Gunter  wrote:
>> Please carefully read ?lm. As I previously told the OP, no looping/apply is
>> necessary. The left hand side of the lm formula can be a matrix for which
>> separate fits will be done on each column automatically.
>
> Which is a great option if the design matrix is constant, but suppose
> you want to predict each stock from every other stock in a dataset?
> This is a one liner with lapply:
>
> lapply(colnames(mtcars), function(n) lm(substitute(y ~ ., list(y =
> as.name(n))), data = mtcars))
>
> granted, not the prettiest or most efficient thing on the planet (and
> falls apart for some reason with fastLm(), which I am still
> investigating work arounds to).  The matrix outcome to lm() approach:
>
> lm(as.matrix(mtcars) ~ ., data = mtcars)
>
> yeilds perfect explanation by the variable of itself, as expected,
> which is not really useful.  The OP did not give many details other
> than "write a for loop".  It is not clear what should be varying.  If
> it is *just* the outcome, you are absolutely right, giving lm a matrix
> seems the most sensible route.
>
> Cheers,
>
> Josh
>
>>
>> -- Bert
>>
>> On Wed, Jul 4, 2012 at 9:44 AM, arun  wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>> You could also use:
>>> dat1 <- read.table(text="
>>>
>>> Date  Stock1  Stock2  Stock3Market
>>> 01/01/20001  2  34
>>> 01/02/20005  6  78
>>> 01/03/20001  2  34
>>> 01/04/20005  6  78
>>> ", header=TRUE, stringsAsFactors=FALSE)
>>>
>>> Stocks<-dat1[,2:4]
>>> apply(Stocks,2,function(x) lm(x~Market,data=dat1))
>>> $Stock1
>>>
>>> Call:
>>> lm(formula = x ~ Market, data = dat1)
>>>
>>> Coefficients:
>>> (Intercept)   Market
>>>  -31
>>>
>>>
>>> $Stock2
>>>
>>> Call:
>>> lm(formula = x ~ Market, data = dat1)
>>>
>>> Coefficients:
>>> (Intercept)   Market
>>>  -21
>>>
>>>
>>> $Stock3
>>>
>>> Call:
>>> lm(formula = x ~ Market, data = dat1)
>>>
>>> Coefficients:
>>> (Intercept)   Market
>>>  -11
>>>
>>> A.K.
>>>
>>>
>>>
>>>
>>> - Original Message -
>>> From: Akhil dua 
>>> To: r-help@r-project.org
>>> Cc:
>>> Sent: Wednesday, July 4, 2012 1:08 AM
>>> Subject: [R] loop for regression
>>>
>>> -- Forwarded message --
>>> From: Akhil dua 
>>> Date: Wed, Jul 4, 2012 at 10:33 AM
>>> Subject:
>>> To: r-help@r-project.org
>>>
>>>
>>> Hi everyone I
>>> have data on stock prices and market indices
>>>
>>> and I need to run a seperate regression of every stock on market
>>> so I want to write a  "for loop"  so that I wont have to write codes again
>>> and again to run the regression...
>>> my data is in the format given below
>>>
>>>
>>>
>>> Date   Stock1  Stock2   Stock3Market
>>> 01/01/2000 1   2  3 4
>>> 01/02/2000 5   6  7 8
>>> 01/03/2000 1   2  3 4
>>> 01/04/2000 5   6  7 8
>>>
>>>
>>> So can any one help me how to write this loop
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read

Re: [R] loop for regression

2012-07-04 Thread Joshua Wiley
On Wed, Jul 4, 2012 at 11:48 AM, Bert Gunter  wrote:
> Please carefully read ?lm. As I previously told the OP, no looping/apply is
> necessary. The left hand side of the lm formula can be a matrix for which
> separate fits will be done on each column automatically.

Which is a great option if the design matrix is constant, but suppose
you want to predict each stock from every other stock in a dataset?
This is a one liner with lapply:

lapply(colnames(mtcars), function(n) lm(substitute(y ~ ., list(y =
as.name(n))), data = mtcars))

granted, not the prettiest or most efficient thing on the planet (and
falls apart for some reason with fastLm(), which I am still
investigating work arounds to).  The matrix outcome to lm() approach:

lm(as.matrix(mtcars) ~ ., data = mtcars)

yeilds perfect explanation by the variable of itself, as expected,
which is not really useful.  The OP did not give many details other
than "write a for loop".  It is not clear what should be varying.  If
it is *just* the outcome, you are absolutely right, giving lm a matrix
seems the most sensible route.

Cheers,

Josh

>
> -- Bert
>
> On Wed, Jul 4, 2012 at 9:44 AM, arun  wrote:
>
>>
>>
>> Hi,
>>
>> You could also use:
>> dat1 <- read.table(text="
>>
>> Date  Stock1  Stock2  Stock3Market
>> 01/01/20001  2  34
>> 01/02/20005  6  78
>> 01/03/20001  2  34
>> 01/04/20005  6  78
>> ", header=TRUE, stringsAsFactors=FALSE)
>>
>> Stocks<-dat1[,2:4]
>> apply(Stocks,2,function(x) lm(x~Market,data=dat1))
>> $Stock1
>>
>> Call:
>> lm(formula = x ~ Market, data = dat1)
>>
>> Coefficients:
>> (Intercept)   Market
>>  -31
>>
>>
>> $Stock2
>>
>> Call:
>> lm(formula = x ~ Market, data = dat1)
>>
>> Coefficients:
>> (Intercept)   Market
>>  -21
>>
>>
>> $Stock3
>>
>> Call:
>> lm(formula = x ~ Market, data = dat1)
>>
>> Coefficients:
>> (Intercept)   Market
>>  -11
>>
>> A.K.
>>
>>
>>
>>
>> - Original Message -
>> From: Akhil dua 
>> To: r-help@r-project.org
>> Cc:
>> Sent: Wednesday, July 4, 2012 1:08 AM
>> Subject: [R] loop for regression
>>
>> -- Forwarded message --
>> From: Akhil dua 
>> Date: Wed, Jul 4, 2012 at 10:33 AM
>> Subject:
>> To: r-help@r-project.org
>>
>>
>> Hi everyone I
>> have data on stock prices and market indices
>>
>> and I need to run a seperate regression of every stock on market
>> so I want to write a  "for loop"  so that I wont have to write codes again
>> and again to run the regression...
>> my data is in the format given below
>>
>>
>>
>> Date   Stock1  Stock2   Stock3Market
>> 01/01/2000 1   2  3 4
>> 01/02/2000 5   6  7 8
>> 01/03/2000 1   2  3 4
>> 01/04/2000 5   6  7 8
>>
>>
>> So can any one help me how to write this loop
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop for regression

2012-07-03 Thread Joshua Wiley
Hi,

A few comments.  First a for loop is probably not optimally efficient.
 Consider instead (using a bulit in example dataset):

lm(cbind(mpg, hp) ~ cyl + vs, data = mtcars)

which gives:

Call:
lm(formula = cbind(mpg, hp) ~ cyl + vs, data = mtcars)

Coefficients:
 mpg   hp
(Intercept)   39.6250  -15.6279
cyl   -3.0907   27.5843
vs-0.9391  -19.1148

i.e., same predictors used on both outcomes.  Note that this is
substantially faster than running separately.  See ?lm for details.
If you need to run separate models (e.g., predictors are changing),
and you have many models and a lot of data (which would not be
surprising when working with stock data), consider using the RcppEigen
package.  You can get it by:

install.packages("RcppEigen")
require(RcppEigen) # load package

it has a function called fastLm which is orders of magnitude faster
than lm() and works almost identically.

lapply(mtcars[, c("mpg", "hp")], function(x) fastLm(X = cbind(Int = 1,
mtcars[, c("cyl", "vs")]), y = x))

you just give it the design matrix (X) and response vector (y) see ?fastLm

Cheers,

Josh

On Tue, Jul 3, 2012 at 10:08 PM, Akhil dua  wrote:
> -- Forwarded message --
> From: Akhil dua 
> Date: Wed, Jul 4, 2012 at 10:33 AM
> Subject:
> To: r-help@r-project.org
>
>
> Hi everyone I
> have data on stock prices and market indices
>
> and I need to run a seperate regression of every stock on market
> so I want to write a  "for loop"  so that I wont have to write codes again
> and again to run the regression...
> my data is in the format given below
>
>
>
> Date   Stock1  Stock2   Stock3Market
> 01/01/2000 1   2  3 4
> 01/02/2000 5   6  7 8
> 01/03/2000 1   2  3 4
> 01/04/2000 5   6  7 8
>
>
> So can any one help me how to write this loop
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with lmer formula

2012-07-02 Thread Joshua Wiley
Hi Camila,

In mixed equation form instead of multilevel, it would be:

Y_it = gamma_00 + gamma_10*X_it + gamma_11*W_it*X + (e_it + u_0t + u_1j*X)

your code seems reasonable.  Note that the random intercept and slope
will be correlated in your specification (unstructured if you want, it
is possible to force out, but is sensible starting place)

model <- lmer(Y ~ X + X:W + (X | ID), data = data)

which gives:
residual variance: e_it
variance of intercept (constant term, gamma_00): u_0t
variance of slope (gamma_10): u_1j*X

as well as overall estimates for the intercept, slope of X, and the
interaction of X and W.

Bert is correct that R sig mixed models is the more appropriate list,
but many people read both and there is no reason it cannot be answered
here.

Cheers,

Josh


On Mon, Jul 2, 2012 at 6:47 PM, Camila Mendes  wrote:
> Hey all -
>
> I am a newbie on mixed-effects models. I want to estimate the following
> model:
>
> Y_it = alpha_0t + alpha_1t*X_it + e_it
> alpha_0t = gamma_00 + u_0t
> alpha_1t = gamma_10 + gamma_11*W_it + u_1j
>
> Where Y is my outcome, X is my level-1 predictor, and W is my level 2
> predictor.
>
> I am not sure if I am doing it right. Is this the correct specification of
> the formula?
>
> model = lmer(Y ~ X + X:Y + ( X | ID), data = data)
>
> Also, can you help me to write down the combined model formula? I tried
> by substituting on the first equation, but I got some weird interactions
> with the residual (?)
>
> Thanks a lot!
>
> - Camila
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.prcomp() call/eval

2012-06-29 Thread Joshua Wiley
On Fri, Jun 29, 2012 at 1:20 AM, Jessica Streicher
 wrote:
> Hm.. i attached a file with the code, but it doesn't show up somehow..

non text files are scrubbed, and only certain file extensions are
allowed (I forget which, I know that .R is *not* allowed (although I
think that .txt and maybe .log are).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.prcomp() call/eval

2012-06-28 Thread Joshua Wiley
On Thu, Jun 28, 2012 at 8:27 AM, Jessica Streicher
 wrote:
> Thanks Josh,
>
> that soggy sandwhich saves me a LOT of code by the way,
> I'll keep it  for the time being  ;)

There may be other ways.  With no knowledge of your context, obviously
I cannot say absolutely that there are better ways to save yourself
code, but I can say that there often are.

>
> greetings
> Jessica
>
> On 28.06.2012, at 17:15, Joshua Wiley wrote:
>
>> Hi Jessica,
>>
>> x <- call("plot", quote(pcaI))
>> eval(x)
>>
>> that said, I suspect you would be better off avoiding this idiom
>> altogether.  Storing unevaluated calls is akin to putting tomatoes on
>> your sandwich before packing it for work---you can do it but you end
>> up with a soggy sandwich by the time you are eating it.  Better to
>> store the bread and tomatoe separately and put them together when you
>> want a delicious sandwich.
>>
>> Cheers,
>>
>> Josh
>>
>> On Thu, Jun 28, 2012 at 8:03 AM, Jessica Streicher
>>  wrote:
>>> Hi!
>>>
>>> I am getting a lot of numbers in the background of the pca screeplots if i 
>>> use call("plot") and eval(somecall).
>>> Til now, creating the calls and plotting later on this way worked fine. 
>>> Example:
>>>
>>>  pcaI<-prcomp(iris[,1:4])
>>>  plot(pcaI)
>>>
>>>  x<-call("plot",pcaI)
>>>  eval(x)
>>>
>>> Anyone got an idea how i can avoid that? (also it might take a second or so 
>>> for the numbers to appear, so wait a bit before you tell me everythings 
>>> fine ^^)
>>>        [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.prcomp() call/eval

2012-06-28 Thread Joshua Wiley
Hi Jessica,

x <- call("plot", quote(pcaI))
eval(x)

that said, I suspect you would be better off avoiding this idiom
altogether.  Storing unevaluated calls is akin to putting tomatoes on
your sandwich before packing it for work---you can do it but you end
up with a soggy sandwich by the time you are eating it.  Better to
store the bread and tomatoe separately and put them together when you
want a delicious sandwich.

Cheers,

Josh

On Thu, Jun 28, 2012 at 8:03 AM, Jessica Streicher
 wrote:
> Hi!
>
> I am getting a lot of numbers in the background of the pca screeplots if i 
> use call("plot") and eval(somecall).
> Til now, creating the calls and plotting later on this way worked fine. 
> Example:
>
>  pcaI<-prcomp(iris[,1:4])
>  plot(pcaI)
>
>  x<-call("plot",pcaI)
>  eval(x)
>
> Anyone got an idea how i can avoid that? (also it might take a second or so 
> for the numbers to appear, so wait a bit before you tell me everythings fine 
> ^^)
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large Test Datasets in R

2012-06-24 Thread Joshua Wiley
Hi Ravi,

My hunch would be "no" because it seems awfully inefficient.  Packages
are mirrored all over the world, and it seems rather silly to be
mirroring, updating, etc. large datasets.

The good news is that if you just want a 10,000 x 100,000 matrix of
0/1s, it is trivial to generate:

X <- matrix(sample(0L:1L, 10^9, TRUE), nrow = 10^4)

Even stored as integers, this is probably going to be around 4GB.  If
you want arbitrary values to later cut:

X <- matrix(rnorm(10^9), nrow = 10^4)

Cheers,

Josh


On Sun, Jun 24, 2012 at 7:08 AM, vioravis  wrote:
> I am looking for some large datasets (10,000 rows & 100,000 columns or vice
> versa) to create some test sets.  I am not concerned about the invidividual
> elements since I will be converting them to binary (0/1) by using arbitrary
> thresholds.
>
> Does any R package provide such big datasets?
>
> Also, what is the biggest text document collection available in R? tm
> package seems to provide only 20 records from the Reuters dataset. Is there
> any package that has 10,000+ documents??
>
> Would appreciate any help on these.
>
> Thank you.
>
> Ravi
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Defining multiple variables in a loop

2012-06-24 Thread Joshua Wiley
untry1$lagtaxVSgdp1, size =
>>>> (nrow(country1)), replace = T))
>>>> tax2 <- as.matrix(sample(country2$lagtaxVSgdp1, size =
>>>> (nrow(country2)), replace = T))
>>>> tax3 <- as.matrix(sample(country3$lagtaxVSgdp1, size =
>>>> (nrow(country3)), replace = T))
>>>>
>>>> gdp1 <- as.matrix(sample(country1$yoygdpcapita, size =
>>>> (nrow(country1)), replace = T))
>>>> gdp2 <- as.matrix(sample(country2$yoygdpcapita, size =
>>>> (nrow(country2)), replace = T))
>>>> gdp3 <- as.matrix(sample(country3$yoygdpcapita, size =
>>>> (nrow(country3)), replace = T))
>>>>
>>>> unemployment1 <- as.matrix(sample(country1$lagunemployment, size =
>>>> (nrow(country1)), replace = T))
>>>> unemployment2 <- as.matrix(sample(country2$lagunemployment, size =
>>>> (nrow(country2)), replace = T))
>>>> unemployment3 <- as.matrix(sample(country3$lagunemployment, size =
>>>> (nrow(country3)), replace = T))
>>>>
>>>> country.year1 <- as.matrix(cbind(country1$Country, country1$Year2))
>>>> country.year2 <- as.matrix(cbind(country2$Country, country2$Year2))
>>>> country.year3 <- as.matrix(cbind(country3$Country, country3$Year2))
>>>>
>>>> country1.2 <- as.data.frame(cbind(country.year1, exp1, tax1, gdp1,
>>>> unemployment1))
>>>> country2.2 <- as.data.frame(cbind(country.year2, exp2, tax2, gdp2,
>>>> unemployment2))
>>>> country3.2 <- as.data.frame(cbind(country.year3, exp3, tax3, gdp3,
>>>> unemployment3))
>>>>
>>>> data <- as.data.frame(rbind(country1.2, country2.2, country3.2))
>>>>
>>>> OECDsamplepanel <- pdata.frame(data, index = NULL, drop = F)
>>>>
>>>> plm <- plm(V5 ~ lag(V6, 1) + V3 + V4 + V5, data = OECDSamplepanel,
>>>> model = "within")
>>>>
>>>> coefficients <- t(as.matrix(plm$coefficients))
>>>> fixef <- t(as.matrix(fixef(plm)))
>>>>
>>>> plmcoef[i, 1:4] = coefficients
>>>> plmfixef[i, 1:3] = fixef
>>>>
>>>> }
>>>>
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eval parse question

2012-06-20 Thread Joshua Wiley
Hi Sebastian,

Probably, but I suspect the "correct" answer is simply do not do it
that way.  Instead of:

test1 <- 1:10
test2 <- 11:20
...
test5 <- 41:50
testt5[7] <- .435

do

test <- list(1:10, 11:20, 21:30, 31:40, 41:50)

then it is as easy as

test[[5]][7] <- .435

Cheers,

Josh

On Wed, Jun 20, 2012 at 12:59 AM, Leuzinger  Sebastian
 wrote:
> Dear all
>
> Is there a more efficient way of achieving the following (assigning to an 
> indexed vector in a loop):
>
> test5 <- 1:10
> eval(parse(text=paste("test",5,"[",7,"]<- ",0.435,sep="")))
>
> this works, but it is probably very slow.
>
> Thanks
> Sebastian Leuzinger
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] populating a large matrix

2012-06-19 Thread Joshua Wiley
Hi Nick,

Off the cuff:

doit2 <- function(s, d, e) {
  matrix(c(s, d, d*d, e, s, d*e, e*e, e*d, s), 3, byrow=FALSE)
}

doit2(s, d, e)

seems like it should offer a substantial speed up.  Unless the matrix
can be populated with arbitrary formulae, I do not see why it should
be a text matrix and then evaluated.  If you have not already, I would
also encourage you to check out OpenMx.  It is a rather flexible
matrix optimizer, and you may be able to get it to do what you want.

http://openmx.psyc.virginia.edu/docs/OpenMx/latest/Likelihood_Matrix.html

Cheers,

Josh

On Tue, Jun 19, 2012 at 8:10 PM, Nick Matzke  wrote:
> Hi all,
>
> This question is slightly weird.  I am trying to populate a matrix with
> equations.  The matrix represents transition probabilities between states.
>  A simple example is:
>
>        state1  state2  state3
> state1  s       d       d*d
> state2  e       s       d*e
> state3  e*e     e*d     s
>
> The parameters s, d, and e need to be optimized with an iterative algorithm.
>  This means I have to modify, say, d, and then recalculate the transition
> probabilities for each cell.
>
> Currently, I do this by making a matrix with the equations in character
> format, setting s, e, and d to values, and then running each cell through
> parse(eval(text=celltxt)). As follows:
>
> #
> # Test code:
> # Make the text matrix
> txtmat = matrix(c("s", "d", "d*d", "e", "s", "d*e", "e*e", "e*d", "e*d"),
> nrow=3, byrow=TRUE)
>
> s=0.7
> d=0.2
> e=0.1
>
> doit <- function(celltxt)
>        {
>        cellval = eval(parse(text=celltxt))
>        return(cellval)
>        }
>
> # Calculate the matrix with numerical values
> matrix_vals = sapply(X=txtmat, FUN=doit)
> valmat = matrix(matrix_vals, nrow=3, byrow=TRUE)
> valmat
> # End test code
> #
>
>
> ...however, this seems to get slow for large matrices. Since I have to
> optimize all the parameters I need something that updates the matrix quickly
> when s, d, or e is changed.  Perhaps this is a job for pointers in C++ or
> something, but I figure there must be a better way in R.
>
> Can anyone think of something more sophisticated than my current method?
>
> Thanks in advance for any and all help!!
>
> Cheers,
> Nick
>
>
>
>
> --
> 
> Nicholas J. Matzke
> Ph.D. Candidate, Graduate Student Researcher
>
> Huelsenbeck Lab
> Center for Theoretical Evolutionary Genomics
> 4151 VLSB (Valley Life Sciences Building)
> Department of Integrative Biology
> University of California, Berkeley
>
> Graduate Student Instructor, IB200B
> Principles of Phylogenetics: Ecology and Evolution
> http://ib.berkeley.edu/courses/ib200b/
> http://phylo.wikidot.com/
>
>
> Lab websites:
> http://ib.berkeley.edu/people/lab_detail.php?lab=54
> http://fisher.berkeley.edu/cteg/hlab.html
> Dept. personal page:
> http://ib.berkeley.edu/people/students/person_detail.php?person=370
> Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html
> Lab phone: 510-643-6299
> Dept. fax: 510-643-6264
>
> Cell phone: 510-301-0179
> Email: mat...@berkeley.edu
>
> Mailing address:
> Department of Integrative Biology
> 1005 Valley Life Sciences Building #3140
> Berkeley, CA 94720-3140
>
> -
> "[W]hen people thought the earth was flat, they were wrong. When people
> thought the earth was spherical, they were wrong. But if you think that
> thinking the earth is spherical is just as wrong as thinking the earth is
> flat, then your view is wronger than both of them put together."
>
> Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer,
> 14(1), 35-44. Fall 1989.
> http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Storing datasets

2012-06-12 Thread Joshua Wiley
On Tue, Jun 12, 2012 at 12:24 AM, Rody  wrote:
> I found a solution myself, but thanks for the answers. I solved it like this:
> D <- matrix(1:225*100,nrow=100,ncol=225)
> for(i in 1:225)
>     D[,i] <- rt(100,df=225)
> end

but as Don said, you can do this in one step (and it is both faster
and more elegant).

D <- matrix(rt(100 * 225, df = 225), ncol = 225)

Cheers,

Josh

>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Storing-datasets-tp4632874p4633071.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >