from:"Meyners, Michael"

Re: [R] Randomization Test

2019-02-28 Thread Meyners, Michael

Ogbos,

To share data (in particularly lengthy one as yours), check out ?dput

To replicate sampling, look at ?replicate (will output to a data frame to use 
further) - that should answer your Q1.

Apart from that (and regarding Q2), the task you are after is getting more and 
more obscure to me. I don't see what you really want to test - between levels 
of n (or A in oodf)? If so, you would need to permute levels within each set 
(are they sets of -5:10, or are all observations completely independent? In the 
latter case, across all sets). What is the null hypothesis you want to test? 
And what is the data exactly? I don't understand what the two columns indicate. 
Then you need to decide on what the test statistic is you want to use. The 
average? The difference between each pair of averages? Anything like that? Do 
you want to test all levels simultaneously, or by pairs? 

Clearly, your way of sampling is inappropriate for a randomization test. You 
are rather simulating data that can take any value between min and max with 
equal probability. That does not seem to be the right null model to me (it 
might be, though, depending on your null hypothesis). It would not make a 
randomization test either way, but rather a Monte Carlo simulation under the 
null hypothesis. Then you would determine your test statistic(s) (whichever 
they are) and subsequently repeat that often enough and check whether your 
observed value is higher than 90 or 99% of the simulated ones. Again, that 
would be a simulation, not a randomization test.

For the latter, you'd need to permute the data in an appropriate way (again 
depending on the null hypothesis, and on the structure, i.e. are they are 
coming in blocks of any kind, or all independent), and then recalculate the 
test statistic every time, and then proceed as before. I'm not sure whether the 
data was the result of a randomized experiment - if not, you can still use the 
idea, but fall into something that some refer to as "permutation testing" - the 
difference being that you need to make much stronger assumptions on 
independence etc of the data. Many use the terms equivalently, but just so you 
are aware. 

I really think you need to look into a textbook (eg. Edgington & Onghena) or 
some papers to understand the concept of randomization tests, or consult with a 
statistician with good background in that field. What you are suggesting is not 
near it, and unless you have a clear hypothesis and a good understanding of how 
the data was generated, it is impossible for you (and anyone else) to say how 
such a test might be designed.

Michael 


> -Original Message-
> From: Ogbos Okike 
> Sent: Mittwoch, 27. Februar 2019 22:53
> To: Meyners, Michael 
> Cc: r-help 
> Subject: Re: [R] Randomization Test
> 
> Dear Kind List,
> 
> I am still battling with this. I have, however, made some progress with the
> suggestions of Micheal and others. At least, I have a better picture of what I
> want to do now as I will attempt a detailed description here.
> 
> I am aware I should show you just a small part of my code and data.
> But when I copied out a small portion and run to see what you get when I
> send that,  I was not satisfied with the signal displayed. The epoch analysis
> averages data and is quite sensitive to leveraging, especially if a small 
> sample
> is used.
> 
> So please permit/exercise patience  me to display the series of epoch that
> give the averaged valued used. You can just run the code and see the signal
> of interest. Here is the code and the data:
> 
> dta <- read.table( text ="n CR
>  -5 8969
.
SNIP...
.
> 10 9566
> ",header=TRUE)
> 
>  data<-matrix(c(dta$CR),ncol=71)
> A<-matrix(rep(-5:10,71))
> B<-matrix(data)
> 
>  oodf<-data.frame(A,B)
>  a<--5:10
> oodf<-data.frame(A,B)
> library(plotrix)
> std.error<-function(x) return(sd(x)/(sum(!is.na(x
> oomean<-as.vector(by(oodf$B,oodf$A,mean))
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="l",ylim=c(8890,9100),
>  )
> A<-oomean-1.96*oose
>  B<-oomean+1.96*oose
> lines(a,A,col="red")
>  lines(a,B,col="red")
> 
>  My Question:
> I wish to conduct a randomization test of significance (90 and 99
> percentile) of the reductions/decreases as displayed by the signal.
> 
> I am attempting using:
> x<-sample(8890:9500,1136,replace=T )
> 
> to generate the random numbers, where 8890, 9500 and 1136 are the
> minimum  and maximum of the signal and 1136 the length of sample data.
> Q1: Please how do I generate many samples as x above, say up to 5000 or
> 10,000? I manually generated and stored as x1,x2, x3 up to x100.
> 
> Q2: Please how do I use this randomly generated numbers to test the
> st

Re: [R] Randomization Test

2019-02-08 Thread Meyners, Michael

Ogbos,

You do not seem to have received a reply over the list yet, which might be due 
to the fact that this seems rather a stats than an R question. Neither got your 
attachment (Figure) through - see posting guide. 

I'm not familiar with epoch analysis, so not sure what exactly you are doing / 
trying to achieve, but some general thoughts: 

* You do NOT want to restrict your re-randomizations in a way that "none of the 
dates corresponds with the ones in the real event" - actually, as a general 
principle, the true data must be an admissible re-randomization as well. You 
seem to have excluded that (and a lot of other randomizations at the same time 
which might have occurred, i.e. dates 1 and 2 reversed but all others the 
same), thereby rendering the test invalid. Any restrictions you have on your 
re-randomizations must've applied to the original randomization as well.
* If you have rather observational data (which I suspect, but not sure), 
Edgington & Onghena (2007) would rather refer to this as a permutation test - 
the difference being that you have to make strong assumptions (similar to 
parametric tests) on the nature of the data, which are designed-in to be true 
for randomization tests. It might be a merely linguistic discrimination, but it 
is important to note which assumptions have to be (implicitly) made.
* I'm not sure what you mean by "mean differences" of the events - is that two 
groups you are comparing? If so, that seems reasonable, but just make sure the 
test statistic you use is reasonable and sensitive against the alternatives you 
are mostly interested in. The randomization/permutation test will never proof 
that, e.g., means are significantly different, but only that there is SOME 
difference. By selecting the appropriate test statistic, you can influence what 
will pop up more easily and what not, but you can never be sure (unless you 
make strong assumptions about everything else, like in many parametric tests).
* For any test statistic, you would then determine the proportion of its values 
among the 5000 samples where it is as large or larger than the one observed (or 
as small or smaller, or either, depending on the nature of the test statistic 
and whether you aim for a one- or a two-sided test). That is your p value. If 
small enough, conclude significance. At least conceptually important: The 
observed test statistic is always part of the re-randomization (i.e. your 5000) 
- so you truly only do 4999 plus the one you observed. Otherwise the test may 
be more or less liberal. Your p value is hence no smaller than 1/n, where n is 
the total number of samples you looked at (including the observed one), a p 
value of 0 is not possible in randomization tests (nor in other tests, of 
course).

I hope this is helpful, but you will need to go through these and refer to your 
own setup to check whether you adhered to the principles or not, which is 
impossible for me to judge based on the information provided (and I won't be 
able to look at excessive code to check either).

Michael

> -Original Message-
> From: R-help  On Behalf Of Ogbos Okike
> Sent: Montag, 28. Januar 2019 19:42
> To: r-help 
> Subject: [R] Randomization Test
> 
> Dear Contributors,
> 
> I conducting epoch analysis. I tried to test the significance of my result 
> using
> randomization test.
> 
> Since I have 71 events, I randomly selected another 71 events, making sure
> that none of the dates in the random events corresponds with the ones in
> the real event.
> 
> Following the code I found here
> (https://www.uvm.edu/~dhowell/StatPages/R/RandomizationTestsWithR/R
> andom2Sample/TwoIndependentSamplesR.html),
> I combined these two data set and used them to generate another 5000
> events. I then plotted the graph of the mean differences for the 5000
> randomly generated events. On the graph, I indicated the region of the
> mean difference between the real 71 epoch and the randomly selected 71
> epoch.
> 
> Since the two tail test shows that the mean difference falls at the extreme of
> the randomly selected events, I concluded that my result is statistically
> significant.
> 
> 
> 
> I am attaching the graph to assistance you in you suggestions.
> 
> I can attach both my code and the real and randomly generated events if you
> ask for it.
> 
> My request is that you help me to understand if I am on the right track or no.
> This is the first time I am doing this and except the experts decide, I am not
> quite sure whether I am right or not.
> 
> Many thanks for your kind concern.
> 
> Best
> Ogbos
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUB

Re: [R] ANOVA Permutation Test

2018-09-03 Thread Meyners, Michael

Juan, 

Your question might be borderline for this list, as it ultimately rather seems 
a stats question coming in R disguise.

Anyway, the short answer is that you *expect* to get a different p value from a 
permutation test unless you are able to do all possible permutation and 
therefore use the so-called systematic reference set. That is rarely the case, 
and only for relatively small problems. 
The permutation test uses a random subset of all possible permutations. Given 
this randomness, you'll get a different p value. In order to get reproducible 
results, you may specify a seed (?set.seed), yet that is only reproducible with 
this environment. Someone else with a different software and/or code might come 
out with a different p. The higher the number of permutations used, the smaller 
the variation around the p values, however. For most applications, 1000 seem 
good enough to me, but sometimes I go higher (in particular if the p value is 
borderline and I really need a strict above/below alpha decision).

The permutations do not create an implicit normal distribution, but rather a 
null distribution that can (likely is depending on non-normality of your data) 
not normal. So your respective proposal does not appeal.

I don't think you need an alternative - the permutation test is just fine, and 
recognizing the randomness in the execution does not render the (relatively 
small) variability in p values a major issue.

You may want to have a look at the text book by Edgington & Onghena for details 
on permutation tests, and there are plenty of papers out there addressing them 
in various contexts, which will help to understand *why* you observe what you 
observe here. 

HTH, Michael



> -Original Message-
> From: R-help  On Behalf Of Juan Telleria Ruiz
> de Aguirre
> Sent: Montag, 3. September 2018 17:18
> To: R help Mailing list 
> Subject: [R] ANOVA Permutation Test
> 
> Dear R users,
> 
> I have the following Question related to Package lmPerm:
> 
> This package uses a modified version of aov() function, which uses
> Permutation Tests instead of Normal Theory Tests for fitting an Analysis of
> Variance (ANOVA) Model.
> 
> However, when I run the following code for a simple linear model:
> 
> library(lmPerm)
> 
> e$t_Downtime_per_Intervention_Successful %>%
>   aovp(
> formula = `Downtime per Intervention[h]` ~ `Working Hours`,
> data = .
>   ) %>%
>   summary()
> 
> I obtain different p-values for each run!
> 
> With a regular ANOVA Test, I obtain instead a constant F-statistic, but I do 
> not
> fulfill the required Normality Assumptions.
> 
> So my questions are:
> 
> Would it still be possible use the regular aov() by generating permutations in
> advance (Obtaining therefore a Normal Distribution thanks to the Central
> Limit Theorem)? And applying the aov() function afterwards? Does it have
> sense?
> 
> 
> Or maybe this issue could be due to unbalanced classes? I also tried to weight
> observations based on proportions, but the function failed.
> 
> 
> Any alternative solution for performing a One-Way ANOVA Test over Non-
> Normal Data?
> 
> 
> Thank you.
> 
> Juan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Letters group Games-Howell post hoc in R

2018-01-18 Thread Meyners, Michael

Apologies if I missed any earlier replies - did you check
multcompLetters in package {multcompView}?
It allows you to get connecting letters reports (if that's what you are after, 
I didn't check what exactly agricolae is providing here). May have to add some 
manual steps to combine this with any data (means or whatever) you want to 
report.
multcompLetters allows you to use p values or a logical (significant or not)
HTH, Michael

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Bars
> Cortina
> Sent: Dienstag, 16. Januar 2018 13:51
> To: R-help@r-project.org
> Subject: [R] Letters group Games-Howell post hoc in R
> 
> Hello everybody,
> 
> I use the sweetpotato database included in R package:
> 
> data(sweetpotato) This dataset contains two variables: yield(continous
> variable) and virus(factor variable).
> 
> Due to Levene test is significant I cannot assume homogeneity of variances
> and I apply Welch test in R instead of one-way ANOVA followed by Tukey
> posthoc.
> 
> Nevertheless, the problems come from when I apply posthoc test. In Tukey
> posthoc test I use library(agricolae) and displays me the superscript letters
> between virus groups. Therefore there are no problems.
> 
> Nevertheless, to perform Games-Howell posthoc, I use
> library(userfriendlyscience) and I obtain Games-Howell output but it's
> impossible for me to obtain a letter superscript comparison between virus
> groups as it is obtained through library(agricolae).
> 
> The code used it was the following:
> 
> library(userfriendlyscience)
> 
> data(sweetpotato)
> 
> oneway<-oneway(sweetpotato$virus, y=sweetpotato$yield, posthoc =
> 'games-howell')
> 
> oneway
> 
> I try with cld() importing previously library(multcompView) but doesn't work.
> 
> Can somebody could helps me?
> 
> Thanks in advance,
> 
> David Bars.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Jaccard index

2015-07-15 Thread Meyners, Michael

Did you search the internet? At first attempt, 

vegdist {vegan}(worked well for me in the past) and 

dist.binary {ade4}

seem to offer what you need. 

HTH, Michael

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of sreenath
> Sent: Mittwoch, 15. Juli 2015 09:35
> To: r-help@r-project.org
> Subject: [R] Jaccard index
> 
> How can i find jaccard index of two groups,which package is to be used?
> please help
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Jaccard-index-
> tp4709883.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: RE: Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design

2015-07-07 Thread Meyners, Michael

Angelo,

the conservatism of Bonferroni aside for a moment, I don't really see what you 
are after. 

mcp(Emotion = "Tukey")

just defines the contrasts to test; in this case, all pairwise comparisons. 

The test=adjusted("bonferroni") argument then says that these contrasts should 
not be corrected for according to Tukey's procedure, but using Bonferroni 
adjustment. 

So this should give you tests for all pairwise comparisons with multiplicity 
correction according to Bonferroni. I wonder how this differs from the results 
you get from SPSS. 

Seemingly, this is not what you want, so the question is what you really want. 
"Bonferroni method" does not indicate which comparisons / contrasts to look at, 
but just takes all those that you are interested in and multiply the 
corresponding p values with the number of comparisons (and making sure it does 
not exceed 1). As John indicated, that can easily be handled manually. Yet, you 
need to create the tests for all comparisons that you are interested in - if 
not via  linfct=mcp(Emotion = "Tukey"), you need to specify them otherwise (see 
the three options indicated on ?glht). The code I suggested offers a convenient 
shortcut in case you are interested in all pairwise comparisons and want them 
to be corrected according to Bonferroni, but if something else is of interest, 
you'd need to specify this (and let us know, as "Bonferroni method" does not 
give a clue about which comparisons to test).

NB: You may want to pay attention to the warning halfway down the helppage for 
?mcp; it may not be clear exactly which effects you want to compare; mcp uses 
just the main effects w/o interactions etc.

Michael

> -Original Message-
> From: John Fox [mailto:j...@mcmaster.ca]
> Sent: Dienstag, 7. Juli 2015 03:24
> To: angelo.arc...@virgilio.it
> Cc: Meyners, Michael; r-help@r-project.org
> Subject: Re: [R] R: RE: Bonferroni post hoc test in R for repeated measure
> ANOVA with mixed within and between subjects design
> 
> Dear Angelo,
> 
> The Bonferroni p-value is just the ordinary p-value times the number of
> tests, so, since R supports multiplication, you can apply the Bonferroni
> adjustment in R. Because Bonferroni tests for multiple comparisons can be
> very conservative, asking why you want to use them is a fair question.
> 
> Best,
>  John
> 
> 
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox/
> 
> 
> 
> On Tue, 7 Jul 2015 00:50:49 +0200 (CEST)  "angelo.arc...@virgilio.it"
>  wrote:
> > Dear Michael,
> > thank you for your answer, however, I am not asking for the tukey with
> > the bonferroni adjustment, but doing the post hoc with the bonferroni
> method.
> > Apparently this is done easily in SPSS, I am wondering whether it is 
> > possible
> with R.
> >
> > Can anyone help me?
> >
> > Thanks in advance
> >
> >
> > Angelo
> >
> >
> >
> > Messaggio originale
> > Da: meyner...@pg.com
> > Data: 6-lug-2015 17.52
> > A: "angelo.arc...@virgilio.it",
> > "r-help@r-project.org"
> > Ogg: RE: [R] Bonferroni post hoc test in R for repeated measure ANOVA
> > with mixed within and between subjects design
> >
> > Untested, but if anything, your best bet is likely something like
> >
> > summary(glht(lme_H2H, linfct=mcp(Emotion = "Tukey")),
> > test=adjusted("bonferroni"))
> >
> > should work (despite the question why you'd want to use Bonferroni
> > rather than Tukey For a reference, see the book on the topic by the
> > package authors. Might be in the paper, too, which is given by
> >
> > citation("multcomp")
> >
> > HTH, Michael
> >
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> > > angelo.arc...@virgilio.it
> > > Sent: Montag, 6. Juli 2015 16:01
> > > To: r-help@r-project.org
> > > Subject: [R] Bonferroni post hoc test in R for repeated measure
> > > ANOVA with mixed within and between subjects design
> > >
> > > Dear List Members,
> > >
> > >
> > >
> > > I need to perform a Bonferroni post hoc test in R on a table with
> > > three within subjects factors (Emotion, having 5 levels, Material,
> > > having 4 levels, Shoes, having 2 levels) and one between subject factor
> (Musician, having 2 levels).
> > >
> > >
> > > I normally use the Tukey method with the following formula
> > >
&g

Re: [R] Bonferroni post hoc test in R for repeated measure ANOVA with mixed within and between subjects design

2015-07-06 Thread Meyners, Michael

Untested, but if anything, your best bet is likely something like 

summary(glht(lme_H2H, linfct=mcp(Emotion = "Tukey")), 
test=adjusted("bonferroni"))

should work (despite the question why you'd want to use Bonferroni rather than 
Tukey
For a reference, see the book on the topic by the package authors. Might be in 
the paper, too, which is given by

citation("multcomp")

HTH, Michael


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> angelo.arc...@virgilio.it
> Sent: Montag, 6. Juli 2015 16:01
> To: r-help@r-project.org
> Subject: [R] Bonferroni post hoc test in R for repeated measure ANOVA with
> mixed within and between subjects design
> 
> Dear List Members,
> 
> 
> 
> I need to perform a Bonferroni post hoc test in R on a table with three within
> subjects factors (Emotion, having 5 levels, Material, having 4 levels, Shoes,
> having 2 levels) and one between subject factor (Musician, having 2 levels).
> 
> 
> I normally use the Tukey method with the following formula
> 
> require(nlme)
> lme_H2H = lme(H2H ~ Emotion*Material*Shoes*Musician, data=scrd,
> random = ~1|Subject)
> require(multcomp)
> summary(glht(lme_H2H, linfct=mcp(Emotion = "Tukey")))
> 
> 
> 
> I am not able to find any reference that explains with an example of R  code
> how to perform a post hoc test with the Bonferroni procedure.
> Can anyone provide an example to perform the same post hoc test in the
> code above but with Bonferroni instead of Tukey?
> 
> 
> Thank you in advance
> 
> 
> 
> Angelo
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail

2015-06-09 Thread Meyners, Michael

Thanks Martin. 
Yep, I understand it is documented and my code wasn't as it should've been -- 
the confusion comes from the fact that it worked ok for hundreds of situations 
that seem very much alike, but one situation breaks. I agree that you typically 
can't be sure about having only numerical data in the data frame, but I was 
sure I had by design (numeric results of simulations, so no factors or anything 
else) and was then sloppy in passing the rows of the data frame to ecdf. So 
wondering what makes this situation different from all the others I had... 
Anyway, point taken and working solution found, so all fine :-)
Cheers, Michael

> -Original Message-
> From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
> Sent: Montag, 8. Juni 2015 16:43
> To: Meyners, Michael
> Cc: r-help@r-project.org
> Subject: Re: [R] mismatch between match and unique causing ecdf (well,
> approxfun) to fail
> 
> 
> > Aehm, adding on this: I incorrectly *assumed* without testing that
> rounding would help; it doesn't:
> > ecdf(round(test2,0))# a rounding that is way too rough for my
> application...
> > #Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> >
> > Digging deeper: The initially mentioned call to unique() is not very 
> > helpful,
> as test2 is a data frame, so I get what I deserve, an unchanged data frame
> with 1 row. Still, the issue remains and can even be simplified further:
> >
> > > ecdf(data.frame(a=3, b=4))
> > Empirical CDF
> > Call: ecdf(data.frame(a = 3, b = 4))
> >  x[1:2] =  3,  4
> >
> > works ok, but
> >
> > > ecdf(data.frame(a=3, b=3))
> > Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> >
> > doesn't (same for a=b=1 or 2, so likely the same for any a=b).
> > Instead,
> >
> > > ecdf(c(a=3, b=3))
> > Empirical CDF
> > Call: ecdf(c(a = 3, b = 3))
> >  x[1:1] =  3
> >
> > does the trick. From ?ecdf, I get that x should be a numeric vector -
> apparently, my misuse of the function by applying it to a row of a data frame
> (i.e. a data frame with one row). In all my other (dozens of) cases that
> worked ok, though but not for this particular one. A simple unlist() helps:
> 
> You were lucky.   To use a one-row data frame instead of a
> numerical vector will typically *not* work unless ... well, you are lucky.
> 
> No, do *not*  pass data frame rows instead of numeric vectors.
> 
> >
> > > ecdf(unlist(data.frame(a=3, b=3)))
> > Empirical CDF
> > Call: ecdf(unlist(data.frame(a = 3, b = 3)))
> >  x[1:1] =  3
> >
> > Yet, I'm even more confused than before: in my other data, there were
> also duplicated values in the vector (1-row-data frame), and it never caused
> any issue. For this particular example, it does. I must be missing something
> fundamental...
> >
> 
> well.. I'm confused about why you are confused, but if you are thinking
> about passing rows of data frames as numeric vectors, this means you are
> sure that your data frame only contains "classical numbers" (no factors, no
> 'Date's, no...).
> 
> In such a case, transform your data frame to a numerical matrix
> *once* preferably using  data.matrix() instead of just
> as.matrix() but in this case it should not matter.
> Then *check* the result and then work with that matrix from then on.
> 
> All other things probably will continue to leave you confused ..
> ;-)
> 
> Martin Maechler,
> ETH Zurich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mismatch between match and unique causing ecdf (well, approxfun) to fail

2015-06-08 Thread Meyners, Michael

Aehm, adding on this: I incorrectly *assumed* without testing that rounding 
would help; it doesn't:

ecdf(round(test2,0))# a rounding that is way too rough for my application...
#Error in xy.coords(x, y) : 'x' and 'y' lengths differ

Digging deeper: The initially mentioned call to unique() is not very helpful, 
as test2 is a data frame, so I get what I deserve, an unchanged data frame with 
1 row. Still, the issue remains and can even be simplified further:

> ecdf(data.frame(a=3, b=4))
Empirical CDF 
Call: ecdf(data.frame(a = 3, b = 4))
 x[1:2] =  3,  4

works ok, but

> ecdf(data.frame(a=3, b=3))
Error in xy.coords(x, y) : 'x' and 'y' lengths differ

doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead, 

> ecdf(c(a=3, b=3))
Empirical CDF 
Call: ecdf(c(a = 3, b = 3))
 x[1:1] =  3

does the trick. From ?ecdf, I get that x should be a numeric vector - 
apparently, my misuse of the function by applying it to a row of a data frame 
(i.e. a data frame with one row). In all my other (dozens of) cases that worked 
ok, though but not for this particular one. A simple unlist() helps:

> ecdf(unlist(data.frame(a=3, b=3)))
Empirical CDF 
Call: ecdf(unlist(data.frame(a = 3, b = 3)))
 x[1:1] =  3

Yet, I'm even more confused than before: in my other data, there were also 
duplicated values in the vector (1-row-data frame), and it never caused any 
issue. For this particular example, it does. I must be missing something 
fundamental...
 
Michael

> -Original Message-
> From: Meyners, Michael
> Sent: Montag, 8. Juni 2015 12:02
> To: 'r-help@r-project.org'
> Subject: mismatch between match and unique causing ecdf (well,
> approxfun) to fail
> 
> All,
> 
> I encountered the following issue with ecdf which was originally on a vector
> of length 10,000, but I have been able to reduce it to a minimal reproducible
> example (just to avoid questions why I'd want to do this for a vector of
> length 2...):
> 
> test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344),
> .Names = c("X817", "X4789"), row.names = 74L, class = "data.frame")
> ecdf(test2)
> 
> # Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> 
> In an attempt to track this down, it occurs that
> 
> unique(test2)
> #   X817X4789
> #74 3.398247 3.398247
> 
> while
> 
> match(test2, unique(test2))
> #[1] 1 1
> 
> matches both values to the first one. This causes a hiccup in the call to 
> ecdf,
> as this uses (an equivalent to) a call to approxfun with x = test2 and y =
> cumsum(tabulate(match(test2, unique(test2, the latter now containing
> one entry less than the former, so xy.coords fails.
> 
> I understand that the issue should be somehow related  to FAQ 7.31, but I
> would have hoped that unique and match would be using the same precision
> and hence both or neither would consider the two values identical, but not
> one match while unique doesn't.
> 
> Last but not least, it doesn't really cause an issue on my end (other than
> breaking my code and hence out of a loop at first place...); rounding will 
> help
> w/o noteworthy changes to the outcome, so no need to propose a
> workaround :-) I'd rather like to raise the issue and learn whether there is a
> purpose for this behavior, and/or whether there is a generic fix to this, or
> whether I am completely missing something.
> 
> Version info (under Windows 7):
> R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> Cheers, Michael

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mismatch between match and unique causing ecdf (well, approxfun) to fail

2015-06-08 Thread Meyners, Michael

All,

I encountered the following issue with ecdf which was originally on a vector of 
length 10,000, but I have been able to reduce it to a minimal reproducible 
example (just to avoid questions why I'd want to do this for a vector of length 
2...):

test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), 
.Names = c("X817", "X4789"), row.names = 74L, class = "data.frame")
ecdf(test2) 

# Error in xy.coords(x, y) : 'x' and 'y' lengths differ

In an attempt to track this down, it occurs that 

unique(test2)
#   X817X4789
#74 3.398247 3.398247

while 

match(test2, unique(test2))
#[1] 1 1

matches both values to the first one. This causes a hiccup in the call to ecdf, 
as this uses (an equivalent to) a call to approxfun with x = test2 and y = 
cumsum(tabulate(match(test2, unique(test2, the latter now containing one 
entry less than the former, so xy.coords fails.

I understand that the issue should be somehow related  to FAQ 7.31, but I would 
have hoped that unique and match would be using the same precision and hence 
both or neither would consider the two values identical, but not one match 
while unique doesn't. 

Last but not least, it doesn't really cause an issue on my end (other than 
breaking my code and hence out of a loop at first place...); rounding will help 
w/o noteworthy changes to the outcome, so no need to propose a workaround :-) 
I'd rather like to raise the issue and learn whether there is a purpose for 
this behavior, and/or whether there is a generic fix to this, or whether I am 
completely missing something.

Version info (under Windows 7): 
R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Platform: x86_64-w64-mingw32/x64 (64-bit)

Cheers, Michael 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R / JMP interface

2014-05-07 Thread Meyners, Michael

Not sure about JMP 11, but remember that JMP 10 did not run with R version >= 
3.0.0 
It depends a bit on the changes that come with new R versions; with JMP 10, 
several versions of the 2.x series were compatible even though JMP officially 
only supported earlier versions. I had hoped that with JMP 11 supporting R 
3.0.1, this would work for a while for the following R versions, but that's 
apparently not the case; I haven't checked recently. You may have to downgrade 
(well, have a parallel install) of R 3.0.1 or earlier to get this working, that 
worked for me in the past, though admittedly not optimal. JMP will always be 
behind the latest R versions due to the extended testing phase before the 
release their new version -- R typically releases 2 versions or so during this 
testing phase alone... Alternatively, you might want to find out what changes 
in R render it incompatible with JMP and reverse these changes for your own R 
installation if possible -- that's certainly beyond my capabilities...
My understanding is that it is not to  be expected that R 3.1 will ever be 
supported by JMP 11; you'd likely have to wait for JMP 12; I was once told that 
they do not make updates in this regard with their within-major-release 
patches, though certainly desirable from my pov...
HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Samuel J Gardner
> Sent: Mittwoch, 7. Mai 2014 18:20
> To: Robert Douglas Kinley; r-help@r-project.org
> Subject: Re: [R] R / JMP interface
> 
> https://communities.sas.com/message/199936
> 
> see link.  Looks like R 3.1 may not yet be supported with JMP 11.
> 
> _
> From: Robert Douglas Kinley
> Sent: Wednesday, May 07, 2014 12:18 PM
> To: r-help@r-project.org
> Cc: Samuel J Gardner
> Subject: R / JMP interface
> 
> 
> hi useRs
> 
> I am trying-out the facility to call R code from JMP.
> 
> details:  R 3.1.0 , JMP 11.1.1 , Windows 7 enterprise , all 64 bit.
> 
> The test-script from the JMP help pages falls over at the first line :- R 
> Init();
> 
> giving the error-message :-
> The installed version of R cannot be used.  The entry point "ConsoleIob"
> could not be located.
> 
> Has anyone else come across this - and ideally has anyone got a solution   ?
> 
> (I have submitted a request to JMP/SAS help too).
> 
> cheers  Bob
> 
> 
> 
> 
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add constraints to rbinom

2013-07-01 Thread Meyners, Michael

You don't need a constraint (rbinom won't give x>n), but you need to make sure 
you are using the n you want to use: try
 x <- cbind(x,rbinom(300,n[i],p[i]))# mind the "[i]" after the n
at the respective line. Furthermore, you need to remove one transformation of x 
to make sure you divide by the right n, try
rate <- t(x/n)

I think that should do the trick. I didn't check the remainder of the code, 
though.

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Anamika Chaudhuri
> Sent: Freitag, 28. Juni 2013 04:06
> To: r-help@r-project.org
> Subject: [R] Add constraints to rbinom
> 
> Hi:
> 
> I am trying to generate Beta-Binomial random variables and then calculate
> Binomial exact confidence intervals to the rate..I was wondering if I needed
> to add a constraint such that x<=n to it, how do I go about it. The reason is 
> I
> am getting data where x>n which is giving a rate>1. Heres my code:
> 
> set.seed(111)
> k<-63
> x<-NULL
> p<-rbeta(k,3,3)# so that the mean nausea rate is alpha/(alpha+beta)
> min<-10
> max<-60
> n<-as.integer(runif(k,min,max))
> for(i in 1:k)
> x<-cbind(x,rbinom(300,n,p[i]))
> x<-t(x)
> rate<-t(t(x)/n)
> se_rate<-sqrt(rate*(1-rate)/n)
> 
> 
> 
> # Exact Confidence Interval
> 
> l_cl_exact<-qbeta(.025,x,n-x+1)
> u_cl_exact<-qbeta(.975,x+1,n-x)
> 
> for (i in 1:63){
> 
> for (j in 1:300)
> {
> 
> if (x[i,j]==0)
> {
> l_cl_exact[i,j]<-0
> u_cl_exact[i,j]<-u_cl_exact[i,j]
> }
> else if (x[i,j]==n[i])
> {
> l_cl_exact[i,j]<-l_cl_exact[i,j]
> u_cl_exact[i,j]<-1
> }
> else
> l_cl_exact[i,j]<-l_cl_exact[i,j]
> u_cl_exact[i,j]<-u_cl_exact[i,j]
> 
> #print(c(i,j))
> 
> }
> }
> 
> 
> Really appreciate any help.
> Thanks
> Anamika
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Anova und Tukey HSD

2013-05-13 Thread Meyners, Michael

Jochen,

a) this is an English-spoken mailing list; other languages are not encouraged 
nor will they typically generate a lot of replies...
b) your code is fine, so this is not an R-issue; you are rather stuck with some 
of the stats background -- you might want to see a friendly local statistician 
for advice :-) or try a stats mailing list (see e.g. the posting guide)
c) your interpretation is ok -- if the CI does not contain 0, you'd have 
significance at the respective level. Alternatively, you might also use the p 
value (column "p adj") and compare that to your significance level (and it 
seems odd that any search machine might fail to give you some links on the 
relationship of p values and CIs -- I got plenty at first try)
d) In brief: you are doing ~100 (rough estimate) paired comparisons. With that, 
you can expect TukeyHSD to be pretty conservative, and it does not come as a 
surprise that you are unable to identify pairs of different groups, even less 
as the p value from the F test is not extremely small (~3%; whatever "extremely 
small" may mean in this case). Google also easily gets me to this site (and 
many other useful ones): http://www.pmean.com/05/TukeyTest.html which might 
provide some further guidance.

Net, tough luck to encounter such a situation, but apparently none of the group 
differences is strong enough to still show up as "significant" after correcting 
for multiplicity (which is certainly called for) in this case where you have an 
inflated number of significance tests. 

HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Jochen Schreiber
> Sent: Montag, 13. Mai 2013 14:25
> To: r-help@r-project.org
> Subject: [R] Anova und Tukey HSD
> 
> Hallo zusammen,
> 
> ich mache zuerst mittels eine Anova eine Überprüfung ob in den Daten ein
> signifikanter Unterschied exisitiert. Danach will ich mittels des TukeyHSD
> rausfinden, zwischen welchen Gruppe der Unterschied vorliegt. Allerdings
> weiß ich nicht wie ich die Daten interpretieren soll. Hier erstmal mein R 
> Code:
> 
> [code]
> Value<-c(-0.9944999814033508,-0.3585381469727,0.7063000202178955,-
> 1.77435803833,-
> 1.080299973487854,0.3055071525574,1.849046325684,-
> 0.412440395355,0.5827999711036682,1.750669482422,-
> 6.693999767303467,-0.877943869019,-
> 1.3408000469207764,1.2560999393463135,-
> 0.1004081062317,1.849046325684,-
> 0.331905722046,0.4957999885082245,0.877943869019,0.73879998922
> 34802,0.877943869019,0.9154000282287598,0.877943869019,0.70630
> 00202178955,-1.3408000469207764,0.7063000202178955,-
> 0.331905722046,-1.6448999643325806,0.412440395355,-
> 1.6448999643325806,-
> 0.877943869019,0.7487000226974487,0.4399000108242035,1.8490463
> 25684,-1.6448999643325806,-2.4323999881744385,1.2265000343322754,-
> 0.4957999885082245,-9.999899864196777,-1.750669482422,-
> 1.6448999643325806,-9.999899864196777,0.877943869019,-
> 5.06279993057251,0.877943869019,-2.967745776367,-
> 5.06279993057251,-6.693999767303467,-
> 1.0990500450134277,0.9944999814033508,-0.467745776367,-
> 0.3585381469727,-
> 9.999899864196777,0.5827999711036682,0.7487000226974487,0.738799989223
> 4802,-0.2533000111579895,-9.999899864196777,-
> 1.0363999605178833,0.3055071525574,-1.174523162842,-
> 0.806410490417,-9.999899864196777,-0.9944999814033508,-
> 2.478300094604492,-0.150930858612,0.4957999885082245,-
> 4.571800231933594,-6.324900150299072,-0.38530001044273376,-
> 1.3408000469207764,-5.93179988861084,-6.693999767303467,-
> 2.967745776367,0.877943869019,-0.05020405311584,-
> 1.77435803833,-0.150930858612,-0.23725003004074097,-
> 0.643268528748,1.2560999393463135,-
> 0.1004081062317,0.4399000108242035,-
> 0.7063000202178955,0.9154000282287598,-
> 0.2181814033508,1.2265000343322754,-
> 0.412440395355,0.1764581741333,-1.4758000373840332,-
> 0.9944999814033508,-1.080299973487854,-0.643268528748,-
> 9.999899864196777,-2.0536999702453613,-
> 0.2181814033508,0.7487000226974487,0.02510202655792,-
> 1.0363999605178833,-0.05020405311584,-
> 0.7387999892234802,0.4957999885082245,-1.4758000373840332,-
> 0.7063000202178955,0.1764581741333,-5.06279993057251,-
> 0.643268528748,-1.4758000373840332,-0.9944999814033508,-
> 0.2533000111579895,0.1764581741333,-
> 0.331905722046,0.6776500344276428,0.3055071525574,-
> 0.05020405311584,0.5827999711036682,1.2560999393463135,-
> 0.4957999885082245,-0.38530001044273376,0.9944999814033508,-
> 2.4323999881744385,1.126338964844,-
> 0.9944999814033508,1.750669482422,1.080299973487854,-
> 0.7387999892234802,-1.3408000469207764,0.612820980835,-
> 2.0536999702453613,0.7063000202178955,-0.806410490417,-
> 0.877943869019,-0.05020405311584,-2.967745776367,-
> 0.877943869019,-2.0536999702453613,-1.3408000469207764,-
> 1.340800046920776

Re: [R] ks.test and wilcoxon.test results differ from other stat.packages

2013-02-01 Thread Meyners, Michael

Impossible to say w/o a reproducible example, but to start with let me suggest 
looking at the exact= (both functions) and correct= (wilcox.test) arguments. 
Experience shows that some change of the default settings allows you to 
reproduce results from other software (and the help pages will explain you what 
the settings do; by that, you might also learn what your other tools actually 
calculate if you have no documentation at hand).

HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Alexander Favorov
> Sent: Freitag, 1. Februar 2013 09:00
> To: r-help@r-project.org
> Subject: [R] ks.test and wilcoxon.test results differ from other
> stat.packages
> 
> Probably, it's an obvious info, but I have not found anything in R FAQ
> concerning this feature/bug.
> 
> The results of ks.test and wilcoxon.test (in the Mann-Whitney version,
> paired = 'FALSE') don't coincide with the results from the other
> statistical packages, e.g. Statistica, Medcalc, and (as for MW test)
> from the numerous online MW tests.
> 
> E.g.
> Statistica p-value=0.0435353
> Medcalc p-value=0.0435354
> R p-value=0.04635
> 
> If I want to obtain result of test once, it does not matter.
> 
> But what should I do if I want to perform Monte-Carlo simulations and I
> need in 1 or even 100 000 p-values and then will build some
> distribution and then use results of Statictica? Whtever, the
> descrepancy bothers.
> 
> Are there some alternative packages for non-parametric statistics that
> produce results comparable with the other program packages? Or,
> probably, there is exists some environment variable to perform
> calculations in more common way?
> 
> Thank you!
> 
> Sasha.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unexpected (?) behavior of sort=TRUE in merge function

2012-09-04 Thread Meyners, Michael

Rui,

yes, without all=T it works fine, but of course there is no point in the whole 
exercise if I'd drop that, as print(test) would do the same, unless I have 
other values of product or cong in any dataset, which I haven't. :-) 

The purpose of the merge is to have all combinations of the levels of product 
and cong in each dataframe in my list -- there might be smarter ways, but it 
does the trick, and I need a unified setup (layout, size and sorting of the 
data) to ease the following steps in my code. Of course, I could achieve that 
easily by sorting the data subsequently, so there are multiple ways to get what 
I want to have. However, the purpose being a bit beyond this post, it's really 
about the behavior here on data sets that look so similar, and about the fact 
that one of those is not treated like it should be according to documentation). 

Thanks again for taking the time to reply.

Cheers, Michael



> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: Dienstag, 4. September 2012 16:58
> To: Meyners, Michael
> Cc: r-help
> Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function
> 
> Hello,
> 
> You're right I had missed the point, sorry.
> I can't see a reason why that behavior, but it seems to have to do with
> all = T, remove it and the problem is gone. But that's probably not
> what you want.
> NA's issue?
> 
> Rui Barradas
> 
> Em 04-09-2012 15:17, Meyners, Michael escreveu:
> > Rui,
> >
> > Thanks for looking into this. I apologize, I should've added my
> output, maybe it looks differently on my machine than on others. I also
> should have made my question more explicit: I'm not looking for a
> solution to get the sorting one way or another, I have that already. I
> rather want to understand why the same code behaves differently on two
> very similar datasets (one just having less rows, see below).
> >
> > The first call gives the following for me:
> >
> >> lapply(test, function(x) merge(x, expand.grid(product=c("Y1", "Y2",
> >> "G", "F", "L", "K"), cong=c(-1,0,1,11)), all=T, sort=TRUE))
> > [[1]]
> > product congx
> > 1F   -1 5.857143
> > 2F0 3.625000
> > 3F1 4.782609
> > 4F   11 6.301887
> > 5G   -1 7.30
> > 6G0 4.80
> > 7G1 4.424242
> > 8G   11 5.781250
> > 9K   -1 4.375000
> > 10   K0 4.714286
> > 11   K1 3.804348
> > 12   K   11 5.566038
> > 13   L   -1 7.272727
> > 14   L0 6.25
> > 15   L1 4.875000
> > 16   L   11 6.877551
> > 17  Y1   -1 5.857143
> > 18  Y10 3.875000
> > 19  Y11 3.535714
> > 20  Y1   11 5.731707
> > 21  Y2   -1 5.90
> > 22  Y20 2.50
> > 23  Y21 4.638889
> > 24  Y2   11 5.419355
> >
> > [[2]]
> > product congx
> > 1   Y1   -1 3.043478
> > 2   Y10 4.887640
> > 3   Y11   NA
> > 4   Y1   11   NA
> > 5   Y2   -1 4.181818
> > 6   Y20 5.207921
> > 7   Y21   NA
> > 8   Y2   11   NA
> > 9G   -1 3.75
> > 10   G0 5.68
> > 11   G1   NA
> > 12   G   11   NA
> > 13   F   -1 4.315789
> > 14   F0 5.705263
> > 15   F1   NA
> > 16   F   11   NA
> > 17   L   -1 4.50
> > 18   L0 6.386364
> > 19   L1   NA
> > 20   L   11   NA
> > 21   K   -1 3.739130
> > 22   K0 4.967033
> > 23   K1   NA
> > 24   K   11   NA
> >
> >
> > So different from what you may have observed, here the first data set
> [[1]] is sorted by label of "product", not by value. As you correctly
> stated, Y1" is coded as 1, "Y2" as 2, etc., but the first rows are for
> F, followed by G etc. The second [[2]] is sorted by level (value). So I
> have different behavior on very similar looking data sets, and hence to
> me at least one of those cannot be "right" according to documentation
> (but I agree with you that the second is correct according to the
> help). In my larger example, it seems as if data sets which do not
> originally have all combinations of product and cong anyway are sorted
> like [[2]], and those that are complete (all 24 combinations occur) are
> sorted like [[1]] is, which to me is still "unexpected&qu

Re: [R] unexpected (?) behavior of sort=TRUE in merge function

2012-09-04 Thread Meyners, Michael

Rui, 

Thanks for looking into this. I apologize, I should've added my output, maybe 
it looks differently on my machine than on others. I also should have made my 
question more explicit: I'm not looking for a solution to get the sorting one 
way or another, I have that already. I rather want to understand why the same 
code behaves differently on two very similar datasets (one just having less 
rows, see below).

The first call gives the following for me:

> lapply(test, function(x) merge(x, expand.grid(product=c("Y1", "Y2", "G", "F", 
> "L", "K"), cong=c(-1,0,1,11)), all=T, sort=TRUE))
[[1]]
   product congx
1F   -1 5.857143
2F0 3.625000
3F1 4.782609
4F   11 6.301887
5G   -1 7.30
6G0 4.80
7G1 4.424242
8G   11 5.781250
9K   -1 4.375000
10   K0 4.714286
11   K1 3.804348
12   K   11 5.566038
13   L   -1 7.272727
14   L0 6.25
15   L1 4.875000
16   L   11 6.877551
17  Y1   -1 5.857143
18  Y10 3.875000
19  Y11 3.535714
20  Y1   11 5.731707
21  Y2   -1 5.90
22  Y20 2.50
23  Y21 4.638889
24  Y2   11 5.419355

[[2]]
   product congx
1   Y1   -1 3.043478
2   Y10 4.887640
3   Y11   NA
4   Y1   11   NA
5   Y2   -1 4.181818
6   Y20 5.207921
7   Y21   NA
8   Y2   11   NA
9G   -1 3.75
10   G0 5.68
11   G1   NA
12   G   11   NA
13   F   -1 4.315789
14   F0 5.705263
15   F1   NA
16   F   11   NA
17   L   -1 4.50
18   L0 6.386364
19   L1   NA
20   L   11   NA
21   K   -1 3.739130
22   K0 4.967033
23   K1   NA
24   K   11   NA
 

So different from what you may have observed, here the first data set [[1]] is 
sorted by label of "product", not by value. As you correctly stated, Y1" is 
coded as 1, "Y2" as 2, etc., but the first rows are for F, followed by G etc. 
The second [[2]] is sorted by level (value). So I have different behavior on 
very similar looking data sets, and hence to me at least one of those cannot be 
"right" according to documentation (but I agree with you that the second is 
correct according to the help). In my larger example, it seems as if data sets 
which do not originally have all combinations of product and cong anyway are 
sorted like [[2]], and those that are complete (all 24 combinations occur) are 
sorted like [[1]] is, which to me is still "unexpected".

Hope this clarifies my question.

Any thoughts appreciated.
Michael

> -----Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: Dienstag, 4. September 2012 14:01
> To: Meyners, Michael
> Cc: r-help
> Subject: Re: [R] unexpected (?) behavior of sort=TRUE in merge function
> 
> Hello,
> 
> Inline.
> Em 04-09-2012 12:24, Meyners, Michael escreveu:
> 
> All,
> 
> I realize from the archive that the sort argument in merge has been
> subject to discussion before, though I couldn't find an explanation for
> this behavior. I tried to simplify this to (kind of) minimal code from
> a real example to the following (and I have no doubts that there are
> smart people around achieving the same with smarter code :-)). I'm
> running R 2.15.1 64bit under MS Windows 7, full session info below.
> 
> I do have a list with two dataframes:
> 
> test <- list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L,
> 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
> 4L, 5L, 6L), .Label = c("Y1", "Y2", "G", "F", "L", "K"), class =
> "factor"),
> cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1,
> 1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286,
> 5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875,
> 2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429,
> 4.638889, 4.42424242424242, 4.78260869565217, 4.875,
> 3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125,
> 6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names =
> c("product",
> "cong", "x"), row.names = c(NA, -24L), class = "data.frame"),
> structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L,
> 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Y1", "Y2", "G",
> "F", "L", "K"), class = "factor"), cong = c(-1, -1, -1, -1,
> -1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957,
> 4.18181818181818,
> 3.75, 4.31578947368421, 4.

[R] unexpected (?) behavior of sort=TRUE in merge function

2012-09-04 Thread Meyners, Michael

All,

I realize from the archive that the sort argument in merge has been subject to 
discussion before, though I couldn't find an explanation for this behavior. I 
tried to simplify this to (kind of) minimal code from a real example to the 
following (and I have no doubts that there are smart people around achieving 
the same with smarter code :-)). I'm running R 2.15.1 64bit under MS Windows 7, 
full session info below.

I do have a list with two dataframes:

test <- list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
4L, 5L, 6L), .Label = c("Y1", "Y2", "G", "F", "L", "K"), class = "factor"), 
cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1, 
1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286, 
5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875, 
2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429, 
4.638889, 4.42424242424242, 4.78260869565217, 4.875, 
3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125, 
6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names = 
c("product", 
"cong", "x"), row.names = c(NA, -24L), class = "data.frame"), 
structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Y1", "Y2", "G", 
"F", "L", "K"), class = "factor"), cong = c(-1, -1, -1, -1, 
-1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957, 4.18181818181818, 
3.75, 4.31578947368421, 4.5, 3.73913043478261, 4.8876404494382, 
5.20792079207921, 5.68, 5.70526315789474, 6.38636363636364, 
4.96703296703297)), .Names = c("product", "cong", "x"), row.names = c(NA, 
-12L), class = "data.frame"))


The dataframes are pretty much the same but for the values in the x-column and 
the fact that the second one has only half as many observations, missing the 
second half of the expand.grid if you like. Now if I run

lapply(test, function(x) merge(x, expand.grid(product=c("Y1", "Y2", "G", "F", 
"L", "K"), cong=c(-1,0,1,11)), all=T, sort=TRUE))   # sort=TRUE is the default, 
so could be omitted

sorts the first dataframe according to the labels of factor "product", while 
for the second one the order is maintained from the first dataframes (x) to 
merge (which is the difference that I could not find being documented). Now I 
run the same code with sort=FALSE instead:

lapply(test, function(x) merge(x, expand.grid(product=c("Y1", "Y2", "G", "F", 
"L", "K"), cong=c(-1,0,1,11)), all=T, sort=FALSE))

The results are at least consistent and fulfill my needs (this is, btw, not 
unexpected from the documentation). Note that I get exactly the same behavior 
if I apply merge subsequently to test[[1]] and test[[2]], so it is not an issue 
from lapply. (I realize that my dataframes are ordered by levels of product, 
but using test[[2]] <- test[[2]][sample(12),] and applying the same code as 
above reveals that indeed no sorting is done but the order is maintained from 
the first dataframe.)

I have a working solution for myself, so I'm not after any advice on how to 
achieve the sorting -- I'd just like to better understand what's going on here 
and/or what I might have missed in the documentation or in the list archives. 

Thanks in advance, 
Michael



Session info:
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.15.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to measure level of similarity of two data frames

2012-05-28 Thread Meyners, Michael

Kel,
in addition, and depending on how you define "similarity", you might want to 
look into the RV coefficient as a measure of it (it is actually related to a 
correlation, so similarity would rather mean "similar information" though not 
necessarily small Euclidean distance); coeffRV in FactoMineR would be one 
option to determine it.
HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Lamke
> Sent: Samstag, 26. Mai 2012 20:05
> To: r-help@r-project.org
> Subject: [R] How to measure level of similarity of two data frames
> 
> Hi group,
> 
> I've been thinking of calculating euclidean distance between each
> column of a data frames that each consists of standardized numerical
> columns.
> However, I don't know if there's a way of summarizing the overall
> distance by some kind of metrics.  If anyone know a proper way of doing
> so and/or a package I would greatly appreciate your suggestions.
> Thanks very much!
> 
> Kel
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-
> measure-level-of-similarity-of-two-data-frames-tp4631466.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Non-parametric test for repeated measures and post-hoc single comparisons in R?

2012-02-20 Thread Meyners, Michael

No, the authors are correct: the individuals (i.e. the 17 individuals) you have 
need to be independent (i.e. no correlation between them, let alone any 
individual running through your temporal experiment more than once, as 
indicated in the citation), while the *observations* are of course dependent as 
they are within the same subject (individual -- they have the same subject 
effect). Think of Friedman as a non-parametric 2-way ANOVA with one of the 
factors being subject; observations of the same subject are dependent, but once 
you include the subject effect, the errors are assumed to be independent (which 
implies that subjects need to be independent and should, e.g., not work on the 
assessment together).
The imprecision is in your interpretation of individuals vs. observations. 
HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of saschav...@gmail.com
> Sent: Montag, 20. Februar 2012 09:59
> To: peter dalgaard
> Cc: r-help@r-project.org
> Subject: Re: [R] Non-parametric test for repeated measures and post-hoc
> single comparisons in R?
> 
> Thanks, I got it! (And I think I should have googled what "replicated"
> means!) However, then Bortz, Lienert, Boehnke are imprecise, if not
> wrong: "Der Friedman-Test setzt voraus, dass die N Individuen
> wechselseitig unabhängig sind, dass also nicht etwa ein und dasselbe
> Individuum zweimal oder mehrmals im Untersuchungsplan auftritt" (p.
> 271). Which I (hope to) translate: The Friedman test requires the N
> individuals to be reciprocally independent, which means that one
> individual cannot occur twice or more times in the research design.
> 
> *S*
> 
> On 19.02.12 22:04, peter dalgaard wrote:
> 
> > Repeated measures means that you have multiple measurements on the
> same individual. Usually, the same person measured at different time
> points. So if you have N individuals and T times, then you can place
> your observations in an N*T layout.
> >
> > In this layout, you can have 1 observation per cell or R>  1
> observations. In the former case, the design is referred to as
> unreplicated.  Got it?
> >
> > -pd
> >
> >
> > On Feb 19, 2012, at 19:25 , saschav...@gmail.com wrote:
> >
> >> Some attribute x from 17 individuals was recorded repeatedly on 6
> time points using a Likert scale with 7 distractors. Which statistical
> test(s) can I apply to check whether the changes along the 6 time
> points were significant?
> >>
> >> set.seed( 123 )
> >> x<- matrix( sample( 1:7, 17*6, repl=T ),
> >>   nrow = 17, byrow = TRUE,
> >>   dimnames = list(1:17, paste( 'T', 1:6, sep='' ))
> >> )
> >>
> >> I found the Friedman test and the Quade test for testing the overall
> hypothesis.
> >>
> >> friedman.test( x )
> >> quade.test( x )
> >>
> >> However, the R help files, my text books (Bortz, Lienert and
> Boehnke, 2008; Köhler, Schachtel and Voleske, 2007; both German), and
> the Wikipedia texts differ in what they propose as requirements for the
> tests. R says that data need to be unreplicated. I read 'unreplicated'
> as 'not-repeated', but is that right? If so, the example, in contrast,
> in friedman.test() appears to use indeed repeated measures. Yet,
> Wikipedia says the contrary that is to say the test is good especially
> if data represents repeated measures. The text books say either (in the
> same paragraph, which is very confusing). What is right?
> >>
> >> In addition, what would be an appropriate test for post-hoc single
> comparisons for the indication which column differs from others
> significantly?
> >>
> >> Bortz, Lienert, Boehnke (2008). Verteilungsfreie Methoden in der
> >> Biostatistik. Berlin: Springer Köhler, Schachtel, Voleske (2007).
> >> Biostatistik: Eine Einführung für Biologen und Agrarwissenschaftler.
> >> Berlin: Springer
> >>
> >> --
> >> Sascha Vieweg, saschav...@gmail.com
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> --
> Sascha Vieweg, saschav...@gmail.com
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2 sample wilcox.test != kruskal.test

2012-01-11 Thread Meyners, Michael

The devil is in the details (and in the arguments in Lukasz code). The defaults 
for the two functions are different: wilcox.test uses an exact test (which is 
not available in kruskal.test afaik) for your data, and uses the continuity 
correction if the normal approximation is requested (neither available in 
kruskal.test). See the manual (in particular ?wilcox.test) for details, or the 
pertinent literature for the theoretical background.

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Lukasz Reclawowicz
> Sent: Wednesday, January 11, 2012 12:00
> To: syrvn
> Cc: r-help@r-project.org
> Subject: Re: [R] 2 sample wilcox.test != kruskal.test
> 
> 2012/1/11 syrvn 
> 
> > Hi,
> >
> > thanks for your answer. Unfortunately I cannot reproduce your
> results.
> >
> > In my example the results still differ when I use your approach:
> >
> > > x <- c(10,11,15,8,16,12,20)
> > > y <- c(10,14,18,25,28,30,35)
> > > f <- as.factor(c(rep("a",7), rep("b",7))) d <- c(x,y)
> > > kruskal.test(x,y)
> >
> 
> Try to compare  wilcox.test and right formula in kruskal, I got:
> > all.equal(wilcox.test(x,y, correct =
> F,exact=F)$p.value,kruskal.test(d~f)$p.value)
> [1] TRUE
> 
> --
> Mi³ego dnia
> 
>   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using a mathematical expression in sapply() XXXX

2012-01-04 Thread Meyners, Michael

Dan,

It depends on what you want to achieve. I suspect you just want to remove 
missing values before summing; if so, consider

sapply(x, sum, na.rm=TRUE)

instead. To make your code running, try

sapply(x, function(x) sum(!is.na(x)))

However, this would just count the number of non-missing values per subgroup, 
not the sum of the values. 

Similarly, 

sapply(x, function(x) sum(x, na.rm=TRUE))

would follow the same rationale but give the sums (with missing values 
removed), i.e. the same results as the first line of code. I'd rather choose 
the first option than this one, though.

Of course, you could also define a new function 

sum1 <- function(x) sum(!is.na(x)) # or probably rather sum(x, na.rm=TRUE), 
depending on your needs

and then use 

sapply(x, sum1)

but that seems a bit overkill, I'd say...

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Dan Abner
> Sent: Wednesday, January 04, 2012 14:41
> To: r-help@r-project.org
> Subject: [R] Using a mathematical expression in sapply() 
> 
> Hello everyone,
> 
> I have the following call to sapply() and error message. Is the most
> efficient way to deal with this to make sum(!is.na(x)) a function in a
> separate line prior to this call? If not, please advise.
> 
> N.Valid=sapply(x,sum(!is.na(x)))
> Error in match.fun(FUN) :
>   'sum(!is.na(x))' is not a function, character or symbol
> 
> 
> Thanks!
> 
> Dan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generalized procrustes analysis without translation

2011-12-09 Thread Meyners, Michael

I’m not aware of any, but if you really want this, it should be possible to 
modify the code of any of the functions you propose and delete the part doing 
the translation. I’m not sure that this is a good idea, though; either your 
matrices are *truly* centered, then it doesn’t make a difference, or they are 
only centered in theory but not in practice due to random error, then the 
rotation and scaling will depend heavily on the deviation of the center of the 
matrix from the origin. Optimal scaling and rotation in GPA are independent 
from each other so might be (de-)selected depending on needs, but they depend 
on the translation, so it seems to make sense that you cannot deselect that 
part. You wouldn’t do scaling of two variables without centering them around 0 
either, would you? You may transform them back if required, but scaling without 
centering first does not appeal to me, same for rotations.
HTH, Michael

From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mark Heckmann
Sent: Wednesday, December 07, 2011 19:09
To: r-help@r-project.org
Subject: [R] generalized procrustes analysis without translation

I am searching for a GPA function/algorithm that allows to omit translation of 
matrices.
procGPA in the shapes package does not have this option.
I need scaling, rotation and reflection but no translation as my matrices are 
naturally centered.

I am not sure if GPA from FactoMineR might be used for this.
There are no arguments allowing to switch on the above mentioned arguments 
except for scaling.
Frankly, I also do not understand how to set up the data for GPA properly, 
especially how to use the group argument.

Does someone know another GPA function or has some code available?

Thanks in advance
Mark


Mark Heckmann
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com











    [[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pearson chi-square test

2011-09-27 Thread Meyners, Michael

I suspect that the chisquare-test might not be appropriate, as you have 
constraints (same number of observations for A in both contingency tables). I 
further suspect that there is no test readily available for that, but I might 
be wrong. Maybe randomization tests could help here, but it would require a bit 
of thinking AND programming to accomplish that. chisq.test might give you an 
approximate solution, but I can't say how good this will be (and it might also 
depend on the data, btw).
Best, Michael


From: Michael Haenlein
Sent: Tuesday, September 27, 2011 17:05
To: r-help@r-project.org
Cc: Meyners, Michael
Subject: RE: [R] Pearson chi-square test

Dear Michael,
 
Thanks very much for your answers!
 
The purpose of my analysis is to test whether the contingency table x is 
different from the contingency table y.
Or, to put it differently, whether there is a significant difference between 
the joint distribution A&B and A&C.
 
Based on your answer I'm wondering whether the best way to do this is really a 
chisq.test?
Or is there probably a different function or package I should use altogether?
 
Thanks,
 
Michael
 
 
 
-Original Message-----
From: Meyners, Michael 
Sent: Dienstag, 27. September 2011 17:00
To: Michael Haenlein; r-help@r-project.org
Subject: RE: [R] Pearson chi-square test
 
Just for completeness: the manual calculation you'd want is most likely
 
sum((x-y)^2  / (x+y))
 
(that's one you can find on the Wikipedia link you provided). To get the same 
from chisq.test, try something like 
 
chisq.test(data.frame(x,y)[,c(3,6)])
 
(there are surely smarter ways, but at least it works here). Note that 
something like 
 
chisq.test(as.vector(x), as.vector(y)) 
 
will give a different test, i.e. based on a contingency table of x cross y).
M. 
 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Meyners, Michael
> Sent: Tuesday, September 27, 2011 13:28
> To: Michael Haenlein; r-help@r-project.org
> Subject: Re: [R] Pearson chi-square test
> 
> Not sure what you want to test here with two matrices, but reading the
> manual helps here as well:
> 
> y   a vector; ignored if x is a matrix.
> 
> x and y are matrices in your example, so it comes as no surprise that
> you get different results. On top of that, your manual calculation is
> not correct if you want to test whether two samples come from the same
> distribution (so don't be surprised if R still gives a different
> value...).
> 
> HTH, Michael
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of Michael Haenlein
> > Sent: Tuesday, September 27, 2011 12:45
> > To: r-help@r-project.org
> > Subject: [R] Pearson chi-square test
> >
> > Dear all,
> >
> > I have some trouble understanding the chisq.test function.
> > Take the following example:
> >
> > set.seed(1)
> > A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
> > B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
> > C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
> > x <- table(A,B)
> > y <- table(A,C)
> >
> > When I calculate the test statistic by hand I get a value of
> > approximately
> > 75.9:
> > http://en.wikipedia.org/wiki/Pearson's_chi-
> > square_test#Calculating_the_test-statistic
> > sum((x-y)^2/y)
> >
> > But when I do chisq.test(x,y) I get a value of 12.2 while
> > chisq.test(y,x)
> > gives a value of 10.3.
> >
> > I understand that I must be doing something wrong here, but I'm not
> > sure
> > what.
> >
> > Thanks,
> >
> > Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pearson chi-square test

2011-09-27 Thread Meyners, Michael

Just for completeness: the manual calculation you'd want is most likely

sum((x-y)^2  / (x+y))

(that's one you can find on the Wikipedia link you provided). To get the same 
from chisq.test, try something like 

chisq.test(data.frame(x,y)[,c(3,6)])

(there are surely smarter ways, but at least it works here). Note that 
something like 

chisq.test(as.vector(x), as.vector(y)) 

will give a different test, i.e. based on a contingency table of x cross y).
M. 

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Meyners, Michael
> Sent: Tuesday, September 27, 2011 13:28
> To: Michael Haenlein; r-help@r-project.org
> Subject: Re: [R] Pearson chi-square test
> 
> Not sure what you want to test here with two matrices, but reading the
> manual helps here as well:
> 
> y a vector; ignored if x is a matrix.
> 
> x and y are matrices in your example, so it comes as no surprise that
> you get different results. On top of that, your manual calculation is
> not correct if you want to test whether two samples come from the same
> distribution (so don't be surprised if R still gives a different
> value...).
> 
> HTH, Michael
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of Michael Haenlein
> > Sent: Tuesday, September 27, 2011 12:45
> > To: r-help@r-project.org
> > Subject: [R] Pearson chi-square test
> >
> > Dear all,
> >
> > I have some trouble understanding the chisq.test function.
> > Take the following example:
> >
> > set.seed(1)
> > A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
> > B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
> > C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
> > x <- table(A,B)
> > y <- table(A,C)
> >
> > When I calculate the test statistic by hand I get a value of
> > approximately
> > 75.9:
> > http://en.wikipedia.org/wiki/Pearson's_chi-
> > square_test#Calculating_the_test-statistic
> > sum((x-y)^2/y)
> >
> > But when I do chisq.test(x,y) I get a value of 12.2 while
> > chisq.test(y,x)
> > gives a value of 10.3.
> >
> > I understand that I must be doing something wrong here, but I'm not
> > sure
> > what.
> >
> > Thanks,
> >
> > Michael
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pearson chi-square test

2011-09-27 Thread Meyners, Michael

Not sure what you want to test here with two matrices, but reading the manual 
helps here as well: 

y   a vector; ignored if x is a matrix.

x and y are matrices in your example, so it comes as no surprise that you get 
different results. On top of that, your manual calculation is not correct if 
you want to test whether two samples come from the same distribution (so don't 
be surprised if R still gives a different value...).

HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Michael Haenlein
> Sent: Tuesday, September 27, 2011 12:45
> To: r-help@r-project.org
> Subject: [R] Pearson chi-square test
> 
> Dear all,
> 
> I have some trouble understanding the chisq.test function.
> Take the following example:
> 
> set.seed(1)
> A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
> B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
> C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
> x <- table(A,B)
> y <- table(A,C)
> 
> When I calculate the test statistic by hand I get a value of
> approximately
> 75.9:
> http://en.wikipedia.org/wiki/Pearson's_chi-
> square_test#Calculating_the_test-statistic
> sum((x-y)^2/y)
> 
> But when I do chisq.test(x,y) I get a value of 12.2 while
> chisq.test(y,x)
> gives a value of 10.3.
> 
> I understand that I must be doing something wrong here, but I'm not
> sure
> what.
> 
> Thanks,
> 
> Michael
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] library(SenoMineR)- Triangle Test Query

2011-06-08 Thread Meyners, Michael

Unless I missed it, neither the OP nor the list was CC'd on this, so for anyone 
interested, I forward this solution (untested from my side) from the package 
maintainer. Not sure whether the file comes through, so I include the updated 
code in the message's body below. 
Cheers, Michael


** updated code for triangle.test 
triangle.test <- function (design,answer,preference=NULL){

answer = gsub("(\\w)", "\\U\\1", as.character(answer), perl=TRUE)
labprod = 
levels(as.factor(c(as.character(design[,1]),as.character(design[,2]),as.character(design[,3]
nbprod = length(labprod)
nb.answer = nb.good = pref = matrix(0,nbprod,nbprod)
for (i in 1:nrow(design)){
  for (j in 1:nbprod){
 if (labprod[j] == design[i,1]) i1 = j
 if (labprod[j] == design[i,2]) i2 = j
 if (labprod[j] == design[i,3]) i3 = j
  }
  if (i1!=i2) nb.answer [i1,i2] = nb.answer[i1,i2]+1
  if (i1==i2) nb.answer [i1,i3] = nb.answer[i1,i3]+1
  if ((i1==i2)&(answer[i]=="Z")){
nb.good[i1,i3]=nb.good[i1,i3]+1
if (length(preference)>0){
  if (preference[i]!="Z") pref[i3,i1] = pref[i3,i1] +1
  if (preference[i]=="Z") pref[i1,i3] = pref[i1,i3] +1
}
  }
  if ((i1==i3)&(answer[i]=="Y")){
nb.good[i1,i2]=nb.good[i1,i2]+1
if (length(preference)>0){
  if (preference[i]!="Y") pref[i2,i1] = pref[i2,i1] +1
  if (preference[i]=="Y") pref[i1,i2] = pref[i1,i2] +1
}
  }
  if ((i2==i3)&(answer[i]=="X")){
nb.good[i1,i2]=nb.good[i1,i2]+1
if (length(preference)>0){
  if (preference[i]!="X") pref[i1,i2] = pref[i1,i2] +1
  if (preference[i]=="X") pref[i2,i1] = pref[i2,i1] +1
}
  }
}
nb.good = nb.good + t(nb.good)
nb.answer = nb.answer + t(nb.answer)

diag(nb.answer)=1
prob = pbinom(nb.good-1,nb.answer,1/3,lower.tail=FALSE)
maxML = recognize = minimum = matrix(NA,nbprod,nbprod)
for (i in 1: (nbprod-1)){
  for (j in (i+1):nbprod){
aux = matrix(0,nb.good[i,j]+1,1)
for (k in 0:nb.good[i,j]) aux[k] = 
dbinom(nb.good[i,j]-k,nb.answer[i,j]-k,1/3)
maxML[i,j] = maxML[j,i] = max(aux)
recognize[i,j] = recognize[j,i] = rev(order(aux))[1]-1
mini = 0
for (k in round(nb.answer[i,j]/3):nb.answer[i,j]) if 
((mini==0)&(dbinom(k,nb.answer[i,j],1/3)<0.05)) mini=k
minimum[i,j]=minimum[j,i]=mini
  }
}

confusion = (nb.answer-recognize) / nb.answer
diag(nb.answer)=diag(recognize)=0
diag(maxML)=diag(confusion)=1

rownames(nb.answer) = colnames(nb.answer) = rownames(nb.good) = 
colnames(nb.good) = labprod
rownames(prob) = colnames(prob)= rownames(confusion) = colnames(confusion)= 
labprod
rownames(maxML) = colnames(maxML) = rownames(minimum) = colnames(minimum) = 
rownames(recognize) = colnames(recognize) = labprod
if (length(preference)>0) rownames(pref) = colnames(pref) = labprod

res = list()
res$nb.comp = nb.answer
res$nb.ident = nb.good
res$p.value = prob
res$nb.recognition = recognize
res$maxML = maxML
res$confusion = confusion
res$minimum = minimum
if (length(preference)>0) res$pref = pref
##res$complete = result
return(res)
}

** end updated code for triangle.test 




-Original Message-
From: Francois Husson 
Sent: Wednesday, June 08, 2011 14:43
To: Meyners, Michael
Subject: Re: [R] library(SenoMineR)- Triangle Test Query

  Dear Vijayan, dear Michael,

Indeed there was an error in the function triangle.test. I attach the new 
function.
Thanks Michael for your answer.
Best
Francois


Le 08/06/2011 12:32, Meyners, Michael a écrit :
> Vijayan,
>
> I cannot find an error in your code, but I had a look at the code of 
> triangle.test -- unless I'm missing something, it contains a bug. If you 
> study the way in which the matrix "pref" is updated, you find that the vector 
> preference is compared to 1, 2 and 3 instead of "X", "Y" and "Z" as it should 
> be. That way, some of the non-diagonal entries of pref will always be zero, 
> irrespective of the data, which does not make sense. I think it should work 
> if you modify the code accordingly. Alternatively, a quick patch (untested!) 
> might be to code preferences as 1, 2 and 3 instead of the letters (but I'm 
> not sure whether this has any other implications).
> I CC the author of the function and maintainer of the package; he should 
> correct me if needed or could otherwise update the code for the next release 
> (I worked on SensoMineR 1.11).
>
> Hope this helps,
> Michael
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
>> project.org] On Behalf Of Vijayan Padmanabhan
>> Sent: Saturday, June 04, 2011 9:21
>> To: r-help@r-project.org
>> Subject: [R] library(SenoMineR)- Triangle Test Query
>>
>> Dear R Group
>> I was trying to use t

Re: [R] library(SenoMineR)- Triangle Test Query

2011-06-08 Thread Meyners, Michael

Vijayan,

I cannot find an error in your code, but I had a look at the code of 
triangle.test -- unless I'm missing something, it contains a bug. If you study 
the way in which the matrix "pref" is updated, you find that the vector 
preference is compared to 1, 2 and 3 instead of "X", "Y" and "Z" as it should 
be. That way, some of the non-diagonal entries of pref will always be zero, 
irrespective of the data, which does not make sense. I think it should work if 
you modify the code accordingly. Alternatively, a quick patch (untested!) might 
be to code preferences as 1, 2 and 3 instead of the letters (but I'm not sure 
whether this has any other implications).
I CC the author of the function and maintainer of the package; he should 
correct me if needed or could otherwise update the code for the next release (I 
worked on SensoMineR 1.11).

Hope this helps, 
Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Vijayan Padmanabhan
> Sent: Saturday, June 04, 2011 9:21
> To: r-help@r-project.org
> Subject: [R] library(SenoMineR)- Triangle Test Query
> 
> Dear R Group
> I was trying to use the triangle.test function in SensoMineR and
> strangely i
> encounter a error in the output of preference matrix from the analysis.
> To illustrate, pl see the following dataframe of a design with the
> response
> and preference collected as shown below:
> 
> design<-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2,
> 4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4,
> 3, 1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1), `Product Z` = c(3,
> 2, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 3, 2), Response =
> structure(c(1L,
> 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
> 1L, 1L, 2L), .Label = c("X", "Z"), class = "factor"), Preference =
> structure(c(1L,
> 3L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L,
> 2L, 1L, 2L), .Label = c("X", "Y", "Z"), class = "factor")), .Names =
> c("Product X",
> "Product Y", "Product Z", "Response", "Preference"), class =
> "data.frame",
> row.names = c("Panelist1.Test1",
> "Panelist1.Test2", "Panelist2.Test1", "Panelist2.Test2",
> "Panelist3.Test1",
> "Panelist3.Test2", "Panelist4.Test1", "Panelist4.Test2",
> "Panelist5.Test1",
> "Panelist5.Test2", "Panelist6.Test1", "Panelist6.Test2",
> "Panelist7.Test1",
> "Panelist7.Test2", "Panelist8.Test1", "Panelist8.Test2",
> "Panelist9.Test1",
> "Panelist9.Test2", "Panelist10.Test1", "Panelist10.Test2"))
> 
> If you were to investigate the above dataframe, you would find that for
> the
> comparision of Product 2 Vs Product 3, the preference indicates product
> 3 is
> preferred over product 2 all the time.
> 
> ## Read output from the following script to see what i mean above:
> subset(design,`Product X`==2&`Product Y`==3&`Product Z`==3)
> 
> ##Output of above would be:
> . Product X Product Y Product Z Response Preference
> Panelist3.Test2 2 3 3X  Y
> Panelist6.Test2 2 3 3X  Y
> Panelist9.Test2 2 3 3X  Y
> 
> However when I analyse the design with the answers and preferences
> using the
> following script, I get the $pref output which shows that product 2 is
> preferred over 3 all the time. Can somebody explain what is wrong in my
> script?
> 
> answer<-as.vector(design$Response)
> preference<-as.vector(design$Preference)
> triangle.test (design[,1:3], answer,preference)
> 
> ##$pref output from the triangle.test function shows as follows:
> 
> $pref
>   1 2 3 4
> 1 0 0 0 0
> 2 4 0 3 0
> 3 0 0 0 0
> 4 0 0 0 0
> 
> 
> Any help in helping me identify what is going wrong here would be
> highly
> appreciated.
> Regards
> Vijayan Padmanabhan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help on permutation/randomization test

2011-05-24 Thread Meyners, Michael

I suspect you need to give more information/background on the data (though this 
is not primarily an R-related question; you might want to try other resources 
instead). Unless I'm missing something here, I cannot think of ANY reasonable 
test: A permutation (using permtest or anything else) would destroy the 
correlation structure and hence give invalid results, and the assumptions of 
parametric tests are violated as well. Basically, you only have two 
observations, one for each group; with some good will you might consider these 
as repeated measurements, but still on the same subject or whatsoever. Hence no 
way to discriminate the subject from a treatment effect. There is not enough 
data to permute or to rely a statistical test on. So unless you can get rid of 
the dependency within groups (or at least reasonably assume observations to be 
independent), I'm not very optimistic...
HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Wenjin Mao
> Sent: Monday, May 23, 2011 20:56
> To: r-help@r-project.org
> Subject: [R] help on permutation/randomization test
> 
> Hi,
> 
> I have two groups of data of different size:
>group A: x1, x2, , x_n;
>group B: y1, y2, , y_m; (m is not equal to n)
> 
> The two groups are independent but observations within each group are
> not independent,
>  i.e., x1, x2, ..., x_n are not independent; but x's are independent
> from y's
> 
> I wonder if randomization test is still applicable to this case. Does
> R have any function that can do this test for large m and n? I notice
> that "permtest" can only handle small (m+n<22) samples.
> 
> Thank you very much,
> Wenjin
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer() with no intercept

2010-06-14 Thread Meyners, Michael

John, 
Why would you want to fit the model without intercept if you seemingly need it? 
Anyway, I assume that the intercept from your first model just moves into the 
random effects -- you have intercepts there for worker and day, so any of these 
(or both) will absorb it. No surprise that the estimates for the covariates 
only differ slightly, it should be that way. 
What you plot (your second call to panel.lines) is not the correct model, as 
you omit the intercept from the work and day (which is 0 or at least pretty 
close to it if you include the overall intercept in your model). That's why 
your red line is on the top edge (note that the intercept is negative). 
I'm therefore not sure that the model without intercept makes a lot of sense, 
but you consider posting related questions rather to the mixed-models SIG, 
where you might get more erudite comments than from me.
HTH, Michael


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of array chip
Sent: Saturday, June 12, 2010 1:07
To: r-help@r-project.org
Subject: [R] lmer() with no intercept

Hi, I asked this before, but haven't got any response. So would like to have 
another try. thanks for help. Also tried twice to join the  model mailing list 
so that I can ask question there, but still haven't got permission to join that 
list yet.



===


Hi, I am wondering how I can specify no intercept in a mixed model using 
lmer(). 

Here is an example dataset attached ("test.txt"). There are 3 workers, in 5 
days, measured a response variable "y" on independent variable "x". I want to 
use a quadratic term (x2 in the dataset) to model the relationship between y 
and x.

test<-read.table("test.txt",sep='\t',header=T)

If I just simply use lm() and ignore worker and day, so that I can try both a 
linear regression with and without an intercept, here is what I get:

lm(y~x+x2, data=test)
Coefficients:
(Intercept)x   x2  
-1.77491040.1099160   -0.0006152

lm(y~x+x2-1, data=test)
Coefficients:
 x  x2  
0.0490097  -0.0001962

Now, I want to try mixed model considering worker and day as random effect. 

With an intercept:

lmer(y~x+x2+(1|worker)+(1|day), data=test)
Fixed effects:
  Estimate Std. Error t value
(Intercept) -1.324e+00  4.490e-01  -2.948
x1.117e-01  8.563e-03  13.041
x2  -6.357e-04  7.822e-05  -8.127

Without an intercept:

lmer(y~x+x2+(1|worker)+(1|day)-1, data=test)
Fixed effects:
 Estimate Std. Error t value
x   1.107e-01  8.528e-03  12.981
x2 -6.304e-04  7.805e-05  -8.077

It seems working fine. But if you look at the fixed effect coefficients of both 
mixed models, the coefficients for x and x2 are not much different, regardless 
of whether an intercept is included or not. This is not the case for simple 
linear regression using lm() on the top.

If I plot all 4 models in the following plot:

xyplot(y~x,groups=worker,test, col.line = "grey", lwd = 2,
, panel = function(x,y) { 
panel.xyplot(x,y, type='p')
x<-sort(x)
panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) 
panel.lines(x,0.1107*x-0.0006304*x*x,col='red') 
panel.lines(x,0.04901*x-0.0001962*x*x,col='blue') 
panel.lines(x,-1.7749+0.10992*x-0.0006152*x*x,col='green') 


  }) 

As you can see, the mixed model without intercept (red line) does not fit the 
data very well (it's at the top edge of the data, instead of in the middle of 
the data), so I guess I did something wrong here.

Can anyone make any suggestions?

Thanks

John


  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RWinEdt

2010-05-26 Thread Meyners, Michael

I assume you installed WinEdt 6. See 
https://stat.ethz.ch/pipermail/r-help/2010-May/238540.html
and related messages. RWinEdt does not work (yet) with WinEdt 6, so you'll have 
to downgrade back to WinEdt 5.x (or use another editor for the time being).
HTH, Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Yves Aragon
Sent: Saturday, May 22, 2010 18:25
To: r-help@r-project.org
Subject: [R] RWinEdt

I have installed RWinEdt but when I try to load it,  I get the message :

Error in normalizePath(path) :
  path[1]="C:\Users\yves\AppData\Roaming\WinEdt/R.ini": The specified file 
cannot be found.

I have the problem with R 2.11.0 and R 2.10.1,  since I have re-installed 
WinEdt.

Yves

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Suppressing scientific notation on plot axis tick labels

2010-02-02 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Strange, the following works reproducibly on my machine (Windows 2000
Pro):

options(scipen = 50, digits = 5)
x  = c(1e7, 2e7)
?barplot
barplot(x) 

while I also get scientific with your code. After I called ?barplot
once, I'm incapable of getting the scientific notation again, though...
But I admit that I am outdated, I still run 2.8.1 (but not for long...)

Michael


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Shvorob
> Sent: Dienstag, 2. Februar 2010 12:59
> To: r-help@r-project.org
> Subject: Re: [R] Suppressing scientific notation on plot axis 
> tick labels
> 
> 
> options(scipen = 50, digits = 5)
> x  = c(1e7, 2e7)
> barplot(x) 
> 
> Still scientific...
> --
> View this message in context: 
> http://n4.nabble.com/Suppressing-scientific-notation-on-plot-a
xis-tick-labels-tp1459697p1459828.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Explanation w.r.t. rbind, please!

2010-01-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Uwe Dippel
> Sent: Freitag, 29. Januar 2010 11:57
> To: r-help@r-project.org
> Subject: [R] Explanation w.r.t. rbind, please!
> 
> This is what I tried:
> 
>  > num.vec <- c(12.34,56.78,90.12,34.56)  > 
> names(num.vec)<-c("first","second","third","fourth")
>  > num.vec
>  first second  third fourth
>  12.34  56.78  90.12  34.56
>  > seq<-(1:4)  
>  > num.mat<-rbind(num.vec,seq)
>  > num.mat   
> first second third fourth
> num.vec 12.34  56.78 90.12  34.56
> seq  1.00   2.00  3.00   4.00
>  > num.vec [3:4]
>  third fourth
>  90.12  34.56
> 
> (until here I'm fine)
> 
>  > num.mat [seq]
> [1] 12.34  1.00 56.78  2.00

What you (probably) want here is
num.mat ["seq",]

>  > num.mat [num.vec]
> [1] NA NA NA NA   

num.mat["num.vec",]

and so on. You have to use tell R that you want the ROW (that's why the
comma is needed) defined by the NAME "seq" or "num.vec" (that's why you
need "") . Otherwise, R replaces seq by its value 1:4, so 
num.mat[seq] 
is identical to 
num.mat[1:4] 
(and num.mat[12.34] is NA of course in num.mat[num.vec])

You should have a closer look at ?Extract
HTH, Michael

>  > num.vec [seq]
>  first second  third fourth
>  12.34  56.78  90.12  34.56
>  > num.mat [num.vec]
> [1] NA NA NA NA
>  > num.mat [-seq]  
> [1] 90.12  3.00 34.56  4.00
> 
> (and here I'm lost!)
> 
> How could I display a row, instead of always seemingly 
> falling back to columns?
> 
> Uwe
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exact wilcox test and Bonferroni correction

2010-01-22 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

(i) you EITHER correct the p value (by multiplying by 8 in your case) OR
you use the Bonferroni-threshold of 0.05/8, not both. If you correct the
p values, your threshold remains 0.05. If you use 0.05/8, you use the
original p values.
(ii) Yes, if the p value is 0.15, then the corrected one for 8 tests is
min(0.15*8, 1)=1 (i.e. all p values of 1/8 and higher will be 1 after
correction)
(iii) No, you can NOT AT ALL conclude that your samples are from the
same distribution. You have no evidence that they are from different
ones, but that does not imply that they come from the same (or a
similar) one. In particular as the power of the test can be assumed to
be pretty low.

Regarding (i), it might be helpful to search the web for some
information on Bonferroni correction, you could start with wikipedia,
but there are other resources around. Regarding (iii), I would strongly
recommend (re-)reading a good introduction into stats or consulting an
expert statistician. I'm afraid you are missing some of the basics of
statistical hypothesis testing here. To conclude "similarity", you'd
have to define first what exactly that means (you can NEVER show exact
equality), and then construct some appropriate test for that. 

NB: I am sure that you won't succeed to show similarity with a
reasonable definition if you have just 10 observations, though. And for
completeness: If you were aiming at similarity, you should rather NOT
correct for multiplicity (while that depends again on what exactly you
want to show). But I think this goes far beyond the scope of this post
now. 

HTH, Michael
 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of netrunner
> Sent: Freitag, 22. Januar 2010 15:25
> To: r-help@r-project.org
> Subject: Re: [R] exact wilcox test and Bonferroni correction
> 
> 
> Dear Michael,
> thank you very much for your help.
> 
> I perfomed the wilcox.exact function on each of the 8 items 
> for the two groups that I am analysing (that is, I performed 
> 8 times the wilcox test).
> Here an example for the values (ratings from a questionnaire) 
> of one of the
> 8 items:
> 
> a1=5  9 10 10 10 10 10 10 10 10
> 
> b1=7   5  10 NaN  10  10   8  10  10   8   9   9
> 
> wilcox.exact(a1,b1, alternative="two.sided", mu=0, 
> paired=FALSE, exact=TRUE,
> conf.level=0.95)
> 
> I obtained:
> 
> data:  a1 and b1
> W = 73.5, p-value = 0.1514
> alternative hypothesis: true mu is not equal to 0 
> 
> Then I adjusted p-values using p.adjust. 
> 
> For the example above the p-bonferroni value was 1. The 
> threshold p-value is
> 0.00625 (that is 0.05/8)
> Finally, because p-bonferroni > 0.00625 can I conclude that 
> for each item my samples are from the same distribution?
> 
> I am a little bit confused
> 
> thank you!
> 
> netrunner
> 
> 
> --
> View this message in context: 
> http://n4.nabble.com/exact-wilcox-test-and-Bonferroni-correcti
on-tp1099893p1100077.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exact wilcox test and Bonferroni correction

2010-01-22 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

?p.adjust
Apply that to a vector containing all p values you get from
wilcox.exact. Alternatively, multiply each p value by the number of
comparisons you perform, see any textbook for that. You might want to
consider a less conservative correction, though.
HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of netrunner
> Sent: Freitag, 22. Januar 2010 10:41
> To: r-help@r-project.org
> Subject: [R] exact wilcox test and Bonferroni correction
> 
> 
> Hi!
> I need to perform a Mann-Whitney test using Bonferroni 
> correction. I found the function pairwise.wilcox.test. 
> However, my data have ties. I know that when there are ties 
> is better to perform Mann Whitney test using the function 
> wilcox.exact. How can include in that the Bonferroni correction?
> 
> thank you for your help!
> 
> netrunner
> --
> View this message in context: 
> http://n4.nabble.com/exact-wilcox-test-and-Bonferroni-correcti
on-tp1099893p1099893.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] permutations from vectors out of a matrix

2010-01-20 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Well, seems it's an assignment, so you should REALLY get them on your
own and not enquire the list. Foolish me...
M.

> -Original Message-
> From: einohr2...@web.de [mailto:einohr2...@web.de] 
> Sent: Mittwoch, 20. Januar 2010 19:32
> To: Meyners,Michael,LAUSANNE,AppliedMathematics; r-help@r-project.org
> Subject: Re: [R] permutations from vectors out of a matrix
> 
> Meyners,Michael,LAUSANNE,AppliedMathematics schrieb:
> > Sorry, wrong button. Below a hopefully more helpful solution...
> >
> > Etienne,
> > I don't see the point in avoiding some 'special' packages. 
> If you are 
> > willing to change your mind in this regard, try one of the 
> following 
> > solutions that work for me:
> >
> > library(combinat)
> > apply(mat, 2, function(x) unique(permn(x)))
> >
> > # each object in the list contains the permutations from 
> once column 
> > of
> > mat:
> > apply(mat, 2, function(x) do.call(rbind, unique(permn(x 
> >
> > # all vectors you wanted in one matrix; note that they are in the 
> > rows, so you might want to transpose this:
> > do.call(rbind, apply(mat, 2, function(x) do.call(rbind,
> > unique(permn(x) 
> >
> > Not sure about the size of your original problem, though, it might 
> > take a while. If you still want to avoid the (small!) package, you 
> > might consider copying the code for permn from combinat to 
> define the 
> > function within your file. I guess it works (but didn't 
> check) as it 
> > does not seem to require any of the other functions of the package.
> >
> > HTH, Michael
> >
> >
> >   
> >> -Original Message-
> >> From: r-help-boun...@r-project.org
> >> [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne 
> >> Stockhausen
> >> Sent: Montag, 18. Januar 2010 19:20
> >> To: r-help@r-project.org
> >> Subject: [R] permutations from vectors out of a matrix
> >>
> >> Dear R-Users,
> >> I'm relativley new to R and have the following problem. I need all 
> >> permutations of the vectors created by the collumns of a matrix. I 
> >> will give a small example:
> >>
> >> p=3
> >> 
> >> n=2^p-1   
> >>   
> >> #number of obtainable vectors
> >> mat=matrix(0,p,p)
> >> for(i in 1:p)
> >>  {
> >>mat[i:1,i]=1/i
> >>  }
> >>
> >> mat is now a quadratic matrix and n is the number of the vectors I 
> >> try to get when I compute all permutations of the vectors built by 
> >> the individual columns. It should work for all quadratic 
> matrix and I 
> >> want to avoid using some 'special'
> >> packages.
> >> In the example I need the following vectors at the end:
> >> (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); (0,0.5,0.5); 
> >> (1/3,1/3,1/3).
> >> I hope my intention becomes clear.
> >>
> >> I'm looking foward to any ideas and clues that might help me.
> >> Thanks in advance and best regards.
> >>
> >> Etienne
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >> 
> >> 
> -
> >> ---
> >>
> >>
> >> Eingehende eMail ist virenfrei.
> >> Von AVG uberpruft - www.avg.de
> >> Version: 9.0.730 / Virendatenbank: 270.14.150/2632 - Ausgabedatum: 
> >> 01/19/10 08:34:00
> >>
> >> 
> Hey Michael,
> 
> thanks a lot for your answer. I hope I can manage it now. The 
> problem with the 'small' package is, that I have to solve the 
> task without using any package as a 'black box'. Therefore I 
> need to get the permutations on my own.
> 
> Best regards and thanks again
> 
> Etienne
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] permutations from vectors out of a matrix

2010-01-20 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Sorry, wrong button. Below a hopefully more helpful solution...

Etienne,
I don't see the point in avoiding some 'special' packages. If you are
willing to change your mind in this regard, try one of the following
solutions that work for me:

library(combinat)
apply(mat, 2, function(x) unique(permn(x)))

# each object in the list contains the permutations from once column of
mat:
apply(mat, 2, function(x) do.call(rbind, unique(permn(x 

# all vectors you wanted in one matrix; note that they are in the rows,
so you might want to transpose this:
do.call(rbind, apply(mat, 2, function(x) do.call(rbind,
unique(permn(x) 

Not sure about the size of your original problem, though, it might take
a while. If you still want to avoid the (small!) package, you might
consider copying the code for permn from combinat to define the function
within your file. I guess it works (but didn't check) as it does not
seem to require any of the other functions of the package.

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen
> Sent: Montag, 18. Januar 2010 19:20
> To: r-help@r-project.org
> Subject: [R] permutations from vectors out of a matrix
> 
> Dear R-Users,
> I'm relativley new to R and have the following problem. I 
> need all permutations of the vectors created by the collumns 
> of a matrix. I will give a small example:
>
> p=3
> 
> n=2^p-1   
>   
> #number of obtainable vectors
> mat=matrix(0,p,p)
> for(i in 1:p)
>  {
>mat[i:1,i]=1/i
>  }
> 
> mat is now a quadratic matrix and n is the number of the 
> vectors I try to get when I compute all permutations of the 
> vectors built by the individual columns. It should work for 
> all quadratic matrix and I want to avoid using some 'special' 
> packages.
> In the example I need the following vectors at the end:
> (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); 
> (0,0.5,0.5); (1/3,1/3,1/3).
> I hope my intention becomes clear.
> 
> I'm looking foward to any ideas and clues that might help me.
> Thanks in advance and best regards.
> 
> Etienne
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] permutations from vectors out of a matrix

2010-01-20 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Etienne,
I don't see the point in avoiding some 'special' packages. If you are
willing to change your mind in this regard, try something like

library()
 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Etienne Stockhausen
> Sent: Montag, 18. Januar 2010 19:20
> To: r-help@r-project.org
> Subject: [R] permutations from vectors out of a matrix
> 
> Dear R-Users,
> I'm relativley new to R and have the following problem. I 
> need all permutations of the vectors created by the collumns 
> of a matrix. I will give a small example:
>
> p=3
> 
> n=2^p-1   
>   
> #number of obtainable vectors
> mat=matrix(0,p,p)
> for(i in 1:p)
>  {
>mat[i:1,i]=1/i
>  }
> 
> mat is now a quadratic matrix and n is the number of the 
> vectors I try to get when I compute all permutations of the 
> vectors built by the individual columns. It should work for 
> all quadratic matrix and I want to avoid using some 'special' 
> packages.
> In the example I need the following vectors at the end:
> (1,0,0); (0,1,0); (0,0,1); (0.5,0.5,0); (0.5,0,0.5); 
> (0,0.5,0.5); (1/3,1/3,1/3).
> I hope my intention becomes clear.
> 
> I'm looking foward to any ideas and clues that might help me.
> Thanks in advance and best regards.
> 
> Etienne
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantreg - 'could not find function"rq"'

2010-01-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Frankly, I have never even heard about this package before and have no
clue what it does, but the error messages nevertheless suggested a
missing call of library. And I didn't bother to check the spelling, my
fault. So I guess you are right... I still didn't check :-) So to the
OP: try 
library(asdf)
with "asdf" replaced by the correct package name, and mind the spelling!

Michael

> -Original Message-
> From: David Winsemius [mailto:dwinsem...@comcast.net] 
> Sent: Donnerstag, 7. Januar 2010 14:22
> To: Meyners,Michael,LAUSANNE,AppliedMathematics
> Cc: Gough Lauren; r-help@r-project.org
> Subject: Re: [R] Quantreg - 'could not find function"rq"'
> 
> 
> On Jan 7, 2010, at 8:07 AM, Meyners, Michael, LAUSANNE, 
> AppliedMathematics wrote:
> 
> > Are you sure you called
> > library(Quantreg)
> > before calling any function?
> > M.
> 
> Are you both using a package with the exact spelling 
> "Quantreg"? I would have expected it to be :
> 
> library(quantreg)
> 
> --
> David.
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org
> >> [mailto:r-help-boun...@r-project.org] On Behalf Of Gough Lauren
> >> Sent: Donnerstag, 7. Januar 2010 13:44
> >> To: r-help@r-project.org
> >> Subject: [R] Quantreg - 'could not find function"rq"'
> >>
> >> Hi all,
> >>
> >>
> >>
> >> I'm having some troubles with the Quantreg package.  I am using R 
> >> version 2.10.0, and have downloaded the most recent version of 
> >> Quantreg
> >> (4.44) and SparseM (0.83 - required package).  However, 
> when I try to 
> >> run an analysis (e.g. fit1<-rq(y~x, tau=0.5)) I get an 
> error message 
> >> saying that the function "rq" could not be found.  I get the same 
> >> message when I try to search for any of the other functions that 
> >> should be in the package (e.g.
> >> help(anova.rq)).
> >>
> >>
> >>
> >> I used this package last year and still have the old 
> versions of both 
> >> quantreg and SparseM saved on my machine.  When I load these older 
> >> packages and attempt an analysis (which, incidentally, worked fine 
> >> last
> >> February) I get the same error messages as I do with the new 
> >> versions.
> >>
> >>
> >>
> >> The only think I have changed since last February is the 
> version of R 
> >> I am using - is there any reason why quantreg wouldn't work in 
> >> version 2.10.0?  I'm not very experienced with R so I'm 
> struggling to 
> >> work out what may be going wrong
> >> - does anyone have any suggestions!?
> >>
> >>
> >>
> >> Many thanks
> >>
> >>
> >>
> >> Lauren
> >>
> >>
> >>
> >> --___
> >>
> >>
> >>
> >> Lauren Gough - Postgraduate Research Student University of 
> >> Nottingham, School of Geography,  Nottingham,  NG7 2RD
> >>
> >> Tel: +44 (0)115 8467052
> >> Email: lgx...@nottingham.ac.uk <mailto:lgx...@nottingham.ac.uk>
> >>
> >> P Consider the environment. Please don't print this e-mail 
> unless you 
> >> really need to.
> >>
> >>
> >>
> >> This message has been checked for viruses but the contents of an 
> >> attachment may still contain software viruses which could 
> damage your 
> >> computer system:
> >> you are advised to perform your own checks. Email 
> communications with 
> >> the University of Nottingham may be monitored as permitted by UK 
> >> legislation.
> >>[[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantreg - 'could not find function"rq"'

2010-01-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Are you sure you called 
library(Quantreg) 
before calling any function?
M.

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Gough Lauren
> Sent: Donnerstag, 7. Januar 2010 13:44
> To: r-help@r-project.org
> Subject: [R] Quantreg - 'could not find function"rq"'
> 
> Hi all,
> 
>  
> 
> I'm having some troubles with the Quantreg package.  I am 
> using R version 2.10.0, and have downloaded the most recent 
> version of Quantreg
> (4.44) and SparseM (0.83 - required package).  However, when 
> I try to run an analysis (e.g. fit1<-rq(y~x, tau=0.5)) I get 
> an error message saying that the function "rq" could not be 
> found.  I get the same message when I try to search for any 
> of the other functions that should be in the package (e.g. 
> help(anova.rq)).
> 
>  
> 
> I used this package last year and still have the old versions 
> of both quantreg and SparseM saved on my machine.  When I 
> load these older packages and attempt an analysis (which, 
> incidentally, worked fine last
> February) I get the same error messages as I do with the new 
> versions.  
> 
>  
> 
> The only think I have changed since last February is the 
> version of R I am using - is there any reason why quantreg 
> wouldn't work in version 2.10.0?  I'm not very experienced 
> with R so I'm struggling to work out what may be going wrong 
> - does anyone have any suggestions!?
> 
>  
> 
> Many thanks
> 
>  
> 
> Lauren
> 
>  
> 
> --___
> 
>  
> 
> Lauren Gough - Postgraduate Research Student University of 
> Nottingham, School of Geography,  Nottingham,  NG7 2RD  
> 
> Tel: +44 (0)115 8467052
> Email: lgx...@nottingham.ac.uk  
> 
> P Consider the environment. Please don't print this e-mail 
> unless you really need to. 
> 
>  
> 
> This message has been checked for viruses but the contents of 
> an attachment may still contain software viruses which could 
> damage your computer system:
> you are advised to perform your own checks. Email 
> communications with the University of Nottingham may be 
> monitored as permitted by UK legislation.
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] updating arguments of formulae

2009-12-10 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Moreno, 

I leave the discussion on the mixed models to others (you might consider
the SIG group on mixed models as well for this), but try a few hints to
make your code more accessible:

* The "." in updating a formula is substituted with the respective old
formula (depending on the side), but is not mandatory. You could give
the new formula explicitly, i.e. consider something like 
model1 = update(model, . ~ (1 |Sb2) + OS)
if you loose control about your models. See ?update.formula

* I don't see the need for using your construct with
as.formula(paste()), this makes things unnecessarily complicated. See my
above example, which should work as well on your data (and see ?update)

* There is also the "-" operator available in update.formula to remove
terms (because it uses formula, see ?formula). As to your question on
how to move from 
> depM ~ OS + (1 + OS | Sb2)
> to
> depM ~ OS + VR + (1 + OS + VR | Sb2)
try something like
update(model1, .~. - (1 + OS|Sb2) + VR + (1 + OS + VR | Sb2))
while it goes without saying that in this case, it would be easier to
drop the "." and use something like 
update(model1, .~ OS + VR + (1 + OS + VR | Sb2)) 
directly.

* paste accepts more than just two arguments to be pasted: Try somthing
like
model2 = update(model1, as.formula(paste(". ~ . + (1 + ", "OS", "|" ,
"Sb2", ")"))
instead of your construct with several nested calls to paste, and see
?paste. (Note that I added quotes to "OS" and "Sb2", it didn't work for
me otherwise as I have no object OS, not sure what happens if you have
such an object on our search path, but I would suspect you encounter
problems as well.)

If you work yourself through these and thereby simplify your code, you
are more likely to get responses to your questions on which model to use
(which is actually independent from the use of update). As far as I see
it, it doesn't make sense to use a formula like in your model4, but the
mixed model experts might tell me wrong (and I got a bit lost in your
code as well). Please also try to provide commented, minimal,
self-contained, reproducible code for further enquiries (use e.g. one of
the examples on ?lmer to create appropriate examples for your
questions).

HTH, Michael 


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Moreno Ignazio Coco
> Sent: Donnerstag, 10. Dezember 2009 13:35
> To: R-help@r-project.org
> Subject: [R] updating arguments of formulae
> 
> Dear R-Community,
> 
> I am relatively new with R, so sorry for things which for you 
> might be obvious...
> I am trying to automatically update lmer formulae.
> 
> the variables of the model are:
> 
> depM= my dependent measure
> Sb2= a random factor
> OS = a predictor
> VR= another predictor
> 
> So, I am building the first model with random intercept only:
> 
> model = lmer(depM ~ (1 |Sb2))
> 
> then I update the formula adding the first predictor
> 
> model1 = update(model, as.formula(paste(". ~ . + ", OS)))
> 
> the resulting formula will be:
> 
> depM ~ (1 |Sb2) + OS
> 
> let suppose now I want to update the model to have OS both as 
> a fixed effect and in the random term, something like:
> 
> depM ~ (1 + OS |Sb2) + OS
> 
> I can do something very ugly (please tell me if there is a  
> more elegant way to do it) that looks like:
> 
> model2 = update(model1, as.formula(paste(paste(paste(paste(". 
> ~ . + (1  
> + ", OS), "|" ), Sb2), ")")))
> 
> the resulting model2 formula will be:
> 
> depM ~ (1 |Sb2) + OS + (1 + OS | Sb2)
> 
> one first thing I am wondering at this point is whether having
> (1 |Sb2) and (1 + OS | Sb2) in the same expression is redundant.
> in the output it will obviously tell me that group Sb2 is 
> considered twice:
> 
> number of obs: 6514, groups:  Sb2, 23; Sb2, 23
> 
> and i am not sure if am doing it correctly...any advice?
> 
> So let suppose now I want to add the new predictor VR again 
> both in the fixed and in the random part of the formula.
> If i just repeat the two steps above:
> 
> model3 = update(model2, as.formula(paste(". ~ . + ", VR)))
> 
> and then:
> 
> model4 = update(model3, as.formula(paste(paste(paste(paste(". 
> ~ . + (1  
> + ", VR), "|" ), Sb2), ")")))
> 
> the formula I get is:
> 
> depM ~ (1 |Sb2) + OS + (1 + OS | Sb2) + VR + (1 + VR | Sb2)
> 
> so, basically I am adding new stuff on the right side of the 
> formula...
> 
> My first question at this point is whether the above formula 
> is equivalent to:
> 
> depM ~ OS + VR + (1 + OS + VR | Sb2)
> 
> if is not equivalent, which one of the two is correct?
> 
> obviously in the second case, group Sb2, is considered only once.
> 
> If the second version of the formula is the correct one, I 
> don't understand how I can update arguments inside the 
> formula rather than adding things on his right side...
> 
> thus, in the ideal case,  how do I go from something like this:
> 
> depM ~ OS + (1 + OS | Sb2)
> 
> to something like this:
> 
> depM ~ OS + VR + (1 + OS + VR | Sb2)
>

Re: [R] by function ??

2009-12-09 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Or, more general (if you need to include more than just one variable from 
TestData), something like

by(TestData, LEAID, function(x) median(x$RATIO))

Agreed, this is less appealing for the given example than Ista's code, but 
might help to better understand by and to generalize its use to other 
situations.
Michael 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Ista Zahn
> Sent: Mittwoch, 9. Dezember 2009 02:54
> To: L.A.
> Cc: r-help@r-project.org
> Subject: Re: [R] by function ??
> 
> Hi,
> I think you want
> 
> by(TestData[ , "RATIO"], LEAID, median)
> 
> -Ista
> 
> On Tue, Dec 8, 2009 at 8:36 PM, L.A.  wrote:
> >
> > I'm just learning and this is probably very simple, but I'm stuck.
> >   I'm trying to understand the by().
> > This works.
> > by(TestData, LEAID, summary)
> >
> > But, This doesn't.
> >
> > by(TestData, LEAID, median(RATIO))
> >
> >
> > ERROR: could not find function "FUN"
> >
> > HELP!
> > Thanks,
> > LA
> > --
> > View this message in context: 
> > http://n4.nabble.com/by-function-tp955789p955789.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology http://yourpsyche.org
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] csimtest function in multcomp package

2009-12-04 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Daniel,
quick guess: There has been a major change in the package some time ago,
with quite a few functions vanishing and others added. So I guess your
"recycled" code was based on an older version of multcomp. My current
version (1.1-0) does not have csimtest anymore either. I guess you want
to look into ?glht in the same package, and check the archives to find
out more about the changes (and yes, you will have to modify your code
to some extent, at least so had I, it's not just replacing the function
name).
HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of 
> d.m.perk...@qmul.ac.uk
> Sent: Freitag, 4. Dezember 2009 16:47
> To: r-help@r-project.org
> Subject: [R] csimtest function in multcomp package
> 
> Hello all,
> 
> Quick question: I want to do posthoc contrasts for a linear 
> mixed effects model. However, when trying to use the csimtest 
> function in multcomp package I receive an error message 
> saying it cannot find the function, even after installing and 
> loading package multcomp.
> 
> Any pointers would be greatly appreciated
> 
> Daniel
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Learning R

2009-11-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Julia, see 

http://www.r-project.org/ -> Documentation -> Manuals (-> An introduction to R) 
(or use: http://cran.r-project.org/manuals.html)

for a starting point. In addition, you might want to check the annotated list 
of books to see which one might best fit your needs:

http://www.r-project.org/doc/bib/R-books.html

You might also want to browse through the FAQs at some stage, you can learn a 
lot of useful things there. Or look additionally into the Wiki, if that's more 
your style of learning. You see, plenty of options from which you'd need to 
select the one that best meets your requirements. 

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Julia Cains
> Sent: Montag, 30. November 2009 08:23
> To: r-help@r-project.org
> Subject: [R] Learning R
> 
> Dear R helpers,
> 
> Almost 15 days back I have become member of this very active 
> and wonderful group. So far I have been only raising  queries 
> and in turn got them solved too and I really thank for the 
> spirit this group member show when it comes to the guidance.
> 
> I wish to learn R language and I have given 2 months time for 
> this. Can anyone please guide me as how do I begin i.e. from 
> basics to advance.
> 
> R is such a vast thing to learn, so I wish to learn it step 
> by step without getting lost at any stage.
> 
> Please guide me where do I start and upgrade myself to higher 
> level step by step.
> 
> Regards
> 
> Julia
> 
> 
> 
> 
> 
> 
> 
> Only a man of Worth sees Worth in other men
> 
> 
> 
> 
> 
> 
>   
>   [[alternative HTML version deleted]]
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unexpected results in comparison (x == y)

2009-11-04 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

FAQ 7.31:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-the
se-numbers-are-equal_003f 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Tillmann
> Sent: Mittwoch, 4. November 2009 13:50
> To: r-help@r-project.org
> Subject: [R] unexpected results in comparison (x == y)
> 
> 
> Dear readers of the list,
> 
> I have a problem a comparison of two data from a vector. The 
> comparison yields FALSE but should be TRUE. I have checked 
> for mode(), length() and attributes(). See the following code 
> (R2.10.0):
> ---
> # data vector of 66 double data
> X =
> matrix(c(41.41,38.1,39.22,38.1,47.29,46.82,82.46,90.11,45.24,4
> 5.74,49.96,53.40,38.20,42.65,45.41,47.92,39.82,42.02,48.17,49.
> 47,39.67,43.89,47.55,50.05,35.75,37.41,46.13,53.64,52.18,56.30
> ,45.15,47.13,41.57,39.08,43.39,44.73,49.38,47.00,45.67,50.53,4
> 1.08,44.22,49.28,47.83,49.48,46.04,48.37,47.00,33.96,36.30,49.
> 40,46.44,24.40,24.79,41.55,46.26,37.43,39.88,40.63,38.64,49.92
> ,50.19,47.88,48.61,43.73,44.18),ncol=1)
> i = dim(X)
> 
> # calculating pairwise differences for each entry in X Y = 
> matrix(rep(0.0,i[1]*i[1]),ncol=i[1])
> for (j in 1:i[1]) {
>   Y[j,] = X - X[j,]
> }
> 
> # getting pairwise absolute differences to vector Z and 
> selecting only (xj - xk), omitting (xk - xj) # and (xj - xj) 
> i.e. diagonal vector Z = rep(0, ((i[1]*i[1])-i[1])/2) nn <- 1 
> for (j in 1:i[1]) { for (k in 1:i[1]) {
>   if (j > k)  {
>   Z[nn] <- abs(Y[j,k])
>   nn <- nn + 1
>   }
> }
> }
> nn <- nn - 1
> 
> # sorting Z ascending to ZZ, to determine double entries with 
> same difference ii <- 0 ZZ <- sort(Z)
> 
> # here the problem occurs:
> ZZ[1:10]
> ZZ[4]
> ZZ[5]
> ZZ[4] == ZZ[5]
> mode(ZZ[4])
> mode(ZZ[5])
> length(ZZ[4])
> length(ZZ[5])
> attributes(ZZ[4])
> attributes(ZZ[5])
> ---
> 
> I get:
>  
> > ZZ[1:10]
>  [1] 0.00 0.00 0.01 0.02 0.02 0.02 0.04 0.04 0.04 0.05
> > ZZ[4]
> [1] 0.02
> > ZZ[5]
> [1] 0.02
> > ZZ[4] == ZZ[5]
> [1] FALSE
> > mode(ZZ[4])
> [1] "numeric"
> > mode(ZZ[5])
> [1] "numeric"
> > length(ZZ[4])
> [1] 1
> > length(ZZ[5])
> [1] 1
> > attributes(ZZ[4])
> NULL
> > attributes(ZZ[5])
> NULL
> 
> Which is ok, except for ZZ[4] == ZZ[5].
> 
> Can someone please give me an advice where to look?
> In real world situations the original vector (X) will contain 
> upto 100 entries. 
> 
> 
> 
> --
> View this message in context: 
> http://old.nabble.com/unexpected-results-in-comparison-%28x-%3
D%3D-y%29-tp26195749p26195749.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fixed color scale for a series of contour plots

2009-11-03 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

You don't tell us which function you use, but fixing the zlim argument
in image or related functions should do the trick.
M.

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Marion Dumas
> Sent: Dienstag, 3. November 2009 14:38
> To: r-help@r-project.org
> Subject: [R] fixed color scale for a series of contour plots
> 
> Hello,
> I am having trouble figuring out how to create a color scale 
> that will be consistent across multiple contour plots. What I 
> would like is to plot a series of contour or image plots for 
> different subsets of a data set but make sure that the color 
> key stays fixed so that the plots are easily comparable. That 
> is I need a map from the range of values of the entire data 
> set to the color scale and keep then use this color scale for 
> all plots + be able to display the color bar as a legend next 
> to the plot. This sounds trivial, but I can't find useful 
> hints in the documentation of the color arguments of the 
> different plots.
> Thanks so much for any help
> Best
> Marion
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop and plot

2009-10-19 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Or you open a new graphics window / device as part of the loop, see, e.g., 
?windows. Alternatively, you may write the content of the graphics device to a 
file for each iteration, see, e.g., ?savePlot (but you'd want to make sure that 
you have a different filename in each iteration, otherwise, you'll again have 
only one figure...).
Michael

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of joris meys
> Sent: Montag, 19. Oktober 2009 13:12
> To: Rene
> Cc: r-help@r-project.org
> Subject: Re: [R] loop and plot
> 
> Hi Rene,
> 
> the problem is probably due to the fact that R will send all 
> plots to the same graphical output window. Each next plot 
> just replaces the previous one.
> 
> if it's only a few plots, you can divide the graphical window 
> with the commands par(mfrow=...) (see ?par) or 
> layout(matrix(...)) (see ?layout). Otherwise you have to ask 
> the window to wait before refreshing. par(ask=TRUE) (see ?par as well)
> 
> Hope this helps
> Cheers
> Joris
> 
> On Mon, Oct 19, 2009 at 12:14 PM, Rene  wrote:
> > Dear all,
> >
> > I am stuck at applying loop function for creating separated plots.
> >
> > I have coding like below:
> >
> > dataset.table <-
> > 
> table(data.frame(var1=c(1,2,3,1,2,3,1),colour=c("a","b","c","c","a","b
> > ","b")
> > ))
> > kk = function(f)
> >             {
> >              ls=as.character(f)
> >              pie(dataset.table[ls,],main=ls)
> >              box()
> >             }
> >
> > kk(1)
> > kk(2)
> > kk(3)
> >
> > By using above code, I can create 3 single plot 
> respectively, but when 
> > I type kk(1:3), obviously it will not work.
> >
> > I know I have to vectorise the coding, then I can use 
> command kk(1:3). 
> > I try to use loop:
> >
> > kk = function(f)
> >             {
> >              ls=as.character(f)
> >              for (i in length(f))
> >              {
> >              pie(dataset.table[ls[i],],main=ls[i])
> >              box()
> >              }
> >              }
> > kk(1:3)
> >
> > the above code only gives me the last pie plot (ie. kk(3) plot) 
> > instead of 3 plots respectively.
> >
> > Can someone please guide me how to revise the loop coding, 
> and produce 
> > 3 separated plots one after another on the screen by typing kk(1:3)?
> >
> > Thanks a lot.
> >
> > Rene.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating p-values by row for data frames

2009-10-15 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Christoph Heuck
> Sent: Donnerstag, 15. Oktober 2009 17:51
> To: r-help@r-project.org
> Subject: [R] calculating p-values by row for data frames
> 
> Hello R-users,
> I am looking for an elegant way to calculate p-values for 
> each row of a data frame.
> My situation is as follows:
> I have a gene expression results from a microarray with 64 
> samples looking at 25626 genes. The results are in a data 
> frame with the dimensions 64 by 25626 I want to create a 
> volcano plot of difference of means vs. -log(10) of the 
> p-values, comparing normal samples to abnormal samples. The 
> results of both type of samples are all in my data frame.
> Now, I have found a way to calculate the p-value using a "for 
> (i in 1:25626)" loop (see below):
> 
> df.normal  #dataframe, which only contains the normal samples 
> df.samples  #dataframe, which only contains abnormal samples
> 
> DM=rowMeans(df.normal)-rowMeans(df.samples) #gives me a 
> dataframe with the difference of means
> 
> PV=array(1,c(25626,1))
> for (i in 1:25626){
> VL=t.test(matrix.b[i,],matrix.a[i,])
> V=as.numeric(VL[3])
> V=-log10(V)
> PV[i,1]=V}
> 
> plot(DM, PV, main=title,xlab=x.lab, ylab="-log(10) P-Values",pch=20)}
> 
> It takes around 3-5 minutes to generate the volcano plot this 
> way. I will be running arrays which will look at 2.2 million 
> sites >> this approach will then take way too long.
> I was wondering if there is a more elegant way to calculate 
> the p-values for an array/fataframe/matrix in a row-by row 
> fashion, which is similar to "rowMeans".
> 
> I thought writing a function to get the p-value and then using
> apply(x,1,function) would be the best.
> 
> I have the function which will give me the p-value
> 
> p.value = function (x,y){
> PV=as.numeric(t.test(x,y)[3])
> }
> 
> and I can get a result if I test it only on one row (below is 
> 6 by 10 data frame example of my original data)
> 
> RRR
>  X259863X267862 X267906X300875
> X300877 X300878
> MSPI0406S0183 -3.2257205 -3.2248899  2.85590082 -2.6293602
> -3.5054348 -2.62817269
> MSPI0406S0238 -2.6661903 -3.1135020  2.17073881 -3.2357307
> -2.3309775 -1.76078452
> MSPI0406S0239 -1.7636439 -0.6702877  0.19471126 -0.7397132
> -1.4332662 -0.24822470
> MSPI0406S0300  0.6471381 -0.2638928 -0.61876054 -0.9180127
> 0.2539848 -0.63122203
> MSPI0406S0301  0.9207208  0.2164267 -0.33238846 -1.1450717
> -0.2935584 -1.01659802
> MSPI0406S0321 -0.4073272 -0.2852402 -0.08085746 -0.4109428
> -0.2185432 -0.39736137
> MSPI0406S0352 -0.7074175 -0.6987548 -1.22004647 -0.8570551
> -0.5083861 -0.09267928
> MSPI0406S0353 -0.2745682  0.3012990 -0.64787221 -0.5654195
> 0.4265007 -0.65963404
> MSPI0406S0354 -1.1858394 -1.4388609 -0.07329722 -2.0010785
> -1.3245696 -1.43216984
> MSPI0406S0360 -1.4599809 -1.4929059  0.63453235 -1.1476760
> -1.5849922 -1.03187399
> 
> > zz=p.value(RRR[1,1:3],RRR[1,4:6])
> > zz
> $p.value
> [1] 0.485727
> 
> but I cannot do this row by row using apply
> 
> > xxx=apply(RRR,1,p.value(RRR[,1:3],RRR[,4:6]))

xxx <- apply(RRR, 1, function(x) p.value(x[1:3],x[4:6]))
works for me. Check the examples in ?apply.
HTH, Michael

> 
> Error in match.fun(FUN) :
>   'p.value(RRR[, 1:3], RRR[, 4:6])' is not a function, 
> character or symbol
> 
> Does anyone have any suggestions?
> Thanks in advance
> 
> Christoph Heuck
> Albert Einstein College of Medicine
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] post-hoc test with kruskal.test()

2009-10-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Robert,

What do you mean by "not symmetric"? If you mean unbalanced in terms of sample 
size, that's not a problem if you choose the right specifications for 
wilcox.test. The Kruskal-Wallis-Test is a generalization of the unpaired 
Wilcoxon test for more than two groups. Not sure whether kruskal.test works 
with just two groups, but if so, it should give the same results as wilcox.test 
if you set the arguments accordingly. 

Having said that, I should mention that unlike some normality-based post-hoc 
tests, the proposed approch is not based on a common error term. The paired 
comparisons will ignore the fact that you had a third group, and this will in 
particular result in (possibly quite) different power of the three comparisons, 
depending on the sample sizes and the noise given in just these two groups. I 
wouldn't know what to do about that, though.

Michael

> -Original Message-
> From: Robert Kalicki 
> Sent: Mittwoch, 14. Oktober 2009 14:11
> To: Meyners,Michael,LAUSANNE,AppliedMathematics
> Subject: RE: [R] post-hoc test with kruskal.test()
> 
> Hi Michael,
> Thank you very much for your clear and prompt answer.
> Is it still valid if I use an unpaired comparison with 
> wilcox.test() since my groups are not symmetric.
> Many thanks
> 
> Robert
> 
> -Message d'origine-
> De : Meyners,Michael,LAUSANNE,AppliedMathematics
> Envoyé : mercredi 14 octobre 2009 10:30
> À : Robert Kalicki; r-help@r-project.org Objet : RE: [R] 
> post-hoc test with kruskal.test()
> 
> Robert,
> 
> you can do the corresponding paired comparisons using 
> wilcox.test. As far as I know, there is no such general 
> correction as Tukey's HSD for the Kruskal-Wallis-Test. 
> However, if you have indeed only 3 groups (resulting in
> 3 paired comparisons), the intersection-union principle and 
> the theory of closed test procedures should allow you to do 
> these test without further correction, given the global test 
> was statistically significant.
> 
> HTH, Michael
> 
> 
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki
> > Sent: Mittwoch, 14. Oktober 2009 09:17
> > To: r-help@r-project.org
> > Subject: [R] post-hoc test with kruskal.test()
> > 
> > Dear R users,
> > 
> > I would like to know if there is a way in R to execute a 
> post-hoc test 
> > (factor levels comparison, like Tukey for
> > ANOVA) of a non-parametric analysis of variance with
> > kruskal.test() function. I am comparing three different groups. The 
> > preliminary analysis using the kruskal-wallis-test show 
> significance, 
> > but I still don't know the relationship and the significance level 
> > between each group?
> > 
> >  
> > 
> > Do you have any suggestion?
> > 
> >  
> > 
> > Many thanks in advance!
> > 
> >  
> > 
> > Robert
> > 
> >  
> > 
> >  
> > 
> > ___
> > Robert M. Kalicki, MD
> > 
> > Postdoctoral Fellow
> > 
> > Department of Nephrology and Hypertension
> > 
> > Inselspital
> > 
> > University of Bern
> > 
> > Switzerland
> > 
> > 
> > 
> > Address:
> > 
> > Klinik und Poliklinik für Nephrologie und Hypertonie
> > 
> > KiKl G6
> > 
> > Freiburgstrasse 15
> > 
> > CH-3010 Inselspital Bern
> > 
> > 
> > 
> > Tel +41(0)31 632 96 63
> > 
> > Fax+41(0)31 632 14 58
> > 
> > 
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > 
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] post-hoc test with kruskal.test()

2009-10-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Robert,

you can do the corresponding paired comparisons using wilcox.test. As far as I 
know, there is no such general correction as Tukey's HSD for the 
Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 
paired comparisons), the intersection-union principle and the theory of closed 
test procedures should allow you to do these test without further correction, 
given the global test was statistically significant.

HTH, Michael



> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki
> Sent: Mittwoch, 14. Oktober 2009 09:17
> To: r-help@r-project.org
> Subject: [R] post-hoc test with kruskal.test()
> 
> Dear R users,
> 
> I would like to know if there is a way in R to execute a 
> post-hoc test (factor levels comparison, like Tukey for 
> ANOVA) of a non-parametric analysis of variance with 
> kruskal.test() function. I am comparing three different 
> groups. The preliminary analysis using the 
> kruskal-wallis-test show significance, but I still don't know 
> the relationship and the significance level between each group?
> 
>  
> 
> Do you have any suggestion?
> 
>  
> 
> Many thanks in advance!
> 
>  
> 
> Robert
> 
>  
> 
>  
> 
> ___
> Robert M. Kalicki, MD
> 
> Postdoctoral Fellow
> 
> Department of Nephrology and Hypertension
> 
> Inselspital
> 
> University of Bern
> 
> Switzerland
> 
> 
> 
> Address:
> 
> Klinik und Poliklinik für Nephrologie und Hypertonie
> 
> KiKl G6
> 
> Freiburgstrasse 15
> 
> CH-3010 Inselspital Bern
> 
> 
> 
> Tel +41(0)31 632 96 63
> 
> Fax+41(0)31 632 14 58
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reference on permutation test

2009-10-13 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

It's always worthwhile to look at the articles by Pitman (and maybe the
textbook by Fisher, if you have access to it); Welch is a nice paper,
too, but might be pretty technical to "learn" about the area. I don't
know any of the textbooks except Edgington (which is in its 4th edition
now with co-author P Onghena), a book I can wholeheartedly recommend.
The authors explain the basic concepts in a way that should even be
accessible to non-statisticians, I believe. They also cover a lot of
special cases, and give exhaustive theoretical background for those
interested (you can easily skip these parts if you are not). It goes
along with a CD with some programs to run these test, though I did not
use it so far -- you can do it yourself in R, of course :-) I am not
sure whether they cover any other resampling methods at all, but if so,
only very briefly, so you'd need another reference for that. Efron &
Tibshirani is a classical, while I heard some people recommending
Davison & Hinkley -- I didn't find the time yet to look more closely
into the latter.
Just my 2 cents,
Michael


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Peng Yu
> Sent: Mittwoch, 14. Oktober 2009 04:58
> To: r-h...@stat.math.ethz.ch
> Subject: [R] reference on permutation test
> 
> I want learn permutation test and resampleing, etc. There are 
> a few references listed below. I'm wondering what is the best 
> book on this topic. Can somebody give me some advice. Thank you!
> 
>
http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Permutation_tes
t
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I do simple string concatenation in R?

2009-09-30 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

?paste 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Dr. Arie Ben David
> Sent: Mittwoch, 30. September 2009 10:22
> To: r-help@r-project.org
> Subject: [R] How do I do simple string concatenation in R?
> 
> Dear R gurus
> How do I do simple string concatenation in R? 
> For example:
> A = "klm"
> B = "jjj"
> How can I assign a value to C such that C == "klmjjj" is True?
> Thank you
> Arie
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Probability of data values form DENSITY function

2009-09-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Lina, 

check ?density (and do so carefully). R uses a kernel estimate, by default 
"Gaussian" if I remember correctly. Values in a certain grid can be found from 
the code I sent earlier. I didn't check, but these are most likely just 
linearly interpolated by plot.density, and as the grid is sufficiently tight, 
it visually looks smooth. You could mimick this by linearly interpolating from 
the two closest neighbours for any "new" x values (or, as a rough 
approximation, choosing the density value of the nearest neighbour). I am not 
sure whether there are some specificities for time series in general or GARCH 
models in particular, that's not my area of expertise. Maybe there is some 
method for GARCH objects in the library you are using (you don't tell me which 
one). Try 
str(density(x)) 
from your original post to see whether you might have to adapt the proposed way 
to get these values. I copy the list in case someone can give more details on 
that.

Two side remarks: 
* You should note that the "density" function in R takes a lot of parameters 
that might heavily influence the outcome, and even though R will propose some 
defaults that will work frequently, they might not be optimal or even 
appropriate for your particular situation, model, and data. 
* From your messages I fear that you have a limited understanding of the notion 
of density functions, at least for continuous distributions; they do NOT give 
the probability that a certain value is taken. Try 
plot(x<-seq(-1,1,0.01), dnorm(x, sd=.25), type="l") 
to see that the density can easily exceed 1 and can hence not represent a 
probability. In fact, the theoretical probability that a specific value is 
actually taken from a continuous distribution is always equal to 0. 

I would strongly suggest to contact a local statistician to clarify these 
issues.

HTH, Michael

From: Lina Rusyte [mailto:liner...@yahoo.co.uk] 
    Sent: Dienstag, 29. September 2009 18:03
To: Meyners,Michael,LAUSANNE,AppliedMathematics
Subject: RE: [R] Probability of data values form DENSITY function

Dear Mr. Michael Meyners, 

Assume, that I have simulated data from GARCH  process: 

spec = garchSpec(model = list(omega = 0.01, alpha = 0.13, beta = 0.86, 
shape = 6), cond.dist = "std") 

yt<-garchSim(spec, n = 4000) 

The empirical distribution of this data (yt) is assumed to be not 
parametrized distribution). This data is my in-sample data. 

I have another set of the data (yt1), generated from the same process 
(but it is not the same as yt). I need to find the probabilities of these data 
(yt1)  points in previously mentioned empiric distribution (of yt). 

Simply to say, having continuous EMPIRIC density function, I need for 
some points in x axis (values) to find out the corresponding values of y axis 
(their probability). 

R plots empiric density function, so I assume it approximates somehow 
the empirical density function. 

Thank a lot for the help in advance. 

    Best regards, 

Lina 

--- On Tue, 29/9/09, Meyners,Michael, wrote:

From: Meyners,Michael,LAUSANNE,AppliedMathematics
Subject: RE: [R] Probability of data values form DENSITY 
function
To: "Lina Rusyte" 
Cc: "R help" 
Date: Tuesday, 29 September, 2009, 4:59 PM

Lina, check whether something like

data.frame(density(rnorm(10))[1:2]) 

contains the information you want. Otherwise, try to be (much) 
more specific in what you want so that we do not need to guess (and of course 
provide minimal, self-contained, reproducible code). That has a higher chance 
to trigger some responses than posting the same message again. And if you even 
explain what you want to do with these values, you might get even better 
responses.

HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org 
<http://de.mc256.mail.yahoo.com/mc/compose?to=r-help-boun...@r-project.org>  
> [mailto:r-help-boun...@r-project.org 
<http://de.mc256.mail.yahoo.com/mc/compose?to=r-help-boun...@r-project.org> ] 
On Behalf Of Lina Rusyte
> Sent: Dienstag, 29. September 2009 16:45
> To: R help
> Cc: R help
> Subject: [R] Probability of data values form DENSITY function
> 
> Hello,
> Â
>

Re: [R] Probability of data values form DENSITY function

2009-09-29 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Lina, check whether something like

data.frame(density(rnorm(10))[1:2]) 

contains the information you want. Otherwise, try to be (much) more specific in 
what you want so that we do not need to guess (and of course provide minimal, 
self-contained, reproducible code). That has a higher chance to trigger some 
responses than posting the same message again. And if you even explain what you 
want to do with these values, you might get even better responses.

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Lina Rusyte
> Sent: Dienstag, 29. September 2009 16:45
> To: R help
> Cc: R help
> Subject: [R] Probability of data values form DENSITY function
> 
> Hello,
> Â
> Could someone help me please and to tell how to get the 
> probability from empiricalÂ DENSITY (not parametric) for each 
> data value (R function). 
> For example, for normal distribution there is such a function like: 
> Â
> â€œdnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = 
> FALSE)â€ Â I need the same function only for the empiricalÂ 
DENSITY functionÂ (which does not correspond to any typical > distribution). 
> R plots the density function of any data: 
> Â
> â€œplot(density(x))â€
> Â
> I need to find out the probability for each data value from 
> this plot-line. 
> Â
> Best regards,
> Lina
> 
> 
>   
>   [[alternative HTML version deleted]]
> 
> 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A stat related question

2009-09-18 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Let's assume you have just three observations, and x-- = 1:3 for your
observations. 

Predictor 1:y = x^2
Predictor 2:y = 1 if x=1
y = 4 if x=2
y = 9 if x=3
y = 0 elsewhere

These predictors are obviously not the same, but will give the same Mean
Squared Error for your data (whatever your observed y-values are). This
should suffice as a counter example. Or did I misunderstand your
question?

HTH, Michael


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of RON70
> Sent: Freitag, 18. September 2009 11:23
> To: r-help@r-project.org
> Subject: [R] A stat related question
> 
> 
> Can I ask a small stat. related question here?
> 
> Suppose I have two predictors for a time series processes and 
> accuracy of predictor is measured from MSEs. My question is, 
> if two predictors give same MSE then, necessarily they have 
> to be identical? Can anyone provide me any counter example?
> 
> Thanks.
> --
> View this message in context: 
> http://www.nabble.com/A-stat-related-question-tp25505618p25505618.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected behavior in friedman.test and ks.test

2009-09-08 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Alex,

It's mainly speculation, as I cannot check the Excel add-in nor Vassar, but 
I'll give it a try.

For the Friedman-test: Results of R coincide with those reported by Hollander & 
Wolfe, which I'd take as a point in favor of R. In any case, my guess is that 
ties are handled differently (average ranks in R), but you'd have to check with 
the documentation of WinSTAT and Vassar. If it is not documented, see what test 
statistic you'd get "manually" according to which handling of ties.

For the ks.test: See the ?ks.test for meaning of the "exact" argument of this 
function. I'd assume that Excel gives you the asymptotic p value only, while R 
will by default return an exact one for 32 samples. From the same help page: 
"Otherwise, asymptotic distributions are used whose approximations may be 
inaccurate in small samples". You could check using something like 
ks.test(myData$f1_A, pnorm, exact=FALSE). If that doesn't resolve the issue: do 
the KS test (semi-)manually, which should not be that difficult (even in Excel, 
if the need may be), and compare the D value with the one obtained from R and 
Excel, respectively.

HTH, Michael



> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of 
> atconsta-rh...@yahoo.com
> Sent: Dienstag, 8. September 2009 15:33
> To: R-help@r-project.org
> Subject: [R] Unexpected behavior in friedman.test and ks.test
> 
> I have to start by saying that I am new to R, so I might miss 
> something crucial here. It seems to me that the results of 
> friedman.test and ks.test are "wrong". Now, obviously, the 
> first thing which crossed my mind was "it can't be, this is a 
> package used by so many, someone should have observed", but I 
> can't figure out what it might be.
> 
> Problem: let's start with friedman.test. I have a lot of data 
> to analyze and I grew sick and tired of clicking and 
> selecting in Excel (for which we have a statistics Add-In 
> purchased, don't' start to flame me on using Excel for stats, 
> please!); so I wanted to automate the analysis in R and 
> figured out the results differ from Excel. Example Take the 
> data from example(friedman.test)  (Hollander & Wolfe (1973), 
> p. 140ff.). I ran the example in R and got:
> 
> Friedman rank sum test
> data:  RoundingTimes
> Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805
> 
> Same data, in Excel, using the WinSTAT for Excel (Fitch 
> software), gives: Friedman chi-squared = 10.6364, df = 2, 
> p-value =0.004902
> 
> Puzzled, I entered the data in the calculator from Vassar 
> (http://faculty.vassar.edu/lowry/fried3.html ) and got 
> exactly the same values as in Excel (and, again, different 
> from R). Admittedly, the differences are not large, and both 
> fall below the 0.05 threshold, but, still.
> 
> So, question 1 would be "why is R different from both Excel 
> and Vassar?"
> 
> 
> Now to the Kolmogorov-Smirnov test, from which my odeal 
> actually started: the results from ks.test are wildly 
> different from the ones I have got with the Excel add-in. 
> Basically, I have 32 sets of observations (patients) for 100 
> independent variables (different blood analyses). Question 
> was whether the data is normally distributed for each of the 
> analyses and, hence, whether I can apply a parametric test or not.
> Once I had loaded the data in a dataframe (and it looks as 
> expected), I ran:
> ks.test(myData$f1_A, pnorm)
> ks.test(myData$f8_A, pnorm)
> 
> They give p-values of < 2.2e-16 (with ties) and 8.882e-16. 
> The Excel Add-In gives p-values of 
>  
> 0.0074491 and, respectively, 0.2730477
> 
> Here the difference is serious, like between highly 
> significant non-normal for both f1 and f8 (R), or one 
> non-normal and one normal (the Add-in). I first thought that 
> the difference might arise from different probablity 
> distributions (but what else, if not pnorm). Then I ran the 
> friedman test, to find out similar discrepancies.
> 
> I'd really appreciate some input on this: what's wrong and 
> how should I decide whom to trust?
> 
> Many thanks in advance,
> 
> Alex
> 
> 
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2

2009-09-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

See the respective help files. The continuity correction only affects
the normal approximation in wilcox.test. With this small samples sizes,
the default evaluation is exact, so it doesn't change anything. In
contrast, kruskal.test is incapable to compute exact values but always
uses the chi-square approximation. So the discrepancy is between exact
test and approximation (guess you'd be better off with the former).

If you get the urge to reproduce the p value from kruskal.test using
wilcox.test, and maybe to better understand what's happening, try

a <- wilcox.test(x1, x2, paired=FALSE, exact=FALSE, correct=FALSE)

(and yes, now with exact=FALSE, the continuity correction makes a
difference).

HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of David Scott
> Sent: Dienstag, 8. September 2009 07:02
> To: Thomas Farrar
> Cc: r-help@r-project.org
> Subject: Re: [R] Equivalence of Mann-Whitney test and 
> Kruskal-Wallis test with k=2
> 
> Thomas Farrar wrote:
> > Hi all,
> > 
> > The Kruskal-Wallis test is a generalization of the two-sample 
> > Mann-Whitney test to *k* samples.  That being the case, the 
> > Kruskal-Wallis test with *k*=2 should give an identical 
> p-value to the Mann-Whitney test, should it not?
> > 
> > x1<-c(1:5)
> > x2<-c(6,8,9,11)
> > a<-wilcox.test(x1,x2,paired=FALSE)
> > b<-kruskal.test(list(x1,x2),paired=FALSE)
> > a$p.value
> > [1] 0.01587302
> > b$p.value
> > [1] 0.01430588
> > 
> > The p-values are slightly different (note that there are no ties in 
> > the data, so computed p-values should be exact).
> > 
> > Can anyone explain the discrepancy?  It's been awhile since 
> I studied 
> > nonparametric stats and this one has me scratching my head.
> > 
> > Many thanks!
> > Tom
> > 
> 
> The continuity correction? It is true by default for 
> wilcox.test and is not apparent in the help for kruskal.test.
> 
> David Scott
> 
> --
> _
> David Scott   Department of Statistics
>   The University of Auckland, PB 92019
>   Auckland 1142,NEW ZEALAND
> Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
> Email:d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018
> 
> Director of Consulting, Department of Statistics
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "biplot" graphical options?

2009-09-02 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Taking the example from ?biplot.princomp with reproducible code (hint,
hint):

biplot(princomp(USArrests))
biplot(princomp(USArrests), col=c(2,3), cex=c(1/2, 2)) 

clearly changes the color and font size on my system. For changing the
"point symbols" (which are text by default), try setting xlabs and
ylabs. But of course, an argument "pch" will not have any effect, as
biplot does have no call to "points" or similar, but to "text" only (and
hence mainly parameters of "text" will have an effect on what you aim
at). So for documentation, I'd propose ?biplot.princomp and
?biplot.default.

For changes that are truly impossible (there are indeed some), take the
code for biplot.default (or biplot.princomp) and amend where necessary
to fulfill your needs. 

HTH, Michael

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Manca Marco (PATH)
> Sent: Mittwoch, 2. September 2009 13:02
> To: r-help@r-project.org
> Subject: [R] "biplot" graphical options?
> Importance: High
> 
> 
> Dear R-help fellows
> 
> good afternoon.
> 
> I am struggling in the attempt to impose some graphical 
> conditions (changing point symbols, colors, etc) to biplot 
> function (I am using it to visualize the results of princomp) 
> but I can't apparently manage to change anything but the 
> axis... and I have been browsing manuals and vignettes 
> without finding any explicit suggestions on how to operate...
> 
> Can anyone, please, point my attention to the relevant documentation?
> 
> Thank you in advance and best regards,
> Marco
> 
> --
> Marco Manca, MD
> University of Maastricht
> Faculty of Health, Medicine and Life Sciences (FHML) 
> Cardiovascular Research Institute (CARIM) PO Box 616 6200 MD 
> Maastricht
> 
> E-mail: m.ma...@path.unimaas.nl
> Office telephone: +31(0)433874633
> Personal mobile: +31(0)626441205
> Twitter: @markomanka
> 
> 
> **
> ***
> 
> This email and any files transmitted with it are 
> confide...{{dropped:15}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Permutation test and R2 problem

2009-08-14 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roy
> Sent: Freitag, 14. August 2009 12:05
> To: r-help@r-project.org
> Subject: [R] Permutation test and R2 problem
> 
> Hi,
> 
> 
> I have optimized the shrinkage parameter (GCV)for ridge  and  
> got my r2 value is 70% . to check the sensitivity of the 
> result, I did permutation test. I permuted the response 
> vector and run for 1000 times and draw a distribution. But 
> now, I get r2 values highest 98% and some of them more than 
> 70 %. Is it expected from such type of test?

Depends on what exactly you are doing and on your data, but surely this
is not "unexpected" (even less given the information we have).
 
> *I was under impression that, r2 with real data set will 
> always maximum! And permutation will not be effected i.e. 
> permuted r2 will always less than real one! *

Why would that be? And even more, why would you do a permutation test
then if you knew in advance that all permuted values are below your
observed one? You optimize the shrinkage parameter for your data, not
your data for your shrinkage parameter. In the latter case you would
have been right. Given any fixed shrinkage parameter, you can always
find "some data" (e.g. the predicted values) that fit better than the
original. I guess that in most non-artificial cases with a reasonable
amount of data, there are at least some permutations that give a higher
r2.

Not sure what kind of sensitivity you want to check, but probably you'd
have to optimize the shrinkage parameter as well. And of course to make
sure that your permutations correspond to the original constraints. But
this opens a wide field, refer to any textbook on this matter for
further detail (e.g. Edgington & Onghena, Randomization tests).

HTH, Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-squared approximation may be incorrect

2009-08-11 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Nancy (?),
see ?chisq.test (in particular the examples and the comment on "expected
frequency" there). 
A rule of thumb (see any basic text book) for the chisquared
approximation being okay is that the expected value in each cell is at
least 5. The warning tells you that this does not hold true for at least
one of your cells. 
You might want to try 
chisq.test(t, simulate.p.value = TRUE) 
instead.
Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of SNN
Sent: Mittwoch, 12. August 2009 00:34
To: r-help@r-project.org
Subject: [R] Chi-squared approximation may be incorrect



Hi,


I am trying to run a Chi squre test but I am getting the following
message

Warning messages:
1: In chisq.test(t) : Chi-squared approximation may be incorrect

Does anyone what it means?

your help is appreciated


--
View this message in context:
http://www.nabble.com/Chi-squared-approximation-may-be-incorrect-tp24926
932p24926932.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditional randomization of groups

2009-08-07 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Hi "unknown",

As a quick patch, try something like

mydata$Set <- 1
mydata$Set[c(sample(mydata$ID[mydata$Type=="A"],
ifelse(runif(1)<1/2,2,3)), sample(mydata$ID[mydata$Type=="B"], 3) )[1:5]
] <- 2
 
HTH, Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of yabado
Sent: Donnerstag, 6. August 2009 23:29
To: r-help@r-project.org
Subject: [R] Conditional randomization of groups


Hi

I have a data set which is like this:

Data

ID   Type
1   A
2   B 
3   A
4   A
5   A
6   B
7   B
8   A
9   B
10 B

As you can see, there are 5 A and 5 B. Now, I want to a randomization of
the data set which will assign 5 ID  to a new group called "Set1" and
and the other 5 to "Set2"

The Set1 and Set2 will be either: 
Set1 (A A A B B ) and Set2 (A A B B B)
or
Set1(A A B B B) and Set2 (A A A B B).

Each new group need to have at two A and two B, and one more B or A.
Except this condition, everything else (ID, permutation, selection to
Set1 and
Set2) need to be random.

Can anyone show me how to do the randomization?


--
View this message in context:
http://www.nabble.com/Conditional-randomization-of-groups-tp24854783p248
54783.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rnorm() converted to daily

2009-04-17 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

SD in y is more than 15 times (!!!) larger than in x and z,
respectively, and hence SD of the mean y is also larger. 100,000 values
are not enough to stabilize this. You could have obtained 0.09 or even
larger as well. Try y with different seeds, y varies much more than x
and z do. Or check the variances: 

var.x <- var(replicate(100, mean( rnorm( 10, mean=0.08, sd=0.25 
var.y <- var(replicate(100, mean( rnorm( 10, mean=(0.08/252),
sd=(0.25/sqrt(252)) )) * 252))
var.z <- var(replicate(100, mean( rnorm( 10, mean=(0.08/252),
sd=0.001 )) * 252))

I guess what you are doing wrong is assuming that 0.25/sqrt(252) is
similar to 0.25/252, the latter being close so 0.001, while the former
is obviously not. Try to replace 0.25 in y by 0.0158, than you should be
close enough.

HTH, Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Ken-JP
Sent: Freitag, 17. April 2009 09:26
To: r-help@r-project.org
Subject: [R] rnorm() converted to daily



yearly 8% drift, 25% sd

why are x and y so different (x and z look ok)?

Does this have something to do with biases due to the relationship
between population and sample-estimates as a function of n samples and
sd?  Or am I doing something wrong?

---

set.seed( 1 );
x <- mean( rnorm( 10, mean=0.08, sd=0.25 )); set.seed( 1 ); y <-
mean( rnorm( 10, mean=(0.08/252), sd=(0.25/sqrt(252)) )) * 252;
set.seed( 1 ); z <- mean( rnorm( 10, mean=(0.08/252), sd=0.001 )) *
252; #
---
x# 0.07943898
y# 0.07109407
z# 0.07943449



--
View this message in context:
http://www.nabble.com/rnorm%28%29-converted-to-daily-tp23092444p23092444
.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Genstat into R - Randomisation test

2009-04-15 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Robert, Tom, Peter and all, 

If I remember correctly (don't have my copy at hand right now),
Edgington and Onghena differentiate between randomization tests and
permutation tests along the following lines:

Randomization test: Apply only to randomized experiments, for which we
consider the theoretically possible re-randomizations given the
constraints for the original randomization. In this setting, inference
is valid for the observed population, and does not require random
sampling (while, of course, this assumption is needed if we want to
generalize the findings). I think they also call it re-randomization
tests at least once, while the prefix is usually omitted for simplicity,
but this makes it clearer that an initial randomization is required.

Permutation test: Can be applied even to data from non-randomized
experiments, requires, however, random sampling. Hence, one additional
assumption is needed (one that is common to inference using classical
tests). This is hardly addressed in Edgington & Onghena. 

In both cases, the "answer" can be approximated by random draws if full
randomizations/permutations are difficult or impossible to perform, but
this applies to both methods. The distinction in Edginton (1980) as
mentioned by Tom is not used in the latest edition. 

In this regard, the comparison with "exact vs. approximate integral"
seems inappropriate to me, as the distinction is conceptual, not
computational. Neither is one the (non-)exhaustive variant of the other.
I doubt that "randomization" test is named according to "_random_ choice
of permutations" as implied by Peter, it's rather based on
"randomization" (of an experiment) in its best statistical meaning. 

Not sure whether the distinction is needed, but it might be helpful in
some instances, at least if used consistently. I have never seen another
distinction between these two, and the literature is inconsistent in its
use as well.

Just my two cents worth.
Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Robert A LaBudde
Sent: Donnerstag, 9. April 2009 17:38
To: Tom Backer Johnsen
Cc: r-help@r-project.org
Subject: Re: [R] Genstat into R - Randomisation test

At 04:43 AM 4/9/2009, Tom Backer Johnsen wrote:
>Peter Dalgaard wrote:
> > Mike Lawrence wrote:
> >> Looks like that code implements a non-exhaustive variant of the 
> >> randomization test, sometimes called a permutation test.
> >
> > Isn't it the other way around? (Permutation tests can be
> exhaustive by looking at all permutations, if a randomization test did

> that, then it wouldn't be random.)
>
>Eugene Edgington wrote an early book (1980) on this subject called 
>"Randomization tests", published by Dekker.  As far as I remember, he 
>differentiates between "Systematic permutation" tests where one looks 
>at all possible permutations.  This is of course prohibitive for 
>anything beyond trivially small samples.  For larger samples he uses 
>what he calls "Random permutations", where a random sample of the 
>possible permutations is used.
>
>Tom
>
>Peter Dalgaard wrote:
>>Mike Lawrence wrote:
>>>Looks like that code implements a non-exhaustive variant of the 
>>>randomization test, sometimes called a permutation test.
>>Isn't it the other way around? (Permutation tests can be exhaustive by

>>looking at all permutations, if a randomization test did that, then it

>>wouldn't be random.)

Edginton and Onghena make a similar distinction in their book, but I
think such a distinction is without merit.

Do we distinguish between "exact" definite integrals and "approximate"
ones obtained by numerical integration, of which Monte Carlo sampling is
just one class of algorithms? Don't we just say: 
"The integral was evaluated numerically by the [whatever] method to an
accuracy of [whatever], and the value was found to be [whatever]." 
Ditto for optimization problems.

A randomization test has one correct answer based upon theory. We are
simply trying to calculate that answer's value when it is difficult to
do so. Any approximate method that is used must be performed such that
the error of approximation is trivial with respect to the contemplated
use.

Doing Monte Carlo sampling to find an approximate answer to a
randomization test, or to an optimization problem, or to a bootstrap
distribution should be carried out with enough realizations to make sure
the residual error is trivial.

As Monte Carlo sampling is a "random" sampling-based approximate method.
The name does create confusion in terminology for "randomization" tests
for bootstrapping.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

Re: [R] OT: A test with dependent samples.

2009-02-11 Thread Meyners, Michael, LAUSANNE, AppliedMathematics

Rolf,

as you explicitly asked for a comment on your proposal: It is generally
equivalent to McNemar's test and maybe even more appropriate because of
the asymptotics involved in the chi-squared distribution, which might
not be too good with n=12. In some more detail:

McNemar's test basically considers the difference between the
off-diagonal elements, normalized by their sum. You do the same,
ignoring all (0,0) and (1,1) pairs (the latter do not occur in your
example anyway). binom.test(12,12,alternative="greater") gives you the
one-sided p-value.

If, however, you do not use the exact binomial distribution (with n=12)
but the normal approximation, you find E(X)=6 and VAR(X)=3 with X the
number of sequences (0,1), and you observed x=12. This gives you a
z-score of (12-6)/sqrt(3) and a corresponding one-sided p-value:

> pnorm(6/sqrt(3), lower.tail=F)
[1] 0.0002660028

Further, McNemar's test WITHOUT continuity correction gives 

> mcnemar.test(matrix(c(61,0,12,0),2,2), correct=F)

McNemar's Chi-squared test

data:  matrix(c(61, 0, 12, 0), 2, 2) 
McNemar's chi-squared = 12, df = 1, p-value = 0.000532

It comes as no surprise that the reported (two-sided!) p-value is
exactly twice the (one-sided!) p-value from the binomial test with
normal approximation:

> pnorm(6/sqrt(3), lower.tail=F)*2
[1] 0.0005320055

I do not want to stress the pros and cons of continuity corrections, but
if neglected, you see that you get the same results (except that McNemar
is generally two-sided), due to the relation of normal and chi-squared
distribution. 

If you use the binomial test, you can forget about asymptotics and
continuity correction, that's why I indeed consider you approach
superior to the chi-squared approximation used in McNemar's test. But
you might fail to find a reference for the exact approach...


You briefly asked in a later mail testing for p=0. Indeed, _any_
incident will disprove this hypothesis, and the p value reported by 

> binom.test(1,73,p=0)

Exact binomial test

data:  1 and 73 
number of successes = 1, number of trials = 73, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0 
95 percent confidence interval:
 0.0003467592 0.0739763232 
sample estimates:
probability of success 
0.01369863 

is not wrong (as R just tells us it is BELOW a certain value), but could
be refined by saying "p-value = 0". BTW, I am not sure what
"p-value=TRUE" tells me, as derived from binom.test(0,73,p=0). I
personally don't care about either, as to test H0: p=0, I would not use
any software but rather stress some basic statistical theory.


It remains the questions whether McNemar (or your proposal) is generally
appropriate here. Like others, I have doubts, except maybe if
observations were made on two subsequent days, on the second of which
the drug was administered. How could you otherwise be sure that the
increase in incidences it not due to the progression of cancer, say,
which would be difficult to rule out. Also, from my experience and in
contrast to you point of view, I'd rather assume it likely that a vet is
much more reluctant to apply the new treatment to an already vomitting
cat, your "baseline" with 0 out of 73 at least seems suspicious, and
until proven wrong, I'd take the assumption that this is not by chance
only.
I understand that this is an observational study with no
control/randomization, but concluding a side-effect from a statistically
significant change in incidence rates from this study seems questionable
to me. What I'd propose (and used in comparable situations) is to
confine yourself on the confidence interval and let the investigator
decide / discuss whether the lower bound of this is higher than what she
would expect under control or alternative treatment. She needs to have
some experience, if not historical data, and using a test or just this
confidence interval, you will only get a hint on a possible side-effect,
no formal proof (which is always difficult from observational studies).
Discussing the CI would seem fairer and even stronger to me then. In
other words: Mathematically, you'll get an appropriate test, while
conceptually, I'm far from being convinced.

Hope this makes sense, 
Michael




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Rolf Turner
Sent: Dienstag, 10. Februar 2009 22:33
To: R-help Forum
Subject: [R] OT: A test with dependent samples.


I am appealing to the general collective wisdom of this list in respect
of a statistics (rather than R) question.  This question comes to me
from a friend who is a veterinary oncologist.  In a study that she is
writing up there were 73 cats who were treated with a drug called
piroxicam.  None of the cats were observed to be subject to vomiting
prior to treatment; 12 of the cats were subject to vomiting after
treatment commenced.  She wants to be able to say that the treatment had
a ``significant''
impact with

63 matches

Mail list logo