Re: [R] different results between cor and ccf

2024-01-16 Thread Berwin A Turlach
G'day Patrick,

On Tue, 16 Jan 2024 09:19:40 +0100
Patrick Giraudoux  wrote:

[...]
> So far so good, but when I lag one of the series, I cannot find the
> same correlation as with ccf
> 
> > cor(x[1:(length(x)-1)],y[2:length(y)]) [1] -0.7903428  
> 
> ... where I expect -0.668 based on ccf
> 
> Can anyone explain why ?

The difference is explained by cff() seeing the complete data on x and
y and calculating the sample means only once, which are then used in
the calculations for each lag.  cor() sees only the data you pass down,
so calculates different estimates for the means of the two sequences.

To illustrate:

[...first execute your code...]
R> xx <- x-mean(x)
R> yy <- y-mean(y)
R> n <- length(x)
R> vx <- sum(xx^2)/n
R> vy <- sum(yy^2)/n
R> (c0 <- sum(xx*yy)/n/sqrt(vx*vy))
[1] -0.5948694
R> xx <- x[1:(length(x)-1)] - mean(x)
R> yy <- y[2:length(y)] - mean(y)
R> (c1 <- sum(xx*yy)/n/sqrt(vx*vy))
[1] -0.6676418


The help page of cff() points to MASS, 4ed, the more specific reference
is p 389ff. :)

Cheers,

Berwin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] different results between cor and ccf

2024-01-16 Thread Patrick Giraudoux
Dear listers,

I am working on a time series but find that for a given non-zero time 
lag correlations obtained by ccf and cor are different.

x <- c(0.85472102802704641, 1.6008990694641689, 2.5019632258894835, 
2.514654801253164, 3.3359198688206368, 3.5401357138398208, 
2.6304117871193538, 3.6694074965420009, 3.9125153101706776, 
4.4006592535478566, 3.0208991912866829, 2.959090589344433, 
3.8434635568566056, 2.1683644330520457, 2.3060571563512973, 
1.4680350663043942, 2.0346918622459054, 2.3674524446877538)

y <- c(2.3085729270534765, 2.0809088217491416, 1.6249456563631131, 
1.513338933177, 0.66754156827555422, 0.3080839731181978, 
0.5265304299394, 0.89070463020837132, 0.71600791432232669, 
0.82152341002975027, 0.22200290782700527, 0.6608410635137173, 
0.90715232876618945, 0.45624062770725898, 0.35074487486980244, 
1.1681750562971052, 1.6976462236079737, 0.88950230250556417)

cc<-ccf(x,y)

> cc Autocorrelations of series ‘X’, by lag -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 
2 0.098 0.139 0.127 -0.043 -0.049 0.069 -0.237 -0.471 -0.668 -0.595 
-0.269 -0.076 3 4 5 6 7 8 9 -0.004 0.123 0.272 0.283 0.401 0.435 0.454

cor(x,y) [1] -0.5948694

So far so good, but when I lag one of the series, I cannot find the same 
correlation as with ccf

> cor(x[1:(length(x)-1)],y[2:length(y)]) [1] -0.7903428

... where I expect -0.668 based on ccf

Can anyone explain why ?

Best,

Patrick

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Rolf Turner


On Sat, 13 Jan 2024 16:54:01 -0800
Bert Gunter  wrote:

> Well, this would seem to work:
> 
> e <- data.frame(Score = Score
>  , Country = factor(Country)
>  , Time = Time)
> 
> ncountry <- nlevels(e$Country)
> func= function(dat,idx) {
>if(length(unique(dat[idx,'Country'])) < ncountry) NA
>else coef(lm(Score~ Time + Country,data = dat[idx,]))
> }
> B <-  boot(e, func, R=1000)
> 
> boot.ci(B, index=2, type="perc")
> 
> Caveats:
> 1) boot.ci handles the NA's by omitting them, which of course gives a
> smaller resample and longer CI's than the value of R specified in the
> call to boot().
> 
> 2) I do not know if the *nice* statistical properties of the
> nonparametric bootstrap, e.g. asymptotic correctness, hold when
> bootstrap samples are produced in this way.  I leave that to wiser
> heads than me.



It seems to me that my shaganappi idea causes func() to return a vector
of coefficients with NAs corresponding to any missing levels of the
"Country" factor, whereas your idea causes it to return a scalar NA
whenever one or more of the levels of the "Country" factor is missing.

I have no idea what the implications of this are.  As I said before, I
have no idea what I am doing!

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Bert Gunter
Well, this would seem to work:

e <- data.frame(Score = Score
 , Country = factor(Country)
 , Time = Time)

ncountry <- nlevels(e$Country)
func= function(dat,idx) {
   if(length(unique(dat[idx,'Country'])) < ncountry) NA
   else coef(lm(Score~ Time + Country,data = dat[idx,]))
}
B <-  boot(e, func, R=1000)

boot.ci(B, index=2, type="perc")

Caveats:
1) boot.ci handles the NA's by omitting them, which of course gives a
smaller resample and longer CI's than the value of R specified in the call
to boot().

2) I do not know if the *nice* statistical properties of the nonparametric
bootstrap, e.g. asymptotic correctness, hold when bootstrap samples are
produced in this way.  I leave that to wiser heads than me.

Cheers,
Bert

On Sat, Jan 13, 2024 at 2:51 PM Ben Bolker  wrote:

>It took me a little while to figure this out, but: the problem is
> that if your resampling leaves out any countries (which is very likely),
> your model applied to the bootstrapped data will have fewer coefficients
> than your original model.
>
> I tried this:
>
> cc <- unique(e$Country)
> func <- function(data, idx) {
> coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
> }
>
> but lm() automatically drops coefficients for missing countries (I
> didn't think about it too hard, but I thought they might get returned as
> NA and that boot() might be able to handle that ...)
>
>If you want to do this I think you'll have to find a way to do a
> *stratified* bootstrap, restricting the bootstrap samples so that they
> always contain at least one sample from each country ... (I would have
> expected "strata = as.numeric(e$Country)" to do this, but it doesn't
> work the way I thought ... it tries to compute the statistics for *each*
> stratum ...)
>
>
>
> 
>
>   Debugging attempts:
>
> set.seed(101)
> options(error=recover)
> B= boot(e, func, R=1000)
>
>
> Error in t.star[r, ] <- res[[r]] :
>number of items to replace is not a multiple of replacement length
>
> Enter a frame number, or 0 to exit
>
> 1: boot(e, func, R = 1000)
>
> 
>
> Selection: 1
> Called from: top level
> Browse[1]> print(r)
> [1] 2
> Browse[1]> t.star[r,]
> [1] NA NA NA NA NA NA NA NA NA
>
> i[2,]
>   [1] 14 15 22 22 21 14  8  2 12 22 10 15  9  7  9 13 12 23  1 20 15  7
> 5 10
>
>
>
>
> On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:
> > Dear Duncan,
> > Dear Ivan,
> >
> > I really thank you a lot for your response.
> > So, if I correctly understand your answers the problem is coming from
> this line:
> >
> > coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> >
> > This line should be:
> > coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> >
> > If yes, now I get an error message (code here below)! So, it still does
> not work.
> >
> > Error in t.star[r, ] <- res[[r]] :
> >number of items to replace is not a multiple of replacement length
> >
> >
> > ##
> >
> Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
> 345,456,543,501)
> >
> > Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey",
> "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal",
> "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria",
> "Austria", "Austria")
> >
> > Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
> >
> > e=data.frame(Score, Country, Time)
> >
> >
> > library(boot)
> > func= function(data, idx) {
> > coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> > }
> > B= boot(e, func, R=1000)
> >
> > boot.ci(B, index=2, type="perc")
> > #
> >
> >
> >
> >
> >
> >
> >
> >
> > Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov <
> ikry...@disroot.org> a écrit :
> >
> >
> >
> >
> >
> > В Sat, 13 Jan 2024 20:33:47 + (UTC)
> >
> > varin sacha via R-help  пишет:
> >
> >> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> >
> >
> > Wrong place for the data=... argument. You meant to give it to lm(...),
> > but in the end it went to coef(...). Without the data=... argument, the
> > formula passed to lm() picks up the global variables inherited by the
> > func() closure.
> >
> > Unfortunately, S3 methods really do have to ignore extra arguments they
> > don't understand if the class is to be extended, so coef.lm isn't
> > allowed to complain to you about it.
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Ben Bolker
  It took me a little while to figure this out, but: the problem is 
that if your resampling leaves out any countries (which is very likely), 
your model applied to the bootstrapped data will have fewer coefficients 
than your original model.


I tried this:

cc <- unique(e$Country)
func <- function(data, idx) {
   coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
}

but lm() automatically drops coefficients for missing countries (I 
didn't think about it too hard, but I thought they might get returned as 
NA and that boot() might be able to handle that ...)


  If you want to do this I think you'll have to find a way to do a 
*stratified* bootstrap, restricting the bootstrap samples so that they 
always contain at least one sample from each country ... (I would have 
expected "strata = as.numeric(e$Country)" to do this, but it doesn't 
work the way I thought ... it tries to compute the statistics for *each* 
stratum ...)






 Debugging attempts:

set.seed(101)
options(error=recover)
B= boot(e, func, R=1000)


Error in t.star[r, ] <- res[[r]] :
  number of items to replace is not a multiple of replacement length

Enter a frame number, or 0 to exit

1: boot(e, func, R = 1000)



Selection: 1
Called from: top level
Browse[1]> print(r)
[1] 2
Browse[1]> t.star[r,]
[1] NA NA NA NA NA NA NA NA NA

i[2,]
 [1] 14 15 22 22 21 14  8  2 12 22 10 15  9  7  9 13 12 23  1 20 15  7 
5 10





On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:

Dear Duncan,
Dear Ivan,

I really thank you a lot for your response.
So, if I correctly understand your answers the problem is coming from this line:

coef(lm(Score~ Time + factor(Country)),data=data[idx,])

This line should be:
coef(lm(Score~ Time + factor(Country),data=data[idx,]))

If yes, now I get an error message (code here below)! So, it still does not 
work.

Error in t.star[r, ] <- res[[r]] :
   number of items to replace is not a multiple of replacement length


##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
  
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
  
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
  
e=data.frame(Score, Country, Time)
  
  
library(boot)

func= function(data, idx) {
coef(lm(Score~ Time + factor(Country),data=data[idx,]))
}
B= boot(e, func, R=1000)
  
boot.ci(B, index=2, type="perc")

#








Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov  a 
écrit :





В Sat, 13 Jan 2024 20:33:47 + (UTC)

varin sacha via R-help  пишет:


coef(lm(Score~ Time + factor(Country)),data=data[idx,])



Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread varin sacha via R-help
Dear Duncan,
Dear Ivan,

I really thank you a lot for your response.
So, if I correctly understand your answers the problem is coming from this line:

coef(lm(Score~ Time + factor(Country)),data=data[idx,])

This line should be:
coef(lm(Score~ Time + factor(Country),data=data[idx,]))

If yes, now I get an error message (code here below)! So, it still does not 
work.

Error in t.star[r, ] <- res[[r]] :
  number of items to replace is not a multiple of replacement length


##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
 
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", 
"USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", 
"UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
 
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
 
e=data.frame(Score, Country, Time)
 
 
library(boot)
func= function(data, idx) {
coef(lm(Score~ Time + factor(Country),data=data[idx,]))
}
B= boot(e, func, R=1000)
 
boot.ci(B, index=2, type="perc")
#








Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov  a 
écrit : 





В Sat, 13 Jan 2024 20:33:47 + (UTC)

varin sacha via R-help  пишет:

> coef(lm(Score~ Time + factor(Country)),data=data[idx,])


Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Ivan Krylov via R-help
В Sat, 13 Jan 2024 20:33:47 + (UTC)
varin sacha via R-help  пишет:

> coef(lm(Score~ Time + factor(Country)),data=data[idx,])

Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Duncan Murdoch

On 13/01/2024 3:33 p.m., varin sacha via R-help wrote:

Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
  345,456,543,501)
  
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey",

"USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal",
"Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria",
"Austria", "Austria")
  
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
  
e=data.frame(Score, Country, Time)
  
  
library(boot)

func= function(data, idx) {
coef(lm(Score~ Time + factor(Country)),data=data[idx,])
}
B= boot(e, func, R=1000)
  
boot.ci(B, index=2, type="perc")


Your function ignores the data, because it passes data[idx,] to coef(), 
not to lm().  coef() ignores it.  So the function is using the global 
variables you created earlier, not the ones in e.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange results : bootrstrp CIs

2024-01-13 Thread varin sacha via R-help
Dear R-experts,

Here below, my R code working BUT I get a strange result I was not expecting! 
Indeed, the 95% percentile bootstrap CIs is (-54.81, -54.81 ). Is anything 
going wrong?

Best,

##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
 
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", 
"USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", 
"UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
 
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
 
e=data.frame(Score, Country, Time)
 
 
library(boot)
func= function(data, idx) {
coef(lm(Score~ Time + factor(Country)),data=data[idx,])
}
B= boot(e, func, R=1000)
 
boot.ci(B, index=2, type="perc")
#

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting Results from LOF.test() from qpcR package

2023-08-20 Thread Bert Gunter
I would suggest that a simple plot of residuals vs. fitted values and
perhaps plots of residuals vs. the independent variables are almost always
more useful than omnibus LOF tests. (many would disagree!) However,as Ben
noted, this is wandering outside R-Help's strict remit, and you would be
better served by statistics discussion/help sites rather than R-Help.
Though with this small a data set and this complex a model, I would be
surprised if there could be LOF unless it were glaringly obvious from
simple plots.

Cheers,
Bert



-- Bert

On Sun, Aug 20, 2023 at 6:02 PM Paul Bernal  wrote:

> I am using LOF.test() function from the qpcR package and got the following
> result:
>
> > LOF.test(nlregmod3)
> $pF
> [1] 0.97686
>
> $pLR
> [1] 0.77025
>
> Can I conclude from the LOF.test() results that my nonlinear regression
> model is significant/statistically significant?
>
> Where my nonlinear model was fitted as follows:
> nlregmod3 <- nlsr(formula=y ~ theta1 - theta2*exp(-theta3*x), data =
> mod14data2_random,
>   start = list(theta1 = 0.37,
>theta2 = -exp(-1.8),
>theta3 = 0.05538))
> And the data used to fit this model is the following:
> dput(mod14data2_random)
> structure(list(index = c(14L, 27L, 37L, 33L, 34L, 16L, 7L, 1L,
> 39L, 36L, 40L, 19L, 28L, 38L, 32L), y = c(0.44, 0.4, 0.4, 0.4,
> 0.4, 0.43, 0.46, 0.49, 0.41, 0.41, 0.38, 0.42, 0.41, 0.4, 0.4
> ), x = c(16, 24, 32, 30, 30, 16, 12, 8, 36, 32, 36, 20, 26, 34,
> 28)), row.names = c(NA, -15L), class = "data.frame")
>
> Cheers,
> Paul
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting Results from LOF.test() from qpcR package

2023-08-20 Thread Ben Bolker
  The p-values are non-significant by any standard cutoff (e.g. 
p<=0.05, p<=0.1) but note that this is a *lack-of-fit* test -- i.e., 
"does my function fit the data well enough?", **not** a "significant 
pattern" test (e.g., "does my function fit the data better than a 
reasonable null model?").  In other words, this test tells you that you 
*can't* reject the null hypothesis that the model is "good enough" in 
some sense.


  To test against a constant null model, you could do

nullmod <- nlsr(y ~ const,
data = mod14data2_random,
start = list(const = 0.45))
anova(nlregmod3, nullmod)


  (This question seems to be verging on "general question about 
statistics" rather than "question about R", so maybe better for a venue 
like https://stats.stackexchange.com ??)


On 2023-08-20 9:01 p.m., Paul Bernal wrote:

I am using LOF.test() function from the qpcR package and got the following
result:


LOF.test(nlregmod3)

$pF
[1] 0.97686

$pLR
[1] 0.77025

Can I conclude from the LOF.test() results that my nonlinear regression
model is significant/statistically significant?

Where my nonlinear model was fitted as follows:
nlregmod3 <- nlsr(formula=y ~ theta1 - theta2*exp(-theta3*x), data =
mod14data2_random,
   start = list(theta1 = 0.37,
theta2 = -exp(-1.8),
theta3 = 0.05538))
And the data used to fit this model is the following:
dput(mod14data2_random)
structure(list(index = c(14L, 27L, 37L, 33L, 34L, 16L, 7L, 1L,
39L, 36L, 40L, 19L, 28L, 38L, 32L), y = c(0.44, 0.4, 0.4, 0.4,
0.4, 0.43, 0.46, 0.49, 0.41, 0.41, 0.38, 0.42, 0.41, 0.4, 0.4
), x = c(16, 24, 32, 30, 30, 16, 12, 8, 36, 32, 36, 20, 26, 34,
28)), row.names = c(NA, -15L), class = "data.frame")

Cheers,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpreting Results from LOF.test() from qpcR package

2023-08-20 Thread Paul Bernal
I am using LOF.test() function from the qpcR package and got the following
result:

> LOF.test(nlregmod3)
$pF
[1] 0.97686

$pLR
[1] 0.77025

Can I conclude from the LOF.test() results that my nonlinear regression
model is significant/statistically significant?

Where my nonlinear model was fitted as follows:
nlregmod3 <- nlsr(formula=y ~ theta1 - theta2*exp(-theta3*x), data =
mod14data2_random,
  start = list(theta1 = 0.37,
   theta2 = -exp(-1.8),
   theta3 = 0.05538))
And the data used to fit this model is the following:
dput(mod14data2_random)
structure(list(index = c(14L, 27L, 37L, 33L, 34L, 16L, 7L, 1L,
39L, 36L, 40L, 19L, 28L, 38L, 32L), y = c(0.44, 0.4, 0.4, 0.4,
0.4, 0.43, 0.46, 0.49, 0.41, 0.41, 0.38, 0.42, 0.41, 0.4, 0.4
), x = c(16, 24, 32, 30, 30, 16, 12, 8, 36, 32, 36, 20, 26, 34,
28)), row.names = c(NA, -15L), class = "data.frame")

Cheers,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] replicate results of tree package

2022-10-13 Thread Naresh Gurbuxani


I am trying to understand ``deviance'' in classification tree output
from tree package.

library(tree)

set.seed(911)
mydf <- data.frame(
name = as.factor(rep(c("A", "B"), c(10, 10))),
x = c(rnorm(10, -1), rnorm(10, 1)),
y = c(rnorm(10, 1), rnorm(10, -1)))

mytree <- tree(name ~ ., data = mydf)

mytree
# node), split, n, deviance, yval, (yprob)
#   * denotes terminal node

# 1) root 20 27.730 A ( 0.5 0.5 )  
#   2) y < -0.00467067 10  6.502 B ( 0.1 0.9 )  
# 4) x < 1.50596 5  5.004 B ( 0.2 0.8 ) *
# 5) x > 1.50596 5  0.000 B ( 0.0 1.0 ) *
#   3) y > -0.00467067 10  6.502 A ( 0.9 0.1 )  
# 6) x < -0.578851 5  0.000 A ( 1.0 0.0 ) *
# 7) x > -0.578851 5  5.004 A ( 0.8 0.2 ) *

# Replicate results for node 2
# Probabilities tie out
with(subset(mydf, y < -0.00457), table(name))
# name
# A B 
# 1 9

# Cannot replicate deviance = -1 * sum(p_mk * log(p_mk))
0.1 * log(0.1) + 0.9 * log(0.9)
# [1] 0.325083

1.  In the documentation, is it possible to find the definition of
deviance?
2.  Is it possible to see the code where it calculates deviance?

Thanks,
Naresh

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-10-08 Thread Rui Barradas

Hello,

Thanks for the compliment.

The R Core team, to which we must be very grateful for their great work 
along so many years, is, for good and obvious reasons, known to be 
change resistant and a new method would overload them even more with 
maintenance worries so I guess text.htest won't make it to core R.


(And that would be to open a precedent. Is, for instance, text.lm next?)

If it does make it to base R, which I doubt, then the base R package 
should be package graphics, right?


In the mean time I have found a small bug, near the end of print.htest 
there's



print(x$estimate, digits = digits, ...)


corresponding to my


paste("sample estimates:", round(ht$estimate, digits = digits), sep =  "\n")


The bug is that x$estimate/ht$estimate is a named vector and with paste 
the names attribute is lost. It doesn't plot "mean of x". I will 
probably try to sort this out but make no promises.


Thanks once again,

Rui Barradas

Às 14:47 de 07/10/21, PIKAL Petr escreveu:

Hallo Rui.

I finally tested your function and it seems to me that it should propagate
to the core R or at least to the stats package.

Although it is a bit overkill for my purpose, its use is straightforward and
simple. I checked it for several *test functions and did not find any
problem.

Thanks and best regards.

Petr


-Original Message-
From: Rui Barradas 
Sent: Friday, September 17, 2021 9:56 PM
To: PIKAL Petr ; r-help 
Subject: Re: [R] adding results to plot

Hello,

*.test functions in base R return a list of class "htest", with its own
print method.
The method text.htest for objects of class "htest" below is a hack. I
adapted the formating part of the code of print.htest to plot text().
I find it maybe too complicated but it seems to work.

Warning: Not debugged at all.



text.htest <- function (ht, x, y = NULL, digits = getOption("digits"),
  prefix = "", adj = NULL, ...) {
out <- list()
i_out <- 1L
out[[i_out]] <- paste(strwrap(ht$method, prefix = prefix), sep = "\n")
i_out <- i_out + 1L
out[[i_out]] <- paste0("data:  ", ht$data.name)

stat_line <- NULL
i_stat_line <- 0L
if (!is.null(ht$statistic)) {
  i_stat_line <- i_stat_line + 1L
  stat_line[[i_stat_line]] <- paste(names(ht$statistic), "=",
format(ht$statistic, digits =
max(1L, digits - 2L)))
}
if (!is.null(ht$parameter)) {
  i_stat_line <- i_stat_line + 1L
  stat_line[[i_stat_line]] <- paste(names(ht$parameter), "=",
format(ht$parameter, digits =
max(1L, digits - 2L)))
}
if (!is.null(ht$p.value)) {
  fp <- format.pval(ht$p.value, digits = max(1L, digits - 3L))
  i_stat_line <- i_stat_line + 1L
  stat_line[[i_stat_line]] <- paste("p-value",
if (startsWith(fp, "<")) fp else
paste("=", fp))
}
if(!is.null(stat_line)){
  i_out <- i_out + 1L
  #out[[i_out]] <- strwrap(paste(stat_line, collapse = ", "))
  out[[i_out]] <- paste(stat_line, collapse = ", ")
}
if (!is.null(ht$alternative)) {
  alt <- NULL
  i_alt <- 1L
  alt[[i_alt]] <- "alternative hypothesis: "
  if (!is.null(ht$null.value)) {
if (length(ht$null.value) == 1L) {
  alt.char <- switch(ht$alternative, two.sided = "not equal to",
 less = "less than", greater = "greater than")
  i_alt <- i_alt + 1L
  alt[[i_alt]] <- paste0("true ", names(ht$null.value), " is ",
alt.char,
 " ", ht$null.value)
}
else {
  i_alt <- i_alt + 1L
  alt[[i_alt]] <- paste0(ht$alternative, "\nnull values:\n")
}
  }
  else {
i_alt <- i_alt + 1L
alt[[i_alt]] <- ht$alternative
  }
  i_out <- i_out + 1L
  out[[i_out]] <- paste(alt, collapse = " ")
}
if (!is.null(ht$conf.int)) {
  i_out <- i_out + 1L
  out[[i_out]] <- paste0(format(100 * attr(ht$conf.int, "conf.level")),
 " percent confidence interval:\n", " ",
 paste(format(ht$conf.int[1:2], digits =
digits), collapse = " "))
}
if (!is.null(ht$estimate)) {
  i_out <- i_out + 1L
  out[[i_out]] <- paste("sample estimates:", round(ht$estimate,
digits = digits), sep = "\n")
}
i_out <- i_out + 1L
out[[i_out]] <- "\n"
names(out)[i_out] <- "sep"
out <- do.call(paste, out)
if(is.null(adj)) adj <- 0L
text(x, y, la

Re: [R] adding results to plot

2021-10-07 Thread PIKAL Petr
Hallo Rui.

I finally tested your function and it seems to me that it should propagate
to the core R or at least to the stats package.

Although it is a bit overkill for my purpose, its use is straightforward and
simple. I checked it for several *test functions and did not find any
problem.

Thanks and best regards.

Petr

> -Original Message-
> From: Rui Barradas 
> Sent: Friday, September 17, 2021 9:56 PM
> To: PIKAL Petr ; r-help 
> Subject: Re: [R] adding results to plot
> 
> Hello,
> 
> *.test functions in base R return a list of class "htest", with its own
> print method.
> The method text.htest for objects of class "htest" below is a hack. I
> adapted the formating part of the code of print.htest to plot text().
> I find it maybe too complicated but it seems to work.
> 
> Warning: Not debugged at all.
> 
> 
> 
> text.htest <- function (ht, x, y = NULL, digits = getOption("digits"),
>  prefix = "", adj = NULL, ...) {
>out <- list()
>i_out <- 1L
>out[[i_out]] <- paste(strwrap(ht$method, prefix = prefix), sep = "\n")
>i_out <- i_out + 1L
>out[[i_out]] <- paste0("data:  ", ht$data.name)
> 
>stat_line <- NULL
>i_stat_line <- 0L
>if (!is.null(ht$statistic)) {
>  i_stat_line <- i_stat_line + 1L
>  stat_line[[i_stat_line]] <- paste(names(ht$statistic), "=",
>format(ht$statistic, digits =
> max(1L, digits - 2L)))
>}
>if (!is.null(ht$parameter)) {
>  i_stat_line <- i_stat_line + 1L
>  stat_line[[i_stat_line]] <- paste(names(ht$parameter), "=",
>format(ht$parameter, digits =
> max(1L, digits - 2L)))
>}
>if (!is.null(ht$p.value)) {
>  fp <- format.pval(ht$p.value, digits = max(1L, digits - 3L))
>  i_stat_line <- i_stat_line + 1L
>  stat_line[[i_stat_line]] <- paste("p-value",
>if (startsWith(fp, "<")) fp else
> paste("=", fp))
>}
>if(!is.null(stat_line)){
>  i_out <- i_out + 1L
>  #out[[i_out]] <- strwrap(paste(stat_line, collapse = ", "))
>  out[[i_out]] <- paste(stat_line, collapse = ", ")
>}
>if (!is.null(ht$alternative)) {
>  alt <- NULL
>  i_alt <- 1L
>  alt[[i_alt]] <- "alternative hypothesis: "
>  if (!is.null(ht$null.value)) {
>if (length(ht$null.value) == 1L) {
>  alt.char <- switch(ht$alternative, two.sided = "not equal to",
> less = "less than", greater = "greater than")
>  i_alt <- i_alt + 1L
>  alt[[i_alt]] <- paste0("true ", names(ht$null.value), " is ",
> alt.char,
> " ", ht$null.value)
>}
>else {
>  i_alt <- i_alt + 1L
>  alt[[i_alt]] <- paste0(ht$alternative, "\nnull values:\n")
>}
>  }
>  else {
>i_alt <- i_alt + 1L
>alt[[i_alt]] <- ht$alternative
>  }
>  i_out <- i_out + 1L
>  out[[i_out]] <- paste(alt, collapse = " ")
>}
>if (!is.null(ht$conf.int)) {
>  i_out <- i_out + 1L
>  out[[i_out]] <- paste0(format(100 * attr(ht$conf.int, "conf.level")),
> " percent confidence interval:\n", " ",
> paste(format(ht$conf.int[1:2], digits =
> digits), collapse = " "))
>}
>if (!is.null(ht$estimate)) {
>  i_out <- i_out + 1L
>  out[[i_out]] <- paste("sample estimates:", round(ht$estimate,
> digits = digits), sep = "\n")
>}
>i_out <- i_out + 1L
>out[[i_out]] <- "\n"
>names(out)[i_out] <- "sep"
>out <- do.call(paste, out)
>if(is.null(adj)) adj <- 0L
>text(x, y, labels = out, adj = adj, ...)
>invisible(out)
> }
> 
> 
> res <- shapiro.test(rnorm(100))
> plot(1,1, ylim = c(0, length(res) + 1L))
> text(res, 0.6, length(res) - 1)
> res
> 
> res2 <- t.test(rnorm(100))
> plot(1,1, ylim = c(0, length(res2) + 1L))
> text(res2, 0.6, length(res2) - 1L)
> res2
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> 
> 
> Às 15:12 de 16/09/21, PIKAL Petr escreveu:
> > Dear all
> >
> > I know I have seen the answer somewhere but I am not able to find it.
> Please
> > help
> >
> 

Re: [R] adding results to plot

2021-09-17 Thread Rui Barradas

Hello,

*.test functions in base R return a list of class "htest", with its own 
print method.
The method text.htest for objects of class "htest" below is a hack. I 
adapted the formating part of the code of print.htest to plot text().

I find it maybe too complicated but it seems to work.

Warning: Not debugged at all.



text.htest <- function (ht, x, y = NULL, digits = getOption("digits"),
prefix = "", adj = NULL, ...) {
  out <- list()
  i_out <- 1L
  out[[i_out]] <- paste(strwrap(ht$method, prefix = prefix), sep = "\n")
  i_out <- i_out + 1L
  out[[i_out]] <- paste0("data:  ", ht$data.name)

  stat_line <- NULL
  i_stat_line <- 0L
  if (!is.null(ht$statistic)) {
i_stat_line <- i_stat_line + 1L
stat_line[[i_stat_line]] <- paste(names(ht$statistic), "=",
  format(ht$statistic, digits = 
max(1L, digits - 2L)))

  }
  if (!is.null(ht$parameter)) {
i_stat_line <- i_stat_line + 1L
stat_line[[i_stat_line]] <- paste(names(ht$parameter), "=",
  format(ht$parameter, digits = 
max(1L, digits - 2L)))

  }
  if (!is.null(ht$p.value)) {
fp <- format.pval(ht$p.value, digits = max(1L, digits - 3L))
i_stat_line <- i_stat_line + 1L
stat_line[[i_stat_line]] <- paste("p-value",
  if (startsWith(fp, "<")) fp else 
paste("=", fp))

  }
  if(!is.null(stat_line)){
i_out <- i_out + 1L
#out[[i_out]] <- strwrap(paste(stat_line, collapse = ", "))
out[[i_out]] <- paste(stat_line, collapse = ", ")
  }
  if (!is.null(ht$alternative)) {
alt <- NULL
i_alt <- 1L
alt[[i_alt]] <- "alternative hypothesis: "
if (!is.null(ht$null.value)) {
  if (length(ht$null.value) == 1L) {
alt.char <- switch(ht$alternative, two.sided = "not equal to",
   less = "less than", greater = "greater than")
i_alt <- i_alt + 1L
alt[[i_alt]] <- paste0("true ", names(ht$null.value), " is ", 
alt.char,

   " ", ht$null.value)
  }
  else {
i_alt <- i_alt + 1L
alt[[i_alt]] <- paste0(ht$alternative, "\nnull values:\n")
  }
}
else {
  i_alt <- i_alt + 1L
  alt[[i_alt]] <- ht$alternative
}
i_out <- i_out + 1L
out[[i_out]] <- paste(alt, collapse = " ")
  }
  if (!is.null(ht$conf.int)) {
i_out <- i_out + 1L
out[[i_out]] <- paste0(format(100 * attr(ht$conf.int, "conf.level")),
   " percent confidence interval:\n", " ",
   paste(format(ht$conf.int[1:2], digits = 
digits), collapse = " "))

  }
  if (!is.null(ht$estimate)) {
i_out <- i_out + 1L
out[[i_out]] <- paste("sample estimates:", round(ht$estimate, 
digits = digits), sep = "\n")

  }
  i_out <- i_out + 1L
  out[[i_out]] <- "\n"
  names(out)[i_out] <- "sep"
  out <- do.call(paste, out)
  if(is.null(adj)) adj <- 0L
  text(x, y, labels = out, adj = adj, ...)
  invisible(out)
}


res <- shapiro.test(rnorm(100))
plot(1,1, ylim = c(0, length(res) + 1L))
text(res, 0.6, length(res) - 1)
res

res2 <- t.test(rnorm(100))
plot(1,1, ylim = c(0, length(res2) + 1L))
text(res2, 0.6, length(res2) - 1L)
res2


Hope this helps,

Rui Barradas



Às 15:12 de 16/09/21, PIKAL Petr escreveu:

Dear all

I know I have seen the answer somewhere but I am not able to find it. Please
help


plot(1,1)
res <- shapiro.test(rnorm(100))
res


 Shapiro-Wilk normality test

data:  rnorm(100)
W = 0.98861, p-value = 0.5544

I would like to add whole res object to the plot.

I can do it one by one

text(locator(1), res$method)
text(locator(1), as.character(res$p.value))

...
But it is quite inconvenient

I could find some way in ggplot world but not in plain plot world.

Best regards
Petr


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-17 Thread PIKAL Petr
Thanks Jim

This seems to be strightforward and quite simple. I considered addtable2plot 
but was not sure how to make propper data frame from the result.

Regards
Petr

> -Original Message-
> From: Jim Lemon 
> Sent: Friday, September 17, 2021 2:31 AM
> To: PIKAL Petr ; r-help mailing list  project.org>
> Subject: Re: [R] adding results to plot
>
> Hi Petr,
> The hard part is the names for the data frame that addtable2plot requires:
>
> set.seed(753)
> res <- shapiro.test(rnorm(100))
> library(plotrix)
> plot(0,0,type="n",axes=FALSE)
> addtable2plot(0,0,data.frame(element=names(res)[1:2],
>   value=round(as.numeric(res[1:2]),3)),xjust=0.5,
>   title=res$method)
>
> There is probably a way to get blank names with data.frame(), but I gave up.
>
> Jim
>
> On Fri, Sep 17, 2021 at 12:22 AM PIKAL Petr  wrote:
> >
> > Dear all
> >
> > I know I have seen the answer somewhere but I am not able to find it.
> > Please help
> >
> > > plot(1,1)
> > > res <- shapiro.test(rnorm(100))
> > > res
> >
> > Shapiro-Wilk normality test
> >
> > data:  rnorm(100)
> > W = 0.98861, p-value = 0.5544
> >
> > I would like to add whole res object to the plot.
> >
> > I can do it one by one
> > > text(locator(1), res$method)
> > > text(locator(1), as.character(res$p.value))
> > ...
> > But it is quite inconvenient
> >
> > I could find some way in ggplot world but not in plain plot world.
> >
> > Best regards
> > Petr
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-16 Thread Jim Lemon
Hi Petr,
The hard part is the names for the data frame that addtable2plot requires:

set.seed(753)
res <- shapiro.test(rnorm(100))
library(plotrix)
plot(0,0,type="n",axes=FALSE)
addtable2plot(0,0,data.frame(element=names(res)[1:2],
  value=round(as.numeric(res[1:2]),3)),xjust=0.5,
  title=res$method)

There is probably a way to get blank names with data.frame(), but I gave up.

Jim

On Fri, Sep 17, 2021 at 12:22 AM PIKAL Petr  wrote:
>
> Dear all
>
> I know I have seen the answer somewhere but I am not able to find it. Please
> help
>
> > plot(1,1)
> > res <- shapiro.test(rnorm(100))
> > res
>
> Shapiro-Wilk normality test
>
> data:  rnorm(100)
> W = 0.98861, p-value = 0.5544
>
> I would like to add whole res object to the plot.
>
> I can do it one by one
> > text(locator(1), res$method)
> > text(locator(1), as.character(res$p.value))
> ...
> But it is quite inconvenient
>
> I could find some way in ggplot world but not in plain plot world.
>
> Best regards
> Petr
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-16 Thread PIKAL Petr
Thanks, 
I will try to elaborate on it.

Best regards.
Petr

> -Original Message-
> From: R-help  On Behalf Of Kimmo Elo
> Sent: Thursday, September 16, 2021 4:45 PM
> To: r-help@r-project.org
> Subject: Re: [R] adding results to plot
> 
> Hi!
> 
> Maybe with this:
> 
> text(x=0.6, y=1.2, paste0(capture.output(res), collapse="\n"), adj=0)
> 
> HTH,
> 
> Kimmo
> 
> to, 2021-09-16 kello 14:12 +, PIKAL Petr kirjoitti:
> > Virhe vahvistaessa allekirjoitusta: Virhe tulkittaessa Dear all
> >
> > I know I have seen the answer somewhere but I am not able to find it.
> > Please
> > help
> >
> > > plot(1,1)
> > > res <- shapiro.test(rnorm(100))
> > > res
> >
> > Shapiro-Wilk normality test
> >
> > data:  rnorm(100)
> > W = 0.98861, p-value = 0.5544
> >
> > I would like to add whole res object to the plot.
> >
> > I can do it one by one
> > > text(locator(1), res$method)
> > > text(locator(1), as.character(res$p.value))
> > ...
> > But it is quite inconvenient
> >
> > I could find some way in ggplot world but not in plain plot world.
> >
> > Best regards
> > Petr
> >
> > --=_NextPart_000_00C9_01D7AB15.A6E04EE0--
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-16 Thread PIKAL Petr
Hallo

Thanks, I will try wat option is better if yours or Kimmo's

Best regards
Petr

> -Original Message-
> From: Bert Gunter 
> Sent: Thursday, September 16, 2021 5:00 PM
> To: PIKAL Petr 
> Cc: r-help 
> Subject: Re: [R] adding results to plot
> 
> I was wrong. text() will attempt to coerce to character. This may be
> informative:
> 
> > as.character(res)
> [1] "c(W = 0.992709285275917)""0.869917232073854"
> [3] "Shapiro-Wilk normality test" "rnorm(100)"
> 
> plot(0:1, 0:1); text(0,seq(.1,.9,.2), labels = res, pos = 4)
> 
> Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Thu, Sep 16, 2021 at 7:44 AM Bert Gunter 
> wrote:
> >
> > res is a list of class "htest" . You can only add text strings  to a
> > plot via text(). I don't know what ggplot does.
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Thu, Sep 16, 2021 at 7:22 AM PIKAL Petr 
> wrote:
> > >
> > > Dear all
> > >
> > > I know I have seen the answer somewhere but I am not able to find
> > > it. Please help
> > >
> > > > plot(1,1)
> > > > res <- shapiro.test(rnorm(100))
> > > > res
> > >
> > > Shapiro-Wilk normality test
> > >
> > > data:  rnorm(100)
> > > W = 0.98861, p-value = 0.5544
> > >
> > > I would like to add whole res object to the plot.
> > >
> > > I can do it one by one
> > > > text(locator(1), res$method)
> > > > text(locator(1), as.character(res$p.value))
> > > ...
> > > But it is quite inconvenient
> > >
> > > I could find some way in ggplot world but not in plain plot world.
> > >
> > > Best regards
> > > Petr
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-16 Thread Bert Gunter
I was wrong. text() will attempt to coerce to character. This may be
informative:

> as.character(res)
[1] "c(W = 0.992709285275917)""0.869917232073854"
[3] "Shapiro-Wilk normality test" "rnorm(100)"

plot(0:1, 0:1); text(0,seq(.1,.9,.2), labels = res, pos = 4)

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Sep 16, 2021 at 7:44 AM Bert Gunter  wrote:
>
> res is a list of class "htest" . You can only add text strings  to a
> plot via text(). I don't know what ggplot does.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Thu, Sep 16, 2021 at 7:22 AM PIKAL Petr  wrote:
> >
> > Dear all
> >
> > I know I have seen the answer somewhere but I am not able to find it. Please
> > help
> >
> > > plot(1,1)
> > > res <- shapiro.test(rnorm(100))
> > > res
> >
> > Shapiro-Wilk normality test
> >
> > data:  rnorm(100)
> > W = 0.98861, p-value = 0.5544
> >
> > I would like to add whole res object to the plot.
> >
> > I can do it one by one
> > > text(locator(1), res$method)
> > > text(locator(1), as.character(res$p.value))
> > ...
> > But it is quite inconvenient
> >
> > I could find some way in ggplot world but not in plain plot world.
> >
> > Best regards
> > Petr
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-16 Thread Kimmo Elo
Hi!

Maybe with this:

text(x=0.6, y=1.2, paste0(capture.output(res), collapse="\n"), adj=0)

HTH,

Kimmo

to, 2021-09-16 kello 14:12 +, PIKAL Petr kirjoitti:
>   Virhe vahvistaessa allekirjoitusta: Virhe tulkittaessa
> Dear all
> 
> I know I have seen the answer somewhere but I am not able to find it.
> Please
> help
> 
> > plot(1,1)
> > res <- shapiro.test(rnorm(100))
> > res
> 
> Shapiro-Wilk normality test
> 
> data:  rnorm(100)
> W = 0.98861, p-value = 0.5544
> 
> I would like to add whole res object to the plot.
> 
> I can do it one by one
> > text(locator(1), res$method)
> > text(locator(1), as.character(res$p.value))
> ...
> But it is quite inconvenient
> 
> I could find some way in ggplot world but not in plain plot world.
> 
> Best regards
> Petr
> 
> --=_NextPart_000_00C9_01D7AB15.A6E04EE0--
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding results to plot

2021-09-16 Thread Bert Gunter
res is a list of class "htest" . You can only add text strings  to a
plot via text(). I don't know what ggplot does.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Sep 16, 2021 at 7:22 AM PIKAL Petr  wrote:
>
> Dear all
>
> I know I have seen the answer somewhere but I am not able to find it. Please
> help
>
> > plot(1,1)
> > res <- shapiro.test(rnorm(100))
> > res
>
> Shapiro-Wilk normality test
>
> data:  rnorm(100)
> W = 0.98861, p-value = 0.5544
>
> I would like to add whole res object to the plot.
>
> I can do it one by one
> > text(locator(1), res$method)
> > text(locator(1), as.character(res$p.value))
> ...
> But it is quite inconvenient
>
> I could find some way in ggplot world but not in plain plot world.
>
> Best regards
> Petr
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding results to plot

2021-09-16 Thread PIKAL Petr
Dear all

I know I have seen the answer somewhere but I am not able to find it. Please
help

> plot(1,1)
> res <- shapiro.test(rnorm(100))
> res

Shapiro-Wilk normality test

data:  rnorm(100)
W = 0.98861, p-value = 0.5544

I would like to add whole res object to the plot.

I can do it one by one
> text(locator(1), res$method)
> text(locator(1), as.character(res$p.value))
...
But it is quite inconvenient

I could find some way in ggplot world but not in plain plot world.

Best regards
Petr
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-20 Thread bharat rawlley via R-help
Thank you for your time, Professor John! Much appreciated! 
Yours sincerely Bharat Rawlley 



Sent from Yahoo Mail on Android 
 
  On Thu, 21 Jan 2021 at 4:40 AM, John Fox wrote:   Dear 
Bharat Rawlley,

On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:
>  Dear Professor John,
> Thank you very much for your reply!
> I agree with you that the non-parametric tests I mentioned in my previous 
> email (Moods median test and Median test) do not make sense in this situation 
> as they treat PFD_n and drug_code as different groups. As you correctly said, 
> I want to use PFD_n as a vector of scores and drug_code to make two groups 
> out of it. This is exactly what the Independent samples median test does in 
> SPSS. I wish to perform the same test in R and am unable to do so.
> Simply put, I am asking how to perform the Independent samples median test in 
> R just like it is performed in SPSS?

I'm afraid that I'm the wrong person to ask, since I haven't used SPSS 
in perhaps 30 years and have no idea what it does to test for 
differences in medians. A Google search for "independent samples median 
test in R" turns up a number of hits.

> 
> Secondly, for the question you are asking about the test statistic, I have 
> not performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code 
> data. I have said something to the contrary in my first email, I apologize 
> for that.

For continuous data, the Wilcoxon test is, I believe, a reasonable 
choice, but not when there are so many ties. If SPSS doesn't perform a 
Wilcoxon test for a difference in medians, then there's of course no 
reason to expect that the p-values would be the same.

Best,
  John

> Thank you very much for your time!
> Yours sincerelyBharat Rawlley    On Wednesday, 20 January, 2021, 04:47:21 am 
> IST, John Fox  wrote:
>  
>  Dear Bharat Rawlley,
> 
> What you tried to do appears to be nonsense. That is, you're treating
> PFD_n and drug_code as if they were scores for two different groups.
> 
> I assume that what you really want to do is to treat PFD_n as a vector
> of scores and drug_code as defining two groups. If that's correct, and
> with your data into Data, you can try the following:
> 
> --snip --
> 
>  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)
> 
>      Wilcoxon rank sum test with continuity correction
> 
> data:  PFD_n by drug_code
> W = 197, p-value = 0.05563
> alternative hypothesis: true location shift is not equal to 0
> 95 percent confidence interval:
>    -2.14e+00  5.037654e-05
> sample estimates:
> difference in location
>                -1.19
> 
> Warning messages:
> 1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
>    cannot compute exact p-value with ties
> 2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
>    cannot compute exact confidence intervals with ties
> 
> --snip --
> 
> You can get an approximate confidence interval by specifying exact=FALSE:
> 
> --snip --
> 
>  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)
> 
>      Wilcoxon rank sum test with continuity correction
> 
> data:  PFD_n by drug_code
> W = 197, p-value = 0.05563
> alternative hypothesis: true location shift is not equal to 0
> 95 percent confidence interval:
>    -2.14e+00  5.037654e-05
> sample estimates:
> difference in location
>                -1.19
> 
> --snip --
> 
> As it turns out, your data are highly discrete and have a lot of ties
> (see in particular PFD_n = 28):
> 
> --snip --
> 
>  > xtabs(~ PFD_n + drug_code, data=Data)
> 
>        drug_code
> PFD_n  0  1
>      0  2  0
>      16  1  1
>      18  0  1
>      19  0  1
>      20  2  0
>      22  0  1
>      24  2  0
>      25  1  2
>      26  5  2
>      27  4  2
>      28  5 13
>      30  1  2
> 
> --snip --
> 
> I'm no expert in nonparametric inference, but I doubt whether the
> approximate p-value will be very accurate for data like these.
> 
> I don't know why wilcox.test() (correctly used) and SPSS are giving you
> slightly different results -- assuming that you're actually doing the
> same thing in both cases. I couldn't help but notice that most of your
> data are missing. Are you getting the same value of the test statistic
> and different p-values, or is the test statistic different as well?
> 
> I hope this helps,
>    John
> 
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
> 
> On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:
>>    Thank you for the reply and suggestion, Michael!
>> I used dput() and this is the output I can share with you. Simply explained, 
>> I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
>> values (including NA). The problem with the Wilcoxon Rank Sum test has been 
>> described in my first email.
>> Please do let me know if you need any further clarification from my 

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-20 Thread John Fox

Dear Bharat Rawlley,

On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:

  Dear Professor John,
Thank you very much for your reply!
I agree with you that the non-parametric tests I mentioned in my previous email 
(Moods median test and Median test) do not make sense in this situation as they 
treat PFD_n and drug_code as different groups. As you correctly said, I want to 
use PFD_n as a vector of scores and drug_code to make two groups out of it. 
This is exactly what the Independent samples median test does in SPSS. I wish 
to perform the same test in R and am unable to do so.
Simply put, I am asking how to perform the Independent samples median test in R 
just like it is performed in SPSS?


I'm afraid that I'm the wrong person to ask, since I haven't used SPSS 
in perhaps 30 years and have no idea what it does to test for 
differences in medians. A Google search for "independent samples median 
test in R" turns up a number of hits.




Secondly, for the question you are asking about the test statistic, I have not 
performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. 
I have said something to the contrary in my first email, I apologize for that.


For continuous data, the Wilcoxon test is, I believe, a reasonable 
choice, but not when there are so many ties. If SPSS doesn't perform a 
Wilcoxon test for a difference in medians, then there's of course no 
reason to expect that the p-values would be the same.


Best,
 John


Thank you very much for your time!
Yours sincerelyBharat RawlleyOn Wednesday, 20 January, 2021, 04:47:21 am IST, 
John Fox  wrote:
  
  Dear Bharat Rawlley,


What you tried to do appears to be nonsense. That is, you're treating
PFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vector
of scores and drug_code as defining two groups. If that's correct, and
with your data into Data, you can try the following:

--snip --

  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

     Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
   -2.14e+00  5.037654e-05
sample estimates:
difference in location
               -1.19

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact confidence intervals with ties

--snip --

You can get an approximate confidence interval by specifying exact=FALSE:

--snip --

  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

     Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
   -2.14e+00  5.037654e-05
sample estimates:
difference in location
               -1.19

--snip --

As it turns out, your data are highly discrete and have a lot of ties
(see in particular PFD_n = 28):

--snip --

  > xtabs(~ PFD_n + drug_code, data=Data)

       drug_code
PFD_n  0  1
     0  2  0
     16  1  1
     18  0  1
     19  0  1
     20  2  0
     22  0  1
     24  2  0
     25  1  2
     26  5  2
     27  4  2
     28  5 13
     30  1  2

--snip --

I'm no expert in nonparametric inference, but I doubt whether the
approximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving you
slightly different results -- assuming that you're actually doing the
same thing in both cases. I couldn't help but notice that most of your
data are missing. Are you getting the same value of the test statistic
and different p-values, or is the test statistic different as well?

I hope this helps,
   John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:

   Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email.
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-20 Thread bharat rawlley via R-help
 Dear Professor John, 
Thank you very much for your reply! 
I agree with you that the non-parametric tests I mentioned in my previous email 
(Moods median test and Median test) do not make sense in this situation as they 
treat PFD_n and drug_code as different groups. As you correctly said, I want to 
use PFD_n as a vector of scores and drug_code to make two groups out of it. 
This is exactly what the Independent samples median test does in SPSS. I wish 
to perform the same test in R and am unable to do so.
Simply put, I am asking how to perform the Independent samples median test in R 
just like it is performed in SPSS? 

Secondly, for the question you are asking about the test statistic, I have not 
performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. 
I have said something to the contrary in my first email, I apologize for that. 
Thank you very much for your time! 
Yours sincerelyBharat RawlleyOn Wednesday, 20 January, 2021, 04:47:21 am 
IST, John Fox  wrote:  
 
 Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating 
PFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vector 
of scores and drug_code as defining two groups. If that's correct, and 
with your data into Data, you can try the following:

--snip --

 > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

    Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
  -2.14e+00  5.037654e-05
sample estimates:
difference in location
              -1.19

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact confidence intervals with ties

--snip --

You can get an approximate confidence interval by specifying exact=FALSE:

--snip --

 > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

    Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
  -2.14e+00  5.037654e-05
sample estimates:
difference in location
              -1.19

--snip --

As it turns out, your data are highly discrete and have a lot of ties 
(see in particular PFD_n = 28):

--snip --

 > xtabs(~ PFD_n + drug_code, data=Data)

      drug_code
PFD_n  0  1
    0  2  0
    16  1  1
    18  0  1
    19  0  1
    20  2  0
    22  0  1
    24  2  0
    25  1  2
    26  5  2
    27  4  2
    28  5 13
    30  1  2

--snip --

I'm no expert in nonparametric inference, but I doubt whether the 
approximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving you 
slightly different results -- assuming that you're actually doing the 
same thing in both cases. I couldn't help but notice that most of your 
data are missing. Are you getting the same value of the test statistic 
and different p-values, or is the test statistic different as well?

I hope this helps,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:
>  Thank you for the reply and suggestion, Michael!
> I used dput() and this is the output I can share with you. Simply explained, 
> I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
> values (including NA). The problem with the Wilcoxon Rank Sum test has been 
> described in my first email.
> Please do let me know if you need any further clarification from my side! 
> Thanks a lot for your time!
> structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 
> 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 
> 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 
> 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 
> 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 
> 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1, NA, NA, 0, NA, 4, NA, 
> 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, 
> NA, NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 
> 2, 2, NA, 28, 0, NA, 4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 
> 4, 28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 
> 0, NA, 2, 8, 3, NA, NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, 
> NA), PFD_n = c(27, NA, NA, 28, NA, 

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-19 Thread John Fox

Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating 
PFD_n and drug_code as if they were scores for two different groups.


I assume that what you really want to do is to treat PFD_n as a vector 
of scores and drug_code as defining two groups. If that's correct, and 
with your data into Data, you can try the following:


--snip --

> wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.14e+00  5.037654e-05
sample estimates:
difference in location
 -1.19

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact confidence intervals with ties

--snip --

You can get an approximate confidence interval by specifying exact=FALSE:

--snip --

> wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.14e+00  5.037654e-05
sample estimates:
difference in location
 -1.19

--snip --

As it turns out, your data are highly discrete and have a lot of ties 
(see in particular PFD_n = 28):


--snip --

> xtabs(~ PFD_n + drug_code, data=Data)

 drug_code
PFD_n  0  1
   0   2  0
   16  1  1
   18  0  1
   19  0  1
   20  2  0
   22  0  1
   24  2  0
   25  1  2
   26  5  2
   27  4  2
   28  5 13
   30  1  2

--snip --

I'm no expert in nonparametric inference, but I doubt whether the 
approximate p-value will be very accurate for data like these.


I don't know why wilcox.test() (correctly used) and SPSS are giving you 
slightly different results -- assuming that you're actually doing the 
same thing in both cases. I couldn't help but notice that most of your 
data are missing. Are you getting the same value of the test statistic 
and different p-values, or is the test statistic different as well?


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:

  Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email.
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n 
= c(1, NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, NA, 
NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0, NA, 4, NA, 1, NA, 
NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4, 28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, 
NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA, NA, 28, NA, 26, NA, 20, NA, 30, 
24, NA, NA, NA, NA, NA, 18, NA, 28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, 
NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26, NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 
27, NA, NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 25, NA, NA, NA, NA, NA, NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26, 20, 25, NA, NA, 
NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -132L), class = 
c("tbl_df", "tbl", "data.frame"))

Yours sincerely Bharat RawlleyOn Tuesday, 19 January, 2021, 03:53:27 pm IST, 
Michael Dewey  wrote:
  
  Unfortunately your data did not come through. Try using dput() and then

pasting that into the body of your e-mail message.

On 18/01/2021 17:26, bharat rawlley via R-help wrote:

Hello,
On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following 
discrepancies which I am unable to explain.
Q1 In the attached data set, I was trying to compare 

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-19 Thread bharat rawlley via R-help
 Thank you for the reply and suggestion, Michael! 
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email. 
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!  
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 
0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 
1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 
1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 
1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1, NA, NA, 0, NA, 4, NA, 10, NA, 
0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, 
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 
0, NA, 4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4, 28, NA, NA, 0, 
2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, 
NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, 
NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA, 
NA, 28, NA, 26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA, 28, NA, NA, NA, 
NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, 28, 28, 
16, 28, NA, 27, 26, 27, 26, 26, NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 
28, 25, 27, NA, NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA, 
28, NA, NA, NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, 22, 27, NA, 
NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26, 20, 25, NA, NA, NA, 30, NA, NA, 
NA, 19, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -132L), class = 
c("tbl_df", "tbl", "data.frame"))

Yours sincerely Bharat RawlleyOn Tuesday, 19 January, 2021, 03:53:27 pm 
IST, Michael Dewey  wrote:  
 
 Unfortunately your data did not come through. Try using dput() and then 
pasting that into the body of your e-mail message.

On 18/01/2021 17:26, bharat rawlley via R-help wrote:
> Hello,
> On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the 
> following discrepancies which I am unable to explain.
> Q1 In the attached data set, I was trying to compare freq4w_n in those with 
> drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779.
> The code I used in R is as follows -
> wilcox.test(freq4w_n, drug_code, conf.int = T)
> 
> 
> Q2 Similarly, in the same data set, when trying to compare PFD_n in those 
> with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value < 
> 2.2e-16.
> The code I used in R is as follows -
> wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = 
> TRUE, paired = FALSE, conf.int = TRUE)
> 
> 
> I have tried searching on Google and watching some Youtube tutorials, I 
> cannot find an answer, Any help will be really appreciated, Thank you!
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Michael
http://www.dewey.myzen.co.uk/home.html
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-19 Thread Michael Dewey
Unfortunately your data did not come through. Try using dput() and then 
pasting that into the body of your e-mail message.


On 18/01/2021 17:26, bharat rawlley via R-help wrote:

Hello,
On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following 
discrepancies which I am unable to explain.
Q1 In the attached data set, I was trying to compare freq4w_n in those with 
drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779.
The code I used in R is as follows -
wilcox.test(freq4w_n, drug_code, conf.int = T)


Q2 Similarly, in the same data set, when trying to compare PFD_n in those with 
drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value < 2.2e-16.
The code I used in R is as follows -
wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = 
TRUE, paired = FALSE, conf.int = TRUE)


I have tried searching on Google and watching some Youtube tutorials, I cannot 
find an answer, Any help will be really appreciated, Thank you!
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-18 Thread bharat rawlley via R-help
Hello, 
On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following 
discrepancies which I am unable to explain.
Q1 In the attached data set, I was trying to compare freq4w_n in those with 
drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779. 
The code I used in R is as follows - 
wilcox.test(freq4w_n, drug_code, conf.int = T)


Q2 Similarly, in the same data set, when trying to compare PFD_n in those with 
drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value < 2.2e-16. 
The code I used in R is as follows - 
wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = 
TRUE, paired = FALSE, conf.int = TRUE)


I have tried searching on Google and watching some Youtube tutorials, I cannot 
find an answer, Any help will be really appreciated, Thank you! 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Save Results in svg format

2020-12-03 Thread David Winsemius



On 12/3/20 7:12 PM, Anas Jamshed wrote:

#Loading the required libraries
library(ape)
library(phangorn)
library(seqinr)
#Importing the required file
align_5 <- read.alignment("C:/Users/VAMSI/align 5.fasta", format = "fast")
align_119 <- read.alignment("C:/Users/VAMSI/align 119.fasta", format = "fasta")
Computing the distance matrix for both UPGMA and NJ algorithms implementation.
  matrix_5x5 <- dist.alignment(align_5, matrix = "identity")
summary(matrix_5x5)

matrix_119x119 <- dist.alignment(align_119, matrix = "identity")
summary(matrix_119x119)
#Implementation of UPGMA algorithm for a small matrix (5x5) and entire
matrix (119x119)
UPGMA_5x5 <- upgma(matrix_5x5)
UPGMA_119x119 <- upgma(matrix_119x119)
summary(UPGMA_5x5)

summary(UPGMA_119x119)
#Implementation of NJ algorithm for a small matrix (5x5) and entire
matrix (119x119)
NJ_5x5 <- NJ(matrix_5x5)
NJ_119x119 <- NJ(matrix_119x119)
summary(NJ_5x5)

summary(NJ_119x119)


I have done this whole analysis but don't know how can I  the save my
tree file in svg or some other image format



SVG format is for graphics. I don't see any R graphics calls or anything 
I recognize as a "tree". (Perhaps the summary function for objects 
returned from `upgma` include graphics? I surely do not know.)


Cairo graphics is supported in the grDevices package. It should be 
loaded by default. Have your tried this at your console:



?svg


--

David.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Save Results in svg format

2020-12-03 Thread Bert Gunter
Warning: I have basically no idea what you are doing. I presume that you
have consulted ?svg, however. If not , you should probably do so.

Also, a search on "save as svg" on rseek.org  brought up the svglite
package, among other resources. You might want to see what that offers.

Cheers,
Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 3, 2020 at 7:12 PM Anas Jamshed 
wrote:

> #Loading the required libraries
> library(ape)
> library(phangorn)
> library(seqinr)
> #Importing the required file
> align_5 <- read.alignment("C:/Users/VAMSI/align 5.fasta", format = "fast")
> align_119 <- read.alignment("C:/Users/VAMSI/align 119.fasta", format =
> "fasta")
> Computing the distance matrix for both UPGMA and NJ algorithms
> implementation.
>  matrix_5x5 <- dist.alignment(align_5, matrix = "identity")
> summary(matrix_5x5)
>
> matrix_119x119 <- dist.alignment(align_119, matrix = "identity")
> summary(matrix_119x119)
> #Implementation of UPGMA algorithm for a small matrix (5x5) and entire
> matrix (119x119)
> UPGMA_5x5 <- upgma(matrix_5x5)
> UPGMA_119x119 <- upgma(matrix_119x119)
> summary(UPGMA_5x5)
>
> summary(UPGMA_119x119)
> #Implementation of NJ algorithm for a small matrix (5x5) and entire
> matrix (119x119)
> NJ_5x5 <- NJ(matrix_5x5)
> NJ_119x119 <- NJ(matrix_119x119)
> summary(NJ_5x5)
>
> summary(NJ_119x119)
>
>
> I have done this whole analysis but don't know how can I  the save my
> tree file in svg or some other image format
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Save Results in svg format

2020-12-03 Thread Anas Jamshed
#Loading the required libraries
library(ape)
library(phangorn)
library(seqinr)
#Importing the required file
align_5 <- read.alignment("C:/Users/VAMSI/align 5.fasta", format = "fast")
align_119 <- read.alignment("C:/Users/VAMSI/align 119.fasta", format = "fasta")
Computing the distance matrix for both UPGMA and NJ algorithms implementation.
 matrix_5x5 <- dist.alignment(align_5, matrix = "identity")
summary(matrix_5x5)

matrix_119x119 <- dist.alignment(align_119, matrix = "identity")
summary(matrix_119x119)
#Implementation of UPGMA algorithm for a small matrix (5x5) and entire
matrix (119x119)
UPGMA_5x5 <- upgma(matrix_5x5)
UPGMA_119x119 <- upgma(matrix_119x119)
summary(UPGMA_5x5)

summary(UPGMA_119x119)
#Implementation of NJ algorithm for a small matrix (5x5) and entire
matrix (119x119)
NJ_5x5 <- NJ(matrix_5x5)
NJ_119x119 <- NJ(matrix_119x119)
summary(NJ_5x5)

summary(NJ_119x119)


I have done this whole analysis but don't know how can I  the save my
tree file in svg or some other image format

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-18 Thread Marc Roos
 
Maybe this could be interesting to verify against found anomalies?

"A second memory card with uncounted votes was found during an audit in 
Fayette County, Georgia, containing 2,755 votes"
https://www.zerohedge.com/political/second-memory-card-2755-votes-found-during-georgia-election-audit-decreasing-biden-lead

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-16 Thread Matthew McCormack
Bye the way, I thought I had checked my e-mail before sending it, but my 
last e-mail had an unfortunate typo with an 'I' that originally belonged 
to the beginning of a deleted sentence.


Matthew

On 11/17/20 1:54 AM, Matthew McCormack wrote:

External Email - Use Caution
 No reason to apologize. It's a timely and very interesting topic 
that provides a glimpse into the application of statistics in 
forensics. I had never heard of Benford's Law before and I think it is 
really fascinating. One of those very counter intuitive rules that 
show up in statistics and probability; like the Monty Hall problem. 
Why in the world does Benford's Law work ?  I have been wondering if 
it could in any way be applied to biological data analysis. (Also, I 
discovered Stand-up-maths !).


   Often things are not as easy to figure out as we may first 
estimate. I think you would have to start with how you would envision 
a fraud to be committed and then figure out if there is a statistical 
analysis that could detect it, or develop an anlalysis. For example, 
if a voting machine were weighting votes and giving 8/10ths of a vote 
to 'yes' and 10/10ths vote to a 'no'. Is there some statistical 
analysis that could detect this ?  I, Or if someone dumped a couple of 
thousand fraudulent ballots in a vote counting center, is there some 
statistical analysis that could detect this ?  Who knows, maybe a 
whole new field waiting to be explored. A oncee-in-a-while dive into a 
practical application of statistics that has current interest can be 
fun and enlightening for those interested.


Matthew

On 11/16/20 9:01 PM, Abby Spurdle wrote:

 External Email - Use Caution

I've come to the conclusion this whole thing was a waste of time.
This is after evaluating much of the relevant information.

The main problem is a large number of red herrings (some in the data,
some in the context), leading pointless data analysis and pointless
data collection.
It's unlikely that sophisticated software, or sophisticated
statistical modelling tools will make any difference.
Although pretty plots, and pretty web-graphics are achievable.

Sorry list, for encouraging this discussion...


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://secure-web.cisco.com/1icMQVewwCL4P0r0nMcvTG7cQoLGA8vrClXS_7PuCMhfAP5EDlSYNlGppDKYtdY57R0Pqq_TLC4uyH7CSQjzrxbWonQqTR0d7Owzt1oJUshxqjBaYybtXPytcEKTyGL0Wj0aNw-lMCtbQG1wHYe2Gw8r8h0LpQfFihvpv8gyl3L3VpdCfL2GdiuVFUHGynOFY8Lu5fZwQDVdp1bN_ZAAbRHhoQEipiM-vRiK0kf20oD1N3CXQfqyS4O2r9kRmArVLk8RiqyHI0rj_I1iVq5m-bQ/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help 

PLEASE do read the posting guide 
http://secure-web.cisco.com/1K7htkVeCfn5qRcheVmtA1IibcAUehTMiQa-HWmOXY4aZKKdTMqGoB7oWO4dEEBc1qJDtaTeaodidutGZhJexhH2C4c_FpLR_XA-z7GOvfq77dIwhWfnGcvj_31a6y-SXgu5nPP4AdpguRqwR433dZOUMo5MtP5xwtOUGO-EcWd4AvW_7NUFljEFGuAMs06pzQoK4BPfSavqq_QAj-R_mHJ4-AgaKn2Fmh2BOhustujXNyeeWi6KXg3oXtQzqi6BL4HMEK7iWvT21SPXOEJZlMg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-16 Thread Matthew McCormack
 No reason to apologize. It's a timely and very interesting topic 
that provides a glimpse into the application of statistics in forensics. 
I had never heard of Benford's Law before and I think it is really 
fascinating. One of those very counter intuitive rules that show up in 
statistics and probability; like the Monty Hall problem. Why in the 
world does Benford's Law work ?  I have been wondering if it could in 
any way be applied to biological data analysis. (Also, I discovered 
Stand-up-maths !).


   Often things are not as easy to figure out as we may first estimate. 
I think you would have to start with how you would envision a fraud to 
be committed and then figure out if there is a statistical analysis that 
could detect it, or develop an anlalysis. For example, if a voting 
machine were weighting votes and giving 8/10ths of a vote to 'yes' and 
10/10ths vote to a 'no'. Is there some statistical analysis that could 
detect this ?  I, Or if someone dumped a couple of thousand fraudulent 
ballots in a vote counting center, is there some statistical analysis 
that could detect this ?  Who knows, maybe a whole new field waiting to 
be explored. A oncee-in-a-while dive into a practical application of 
statistics that has current interest can be fun and enlightening for 
those interested.


Matthew

On 11/16/20 9:01 PM, Abby Spurdle wrote:

 External Email - Use Caution

I've come to the conclusion this whole thing was a waste of time.
This is after evaluating much of the relevant information.

The main problem is a large number of red herrings (some in the data,
some in the context), leading pointless data analysis and pointless
data collection.
It's unlikely that sophisticated software, or sophisticated
statistical modelling tools will make any difference.
Although pretty plots, and pretty web-graphics are achievable.

Sorry list, for encouraging this discussion...


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-16 Thread Abby Spurdle
I've come to the conclusion this whole thing was a waste of time.
This is after evaluating much of the relevant information.

The main problem is a large number of red herrings (some in the data,
some in the context), leading pointless data analysis and pointless
data collection.
It's unlikely that sophisticated software, or sophisticated
statistical modelling tools will make any difference.
Although pretty plots, and pretty web-graphics are achievable.

Sorry list, for encouraging this discussion...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-15 Thread Matthew McCormack
  I really like this guy's video as well. (He also has another nice 
video critiquing a statistical analysis of vote results from Kent 
county, Michigan that was presented by a Massachusetts Senate candidate, 
who has some impressive academic credentials. )


  And continuing in this same vein of the complexities of statistical 
analysis by intelligent people here is a video by Mark Nigrini using 
Benfords analysis on Maricopa County vote results.


https://www.youtube.com/watch?v=FrJui5d7BrI_channel=MarkNigrini

    If you search for Mark Nigrini on Amazon you will see that he has 
written a major text on Forensic Analysis, specifically forensic 
accounting investigations, that is now in its second edition as well as 
an additional two books on analysis with Benford's Law for accounting, 
auditing, and fraud detection (He plugs the text in the last part of the 
video). All four books have 4-5 star reviews with 2-48 reviewers. From 
the tiny amount of reading I have done on Benford's Law, it seems that 
Nigirini is a leading figure in the use of Benford's Law. In the video 
he shows that voting results for both Trump and Biden from Maricopa 
county AZ both agree with Benfords Law. However, he uses the last digit 
and not the first. A word of caution before you click on that link: he 
uses Excel !


Matthew

On 11/13/20 9:59 PM, Rolf Turner wrote:

 External Email - Use Caution

On Thu, 12 Nov 2020 01:23:06 +0100
Martin Møller Skarbiniks Pedersen  wrote:


Please watch this video if you wrongly believe that Benford's law
easily can be applied to elections results.

https://secure-web.cisco.com/1nXQfJ050onRLM1UOwgj-z0o0L3Hj6hd0rCZ7zMpqnBfCDuZcCkxAJZnj7o7Z8ZAUVxYBTf5FBjL2Y-Ca8T_ecO-N54S0KhgRtLoVDgxiEKX9N7eqzuxO0k0HloVcc2lXrXFNAiansI8zHgyUS4gTdKtRsJCHttTn5bwmV8J7d0_6iqrjee_toWiGnTsDSFaKVkev7tKKV3ERLFwzTPtNf2Rm99EBbdA75FvsXfBk3WXuVop4GZbN3ZGkd2SssFJaw9AgTHmM1k3C2bnB_STO_w/https%3A%2F%2Fyoutu.be%2Fetx0k1nLn78

Just watched this video and found it to be delightfully enlightening
and entertaining.  (Thank you Martin for posting the link.)

However a question springs to mind:  why is it the case that Trump's
vote counts in Chicago *do* seem to follow Benford's law (at least
roughly) when, as is apparently to be expected, Biden's don't?

Has anyone any explanation for this?  Any ideas?

cheers,

Rolf Turner



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-15 Thread Abby Spurdle
I've updated the dataset.
(Which now includes turnout and population estimates).

Also, I've found some anomalous features in the data.
(Namely, more "straight lines" than what I would intuitively expect).

The dataset/description are on my website.
(Links at bottom).


#set PATH as required

data <- read.csv (PATH, header=TRUE)
head (data, 3)

I took a subset, where the Dem/Rep margins have reversed between the
2016 and 2020 elections.

rev.results <- (sign (data$RMARGIN_2016) + sign (data$RMARGIN_2020) == 0)
data2 <- data [data$SUBSV1 != 1 & rev.results,]
sc <- paste (data2$STATE, data2$EQCOUNTY, sep=": ")
head (data2, 3)

Then created two plots, attached.
(1) Republican margin vs voter turnout.
(2) Republican margin vs log (number of votes).

In both cases, there are near-straight lines.
Re-iterating, more than what I would intuitively expect.

library (probhat)

plot1 <- function ()
{   x <- with (data2, cbind (x1=RMARGIN_2020, x2=TURNOUT_2020) )
plot (pdfmv.cks (x, smoothness = c (1, 1) ), contours=FALSE,
hcv=TRUE, n=80,
xlim = c (-2.5, 10), ylim = c (40, 52.5),
main="US Counties\n(with reversed results, over 2016/2020
elections)",
xlab="Republican Margin, 2020", ylab="Voter Turnout, 2020")
points (x, pch=16, col="#00")
abline (v=0, h=50, lty=2)

I1 <- (sc == "Colorado: Alamosa" | sc == "Georgia: Burke" | sc
== "Ohio: Lorain")
I2 <- (sc == "South Carolina: Clarendon" | sc == "Ohio: Mahoning")
sc [! (I1 | I2)] <- ""

k <- lm (TURNOUT_2020 ~ RMARGIN_2020, data = data2 [I1,])$coef
abline (a = k [1], b = k [2])

points (x [I1 | I2,], col="white")
text (x [,1] + 0.2, x [,2], sc, adj = c (0, 0.5) )
}

plot2 <- function ()
{   x <- with (data2, cbind (x1=RMARGIN_2020, x2 = log (NVOTES_2020) ) )
plot (pdfmv.cks (x, smoothness = c (1, 1) ), contours=FALSE,
hcv=TRUE, n=80,
xlim = c (-2.5, 35),
main="US Counties\n(with reversed results, over 2016/2020
elections)",
xlab="Republican Margin, 2020", ylab="log (Number of Votes), 2020")
points (x, pch=16, col="#00")
abline (v=0, lty=2)

sc <- paste (data2$STATE, data2$EQCOUNTY, sep=": ")
I1 <- (sc == "Texas: Kenedy")
I2 <- (sc == "Texas: Reeves" | sc == "New York: Rockland")

k <- lm (log (NVOTES_2020) ~ RMARGIN_2020, data = data2 [I1 | I2,])$coef
abline (a = k [1], b = k [2])

points (x [I1 | I2,], col="white")
text (x [I1, 1] - 0.5, x [I1, 2], sc [I1], adj = c (1, 0.5) )
text (x [I2, 1] + 0.5, x [I2, 2], sc [I2], adj = c (0, 0.5) )
}

plot1 ()
plot2 ()

https://sites.google.com/site/spurdlea/us_election_2020
https://sites.google.com/site/spurdlea/exts/election_results_2.txt


On Sun, Nov 15, 2020 at 8:51 AM Rolf Turner  wrote:
>
>
> On Fri, 13 Nov 2020 19:02:19 -0800
> Jeff Newmiller  wrote:
>
> > It was explained in the video... his counts were so small that they
> > spanned the 1-9 and 10-99 ranges.
>
> Sorry, missed that.  I'll have to watch the video again.
>
> Thanks.
>
> cheers,
>
> Rolf
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-14 Thread Rolf Turner


On Fri, 13 Nov 2020 19:02:19 -0800
Jeff Newmiller  wrote:

> It was explained in the video... his counts were so small that they
> spanned the 1-9 and 10-99 ranges.

Sorry, missed that.  I'll have to watch the video again.

Thanks.

cheers,

Rolf

> 
> On November 13, 2020 6:59:49 PM PST, Rolf Turner
>  wrote:
> >
> >On Thu, 12 Nov 2020 01:23:06 +0100
> >Martin Møller Skarbiniks Pedersen  wrote:
> >
> >> Please watch this video if you wrongly believe that Benford's law
> >> easily can be applied to elections results.
> >> 
> >> https://youtu.be/etx0k1nLn78
> >
> >Just watched this video and found it to be delightfully enlightening
> >and entertaining.  (Thank you Martin for posting the link.)
> >
> >However a question springs to mind:  why is it the case that Trump's
> >vote counts in Chicago *do* seem to follow Benford's law (at least
> >roughly) when, as is apparently to be expected, Biden's don't?
> >
> >Has anyone any explanation for this?  Any ideas?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-13 Thread Jeff Newmiller
It was explained in the video... his counts were so small that they spanned the 
1-9 and 10-99 ranges.

On November 13, 2020 6:59:49 PM PST, Rolf Turner  
wrote:
>
>On Thu, 12 Nov 2020 01:23:06 +0100
>Martin Møller Skarbiniks Pedersen  wrote:
>
>> Please watch this video if you wrongly believe that Benford's law
>> easily can be applied to elections results.
>> 
>> https://youtu.be/etx0k1nLn78
>
>Just watched this video and found it to be delightfully enlightening
>and entertaining.  (Thank you Martin for posting the link.)
>
>However a question springs to mind:  why is it the case that Trump's
>vote counts in Chicago *do* seem to follow Benford's law (at least
>roughly) when, as is apparently to be expected, Biden's don't?
>
>Has anyone any explanation for this?  Any ideas?
>
>cheers,
>
>Rolf Turner

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-13 Thread Rolf Turner


On Thu, 12 Nov 2020 01:23:06 +0100
Martin Møller Skarbiniks Pedersen  wrote:

> Please watch this video if you wrongly believe that Benford's law
> easily can be applied to elections results.
> 
> https://youtu.be/etx0k1nLn78

Just watched this video and found it to be delightfully enlightening
and entertaining.  (Thank you Martin for posting the link.)

However a question springs to mind:  why is it the case that Trump's
vote counts in Chicago *do* seem to follow Benford's law (at least
roughly) when, as is apparently to be expected, Biden's don't?

Has anyone any explanation for this?  Any ideas?

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-11 Thread Martin Møller Skarbiniks Pedersen
Please watch this video if you wrongly believe that Benford's law easily
can be applied to elections results.

https://youtu.be/etx0k1nLn78



On Sun, Nov 1, 2020, 21:17 Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

> Hello:
>
>
>What can you tell me about plans to analyze data from this year's
> general election, especially to detect possible fraud?
>
>
>I might be able to help with such an effort.  I have NOT done
> much with election data, but I have developed tools for data analysis,
> including web scraping, and included them in R packages available on the
> Comprehensive R Archive Network (CRAN) and GitHub.[1]
>
>
>Penny Abernathy, who holds the Knight Chair in Journalism and
> Digital Media Economics at UNC-Chapel Hill, told me that the electoral
> fraud that disqualified the official winner from NC-09 to the US House
> in 2018 was detected by a college prof, who accessed the data two weeks
> after the election.[2]
>
>
>Spencer Graves
>
>
> [1]
> https://github.com/sbgraves237
>
>
> [2]
> https://en.wikiversity.org/wiki/Local_Journalism_Sustainability_Act
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-09 Thread Bert Gunter
For those who are interested:

Very nice examples of (static) statistical graphics on election results can
be found here:
https://www.nytimes.com/interactive/2020/11/09/us/arizona-election-battleground-state-counties.html?action=click=Spotlight=Homepage

Takes multidisciplinary teams and lots of hard work to produce, I would
guess.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 9, 2020 at 4:46 PM Abby Spurdle  wrote:

> RESENT
> INITIAL EMAIL, TOO BIG
> ATTACHMENTS REPLACED WITH LINKS
>
> I created a dataset, linked.
> Had to manually copy and paste from the NY Times website.
>
> > head (data, 3)
> STATE   EQCOUNTY RMARGIN_2016 RMARGIN_2020 NVOTERS_2020
> SUB_STATEVAL_2016
> 1 Alabama Mobile 13.3   12   181783
>  0
> 2 Alabama Dallas-37.5  -3817861
>  0
> 3 Alabama Tuscaloosa 19.3   1589760
>  0
>
> > tail (data, 3)
>STATE EQCOUNTY RMARGIN_2016 RMARGIN_2020 NVOTERS_2020
> SUB_STATEVAL_2016
> 4248 WyomingUinta 58.5   63 9400
>0
> 4249 Wyoming Sublette 63.0   62 4970
>0
> 4250 Wyoming  Johnson 64.3   61 4914
>0
>
> > head (data [data [,1] == "Alaska",], 3)
> STATE EQCOUNTY RMARGIN_2016 RMARGIN_2020 NVOTERS_2020 SUB_STATEVAL_2016
> 68 AlaskaED 40 14.7-24.0   82 1
> 69 AlaskaED 37 14.7 -1.7  173 1
> 70 AlaskaED 38 14.7 -0.4  249 1
>
> EQCounty, is the County or Equivalent.
> Several states, D.C., Alaska, Connecticut, Maine, Massachusetts, Rhode
> Island and Vermont are different.
> RMargin(s) are the republican percentages minus the democrate
> percentages, as 2 or 3 digit numbers between 0 and 100.
> The last column is 0s or 1s, with 1s for Alaska, Connecticut, Maine,
> Massachusetts, Rhode Island and Vermont, where I didn't have the 2016
> margins, so the 2016 margins have been replaced with state-levels
> values.
>
> Then I scaled the margins, based on the number of voters.
> i.e.
> wx2016 <- 1000 * x2016 * nv / max.nv
> (Where x2016 is equal to RMARGIN_2020, and nv is equal to NVOTERS_2020).
>
> There may be a much better way.
>
> And came up the following plots (linked) and output (follows):
>
> ---INPUT---
> PATH = ""
> data = read.csv (PATH, header=TRUE)
>
> #raw data
> x2016 <- as.numeric (data$RMARGIN_2016)
> x2020 <- as.numeric (data$RMARGIN_2020)
> nv <- as.numeric (data$NVOTERS_2020)
> subs <- as.logical (data$SUB_STATEVAL)
>
> #computed data
> max.nv <- max (nv)
> wx2016 <- 1000 * x2016 * nv / max.nv
> wx2020 <- 1000 * x2020 * nv / max.nv
> diffs <- wx2020 - wx2016
>
> OFFSET <- 500
> p0 <- par (mfrow = c (2, 2) )
>
> #plot 1
> plot (wx2016, wx2020,
> main="All Votes\n(By County, or Equivalent)",
> xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin,
> 2020")
> abline (h=0, v=0, lty=2)
>
> #plot 2
> OFFSET <- 200
> plot (wx2016, wx2020,
> xlim = c (-OFFSET, OFFSET), ylim = c (-OFFSET, OFFSET),
> main="All Votes\n(Zoomed In)",
> xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin,
> 2020")
> abline (h=0, v=0, lty=2)
>
> OFFSET <- 1000
>
> #plot 3
> J1 <- order (diffs, decreasing=TRUE)[1:400]
> plot (wx2016 [J1], wx2020 [J1],
> xlim = c (-OFFSET, OFFSET), ylim = c (-OFFSET, OFFSET),
> main="400 Biggest Shifts Towards Republican",
> xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin,
> 2020")
> abline (h=0, v=0, lty=2)
> abline (a=0, b=1, lty=2)
>
> #plot 4
> J2 <- order (diffs)[1:400]
> plot (wx2016 [J2], wx2020 [J2],
> xlim = c (-OFFSET, OFFSET), ylim = c (-OFFSET, OFFSET),
> main="400 Biggest Shifts Towards Democrat",
> xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin,
> 2020")
> abline (h=0, v=0, lty=2)
> abline (a=0, b=1, lty=2)
>
> par (p0)
>
> #most democrat
> I = order (wx2020)[1:30]
> cbind (data [I,], scaled.dem.vote = -1 * wx2020 [I])
>
> #biggest move toward democrat
> head (cbind (data [J2,], diffs = diffs [J2]), 30)
>
> ---OUTPUT---
> #most democrat
> > cbind (data [I,], scaled.dem.vote = -1 * wx2020 [I])
>   STATEEQCOUNTY RMARGIN_2016 RMARGIN_2020
> NVOTERS_2020 SUB_STATEVAL_2016 scaled.dem.vote
> 229  California Los Angeles-49.3  -44
> 3674850 0   44000.000
> 769IllinoisCook-53.1  -47
> 1897721 0   24271.164
> 4073 WashingtonKing-48.8  -53
> 1188152 0   17135.953
> 3092   PennsylvaniaPhiladelphia-67.0  -63
> 701647 0   12028.725
> 215  California Alameda-63.5  -64
> 625710 0   10897.163
> 227  

Re: [R] analyzing results from Tuesday's US elections

2020-11-09 Thread Abby Spurdle
RESENT
INITIAL EMAIL, TOO BIG
ATTACHMENTS REPLACED WITH LINKS

I created a dataset, linked.
Had to manually copy and paste from the NY Times website.

> head (data, 3)
STATE   EQCOUNTY RMARGIN_2016 RMARGIN_2020 NVOTERS_2020 SUB_STATEVAL_2016
1 Alabama Mobile 13.3   12   181783 0
2 Alabama Dallas-37.5  -3817861 0
3 Alabama Tuscaloosa 19.3   1589760 0

> tail (data, 3)
   STATE EQCOUNTY RMARGIN_2016 RMARGIN_2020 NVOTERS_2020 SUB_STATEVAL_2016
4248 WyomingUinta 58.5   63 9400 0
4249 Wyoming Sublette 63.0   62 4970 0
4250 Wyoming  Johnson 64.3   61 4914 0

> head (data [data [,1] == "Alaska",], 3)
STATE EQCOUNTY RMARGIN_2016 RMARGIN_2020 NVOTERS_2020 SUB_STATEVAL_2016
68 AlaskaED 40 14.7-24.0   82 1
69 AlaskaED 37 14.7 -1.7  173 1
70 AlaskaED 38 14.7 -0.4  249 1

EQCounty, is the County or Equivalent.
Several states, D.C., Alaska, Connecticut, Maine, Massachusetts, Rhode
Island and Vermont are different.
RMargin(s) are the republican percentages minus the democrate
percentages, as 2 or 3 digit numbers between 0 and 100.
The last column is 0s or 1s, with 1s for Alaska, Connecticut, Maine,
Massachusetts, Rhode Island and Vermont, where I didn't have the 2016
margins, so the 2016 margins have been replaced with state-levels
values.

Then I scaled the margins, based on the number of voters.
i.e.
wx2016 <- 1000 * x2016 * nv / max.nv
(Where x2016 is equal to RMARGIN_2020, and nv is equal to NVOTERS_2020).

There may be a much better way.

And came up the following plots (linked) and output (follows):

---INPUT---
PATH = ""
data = read.csv (PATH, header=TRUE)

#raw data
x2016 <- as.numeric (data$RMARGIN_2016)
x2020 <- as.numeric (data$RMARGIN_2020)
nv <- as.numeric (data$NVOTERS_2020)
subs <- as.logical (data$SUB_STATEVAL)

#computed data
max.nv <- max (nv)
wx2016 <- 1000 * x2016 * nv / max.nv
wx2020 <- 1000 * x2020 * nv / max.nv
diffs <- wx2020 - wx2016

OFFSET <- 500
p0 <- par (mfrow = c (2, 2) )

#plot 1
plot (wx2016, wx2020,
main="All Votes\n(By County, or Equivalent)",
xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin, 2020")
abline (h=0, v=0, lty=2)

#plot 2
OFFSET <- 200
plot (wx2016, wx2020,
xlim = c (-OFFSET, OFFSET), ylim = c (-OFFSET, OFFSET),
main="All Votes\n(Zoomed In)",
xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin, 2020")
abline (h=0, v=0, lty=2)

OFFSET <- 1000

#plot 3
J1 <- order (diffs, decreasing=TRUE)[1:400]
plot (wx2016 [J1], wx2020 [J1],
xlim = c (-OFFSET, OFFSET), ylim = c (-OFFSET, OFFSET),
main="400 Biggest Shifts Towards Republican",
xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin, 2020")
abline (h=0, v=0, lty=2)
abline (a=0, b=1, lty=2)

#plot 4
J2 <- order (diffs)[1:400]
plot (wx2016 [J2], wx2020 [J2],
xlim = c (-OFFSET, OFFSET), ylim = c (-OFFSET, OFFSET),
main="400 Biggest Shifts Towards Democrat",
xlab="Scaled Republican Margin, 2016", ylab="Scaled Republican Margin, 2020")
abline (h=0, v=0, lty=2)
abline (a=0, b=1, lty=2)

par (p0)

#most democrat
I = order (wx2020)[1:30]
cbind (data [I,], scaled.dem.vote = -1 * wx2020 [I])

#biggest move toward democrat
head (cbind (data [J2,], diffs = diffs [J2]), 30)

---OUTPUT---
#most democrat
> cbind (data [I,], scaled.dem.vote = -1 * wx2020 [I])
  STATEEQCOUNTY RMARGIN_2016 RMARGIN_2020
NVOTERS_2020 SUB_STATEVAL_2016 scaled.dem.vote
229  California Los Angeles-49.3  -44
3674850 0   44000.000
769IllinoisCook-53.1  -47
1897721 0   24271.164
4073 WashingtonKing-48.8  -53
1188152 0   17135.953
3092   PennsylvaniaPhiladelphia-67.0  -63
701647 0   12028.725
215  California Alameda-63.5  -64
625710 0   10897.163
227  California Santa Clara-52.1  -49
726186 09682.875
238  California   San Diego-19.7  -23
1546144 09676.942
2683   New YorkBrooklyn-62.0  -49
693937 09252.871
2162  MinnesotaHennepin-34.9  -43
753716 08819.350
2074   Michigan   Wayne-37.1  -37
863382 08692.908
2673   New York   Manhattan-76.9  -70
446861 08511.986
221  California   San Francisco-75.2  -73
413642 08216.898
3495  Texas   

Re: [R] analyzing results from Tuesday's US elections

2020-11-09 Thread Marc Roos
 
Publish the results/graphs please, like to see what your are doing.



-Original Message-
From: Matthew McCormack [mailto:mccorm...@molbio.mgh.harvard.edu] 
Sent: Monday, November 09, 2020 6:14 PM
To: r-help@r-project.org
Subject: Re: [R] analyzing results from Tuesday's US elections


Benford Analysis for Data Validation and Forensic Analytics

Provides tools that make it easier to validate data using Benford's Law.

https://www.rdocumentation.org/packages/benford.analysis/versions/0.1.5


Matthew

On 11/9/20 9:23 AM, Alexandra Thorn wrote:
>  External Email - Use Caution
>
> This thread strikes me as pretty far off-topic for a forum dedicated 
> to software support on R.
>
> https://secure-web.cisco.com/15MzwKoUQfDzeGBDx9gweXKgiYtAPv1UlnW2dg9Cu
> DtSNWgxy3ffTf_uuPizbjoJnovoOD6lrPDluOgGvIUTEF1d_rOTfaF3nUKLvNiZa3fHZ_I
> HD-SjKotr4lurHjmNPlSrljLipPsrDk2aoo63-GLwvaw64By_MnLST7lt4FgA2pYXgE3x1
> 5Xn-kRZ85m29f0BxhHJMVfilvVUoUEBPrw/https%3A%2F%2Fwww.r-project.org%2Fm
> ail.html%23instructions "The main R mailing list, for discussion 
> about problems and solutions using R, announcements (not covered by 
> R-announce or R-packages, see above), about the availability 
of 
> new functionality for R and documentation of R, comparison and 
> compatibility with S-plus, and for the posting of nice examples and 
> benchmarks. Do read the posting guide before sending anything!"
>
> https://secure-web.cisco.com/1V05G8mWSPHU-YvLbL-UQMy49XX7n7-EivE-gTOlh
> 2nZ3P0oxp6DGUUZQ_Q5VIkE3J0qmhrrSXxJaqZjv-Tllghba8lQrbkazuAHTcltsfo3I-C
> -SMqhb-CDdFbeEgIsr7py_gKW9BqumTZacywhHVnzhGGR2s1A-2akqQLYSYpYeX5EcVJAY
> vX1KPCs9kJbOEveOr5yYjetokaZpLTzdMA/https%3A%2F%2Fwww.r-project.org%2Fp
> osting-guide.html "The R mailing lists are primarily intended for 
> questions and discussion about the R software. However, questions 
> about statistical methodology are sometimes posted. If the question is 

> well-asked and of interest to someone on the list, it may elicit an 
> informative up-to-date answer. See also the Usenet groups 
> sci.stat.consult (applied statistics and consulting) and sci.stat.math 

> (mathematical stat and probability)."
>
> On Mon, 9 Nov 2020 00:53:46 -0500
> Matthew McCormack  wrote:
>
>> You can try here: 
>> https://secure-web.cisco.com/17WRivozTB0Frts23cTlTBd3SYWzVXQsLa_jDRN8
>> SldAl35F0SYXRMZczzIXrQFTzbfRV4YfPOVhMSwopcdTU9Sva396s3bX3-KM7-51KjSnY
>> 0aXxlADYaHdvs4y4YXrUfk1GT2801rVL26MCEEn2E1azdQ8ECllu1roS_Z8MIj8d6kyCt
>> UYVdOYN1i9DuWBSXPlEi-iOtrQsBp6ELRXNFw/https%3A%2F%2Fdecisiondeskhq.co
>> m%2F
>>
>> I think they have what you are looking for. From their website:
>>
>> "Create a FREE account to access up to the minute election results 
>> and insights on all U.S. Federal elections. Decision Desk HQ & 
>> Øptimus provide live election night coverage, race-specific results 
>> including county-level returns, and exclusive race probabilities for 
>> key battleground races."
>>
>>      Also, this article provides a little, emphasis on little, of 
>> statistical analysis of election results, but it may be a place to 
>> start.
>>
>> https://secure-web.cisco.com/1JA34S9tw27K78g7scwo2aGe4lPpV7HThBE81hhJ
>> jb4Ban7fxqbnOZqx7HxfcyqKrcB5BX7oJFHhMPumrxjm6aQJ0trW1Jgk0h9s2mNhZg4T_
>> gTUls8y4l0KZ-AstUtw0eC0TtR9mHblU7KWid-7OO4mg0TfsxWyNpcqkA8MBuGftOEgUF
>> 7WtakShYgmCNYJkEfQJHK5_vjwK0taJeUheVw/https%3A%2F%2Fwww.theepochtimes
>> .com%2Fstatistical-anomalies-in-biden-votes-analyses-indicate_3570518
>> .html%3Futm_source%3Dnewsnoe%26utm_medium%3Demail%26utm_campaign%3Dbr
>> eaking-2020-11-08-5
>>
>> Matthew
>>
>> On 11/8/20 11:25 PM, Bert Gunter wrote:
>>>   External Email - Use Caution
>>>
>>> NYT  had interactive maps that reported  votes by county. So try 
>>> contacting them.
>>>
>>>
>>> Bert
>>>
>>> On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle 
>>> wrote:
>>>>> such a repository already exists -- the NY Times, AP, CNN, etc.
>>>>> etc.
>>>> already have interactive web pages that did this
>>>>
>>>> I've been looking for presidential election results, by 
>>>> ***county***. I've found historic results, including results for 
>>>> 2016.
>>>>
>>>> However, I can't find such a dataset, for 2020.
>>>> (Even though this seems like an obvious thing to publish).
>>>>
>>>> I suspect that the NY Times has the data, but I haven't been able 
>>>> to work where the data is on their website, or how to access it.
>>>>
>>>> More ***specifi

Re: [R] analyzing results from Tuesday's US elections

2020-11-09 Thread Matthew McCormack


Benford Analysis for Data Validation and Forensic Analytics

Provides tools that make it easier to validate data using Benford's Law.

https://www.rdocumentation.org/packages/benford.analysis/versions/0.1.5


Matthew

On 11/9/20 9:23 AM, Alexandra Thorn wrote:
>  External Email - Use Caution
>
> This thread strikes me as pretty far off-topic for a forum dedicated to
> software support on R.
>
> https://secure-web.cisco.com/15MzwKoUQfDzeGBDx9gweXKgiYtAPv1UlnW2dg9CuDtSNWgxy3ffTf_uuPizbjoJnovoOD6lrPDluOgGvIUTEF1d_rOTfaF3nUKLvNiZa3fHZ_IHD-SjKotr4lurHjmNPlSrljLipPsrDk2aoo63-GLwvaw64By_MnLST7lt4FgA2pYXgE3x15Xn-kRZ85m29f0BxhHJMVfilvVUoUEBPrw/https%3A%2F%2Fwww.r-project.org%2Fmail.html%23instructions
> "The ‘main’ R mailing list, for discussion about problems and solutions
> using R, announcements (not covered by ‘R-announce’ or ‘R-packages’,
> see above), about the availability of new functionality for R and
> documentation of R, comparison and compatibility with S-plus, and for
> the posting of nice examples and benchmarks. Do read the posting guide
> before sending anything!"
>
> https://secure-web.cisco.com/1V05G8mWSPHU-YvLbL-UQMy49XX7n7-EivE-gTOlh2nZ3P0oxp6DGUUZQ_Q5VIkE3J0qmhrrSXxJaqZjv-Tllghba8lQrbkazuAHTcltsfo3I-C-SMqhb-CDdFbeEgIsr7py_gKW9BqumTZacywhHVnzhGGR2s1A-2akqQLYSYpYeX5EcVJAYvX1KPCs9kJbOEveOr5yYjetokaZpLTzdMA/https%3A%2F%2Fwww.r-project.org%2Fposting-guide.html
> "The R mailing lists are primarily intended for questions and
> discussion about the R software. However, questions about statistical
> methodology are sometimes posted. If the question is well-asked and of
> interest to someone on the list, it may elicit an informative
> up-to-date answer. See also the Usenet groups sci.stat.consult (applied
> statistics and consulting) and sci.stat.math (mathematical stat and
> probability)."
>
> On Mon, 9 Nov 2020 00:53:46 -0500
> Matthew McCormack  wrote:
>
>> You can try here: 
>> https://secure-web.cisco.com/17WRivozTB0Frts23cTlTBd3SYWzVXQsLa_jDRN8SldAl35F0SYXRMZczzIXrQFTzbfRV4YfPOVhMSwopcdTU9Sva396s3bX3-KM7-51KjSnY0aXxlADYaHdvs4y4YXrUfk1GT2801rVL26MCEEn2E1azdQ8ECllu1roS_Z8MIj8d6kyCtUYVdOYN1i9DuWBSXPlEi-iOtrQsBp6ELRXNFw/https%3A%2F%2Fdecisiondeskhq.com%2F
>>
>> I think they have what you are looking for. From their website:
>>
>> "Create a FREE account to access up to the minute election results
>> and insights on all U.S. Federal elections. Decision Desk HQ &
>> Øptimus provide live election night coverage, race-specific results
>> including county-level returns, and exclusive race probabilities for
>> key battleground races."
>>
>>      Also, this article provides a little, emphasis on little, of
>> statistical analysis of election results, but it may be a place to
>> start.
>>
>> https://secure-web.cisco.com/1JA34S9tw27K78g7scwo2aGe4lPpV7HThBE81hhJjb4Ban7fxqbnOZqx7HxfcyqKrcB5BX7oJFHhMPumrxjm6aQJ0trW1Jgk0h9s2mNhZg4T_gTUls8y4l0KZ-AstUtw0eC0TtR9mHblU7KWid-7OO4mg0TfsxWyNpcqkA8MBuGftOEgUF7WtakShYgmCNYJkEfQJHK5_vjwK0taJeUheVw/https%3A%2F%2Fwww.theepochtimes.com%2Fstatistical-anomalies-in-biden-votes-analyses-indicate_3570518.html%3Futm_source%3Dnewsnoe%26utm_medium%3Demail%26utm_campaign%3Dbreaking-2020-11-08-5
>>
>> Matthew
>>
>> On 11/8/20 11:25 PM, Bert Gunter wrote:
>>>   External Email - Use Caution
>>>
>>> NYT  had interactive maps that reported  votes by county. So try
>>> contacting them.
>>>
>>>
>>> Bert
>>>
>>> On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle 
>>> wrote:
> such a repository already exists -- the NY Times, AP, CNN, etc.
> etc.
 already have interactive web pages that did this

 I've been looking for presidential election results, by
 ***county***. I've found historic results, including results for
 2016.

 However, I can't find such a dataset, for 2020.
 (Even though this seems like an obvious thing to publish).

 I suspect that the NY Times has the data, but I haven't been able
 to work where the data is on their website, or how to access it.

 More ***specific*** suggestions would be appreciated...?
   
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcrovbnxo-DAgLEiREocQrn0yMbLc2A-gwR3CN9XurWkU21TUD1CLJ-3gpiCLKKe9BdHWdaeEA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
>>> PLEASE do read the posting guide
>>> http://secure-web.cisco.com/1ppZyk8SO6U25PKNDKtGQ-VIADLxXgKvnHc8QlV3cUMNPzLQvS8E0i9cg05EyzUyHnFjj2QWDjvAjyuduvE1P8Nr0TogQweiuBysM9a1rXjQn1EOaypHdqwa2_inODK1icu0Ff33AZDB00N4x-nYxZ2e16nArVuaMEddaLXBhtBYMn2LAcPYJ8s2wGN10heiFWywn-r8--Hw77GJx1hkTgg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>  [[alternative HTML 

Re: [R] analyzing results from Tuesday's US elections

2020-11-09 Thread Alexandra Thorn
This thread strikes me as pretty far off-topic for a forum dedicated to
software support on R.

https://www.r-project.org/mail.html#instructions
"The ‘main’ R mailing list, for discussion about problems and solutions
using R, announcements (not covered by ‘R-announce’ or ‘R-packages’,
see above), about the availability of new functionality for R and
documentation of R, comparison and compatibility with S-plus, and for
the posting of nice examples and benchmarks. Do read the posting guide
before sending anything!"

https://www.r-project.org/posting-guide.html
"The R mailing lists are primarily intended for questions and
discussion about the R software. However, questions about statistical
methodology are sometimes posted. If the question is well-asked and of
interest to someone on the list, it may elicit an informative
up-to-date answer. See also the Usenet groups sci.stat.consult (applied
statistics and consulting) and sci.stat.math (mathematical stat and
probability)."

On Mon, 9 Nov 2020 00:53:46 -0500
Matthew McCormack  wrote:

> You can try here: https://decisiondeskhq.com/
> 
> I think they have what you are looking for. From their website:
> 
> "Create a FREE account to access up to the minute election results
> and insights on all U.S. Federal elections. Decision Desk HQ &
> Øptimus provide live election night coverage, race-specific results
> including county-level returns, and exclusive race probabilities for
> key battleground races."
> 
>     Also, this article provides a little, emphasis on little, of 
> statistical analysis of election results, but it may be a place to
> start.
> 
> https://www.theepochtimes.com/statistical-anomalies-in-biden-votes-analyses-indicate_3570518.html?utm_source=newsnoe_medium=email_campaign=breaking-2020-11-08-5
> 
> Matthew
> 
> On 11/8/20 11:25 PM, Bert Gunter wrote:
> >  External Email - Use Caution
> >
> > NYT  had interactive maps that reported  votes by county. So try
> > contacting them.
> >
> >
> > Bert
> >
> > On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle 
> > wrote: 
> >>> such a repository already exists -- the NY Times, AP, CNN, etc.
> >>> etc.  
> >> already have interactive web pages that did this
> >>
> >> I've been looking for presidential election results, by
> >> ***county***. I've found historic results, including results for
> >> 2016.
> >>
> >> However, I can't find such a dataset, for 2020.
> >> (Even though this seems like an obvious thing to publish).
> >>
> >> I suspect that the NY Times has the data, but I haven't been able
> >> to work where the data is on their website, or how to access it.
> >>
> >> More ***specific*** suggestions would be appreciated...?
> >>  
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcrovbnxo-DAgLEiREocQrn0yMbLc2A-gwR3CN9XurWkU21TUD1CLJ-3gpiCLKKe9BdHWdaeEA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
> > PLEASE do read the posting guide
> > http://secure-web.cisco.com/1ppZyk8SO6U25PKNDKtGQ-VIADLxXgKvnHc8QlV3cUMNPzLQvS8E0i9cg05EyzUyHnFjj2QWDjvAjyuduvE1P8Nr0TogQweiuBysM9a1rXjQn1EOaypHdqwa2_inODK1icu0Ff33AZDB00N4x-nYxZ2e16nArVuaMEddaLXBhtBYMn2LAcPYJ8s2wGN10heiFWywn-r8--Hw77GJx1hkTgg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
> > and provide commented, minimal, self-contained, reproducible code. 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-08 Thread Matthew McCormack
You can try here: https://decisiondeskhq.com/

I think they have what you are looking for. From their website:

"Create a FREE account to access up to the minute election results and 
insights on all U.S. Federal elections. Decision Desk HQ & Øptimus 
provide live election night coverage, race-specific results including 
county-level returns, and exclusive race probabilities for key 
battleground races."

    Also, this article provides a little, emphasis on little, of 
statistical analysis of election results, but it may be a place to start.

https://www.theepochtimes.com/statistical-anomalies-in-biden-votes-analyses-indicate_3570518.html?utm_source=newsnoe_medium=email_campaign=breaking-2020-11-08-5

Matthew

On 11/8/20 11:25 PM, Bert Gunter wrote:
>  External Email - Use Caution
>
> NYT  had interactive maps that reported  votes by county. So try contacting
> them.
>
>
> Bert
>
> On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle  wrote:
>
>>> such a repository already exists -- the NY Times, AP, CNN, etc. etc.
>> already have interactive web pages that did this
>>
>> I've been looking for presidential election results, by ***county***.
>> I've found historic results, including results for 2016.
>>
>> However, I can't find such a dataset, for 2020.
>> (Even though this seems like an obvious thing to publish).
>>
>> I suspect that the NY Times has the data, but I haven't been able to
>> work where the data is on their website, or how to access it.
>>
>> More ***specific*** suggestions would be appreciated...?
>>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcrovbnxo-DAgLEiREocQrn0yMbLc2A-gwR3CN9XurWkU21TUD1CLJ-3gpiCLKKe9BdHWdaeEA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
> PLEASE do read the posting guide 
> http://secure-web.cisco.com/1ppZyk8SO6U25PKNDKtGQ-VIADLxXgKvnHc8QlV3cUMNPzLQvS8E0i9cg05EyzUyHnFjj2QWDjvAjyuduvE1P8Nr0TogQweiuBysM9a1rXjQn1EOaypHdqwa2_inODK1icu0Ff33AZDB00N4x-nYxZ2e16nArVuaMEddaLXBhtBYMn2LAcPYJ8s2wGN10heiFWywn-r8--Hw77GJx1hkTgg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-08 Thread Bert Gunter
NYT  had interactive maps that reported  votes by county. So try contacting
them.


Bert

On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle  wrote:

> > such a repository already exists -- the NY Times, AP, CNN, etc. etc.
> already have interactive web pages that did this
>
> I've been looking for presidential election results, by ***county***.
> I've found historic results, including results for 2016.
>
> However, I can't find such a dataset, for 2020.
> (Even though this seems like an obvious thing to publish).
>
> I suspect that the NY Times has the data, but I haven't been able to
> work where the data is on their website, or how to access it.
>
> More ***specific*** suggestions would be appreciated...?
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-08 Thread Abby Spurdle
> such a repository already exists -- the NY Times, AP, CNN, etc. etc. already 
> have interactive web pages that did this

I've been looking for presidential election results, by ***county***.
I've found historic results, including results for 2016.

However, I can't find such a dataset, for 2020.
(Even though this seems like an obvious thing to publish).

I suspect that the NY Times has the data, but I haven't been able to
work where the data is on their website, or how to access it.

More ***specific*** suggestions would be appreciated...?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-08 Thread Bert Gunter
Unless I misunderstand, clearly such a repository already exists -- the NY
Times, AP, CNN, etc. etc. already have interactive web pages that did
this!. It doesn't seem to make any difference to Trump conspiracy theorists
and partisans, though.

Also, as usual, a web search (on "central repository of US election
results") brought up what seemed like many relevant hits of historical
data. You may wish to contact one of these sources for further ino.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Nov 8, 2020 at 12:25 AM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

>
>
> On 2020-11-07 23:39, Abby Spurdle wrote:
> >> What can you tell me about plans to analyze data from this year's
> >> general election, especially to detect possible fraud?
> >
> > I was wondering if there's any R packages with out-of-the-box
> > functions for this sort of thing.
> > Can you please let us know, if you find any.
> >
> >> I might be able to help with such an effort.  I have NOT done
> >> much with election data, but I have developed tools for data analysis,
> >> including web scraping, and included them in R packages available on the
> >> Comprehensive R Archive Network (CRAN) and GitHub.[1]
> >
> > Do you have a URL for detailed election results?
> > Or even better, a nice R-friendly CSV file...
> >
> > I recognize that the results aren't complete.
> > And that such a file may need to be updated later.
> > But that doesn't necessarily prevent modelling now.
>
>
>   I asked, because I don't know of any such.  With the
> increasingly
> vicious, widespread and systematic attacks on the integrity of elections
> in the US, I think it would be good to have a central database of
> election results with tools regularly scraping websites of local and
> state election authorities.  Whenever new data were posted, the software
> would update the central repository and send emails to anyone
> interested.  That could simplify data acquisition, because historical
> data could already be available there.  And it would be one standard
> format for the entire US and maybe the world.
>
>
>   This could be extremely valuable in exposing electoral fraud,
> thereby
> reducing its magnitude and effectiveness.  This is a global problem, but
> it seems to have gotten dramatically worse in the US in recent years.[2]
>
>
>   I'd like to join -- or organize -- a team of people working on
> this.
> If we can create the database and data analysis tools in a package like
> Ecfun on CRAN, I think we can interest college profs, especially those
> teaching statistics to political science students, who would love to
> involve their students in something like this.  They could access data
> real time in classes, analyze it using standard tools that we could
> develop, and involve their students in discussing what it means and what
> it doesn't.  They could discuss Bayesian sequential updating and quality
> control concepts using data that are real and relevant to the lives of
> their students.  It could help get students excited about both
> statistics and elections.
>
>
>   Such a project may already exist.  I know there are projects at
> some
> major universities that sound like they might support this.  However
> with the limited time I've invested in this so far, I didn't find any
> that seemed to provide easy access to such data and an easy way to join
> such a project.  Ballotpedia has such data but don't want help in
> analyzing it and asked for a few hundred dollars for data for one
> election cycle in Missouri, which is what I requested.  I can get that
> for free from the web site of the Missouri Secretary of State.
>
>
>   I thought I might next ask the Carter Center about this.
> However,
> but I'm totally consumed with other priorities right now.  I don't plan
> to do anything on this in the short term -- unless I can find
> collaborators.
>
>
>   If such a central database doesn't exist -- and maybe even if it
> does
> -- I thought it might be good to make all the data available in a
> standard format in Wikidata, which is a project of the Wikimedia
> Foundation, which is also the parent organization of Wikipedia.  Then I
> could help create software and documentation on how to scrape data from
> the web sites of different election organizations that have it and
> automatically update Wikidata while also sending emails to people who
> express interest in those election results.  Then we could create
> software for analyzing such data and make that available, e.g., on
> Wikiversity, which is another project of the Wikimedia Foundation --
> with the R code in Ecfun or some other CRAN package.
>
>
>   If we start now, I think we could have something mediocre in
> time for
> various local elections that occur next year with improvements 

Re: [R] analyzing results from Tuesday's US elections

2020-11-08 Thread Spencer Graves




On 2020-11-07 23:39, Abby Spurdle wrote:

What can you tell me about plans to analyze data from this year's
general election, especially to detect possible fraud?


I was wondering if there's any R packages with out-of-the-box
functions for this sort of thing.
Can you please let us know, if you find any.


I might be able to help with such an effort.  I have NOT done
much with election data, but I have developed tools for data analysis,
including web scraping, and included them in R packages available on the
Comprehensive R Archive Network (CRAN) and GitHub.[1]


Do you have a URL for detailed election results?
Or even better, a nice R-friendly CSV file...

I recognize that the results aren't complete.
And that such a file may need to be updated later.
But that doesn't necessarily prevent modelling now.



	  I asked, because I don't know of any such.  With the increasingly 
vicious, widespread and systematic attacks on the integrity of elections 
in the US, I think it would be good to have a central database of 
election results with tools regularly scraping websites of local and 
state election authorities.  Whenever new data were posted, the software 
would update the central repository and send emails to anyone 
interested.  That could simplify data acquisition, because historical 
data could already be available there.  And it would be one standard 
format for the entire US and maybe the world.



	  This could be extremely valuable in exposing electoral fraud, thereby 
reducing its magnitude and effectiveness.  This is a global problem, but 
it seems to have gotten dramatically worse in the US in recent years.[2]



	  I'd like to join -- or organize -- a team of people working on this. 
If we can create the database and data analysis tools in a package like 
Ecfun on CRAN, I think we can interest college profs, especially those 
teaching statistics to political science students, who would love to 
involve their students in something like this.  They could access data 
real time in classes, analyze it using standard tools that we could 
develop, and involve their students in discussing what it means and what 
it doesn't.  They could discuss Bayesian sequential updating and quality 
control concepts using data that are real and relevant to the lives of 
their students.  It could help get students excited about both 
statistics and elections.



	  Such a project may already exist.  I know there are projects at some 
major universities that sound like they might support this.  However 
with the limited time I've invested in this so far, I didn't find any 
that seemed to provide easy access to such data and an easy way to join 
such a project.  Ballotpedia has such data but don't want help in 
analyzing it and asked for a few hundred dollars for data for one 
election cycle in Missouri, which is what I requested.  I can get that 
for free from the web site of the Missouri Secretary of State.



	  I thought I might next ask the Carter Center about this.  However, 
but I'm totally consumed with other priorities right now.  I don't plan 
to do anything on this in the short term -- unless I can find 
collaborators.



	  If such a central database doesn't exist -- and maybe even if it does 
-- I thought it might be good to make all the data available in a 
standard format in Wikidata, which is a project of the Wikimedia 
Foundation, which is also the parent organization of Wikipedia.  Then I 
could help create software and documentation on how to scrape data from 
the web sites of different election organizations that have it and 
automatically update Wikidata while also sending emails to people who 
express interest in those election results.  Then we could create 
software for analyzing such data and make that available, e.g., on 
Wikiversity, which is another project of the Wikimedia Foundation -- 
with the R code in Ecfun or some other CRAN package.



	  If we start now, I think we could have something mediocre in time for 
various local elections that occur next year with improvements for the 
2022 US Congressional elections and something even better for the 2024 
US presidential elections.



  Thanks for asking.
  Spencer Graves


[1]
https://github.com/sbgraves237


[2]
https://en.wikiversity.org/wiki/Electoral_integrity_in_the_United_States

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing results from Tuesday's US elections

2020-11-07 Thread Abby Spurdle
> What can you tell me about plans to analyze data from this year's
> general election, especially to detect possible fraud?

I was wondering if there's any R packages with out-of-the-box
functions for this sort of thing.
Can you please let us know, if you find any.

> I might be able to help with such an effort.  I have NOT done
> much with election data, but I have developed tools for data analysis,
> including web scraping, and included them in R packages available on the
> Comprehensive R Archive Network (CRAN) and GitHub.[1]

Do you have a URL for detailed election results?
Or even better, a nice R-friendly CSV file...

I recognize that the results aren't complete.
And that such a file may need to be updated later.
But that doesn't necessarily prevent modelling now.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] analyzing results from Tuesday's US elections

2020-11-01 Thread Spencer Graves

Hello:


  What can you tell me about plans to analyze data from this year's 
general election, especially to detect possible fraud?



  I might be able to help with such an effort.  I have NOT done 
much with election data, but I have developed tools for data analysis, 
including web scraping, and included them in R packages available on the 
Comprehensive R Archive Network (CRAN) and GitHub.[1]



  Penny Abernathy, who holds the Knight Chair in Journalism and 
Digital Media Economics at UNC-Chapel Hill, told me that the electoral 
fraud that disqualified the official winner from NC-09 to the US House 
in 2018 was detected by a college prof, who accessed the data two weeks 
after the election.[2]



  Spencer Graves


[1]
https://github.com/sbgraves237


[2]
https://en.wikiversity.org/wiki/Local_Journalism_Sustainability_Act

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open file on R GUI results in spinning wheel and frozen R - Mac OS

2020-09-23 Thread Peter Dalgaard
...or try R-patched, which I'm told has the newer GUI.

-pd

> On 21 Sep 2020, at 21:43 , Berend Hasselman  wrote:
> 
> 
> 
>> On 21 Sep 2020, at 20:24, Gonçalo Ferraz  wrote:
>> 
>> Hello,
>> 
>> I’ve been using R-studio for a while and today I needed to try something 
>> directly on the R-GUI.
>> 
>> But when I try to open any *.R file I get a spinning wheel and R freezes. I 
>> can only shut it down with ‘force quit’.
>> 
>> I have deleted and re-installed R three times, each time trying to run a 
>> more thorough uninstall, but the problem persists.
>> 
>> I am using Mac OS Catalina 10.15.6 and the latest version of R ->  R 4.0.2 
>> GUI 1.72 Catalina build (7847)
>> 
>> Strangely, as this problem was happening on the R GUI, I was still able to 
>> open R scripts on RStudio. But now I uninstalled RStudio as well, in the 
>> latest attempt to start from scratch.
>> 
>> Is this problem familiar to anyone?
>> 
> 
> See this thread on the R-SIG-Mac list: 
> https://stat.ethz.ch/pipermail/r-sig-mac/2020-June/013575.html
> and her for a solution (sequel of above): 
> https://stat.ethz.ch/pipermail/r-sig-mac/2020-July/013641.html
> 
> Go to https://mac.r-project.org/ and get the latest revision of the R GUI 
> which is noe 
> https://mac.r-project.org/high-sierra/R-4.0-branch/R-GUI-7884-4.0-high-sierra-Release.dmg
> 
> I have revision 7849; if the above does not work I can mail you the dmg of 
> revision 7849.
> 
> Berend
> 
> 
> 
>> Thanks for any help,
>> 
>> Gonçalo
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open file on R GUI results in spinning wheel and frozen R - Mac OS

2020-09-21 Thread Berend Hasselman



> On 21 Sep 2020, at 20:24, Gonçalo Ferraz  wrote:
> 
> Hello,
> 
> I’ve been using R-studio for a while and today I needed to try something 
> directly on the R-GUI.
> 
> But when I try to open any *.R file I get a spinning wheel and R freezes. I 
> can only shut it down with ‘force quit’.
> 
> I have deleted and re-installed R three times, each time trying to run a more 
> thorough uninstall, but the problem persists.
> 
> I am using Mac OS Catalina 10.15.6 and the latest version of R ->  R 4.0.2 
> GUI 1.72 Catalina build (7847)
> 
> Strangely, as this problem was happening on the R GUI, I was still able to 
> open R scripts on RStudio. But now I uninstalled RStudio as well, in the 
> latest attempt to start from scratch.
> 
> Is this problem familiar to anyone?
> 

See this thread on the R-SIG-Mac list: 
https://stat.ethz.ch/pipermail/r-sig-mac/2020-June/013575.html
and her for a solution (sequel of above): 
https://stat.ethz.ch/pipermail/r-sig-mac/2020-July/013641.html

Go to https://mac.r-project.org/ and get the latest revision of the R GUI which 
is noe 
https://mac.r-project.org/high-sierra/R-4.0-branch/R-GUI-7884-4.0-high-sierra-Release.dmg

I have revision 7849; if the above does not work I can mail you the dmg of 
revision 7849.

Berend



> Thanks for any help,
> 
> Gonçalo
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open file on R GUI results in spinning wheel and frozen R - Mac OS

2020-09-21 Thread Bert Gunter
You might do better on r-sig-mac  for this.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 21, 2020 at 11:24 AM Gonçalo Ferraz  wrote:

> Hello,
>
> I’ve been using R-studio for a while and today I needed to try something
> directly on the R-GUI.
>
> But when I try to open any *.R file I get a spinning wheel and R freezes.
> I can only shut it down with ‘force quit’.
>
> I have deleted and re-installed R three times, each time trying to run a
> more thorough uninstall, but the problem persists.
>
> I am using Mac OS Catalina 10.15.6 and the latest version of R ->  R 4.0.2
> GUI 1.72 Catalina build (7847)
>
> Strangely, as this problem was happening on the R GUI, I was still able to
> open R scripts on RStudio. But now I uninstalled RStudio as well, in the
> latest attempt to start from scratch.
>
> Is this problem familiar to anyone?
>
> Thanks for any help,
>
> Gonçalo
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] open file on R GUI results in spinning wheel and frozen R - Mac OS

2020-09-21 Thread Gonçalo Ferraz
Hello,

I’ve been using R-studio for a while and today I needed to try something 
directly on the R-GUI.

But when I try to open any *.R file I get a spinning wheel and R freezes. I can 
only shut it down with ‘force quit’.

I have deleted and re-installed R three times, each time trying to run a more 
thorough uninstall, but the problem persists.

I am using Mac OS Catalina 10.15.6 and the latest version of R ->  R 4.0.2 GUI 
1.72 Catalina build (7847)

Strangely, as this problem was happening on the R GUI, I was still able to open 
R scripts on RStudio. But now I uninstalled RStudio as well, in the latest 
attempt to start from scratch.

Is this problem familiar to anyone?

Thanks for any help,

Gonçalo
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odd Results when generating predictions with nnet function

2020-09-02 Thread Paul Bernal
You are right Jeff, that was a mistake, I was focusing on the square root
and made the mistake of talking about taking the square root instead of
raising to the 2nd power.

This is the example I was following (
https://www.youtube.com/watch?v=SaQgA6V8UA4). Of course, I tried fitting
the nnet model to my own data, to see what kind of results I'd get (the
data that I used, I provided in the very first e-mail).

The question I was asking is why do I get a bunch of 1s for the
predictions, given that the expected results would have to be somewhere
close to the latest observations.

The code and the data from the example I was following is provided in the
youtube link above.

Paul




El mié., 2 sept. 2020 a las 10:01, Jeff Newmiller ()
escribió:

> Why would you expect raising y_pred to the 0.5 to "backtransform" a model
> sqrt(y)~x? Wouldn't you raise to the 2?
>
> Why would you "backtransform" x in such a model if it were never
> transformed in the first place? Dr Maechler did not suggest that.
>
> And why are you mentioning some random unspecified video on Youtube? That
> does not enlighten anyone here, apparently including you. Please reference
> package documentation, and/or reproduce the analysis discussed in that
> video to provide a contrasting (or supporting) point with the example you
> gave.
>
>
> On September 2, 2020 7:21:58 AM PDT, Paul Bernal 
> wrote:
> >Dear Dr. Martin and Dr. Peter,
> >
> >Hope you are doing well. Thank you for your kind feedback. I also tried
> >fitting the nnet using y ~ x, but the model kept on generating odd
> >predictions. If I understand correctly, from what Dr. Martin said, it
> >would
> >be a good idea to try modeling sqrt(y) ~ x and then backtransform
> >raising
> >both y and x to 0.5?
> >
> >I was looking at a video where the guy modeled count data without doing
> >any
> >kind of transformation and didn't get odd results, which is rather
> >extrange.
> >
> >Cheers,
> >
> >Paul
> >
> >
> >
> >El mié., 2 sept. 2020 a las 2:37, Martin Maechler (<
> >maech...@stat.math.ethz.ch>) escribió:
> >
> >> > peter dalgaard
> >> > on Wed, 2 Sep 2020 08:41:09 +0200 writes:
> >>
> >> > Generically, nnet(a$y ~ a$x, a ...) should be nnet(y ~ x,
> >> > data=a, ...) otherwise predict will go looking for a$x, no
> >> > matter what is in xnew.
> >>
> >> > But more importantly, nnet() is a _classifier_,
> >> > so the LHS should be a class, not a numeric variable.
> >>
> >> > -pd
> >>
> >> Well, nnet() can be used for both classification *and* regression,
> >> which is quite clear from the MASS book, but indeed, not from
> >> its help page, which indeed mentions one formula  'class ~ ...'
> >> and then only has classification examples.
> >>
> >> So, indeed, the  ?nnet  help page could improved.
> >>
> >> In his case, y are counts,  so  John Tukey's good old
> >> "first aid transformation" principle would suggest to model
> >>
> >> sqrt(y) ~ ..   in a *regression* model which nnet() can do.
> >>
> >> Martin Maechler
> >> ETH Zurich  and  R Core team
> >>
> >>
> >>
> >> >> On 1 Sep 2020, at 22:19 , Paul Bernal
> >> >>  wrote:
> >> >>
> >> >> Dear friends,
> >> >>
> >> >> Hope you are all doing well. I am currently using R
> >> >> version 4.0.2 and working with the nnet package.
> >> >>
> >> >> My dataframe consists of three columns, FECHA which is
> >> >> the date, x, which is a sequence from 1 to 159, and y,
> >> >> which is the number of covid cases (I am also providing
> >> >> the dput for this data frame below).
> >> >>
> >> >> I tried fitting a neural net model using the following
> >> >> code:
> >> >>
> >> >> xnew = 1:159 Fit <- nnet(a$y ~ a$x, a, size = 5, maxit =
> >> >> 1000, lineout = T, decay = 0.001)
> >> >>
> >> >> Finally, I attempted to generate predictions with the
> >> >> following code:
> >> >>
> >> >> predictions <- predict(Fit, newdata = list(x = xnew),
> >> >> type = "raw")
> >> >>
> >> >> But obtained extremely odd results: As you can see,
> >> >> instead of obtaining numbers, more or less in the range
> >> >> of the last observations of a$y, I end up getting a bunch
> >> >> of 1s, which doesn´t make any sense (if anyone could help
> >> >> me understand what could be causing this):
> >> >> dput(predictions) structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

Re: [R] Odd Results when generating predictions with nnet function

2020-09-02 Thread peter dalgaard
The problem seems to be the fit rather than the predictions. Looks like nnet is 
happier with data between 0 and 1, witness

Fit <- nnet(y/max(y) ~ x, a, size = 5, maxit = 1000, lineout = T, decay = 0.001)
plot(y/max(y)~x,a)
lines(fitted(Fit)~x,a)


> On 2 Sep 2020, at 16:21 , Paul Bernal  wrote:
> 
> Dear Dr. Martin and Dr. Peter, 
> 
> Hope you are doing well. Thank you for your kind feedback. I also tried 
> fitting the nnet using y ~ x, but the model kept on generating odd 
> predictions. If I understand correctly, from what Dr. Martin said, it would 
> be a good idea to try modeling sqrt(y) ~ x and then backtransform raising 
> both y and x to 0.5?
> 
> I was looking at a video where the guy modeled count data without doing any 
> kind of transformation and didn't get odd results, which is rather extrange.
> 
> Cheers,
> 
> Paul
> 
> 
> 
> El mié., 2 sept. 2020 a las 2:37, Martin Maechler 
> () escribió:
> > peter dalgaard 
> > on Wed, 2 Sep 2020 08:41:09 +0200 writes:
> 
> > Generically, nnet(a$y ~ a$x, a ...) should be nnet(y ~ x,
> > data=a, ...) otherwise predict will go looking for a$x, no
> > matter what is in xnew.  
> 
> > But more importantly, nnet() is a _classifier_, 
> > so the LHS should be a class, not a numeric variable.
> 
> > -pd
> 
> Well, nnet() can be used for both classification *and* regression,
> which is quite clear from the MASS book, but indeed, not from
> its help page, which indeed mentions one formula  'class ~ ...'
> and then only has classification examples.
> 
> So, indeed, the  ?nnet  help page could improved.
> 
> In his case, y are counts,  so  John Tukey's good old
> "first aid transformation" principle would suggest to model
> 
> sqrt(y) ~ ..   in a *regression* model which nnet() can do.
> 
> Martin Maechler
> ETH Zurich  and  R Core team
> 
> 
> 
> >> On 1 Sep 2020, at 22:19 , Paul Bernal
> >>  wrote:
> >> 
> >> Dear friends,
> >> 
> >> Hope you are all doing well. I am currently using R
> >> version 4.0.2 and working with the nnet package.
> >> 
> >> My dataframe consists of three columns, FECHA which is
> >> the date, x, which is a sequence from 1 to 159, and y,
> >> which is the number of covid cases (I am also providing
> >> the dput for this data frame below).
> >> 
> >> I tried fitting a neural net model using the following
> >> code:
> >> 
> >> xnew = 1:159 Fit <- nnet(a$y ~ a$x, a, size = 5, maxit =
> >> 1000, lineout = T, decay = 0.001)
> >> 
> >> Finally, I attempted to generate predictions with the
> >> following code:
> >> 
> >> predictions <- predict(Fit, newdata = list(x = xnew),
> >> type = "raw")
> >> 
> >> But obtained extremely odd results: As you can see,
> >> instead of obtaining numbers, more or less in the range
> >> of the last observations of a$y, I end up getting a bunch
> >> of 1s, which doesn´t make any sense (if anyone could help
> >> me understand what could be causing this):
> >> dput(predictions) structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim
> >> = c(159L, 1L), .Dimnames = list(c("1", "2", "3", "4",
> >> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
> >> "15", "16", "17", "18", "19", "20", "21", "22", "23",
> >> "24", "25", "26", "27", "28", "29", "30", "31", "32",
> >> "33", "34", "35", "36", "37", "38", "39", "40", "41",
> >> "42", "43", "44", "45", "46", "47", "48", "49", "50",
> >> "51", "52", "53", "54", "55", "56", "57", "58", "59",
> >> "60", "61", "62", "63", "64", "65", "66", "67", "68",
> >> "69", "70", "71", "72", "73", "74", "75", "76", "77",
> >> "78", "79", "80", "81", "82", "83", "84", "85", "86",
> >> "87", "88", "89", "90", "91", "92", "93", "94", "95",
> >> "96", "97", "98", "99", "100", "101", "102", "103",
> >> "104", "105", "106", "107", "108", "109", "110", "111",
> >> "112", "113", "114", "115", "116", "117", "118", "119",
> >> "120", "121", "122", "123", "124", "125", "126", "127",
> >> "128", "129", "130", "131", "132", "133", "134", "135",
> >> "136", "137", "138", "139", "140", "141", "142", "143",
> >> "144", "145", "146", "147", "148", "149", "150", "151",
> >> "152", "153", "154", "155", "156", "157", "158", "159"),
> >> NULL))
> >> 
> >> head(a) FECHA x y 1 2020-03-09 1 1 2 2020-03-10 2 8 3
> >> 2020-03-11 3 14 4 2020-03-12 4 27 5 

Re: [R] Odd Results when generating predictions with nnet function

2020-09-02 Thread Jeff Newmiller
Why would you expect raising y_pred to the 0.5 to "backtransform" a model 
sqrt(y)~x? Wouldn't you raise to the 2?

Why would you "backtransform" x in such a model if it were never transformed in 
the first place? Dr Maechler did not suggest that.

And why are you mentioning some random unspecified video on Youtube? That does 
not enlighten anyone here, apparently including you. Please reference package 
documentation, and/or reproduce the analysis discussed in that video to provide 
a contrasting (or supporting) point with the example you gave.


On September 2, 2020 7:21:58 AM PDT, Paul Bernal  wrote:
>Dear Dr. Martin and Dr. Peter,
>
>Hope you are doing well. Thank you for your kind feedback. I also tried
>fitting the nnet using y ~ x, but the model kept on generating odd
>predictions. If I understand correctly, from what Dr. Martin said, it
>would
>be a good idea to try modeling sqrt(y) ~ x and then backtransform
>raising
>both y and x to 0.5?
>
>I was looking at a video where the guy modeled count data without doing
>any
>kind of transformation and didn't get odd results, which is rather
>extrange.
>
>Cheers,
>
>Paul
>
>
>
>El mié., 2 sept. 2020 a las 2:37, Martin Maechler (<
>maech...@stat.math.ethz.ch>) escribió:
>
>> > peter dalgaard
>> > on Wed, 2 Sep 2020 08:41:09 +0200 writes:
>>
>> > Generically, nnet(a$y ~ a$x, a ...) should be nnet(y ~ x,
>> > data=a, ...) otherwise predict will go looking for a$x, no
>> > matter what is in xnew.
>>
>> > But more importantly, nnet() is a _classifier_,
>> > so the LHS should be a class, not a numeric variable.
>>
>> > -pd
>>
>> Well, nnet() can be used for both classification *and* regression,
>> which is quite clear from the MASS book, but indeed, not from
>> its help page, which indeed mentions one formula  'class ~ ...'
>> and then only has classification examples.
>>
>> So, indeed, the  ?nnet  help page could improved.
>>
>> In his case, y are counts,  so  John Tukey's good old
>> "first aid transformation" principle would suggest to model
>>
>> sqrt(y) ~ ..   in a *regression* model which nnet() can do.
>>
>> Martin Maechler
>> ETH Zurich  and  R Core team
>>
>>
>>
>> >> On 1 Sep 2020, at 22:19 , Paul Bernal
>> >>  wrote:
>> >>
>> >> Dear friends,
>> >>
>> >> Hope you are all doing well. I am currently using R
>> >> version 4.0.2 and working with the nnet package.
>> >>
>> >> My dataframe consists of three columns, FECHA which is
>> >> the date, x, which is a sequence from 1 to 159, and y,
>> >> which is the number of covid cases (I am also providing
>> >> the dput for this data frame below).
>> >>
>> >> I tried fitting a neural net model using the following
>> >> code:
>> >>
>> >> xnew = 1:159 Fit <- nnet(a$y ~ a$x, a, size = 5, maxit =
>> >> 1000, lineout = T, decay = 0.001)
>> >>
>> >> Finally, I attempted to generate predictions with the
>> >> following code:
>> >>
>> >> predictions <- predict(Fit, newdata = list(x = xnew),
>> >> type = "raw")
>> >>
>> >> But obtained extremely odd results: As you can see,
>> >> instead of obtaining numbers, more or less in the range
>> >> of the last observations of a$y, I end up getting a bunch
>> >> of 1s, which doesn´t make any sense (if anyone could help
>> >> me understand what could be causing this):
>> >> dput(predictions) structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim
>> >> = c(159L, 1L), .Dimnames = list(c("1", "2", "3", "4",
>> >> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
>> >> "15", "16", "17", "18", "19", "20", "21", "22", "23",
>> >> "24", "25", "26", "27", "28", "29", "30", "31", "32",
>> >> "33", "34", "35", "36", "37", "38", "39", "40", "41",
>> >> "42", "43", "44", "45", "46", "47", "48", "49", "50",
>> >> "51", "52", "53", "54", "55", "56", "57", "58", "59",
>> >> "60", "61", "62", "63", "64", "65", "66", "67", "68",
>> >> "69", "70", "71", "72", "73", "74", "75", "76", "77",
>> >> "78", "79", "80", "81", "82", "83", "84", "85", "86",
>> >> "87", "88", "89", "90", "91", "92", "93", "94", "95",
>> >> "96", "97", "98", "99", "100", "101", "102", "103",
>> >> "104", "105", "106", "107", "108", "109", "110", "111",
>> >> "112", "113", "114", "115", "116", "117", "118", "119",
>> >> "120", "121", "122", "123", "124", "125", "126", "127",
>> >> "128", 

Re: [R] Odd Results when generating predictions with nnet function

2020-09-02 Thread Paul Bernal
Dear Dr. Martin and Dr. Peter,

Hope you are doing well. Thank you for your kind feedback. I also tried
fitting the nnet using y ~ x, but the model kept on generating odd
predictions. If I understand correctly, from what Dr. Martin said, it would
be a good idea to try modeling sqrt(y) ~ x and then backtransform raising
both y and x to 0.5?

I was looking at a video where the guy modeled count data without doing any
kind of transformation and didn't get odd results, which is rather extrange.

Cheers,

Paul



El mié., 2 sept. 2020 a las 2:37, Martin Maechler (<
maech...@stat.math.ethz.ch>) escribió:

> > peter dalgaard
> > on Wed, 2 Sep 2020 08:41:09 +0200 writes:
>
> > Generically, nnet(a$y ~ a$x, a ...) should be nnet(y ~ x,
> > data=a, ...) otherwise predict will go looking for a$x, no
> > matter what is in xnew.
>
> > But more importantly, nnet() is a _classifier_,
> > so the LHS should be a class, not a numeric variable.
>
> > -pd
>
> Well, nnet() can be used for both classification *and* regression,
> which is quite clear from the MASS book, but indeed, not from
> its help page, which indeed mentions one formula  'class ~ ...'
> and then only has classification examples.
>
> So, indeed, the  ?nnet  help page could improved.
>
> In his case, y are counts,  so  John Tukey's good old
> "first aid transformation" principle would suggest to model
>
> sqrt(y) ~ ..   in a *regression* model which nnet() can do.
>
> Martin Maechler
> ETH Zurich  and  R Core team
>
>
>
> >> On 1 Sep 2020, at 22:19 , Paul Bernal
> >>  wrote:
> >>
> >> Dear friends,
> >>
> >> Hope you are all doing well. I am currently using R
> >> version 4.0.2 and working with the nnet package.
> >>
> >> My dataframe consists of three columns, FECHA which is
> >> the date, x, which is a sequence from 1 to 159, and y,
> >> which is the number of covid cases (I am also providing
> >> the dput for this data frame below).
> >>
> >> I tried fitting a neural net model using the following
> >> code:
> >>
> >> xnew = 1:159 Fit <- nnet(a$y ~ a$x, a, size = 5, maxit =
> >> 1000, lineout = T, decay = 0.001)
> >>
> >> Finally, I attempted to generate predictions with the
> >> following code:
> >>
> >> predictions <- predict(Fit, newdata = list(x = xnew),
> >> type = "raw")
> >>
> >> But obtained extremely odd results: As you can see,
> >> instead of obtaining numbers, more or less in the range
> >> of the last observations of a$y, I end up getting a bunch
> >> of 1s, which doesn´t make any sense (if anyone could help
> >> me understand what could be causing this):
> >> dput(predictions) structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim
> >> = c(159L, 1L), .Dimnames = list(c("1", "2", "3", "4",
> >> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
> >> "15", "16", "17", "18", "19", "20", "21", "22", "23",
> >> "24", "25", "26", "27", "28", "29", "30", "31", "32",
> >> "33", "34", "35", "36", "37", "38", "39", "40", "41",
> >> "42", "43", "44", "45", "46", "47", "48", "49", "50",
> >> "51", "52", "53", "54", "55", "56", "57", "58", "59",
> >> "60", "61", "62", "63", "64", "65", "66", "67", "68",
> >> "69", "70", "71", "72", "73", "74", "75", "76", "77",
> >> "78", "79", "80", "81", "82", "83", "84", "85", "86",
> >> "87", "88", "89", "90", "91", "92", "93", "94", "95",
> >> "96", "97", "98", "99", "100", "101", "102", "103",
> >> "104", "105", "106", "107", "108", "109", "110", "111",
> >> "112", "113", "114", "115", "116", "117", "118", "119",
> >> "120", "121", "122", "123", "124", "125", "126", "127",
> >> "128", "129", "130", "131", "132", "133", "134", "135",
> >> "136", "137", "138", "139", "140", "141", "142", "143",
> >> "144", "145", "146", "147", "148", "149", "150", "151",
> >> "152", "153", "154", "155", "156", "157", "158", "159"),
> >> NULL))
> >>
> >> head(a) FECHA x y 1 2020-03-09 1 1 2 2020-03-10 2 8 3
> >> 2020-03-11 3 14 4 2020-03-12 4 27 5 2020-03-13 5 36 6
> >> 2020-03-14 6 43
> >>
> >> dput(a) structure(list(FECHA = structure(c(18330, 18331,
> >> 18332, 18333, 18334, 18335, 18336, 18337, 18338, 18339,
> >> 18340, 18341, 18342, 18343, 18344, 18345, 18346, 18347,
> >> 18348, 18349, 18350, 18351, 18352, 18353, 18354, 18355,
> >> 18356, 18357, 18358, 

Re: [R] Odd Results when generating predictions with nnet function

2020-09-02 Thread Martin Maechler
> peter dalgaard 
> on Wed, 2 Sep 2020 08:41:09 +0200 writes:

> Generically, nnet(a$y ~ a$x, a ...) should be nnet(y ~ x,
> data=a, ...) otherwise predict will go looking for a$x, no
> matter what is in xnew.  

> But more importantly, nnet() is a _classifier_, 
> so the LHS should be a class, not a numeric variable.

> -pd

Well, nnet() can be used for both classification *and* regression,
which is quite clear from the MASS book, but indeed, not from
its help page, which indeed mentions one formula  'class ~ ...'
and then only has classification examples.

So, indeed, the  ?nnet  help page could improved.

In his case, y are counts,  so  John Tukey's good old
"first aid transformation" principle would suggest to model

sqrt(y) ~ ..   in a *regression* model which nnet() can do.

Martin Maechler
ETH Zurich  and  R Core team



>> On 1 Sep 2020, at 22:19 , Paul Bernal
>>  wrote:
>> 
>> Dear friends,
>> 
>> Hope you are all doing well. I am currently using R
>> version 4.0.2 and working with the nnet package.
>> 
>> My dataframe consists of three columns, FECHA which is
>> the date, x, which is a sequence from 1 to 159, and y,
>> which is the number of covid cases (I am also providing
>> the dput for this data frame below).
>> 
>> I tried fitting a neural net model using the following
>> code:
>> 
>> xnew = 1:159 Fit <- nnet(a$y ~ a$x, a, size = 5, maxit =
>> 1000, lineout = T, decay = 0.001)
>> 
>> Finally, I attempted to generate predictions with the
>> following code:
>> 
>> predictions <- predict(Fit, newdata = list(x = xnew),
>> type = "raw")
>> 
>> But obtained extremely odd results: As you can see,
>> instead of obtaining numbers, more or less in the range
>> of the last observations of a$y, I end up getting a bunch
>> of 1s, which doesn´t make any sense (if anyone could help
>> me understand what could be causing this):
>> dput(predictions) structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim
>> = c(159L, 1L), .Dimnames = list(c("1", "2", "3", "4",
>> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
>> "15", "16", "17", "18", "19", "20", "21", "22", "23",
>> "24", "25", "26", "27", "28", "29", "30", "31", "32",
>> "33", "34", "35", "36", "37", "38", "39", "40", "41",
>> "42", "43", "44", "45", "46", "47", "48", "49", "50",
>> "51", "52", "53", "54", "55", "56", "57", "58", "59",
>> "60", "61", "62", "63", "64", "65", "66", "67", "68",
>> "69", "70", "71", "72", "73", "74", "75", "76", "77",
>> "78", "79", "80", "81", "82", "83", "84", "85", "86",
>> "87", "88", "89", "90", "91", "92", "93", "94", "95",
>> "96", "97", "98", "99", "100", "101", "102", "103",
>> "104", "105", "106", "107", "108", "109", "110", "111",
>> "112", "113", "114", "115", "116", "117", "118", "119",
>> "120", "121", "122", "123", "124", "125", "126", "127",
>> "128", "129", "130", "131", "132", "133", "134", "135",
>> "136", "137", "138", "139", "140", "141", "142", "143",
>> "144", "145", "146", "147", "148", "149", "150", "151",
>> "152", "153", "154", "155", "156", "157", "158", "159"),
>> NULL))
>> 
>> head(a) FECHA x y 1 2020-03-09 1 1 2 2020-03-10 2 8 3
>> 2020-03-11 3 14 4 2020-03-12 4 27 5 2020-03-13 5 36 6
>> 2020-03-14 6 43
>> 
>> dput(a) structure(list(FECHA = structure(c(18330, 18331,
>> 18332, 18333, 18334, 18335, 18336, 18337, 18338, 18339,
>> 18340, 18341, 18342, 18343, 18344, 18345, 18346, 18347,
>> 18348, 18349, 18350, 18351, 18352, 18353, 18354, 18355,
>> 18356, 18357, 18358, 18359, 18360, 18361, 18362, 18363,
>> 18364, 18365, 18366, 18367, 18368, 18369, 18370, 18371,
>> 18372, 18373, 18374, 18375, 18376, 18377, 18378, 18379,
>> 18380, 18381, 18382, 18383, 18384, 18385, 18386, 18387,
>> 18388, 18389, 18390, 18391, 18392, 18393, 18394, 18395,
>> 18396, 18397, 18398, 18399, 18400, 18401, 18402, 18403,
>> 18404, 18405, 18406, 18407, 18408, 18409, 18410, 18411,
>> 18412, 18413, 18414, 18415, 18416, 18417, 18418, 18419,
>> 18420, 18421, 18422, 18423, 18424, 18425, 18426, 18427,
>> 18428, 18429, 18430, 18431, 18432, 18433, 18434, 18435,
>> 18436, 18437, 18438, 18439, 18440, 18441, 18442, 18443,
>> 18444, 18445, 18446, 18447, 18448, 18449, 18450, 18451,
>> 18452, 18453, 18454, 18455, 18456, 18457, 

Re: [R] Odd Results when generating predictions with nnet function

2020-09-02 Thread peter dalgaard
Generically, nnet(a$y ~ a$x, a ...) should be nnet(y ~ x, data=a, ...) 
otherwise predict will go looking for a$x, no matter what is in xnew. 

But more importantly, nnet() is a _classifier_, so the LHS should be a class, 
not a numeric variable.

-pd

> On 1 Sep 2020, at 22:19 , Paul Bernal  wrote:
> 
> Dear friends,
> 
> Hope you are all doing well. I am currently using R version 4.0.2 and
> working with the nnet package.
> 
> My dataframe consists of three columns, FECHA which is the date, x, which
> is a sequence from 1 to 159, and y, which is the number of covid cases (I
> am also providing the dput for this data frame below).
> 
> I tried fitting a neural net model using the following code:
> 
> xnew = 1:159
> Fit <- nnet(a$y ~ a$x, a, size = 5, maxit = 1000, lineout = T, decay =
> 0.001)
> 
> Finally, I attempted to generate predictions with the following code:
> 
> predictions <- predict(Fit, newdata = list(x = xnew), type = "raw")
> 
> But obtained extremely odd results:
> As you can see, instead of obtaining numbers, more or less in the range of
> the last observations  of a$y, I end up getting a bunch of 1s, which
> doesn´t make any sense (if anyone could help me understand what could be
> causing this):
> dput(predictions)
> structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(159L,
> 1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8",
> "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
> "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
> "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
> "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52",
> "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63",
> "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74",
> "75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85",
> "86", "87", "88", "89", "90", "91", "92", "93", "94", "95", "96",
> "97", "98", "99", "100", "101", "102", "103", "104", "105", "106",
> "107", "108", "109", "110", "111", "112", "113", "114", "115",
> "116", "117", "118", "119", "120", "121", "122", "123", "124",
> "125", "126", "127", "128", "129", "130", "131", "132", "133",
> "134", "135", "136", "137", "138", "139", "140", "141", "142",
> "143", "144", "145", "146", "147", "148", "149", "150", "151",
> "152", "153", "154", "155", "156", "157", "158", "159"), NULL))
> 
> head(a)
>   FECHAx  y
> 1 2020-03-09 1  1
> 2 2020-03-10 2  8
> 3 2020-03-11 3 14
> 4 2020-03-12 4 27
> 5 2020-03-13 5 36
> 6 2020-03-14 6 43
> 
> dput(a)
> structure(list(FECHA = structure(c(18330, 18331, 18332, 18333,
> 18334, 18335, 18336, 18337, 18338, 18339, 18340, 18341, 18342,
> 18343, 18344, 18345, 18346, 18347, 18348, 18349, 18350, 18351,
> 18352, 18353, 18354, 18355, 18356, 18357, 18358, 18359, 18360,
> 18361, 18362, 18363, 18364, 18365, 18366, 18367, 18368, 18369,
> 18370, 18371, 18372, 18373, 18374, 18375, 18376, 18377, 18378,
> 18379, 18380, 18381, 18382, 18383, 18384, 18385, 18386, 18387,
> 18388, 18389, 18390, 18391, 18392, 18393, 18394, 18395, 18396,
> 18397, 18398, 18399, 18400, 18401, 18402, 18403, 18404, 18405,
> 18406, 18407, 18408, 18409, 18410, 18411, 18412, 18413, 18414,
> 18415, 18416, 18417, 18418, 18419, 18420, 18421, 18422, 18423,
> 18424, 18425, 18426, 18427, 18428, 18429, 18430, 18431, 18432,
> 18433, 18434, 18435, 18436, 18437, 18438, 18439, 18440, 18441,
> 18442, 18443, 18444, 18445, 18446, 18447, 18448, 18449, 18450,
> 18451, 18452, 18453, 18454, 18455, 18456, 18457, 18458, 18459,
> 18460, 18461, 18462, 18463, 18464, 18465, 18466, 18467, 18468,
> 18469, 18470, 18471, 18472, 18473, 18474, 18475, 18476, 18477,
> 18478, 18479, 18480, 18481, 18482, 18483, 18484, 18485, 18486,
> 18487, 18488), class = "Date"), x = 1:159, y = c(1, 8, 14, 27,
> 36, 43, 55, 69, 86, 109, 137, 200, 245, 313, 345, 443, 558, 674,
> 786, 901, 989, 1075, 1181, 1317, 1475, 1673, 1801, 1988, 2100,
> 2249, 2528, 2752, 2974, 3234, 3400, 3472, 3574, 3751, 4016, 4210,
> 4273, 4467, 4658, 4821, 4992, 5166, 5338, 5538, 5779, 6021, 6200,
> 6378, 6532, 6720, 7090, 7197, 7387, 7523, 7731, 7868, 8070, 8282,
> 8448, 8616, 8783, 8944, 9118, 9268, 9449, 9606, 9726, 9867, 9977,
> 10116, 10267, 10577, 10926, 11183, 11447, 11728, 12131, 12531,
> 13015, 13463, 13837, 14095, 14609, 15044, 15463, 16004, 16425,
> 16854, 17233, 17889, 18586, 19211, 20059, 20686, 21422, 21962,
> 22597, 23351, 24274, 25222, 26030, 26752, 27314, 28030, 29037,
> 29905, 30658, 31686, 32785, 33550, 34463, 35237, 35995, 36983,
> 38149, 39334, 

[R] Odd Results when generating predictions with nnet function

2020-09-01 Thread Paul Bernal
Dear friends,

Hope you are all doing well. I am currently using R version 4.0.2 and
working with the nnet package.

My dataframe consists of three columns, FECHA which is the date, x, which
is a sequence from 1 to 159, and y, which is the number of covid cases (I
am also providing the dput for this data frame below).

I tried fitting a neural net model using the following code:

xnew = 1:159
Fit <- nnet(a$y ~ a$x, a, size = 5, maxit = 1000, lineout = T, decay =
0.001)

Finally, I attempted to generate predictions with the following code:

predictions <- predict(Fit, newdata = list(x = xnew), type = "raw")

But obtained extremely odd results:
As you can see, instead of obtaining numbers, more or less in the range of
the last observations  of a$y, I end up getting a bunch of 1s, which
doesn´t make any sense (if anyone could help me understand what could be
causing this):
dput(predictions)
structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(159L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52",
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63",
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74",
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85",
"86", "87", "88", "89", "90", "91", "92", "93", "94", "95", "96",
"97", "98", "99", "100", "101", "102", "103", "104", "105", "106",
"107", "108", "109", "110", "111", "112", "113", "114", "115",
"116", "117", "118", "119", "120", "121", "122", "123", "124",
"125", "126", "127", "128", "129", "130", "131", "132", "133",
"134", "135", "136", "137", "138", "139", "140", "141", "142",
"143", "144", "145", "146", "147", "148", "149", "150", "151",
"152", "153", "154", "155", "156", "157", "158", "159"), NULL))

head(a)
   FECHAx  y
1 2020-03-09 1  1
2 2020-03-10 2  8
3 2020-03-11 3 14
4 2020-03-12 4 27
5 2020-03-13 5 36
6 2020-03-14 6 43

dput(a)
structure(list(FECHA = structure(c(18330, 18331, 18332, 18333,
18334, 18335, 18336, 18337, 18338, 18339, 18340, 18341, 18342,
18343, 18344, 18345, 18346, 18347, 18348, 18349, 18350, 18351,
18352, 18353, 18354, 18355, 18356, 18357, 18358, 18359, 18360,
18361, 18362, 18363, 18364, 18365, 18366, 18367, 18368, 18369,
18370, 18371, 18372, 18373, 18374, 18375, 18376, 18377, 18378,
18379, 18380, 18381, 18382, 18383, 18384, 18385, 18386, 18387,
18388, 18389, 18390, 18391, 18392, 18393, 18394, 18395, 18396,
18397, 18398, 18399, 18400, 18401, 18402, 18403, 18404, 18405,
18406, 18407, 18408, 18409, 18410, 18411, 18412, 18413, 18414,
18415, 18416, 18417, 18418, 18419, 18420, 18421, 18422, 18423,
18424, 18425, 18426, 18427, 18428, 18429, 18430, 18431, 18432,
18433, 18434, 18435, 18436, 18437, 18438, 18439, 18440, 18441,
18442, 18443, 18444, 18445, 18446, 18447, 18448, 18449, 18450,
18451, 18452, 18453, 18454, 18455, 18456, 18457, 18458, 18459,
18460, 18461, 18462, 18463, 18464, 18465, 18466, 18467, 18468,
18469, 18470, 18471, 18472, 18473, 18474, 18475, 18476, 18477,
18478, 18479, 18480, 18481, 18482, 18483, 18484, 18485, 18486,
18487, 18488), class = "Date"), x = 1:159, y = c(1, 8, 14, 27,
36, 43, 55, 69, 86, 109, 137, 200, 245, 313, 345, 443, 558, 674,
786, 901, 989, 1075, 1181, 1317, 1475, 1673, 1801, 1988, 2100,
2249, 2528, 2752, 2974, 3234, 3400, 3472, 3574, 3751, 4016, 4210,
4273, 4467, 4658, 4821, 4992, 5166, 5338, 5538, 5779, 6021, 6200,
6378, 6532, 6720, 7090, 7197, 7387, 7523, 7731, 7868, 8070, 8282,
8448, 8616, 8783, 8944, 9118, 9268, 9449, 9606, 9726, 9867, 9977,
10116, 10267, 10577, 10926, 11183, 11447, 11728, 12131, 12531,
13015, 13463, 13837, 14095, 14609, 15044, 15463, 16004, 16425,
16854, 17233, 17889, 18586, 19211, 20059, 20686, 21422, 21962,
22597, 23351, 24274, 25222, 26030, 26752, 27314, 28030, 29037,
29905, 30658, 31686, 32785, 33550, 34463, 35237, 35995, 36983,
38149, 39334, 40291, 41251, 42216, 43257, 44352, 45633, 47177,
48096, 49243, 50373, 51408, 52261, 53468, 54426, 55153, 55906,
56817, 57993, 58864, 60296, 61442, 62223, 63269, 64191, 65256,
66383, 67453, 68456, 69424, 70231, 71418, 72560, 73651, 74492,
75394, 76464, 77377, 78446, 79402)), row.names = c(NA, 159L), class =
"data.frame")
Any help and/or guidance will be greatly appreciated,

Cheers,

Paul

[[alternative HTML version deleted]]

__

Re: [R] unstable results of nlxb fit

2020-05-11 Thread PIKAL Petr
Dear all.

Thank you for your answers. I will try Duncan's approach (if I could manage 
it).

The issue is that first part of my data (actually temperature) up to certain 
time approximately follow one exponential. After that, another process 
prevails and the temperature increase starts to be "explosive". That is why I 
used these two exponentials. As I have many experiments I wanted to perform 
the fit programmatically.

Which leads me to the approach that in each cycle I perform a plot which I 
visually inspect. If I consider the fit satisfactory I keep results. If not, I 
perform the fit with different starting values until it is OK. I am aware that 
it is not optimal but should be easiest.

Thank you again.

Best regards
Petr

> -Original Message-
> From: J C Nash 
> Sent: Friday, May 8, 2020 12:00 AM
> To: Bernard McGarvey ; PIKAL Petr
> ; r-help 
> Subject: Re: [R] unstable results of nlxb fit
>
> These results reflect my experience with this sort of problem.
>
> A couple of comments:
>
> 1) optimx package has a multistart wrapper. I probably should have written
> one for nlsr. Maybe Bernard and I should work on that. The issues are 
> largely
> to make things resistant to silly inputs, which even the smart users (you 
> know,
> the ones looking back from the mirror) introduce.
>
> 2) Sometimes using the bounds constraint capability in nlsr can be helpful,
> e.g., to ensure the exponent parameters are kept apart, can be useful.
>
> 3) Combining with Duncan's suggestion of solving for the linear parameters
> also helps.
>
> All of the above can be sensitive to particular data.
>
> Best, JN
>
> On 2020-05-07 5:41 p.m., Bernard McGarvey wrote:
> > John/Petr, I think there is an issue between a global optimum and local
> optima. I added a multistart loop around the code to see if I could find
> different solutions. Here is the code I added (I am not a great coder so 
> please
> excuse any inefficiencies in this code segment):
> >
> > # Multistart approach
> > NT <- 100
> > Results <- matrix(data=NA, nrow = NT, ncol=5,
> > dimnames=list(NULL,c("SS", "A", "B", "a", "b")))
> > A1 <- runif(NT,0,100)
> > B1 <- runif(NT,0,100)
> > a1 <- runif(NT,0.0,0.1)
> > b1 <- runif(NT,0.0,0.1)
> > for (I in 1:NT) {
> >   if (A1[I] > B1[I]) { # Ensure that the A'a are always the lower so that 
> > nlxb()
> always converge to the same values
> > A0 <- A1[I]
> > a0 <- a1[I]
> > A1[I] <- B1[I]
> > a1[I] <- b1[I]
> > B1[I] <- A0
> > b1[I] <- a0
> >   }
> >   fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
> >   start=list(A=A1[I], B=B1[I], a=a1[I], b=b1[I]))
> >   ccc <- coef(fit)
> >   Results[I,1] <- fit$ssquares
> >   Results[I,2] <- ccc[1]
> >   Results[I,3] <- ccc[2]
> >   Results[I,4] <- ccc[3]
> >   Results[I,5] <- ccc[4]
> > }
> > Results
> >
> > What I found is that the minimum SS generated at each trial had two
> distinct values, 417.8 and 3359.2. The A,B,a, and b values when the SS was
> 417.8 were all the same but I got different values for the case where the
> minimal SS was 3359.2. This indicates that the SS=417.8 may be the global
> minimum solution whereas the others are local optima. Here are the iteration
> results for a 100 trial multistart:
> >
> > Results
> >SS   A   B   a   b
> >   [1,] 3359.2  8.3546e+03  6.8321e+00   -1.988226  2.6139e-02
> >   [2,] 3359.2  8.2865e+03  6.8321e+00   -5.201735  2.6139e-02
> >   [3,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
> >   [4,] 3359.2  6.8321e+00  7.7888e+020.026139 -7.2812e-01
> >   [5,] 3359.2 -3.9020e+01  4.5852e+010.026139  2.6139e-02
> >   [6,] 3359.2  6.8321e+00  2.6310e+020.026139 -1.8116e+00
> >   [7,] 3359.2 -2.1509e+01  2.8341e+010.026139  2.6139e-02
> >   [8,] 3359.2 -3.8075e+01  4.4908e+010.026139  2.6139e-02
> >   [9,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
> >  [10,] 3359.2  1.2466e+04  6.8321e+00   -4.196000  2.6139e-02
> >  [11,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
> >  [12,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
> >  [13,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
> >  [14,] 3359.2  3.8018e+02  6.8321e+00   -0.806414  2.6139e-02
> >  [15,] 3359.2 -3.1921e+00  1.0024e+010.026139  2.6139e-02
> >  [16,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
> >  [17,] 3359.2 -1.5938e+01  2.2770e+01   

Re: [R] unstable results of nlxb fit

2020-05-07 Thread J C Nash
These results reflect my experience with this sort of problem.

A couple of comments:

1) optimx package has a multistart wrapper. I probably should have written one 
for
nlsr. Maybe Bernard and I should work on that. The issues are largely to make 
things
resistant to silly inputs, which even the smart users (you know, the ones 
looking
back from the mirror) introduce.

2) Sometimes using the bounds constraint capability in nlsr can be helpful, 
e.g.,
to ensure the exponent parameters are kept apart, can be useful.

3) Combining with Duncan's suggestion of solving for the linear parameters also
helps.

All of the above can be sensitive to particular data.

Best, JN

On 2020-05-07 5:41 p.m., Bernard McGarvey wrote:
> John/Petr, I think there is an issue between a global optimum and local 
> optima. I added a multistart loop around the code to see if I could find 
> different solutions. Here is the code I added (I am not a great coder so 
> please excuse any inefficiencies in this code segment):
> 
> # Multistart approach
> NT <- 100
> Results <- matrix(data=NA, nrow = NT, ncol=5, dimnames=list(NULL,c("SS", "A", 
> "B", "a", "b")))
> A1 <- runif(NT,0,100)
> B1 <- runif(NT,0,100)
> a1 <- runif(NT,0.0,0.1)
> b1 <- runif(NT,0.0,0.1)
> for (I in 1:NT) {
>   if (A1[I] > B1[I]) { # Ensure that the A'a are always the lower so that 
> nlxb() always converge to the same values
> A0 <- A1[I]
> a0 <- a1[I]
> A1[I] <- B1[I]
> a1[I] <- b1[I]
> B1[I] <- A0
> b1[I] <- a0
>   }
>   fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
>   start=list(A=A1[I], B=B1[I], a=a1[I], b=b1[I]))
>   ccc <- coef(fit)
>   Results[I,1] <- fit$ssquares
>   Results[I,2] <- ccc[1]
>   Results[I,3] <- ccc[2]
>   Results[I,4] <- ccc[3]
>   Results[I,5] <- ccc[4]
> }
> Results
> 
> What I found is that the minimum SS generated at each trial had two distinct 
> values, 417.8 and 3359.2. The A,B,a, and b values when the SS was 417.8 were 
> all the same but I got different values for the case where the minimal SS was 
> 3359.2. This indicates that the SS=417.8 may be the global minimum solution 
> whereas the others are local optima. Here are the iteration results for a 100 
> trial multistart:
> 
> Results
>SS   A   B   a   b
>   [1,] 3359.2  8.3546e+03  6.8321e+00   -1.988226  2.6139e-02
>   [2,] 3359.2  8.2865e+03  6.8321e+00   -5.201735  2.6139e-02
>   [3,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>   [4,] 3359.2  6.8321e+00  7.7888e+020.026139 -7.2812e-01
>   [5,] 3359.2 -3.9020e+01  4.5852e+010.026139  2.6139e-02
>   [6,] 3359.2  6.8321e+00  2.6310e+020.026139 -1.8116e+00
>   [7,] 3359.2 -2.1509e+01  2.8341e+010.026139  2.6139e-02
>   [8,] 3359.2 -3.8075e+01  4.4908e+010.026139  2.6139e-02
>   [9,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [10,] 3359.2  1.2466e+04  6.8321e+00   -4.196000  2.6139e-02
>  [11,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
>  [12,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [13,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [14,] 3359.2  3.8018e+02  6.8321e+00   -0.806414  2.6139e-02
>  [15,] 3359.2 -3.1921e+00  1.0024e+010.026139  2.6139e-02
>  [16,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [17,] 3359.2 -1.5938e+01  2.2770e+010.026139  2.6139e-02
>  [18,] 3359.2 -3.1205e+01  3.8037e+010.026139  2.6139e-02
>  [19,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [20,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [21,] 3359.2  8.6627e+03  6.8321e+00   -3.319778  2.6139e-02
>  [22,] 3359.2  6.8321e+00  1.9318e+010.026139 -6.5773e-01
>  [23,] 3359.2  6.2991e+01 -5.6159e+010.026139  2.6139e-02
>  [24,] 3359.2  2.8865e-03  6.8321e+00   -1.576307  2.6139e-02
>  [25,] 3359.2 -1.2496e+01  1.9328e+010.026139  2.6139e-02
>  [26,] 3359.2 -5.9432e+00  1.2775e+010.026139  2.6139e-02
>  [27,] 3359.2  1.6884e+02  6.8321e+00 -211.866423  2.6139e-02
>  [28,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [29,] 3359.2  5.4972e+03  6.8321e+00   -3.432094  2.6139e-02
>  [30,] 3359.2  6.8321e+00  1.4427e+030.026139 -4.2771e+02
>  [31,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
>  [32,] 3359.2  3.5760e+01 -2.8928e+010.026139  2.6139e-02
>  [33,] 3359.2  6.8321e+00 -4.0737e+020.026139 -6.7152e-01
>  [34,] 3359.2  6.8321e+00  1.2638e+040.026139 -2.8070e+00
>  [35,] 3359.2  1.1813e+01 -4.9807e+000.026139  2.6139e-02
>  [36,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [37,] 3359.2  6.8321e+00  1.2281e+030.026139 -3.0702e+02
>  [38,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [39,] 3359.2 -2.6850e+01  3.3682e+010.026139  2.6139e-02
>  [40,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
>  [41,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
>  [42,] 3359.2 -2.3279e+01  3.0111e+010.026139  

Re: [R] unstable results of nlxb fit

2020-05-07 Thread Bernard McGarvey
John/Petr, I think there is an issue between a global optimum and local optima. 
I added a multistart loop around the code to see if I could find different 
solutions. Here is the code I added (I am not a great coder so please excuse 
any inefficiencies in this code segment):

# Multistart approach
NT <- 100
Results <- matrix(data=NA, nrow = NT, ncol=5, dimnames=list(NULL,c("SS", "A", 
"B", "a", "b")))
A1 <- runif(NT,0,100)
B1 <- runif(NT,0,100)
a1 <- runif(NT,0.0,0.1)
b1 <- runif(NT,0.0,0.1)
for (I in 1:NT) {
  if (A1[I] > B1[I]) { # Ensure that the A'a are always the lower so that 
nlxb() always converge to the same values
A0 <- A1[I]
a0 <- a1[I]
A1[I] <- B1[I]
a1[I] <- b1[I]
B1[I] <- A0
b1[I] <- a0
  }
  fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
  start=list(A=A1[I], B=B1[I], a=a1[I], b=b1[I]))
  ccc <- coef(fit)
  Results[I,1] <- fit$ssquares
  Results[I,2] <- ccc[1]
  Results[I,3] <- ccc[2]
  Results[I,4] <- ccc[3]
  Results[I,5] <- ccc[4]
}
Results

What I found is that the minimum SS generated at each trial had two distinct 
values, 417.8 and 3359.2. The A,B,a, and b values when the SS was 417.8 were 
all the same but I got different values for the case where the minimal SS was 
3359.2. This indicates that the SS=417.8 may be the global minimum solution 
whereas the others are local optima. Here are the iteration results for a 100 
trial multistart:

Results
   SS   A   B   a   b
  [1,] 3359.2  8.3546e+03  6.8321e+00   -1.988226  2.6139e-02
  [2,] 3359.2  8.2865e+03  6.8321e+00   -5.201735  2.6139e-02
  [3,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
  [4,] 3359.2  6.8321e+00  7.7888e+020.026139 -7.2812e-01
  [5,] 3359.2 -3.9020e+01  4.5852e+010.026139  2.6139e-02
  [6,] 3359.2  6.8321e+00  2.6310e+020.026139 -1.8116e+00
  [7,] 3359.2 -2.1509e+01  2.8341e+010.026139  2.6139e-02
  [8,] 3359.2 -3.8075e+01  4.4908e+010.026139  2.6139e-02
  [9,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [10,] 3359.2  1.2466e+04  6.8321e+00   -4.196000  2.6139e-02
 [11,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
 [12,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [13,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [14,] 3359.2  3.8018e+02  6.8321e+00   -0.806414  2.6139e-02
 [15,] 3359.2 -3.1921e+00  1.0024e+010.026139  2.6139e-02
 [16,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [17,] 3359.2 -1.5938e+01  2.2770e+010.026139  2.6139e-02
 [18,] 3359.2 -3.1205e+01  3.8037e+010.026139  2.6139e-02
 [19,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [20,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [21,] 3359.2  8.6627e+03  6.8321e+00   -3.319778  2.6139e-02
 [22,] 3359.2  6.8321e+00  1.9318e+010.026139 -6.5773e-01
 [23,] 3359.2  6.2991e+01 -5.6159e+010.026139  2.6139e-02
 [24,] 3359.2  2.8865e-03  6.8321e+00   -1.576307  2.6139e-02
 [25,] 3359.2 -1.2496e+01  1.9328e+010.026139  2.6139e-02
 [26,] 3359.2 -5.9432e+00  1.2775e+010.026139  2.6139e-02
 [27,] 3359.2  1.6884e+02  6.8321e+00 -211.866423  2.6139e-02
 [28,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [29,] 3359.2  5.4972e+03  6.8321e+00   -3.432094  2.6139e-02
 [30,] 3359.2  6.8321e+00  1.4427e+030.026139 -4.2771e+02
 [31,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
 [32,] 3359.2  3.5760e+01 -2.8928e+010.026139  2.6139e-02
 [33,] 3359.2  6.8321e+00 -4.0737e+020.026139 -6.7152e-01
 [34,] 3359.2  6.8321e+00  1.2638e+040.026139 -2.8070e+00
 [35,] 3359.2  1.1813e+01 -4.9807e+000.026139  2.6139e-02
 [36,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [37,] 3359.2  6.8321e+00  1.2281e+030.026139 -3.0702e+02
 [38,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [39,] 3359.2 -2.6850e+01  3.3682e+010.026139  2.6139e-02
 [40,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [41,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
 [42,] 3359.2 -2.3279e+01  3.0111e+010.026139  2.6139e-02
 [43,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [44,] 3359.2  6.8321e+00  1.4550e+030.026139 -4.0303e+00
 [45,] 3359.2 -1.1386e+01  1.8218e+010.026139  2.6139e-02
 [46,] 3359.2  8.8026e+02  6.8321e+00  -65.430608  2.6139e-02
 [47,] 3359.2 -8.1985e+00  1.5031e+010.026139  2.6139e-02
 [48,] 3359.2 -6.7767e+00  1.3609e+010.026139  2.6139e-02
 [49,] 3359.2 -1.1436e+01  1.8268e+010.026139  2.6139e-02
 [50,] 3359.2  1.0710e+04  6.8321e+00   -2.349659  2.6139e-02
 [51,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
 [52,] 3359.2  6.8321e+00  7.1837e+020.026139 -7.4681e-01
 [53,]  417.8  3.9452e-13  9.7727e+000.280227  2.1798e-02
 [54,]  417.8  9.7727e+00  3.9452e-130.021798  2.8023e-01
 [55,] 3359.2 -4.8774e+00  6.8321e+00  -16.405584  2.6139e-02
 [56,] 3359.2  1.2687e+03  6.8321e+00   -3.775998  2.6139e-02
 [57,] 3359.2  

Re: [R] unstable results of nlxb fit

2020-05-07 Thread Duncan Murdoch
As John said, sums of exponentials are hard.  One thing that often helps 
a lot is to use the partially linear structure:  given a and b, you've 
got a linear model to compute A and B.  Now that you're down to two 
nonlinear parameters, you can draw a contour plot of nearby values to 
see how much of a mess you're dealing with.


Duncan Murdoch

On 07/05/2020 9:12 a.m., PIKAL Petr wrote:

Dear all

I started to use nlxb instead of nls to get rid of singular gradient error.
I try to fit double exponential function to my data, but results I obtain
are strongly dependent on starting values.

tsmes ~ A*exp(a*plast) + B* exp(b*plast)

Changing b from 0.1 to 0.01 gives me completely different results. I usually
check result by a plot but could the result be inspected if it achieved good
result without plotting?

Or is there any way how to perform such task?

Cheers
Petr

Below is working example.


dput(temp)

temp <- structure(list(tsmes = c(31, 32, 32, 32, 32, 32, 32, 32, 33,
34, 35, 35, 36, 36, 36, 37, 38, 39, 40, 40, 40, 40, 40, 41, 43,
44, 44, 44, 46, 47, 47, 47, 47, 48, 49, 51, 51, 51, 52, 53, 54,
54, 55, 57, 57, 57, 59, 59, 60, 62, 63, 64, 65, 66, 66, 67, 67,
68, 70, 72, 74, 76, 78, 81, 84, 85, 86, 88, 90, 91, 92, 94, 96,
97, 99, 100, 102, 104, 106, 109, 112, 115, 119, 123, 127, 133,
141, 153, 163, 171), plast = c(50, 51, 52, 52, 53, 53, 53, 54,
55, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 65, 65, 66,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 75, 76, 76, 77, 77, 78,
78, 79, 80, 81, 82, 83, 84, 85, 85, 86, 86, 87, 88, 88, 89, 90,
91, 91, 93, 93, 94, 95, 96, 96, 97, 98, 98, 99, 100, 100, 101,
102, 103, 103, 104, 105, 106, 107, 107, 108, 109, 110, 111, 112,
112, 113, 113, 114, 115, 116)), row.names = 2411:2500, class = "data.frame")

library(nlsr)

fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
start=list(A=1, B=15, a=0.025, b=0.01))
coef(fit)
ABab
3.945167e-13 9.772749e+00 2.802274e-01 2.179781e-02

plot(temp$plast, temp$tsmes, ylim=c(0,200))
lines(temp$plast, predict(fit, newdata=temp), col="pink", lwd=3)
ccc <- coef(fit)
lines(0:120,ccc[1]*exp(ccc[3]*(0:120)))
lines(0:120,ccc[2]*exp(ccc[4]*(0:120)), lty=3, lwd=2)

# wrong fit with slightly different b
fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
start=list(A=1, B=15, a=0.025, b=0.1))
coef(fit)
ABab
2911.64483776.8320597  -49.13739790.0261391
lines(temp$plast, predict(fit, newdata=temp), col="red", lwd=3)
ccc <- coef(fit)
lines(0:120,ccc[1]*exp(ccc[3]*(0:120)), col="red")
lines(0:120,ccc[2]*exp(ccc[4]*(0:120)), lty=3, lwd=2, col="red")


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unstable results of nlxb fit

2020-05-07 Thread J C Nash
The double exponential is well-known as a disaster to fit. Lanczos in his
1956 book Applied Analysis, p. 276 gives a good example which is worked through.
I've included it with scripts using nlxb in my 2014 book on Nonlinear Parameter
Optimization Using R Tools (Wiley). The scripts were on Wiley's site for the 
book,
but I've had difficulty getting Wiley to fix things and not checked lately if it
is still accessible. Ask off-list if you want the script and I'll dig into my
archives.

nlxb (preferably from nlsr which you used rather than nlmrt which is now not
maintained), will likely do as well as any general purpose code. There may be
special approaches that do a bit better, but I suspect the reality is that
the underlying problem is such that there are many sets of parameters with
widely different values that will get quite similar sums of squares.

Best, JN


On 2020-05-07 9:12 a.m., PIKAL Petr wrote:
> Dear all
> 
> I started to use nlxb instead of nls to get rid of singular gradient error.
> I try to fit double exponential function to my data, but results I obtain
> are strongly dependent on starting values. 
> 
> tsmes ~ A*exp(a*plast) + B* exp(b*plast)
> 
> Changing b from 0.1 to 0.01 gives me completely different results. I usually
> check result by a plot but could the result be inspected if it achieved good
> result without plotting?
> 
> Or is there any way how to perform such task?
> 
> Cheers
> Petr
> 
> Below is working example.
> 
>> dput(temp)
> temp <- structure(list(tsmes = c(31, 32, 32, 32, 32, 32, 32, 32, 33, 
> 34, 35, 35, 36, 36, 36, 37, 38, 39, 40, 40, 40, 40, 40, 41, 43, 
> 44, 44, 44, 46, 47, 47, 47, 47, 48, 49, 51, 51, 51, 52, 53, 54, 
> 54, 55, 57, 57, 57, 59, 59, 60, 62, 63, 64, 65, 66, 66, 67, 67, 
> 68, 70, 72, 74, 76, 78, 81, 84, 85, 86, 88, 90, 91, 92, 94, 96, 
> 97, 99, 100, 102, 104, 106, 109, 112, 115, 119, 123, 127, 133, 
> 141, 153, 163, 171), plast = c(50, 51, 52, 52, 53, 53, 53, 54, 
> 55, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 65, 65, 66, 
> 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 75, 76, 76, 77, 77, 78, 
> 78, 79, 80, 81, 82, 83, 84, 85, 85, 86, 86, 87, 88, 88, 89, 90, 
> 91, 91, 93, 93, 94, 95, 96, 96, 97, 98, 98, 99, 100, 100, 101, 
> 102, 103, 103, 104, 105, 106, 107, 107, 108, 109, 110, 111, 112, 
> 112, 113, 113, 114, 115, 116)), row.names = 2411:2500, class = "data.frame")
> 
> library(nlsr)
> 
> fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
> start=list(A=1, B=15, a=0.025, b=0.01))
> coef(fit)
>ABab 
> 3.945167e-13 9.772749e+00 2.802274e-01 2.179781e-02 
> 
> plot(temp$plast, temp$tsmes, ylim=c(0,200))
> lines(temp$plast, predict(fit, newdata=temp), col="pink", lwd=3)
> ccc <- coef(fit)
> lines(0:120,ccc[1]*exp(ccc[3]*(0:120)))
> lines(0:120,ccc[2]*exp(ccc[4]*(0:120)), lty=3, lwd=2)
> 
> # wrong fit with slightly different b
> fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
> start=list(A=1, B=15, a=0.025, b=0.1))
> coef(fit)
>ABab 
> 2911.64483776.8320597  -49.13739790.0261391 
> lines(temp$plast, predict(fit, newdata=temp), col="red", lwd=3)
> ccc <- coef(fit)
> lines(0:120,ccc[1]*exp(ccc[3]*(0:120)), col="red")
> lines(0:120,ccc[2]*exp(ccc[4]*(0:120)), lty=3, lwd=2, col="red")
> 
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unstable results of nlxb fit

2020-05-07 Thread PIKAL Petr
Dear all

I started to use nlxb instead of nls to get rid of singular gradient error.
I try to fit double exponential function to my data, but results I obtain
are strongly dependent on starting values. 

tsmes ~ A*exp(a*plast) + B* exp(b*plast)

Changing b from 0.1 to 0.01 gives me completely different results. I usually
check result by a plot but could the result be inspected if it achieved good
result without plotting?

Or is there any way how to perform such task?

Cheers
Petr

Below is working example.

> dput(temp)
temp <- structure(list(tsmes = c(31, 32, 32, 32, 32, 32, 32, 32, 33, 
34, 35, 35, 36, 36, 36, 37, 38, 39, 40, 40, 40, 40, 40, 41, 43, 
44, 44, 44, 46, 47, 47, 47, 47, 48, 49, 51, 51, 51, 52, 53, 54, 
54, 55, 57, 57, 57, 59, 59, 60, 62, 63, 64, 65, 66, 66, 67, 67, 
68, 70, 72, 74, 76, 78, 81, 84, 85, 86, 88, 90, 91, 92, 94, 96, 
97, 99, 100, 102, 104, 106, 109, 112, 115, 119, 123, 127, 133, 
141, 153, 163, 171), plast = c(50, 51, 52, 52, 53, 53, 53, 54, 
55, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 65, 65, 66, 
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 75, 76, 76, 77, 77, 78, 
78, 79, 80, 81, 82, 83, 84, 85, 85, 86, 86, 87, 88, 88, 89, 90, 
91, 91, 93, 93, 94, 95, 96, 96, 97, 98, 98, 99, 100, 100, 101, 
102, 103, 103, 104, 105, 106, 107, 107, 108, 109, 110, 111, 112, 
112, 113, 113, 114, 115, 116)), row.names = 2411:2500, class = "data.frame")

library(nlsr)

fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
start=list(A=1, B=15, a=0.025, b=0.01))
coef(fit)
   ABab 
3.945167e-13 9.772749e+00 2.802274e-01 2.179781e-02 

plot(temp$plast, temp$tsmes, ylim=c(0,200))
lines(temp$plast, predict(fit, newdata=temp), col="pink", lwd=3)
ccc <- coef(fit)
lines(0:120,ccc[1]*exp(ccc[3]*(0:120)))
lines(0:120,ccc[2]*exp(ccc[4]*(0:120)), lty=3, lwd=2)

# wrong fit with slightly different b
fit <- nlxb(tsmes ~ A*exp(a*plast) + B* exp(b*plast), data=temp,
start=list(A=1, B=15, a=0.025, b=0.1))
coef(fit)
   ABab 
2911.64483776.8320597  -49.13739790.0261391 
lines(temp$plast, predict(fit, newdata=temp), col="red", lwd=3)
ccc <- coef(fit)
lines(0:120,ccc[1]*exp(ccc[3]*(0:120)), col="red")
lines(0:120,ccc[2]*exp(ccc[4]*(0:120)), lty=3, lwd=2, col="red")

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Same results but different functions ?

2020-03-26 Thread varin sacha via R-help
Dear Michael, 
Dear Martin,

Many thanks for your suggestions.

Best,







Le lundi 23 mars 2020 à 22:34:41 UTC+1, Martin Maechler 
 a écrit : 





>>>>> Michael Dewey 
>>>>>    on Mon, 23 Mar 2020 13:45:44 + writes:

    > The documentation suggests that the rlm method for a formula does not 
    > have psi as a parameter. Perhaps try using the method for a matrix x and 
    > a vector y.

    > Michael

or use lmrob() from pkg robustbase  which is at least one
generation more recent and also with many more options than
rlm().

rlm() has been fantastic when it was introduced (into S /
S-plus, before R existed [in a publicly visible way]) but it had
been based of what was available back then, end of the 80's, beginning 90's.

Martin

    > On 23/03/2020 12:39, varin sacha via R-help wrote:
    >> Dear R-experts,
    >> 
    >> The rlm command in the MASS package command implements several versions 
of robust regression, for example the Huber and the Tukey (bisquare weighting 
function) estimators.
    >> In my R code here below I try to get the Tukey (bisquare weighting 
function) estimation, R gives me an error message : Error in statistic(data, 
original, ...) : unused argument (psi = psi.bisquare)
    >> If I cancel psi=psi.bisquare my code is working but IMHO I will get the 
Huber estimation and not the Tukey. So how can I get the Tukey ? Many thanks 
for your help.
    >> 
    >> 
    >> # # # # # # # # # # # # # # # # # # # # # # # #
    >> install.packages( "boot",dependencies=TRUE )
    >> install.packages( "MASS",dependencies=TRUE  )
    >> library(boot)
    >> library(MASS)
    >> 
    >> n<-50
    >> b<-runif(n, 0, 5)
    >> z <- rnorm(n, 2, 3)
    >> a <- runif(n, 0, 5)
    >> 
    >> y_model<- 0.1*b - 0.5 * z - a + 10
    >> y_obs <- y_model +c( rnorm(n*0.9, 0, 0.1), rnorm(n*0.1, 0, 0.5) )
    >> df<-data.frame(b,z,a,y_obs)
    >> 
    >>  # function to obtain MSE
    >>  MSE <- function(data, indices, formula) {
    >>     d <- data[indices, ] # allows boot to select sample
    >>     fit <- rlm(formula, data = d)
    >>     ypred <- predict(fit)
    >>     d[["y_obs "]] <-y_obs
    >>     mean((d[["y_obs"]]-ypred)^2)
    >>  }
    >> 
    >>  # Make the results reproducible
    >>  set.seed(1234)
    >> 
    >>  # bootstrapping with 600 replications
    >>  results <- boot(data = df, statistic = MSE,
    >>       R = 600, formula = y_obs ~ b+z+a, psi = psi.bisquare)
    >> 
    >> str(results)
    >> 
    >> boot.ci(results, type="bca" )
    >> # # # # # # # # # # # # # # # # # # # # # # # # #
    >> 
    >> __
    >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.
    >> 

    > -- 
    > Michael
    > http://www.dewey.myzen.co.uk/home.html


    > __
    > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Same results but different functions ?

2020-03-23 Thread Martin Maechler
>>>>> Michael Dewey 
>>>>> on Mon, 23 Mar 2020 13:45:44 + writes:

> The documentation suggests that the rlm method for a formula does not 
> have psi as a parameter. Perhaps try using the method for a matrix x and 
> a vector y.

> Michael

or use lmrob() from pkg robustbase  which is at least one
generation more recent and also with many more options than
rlm().

rlm() has been fantastic when it was introduced (into S /
S-plus, before R existed [in a publicly visible way]) but it had
been based of what was available back then, end of the 80's, beginning 90's.

Martin

> On 23/03/2020 12:39, varin sacha via R-help wrote:
>> Dear R-experts,
>> 
>> The rlm command in the MASS package command implements several versions 
of robust regression, for example the Huber and the Tukey (bisquare weighting 
function) estimators.
>> In my R code here below I try to get the Tukey (bisquare weighting 
function) estimation, R gives me an error message : Error in statistic(data, 
original, ...) : unused argument (psi = psi.bisquare)
>> If I cancel psi=psi.bisquare my code is working but IMHO I will get the 
Huber estimation and not the Tukey. So how can I get the Tukey ? Many thanks 
for your help.
>> 
>> 
>> # # # # # # # # # # # # # # # # # # # # # # # #
>> install.packages( "boot",dependencies=TRUE )
>> install.packages( "MASS",dependencies=TRUE  )
>> library(boot)
>> library(MASS)
>> 
>> n<-50
>> b<-runif(n, 0, 5)
>> z <- rnorm(n, 2, 3)
>> a <- runif(n, 0, 5)
>> 
>> y_model<- 0.1*b - 0.5 * z - a + 10
>> y_obs <- y_model +c( rnorm(n*0.9, 0, 0.1), rnorm(n*0.1, 0, 0.5) )
>> df<-data.frame(b,z,a,y_obs)
>> 
>>  # function to obtain MSE
>>  MSE <- function(data, indices, formula) {
>>     d <- data[indices, ] # allows boot to select sample
>>     fit <- rlm(formula, data = d)
>>     ypred <- predict(fit)
>>     d[["y_obs "]] <-y_obs
>>     mean((d[["y_obs"]]-ypred)^2)
>>  }
>> 
>>  # Make the results reproducible
>>  set.seed(1234)
>> 
>>  # bootstrapping with 600 replications
>>  results <- boot(data = df, statistic = MSE,
>>       R = 600, formula = y_obs ~ b+z+a, psi = psi.bisquare)
>> 
>> str(results)
>> 
>> boot.ci(results, type="bca" )
>> # # # # # # # # # # # # # # # # # # # # # # # # #
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 

> -- 
> Michael
> http://www.dewey.myzen.co.uk/home.html

> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Same results but different functions ?

2020-03-23 Thread Michael Dewey
The documentation suggests that the rlm method for a formula does not 
have psi as a parameter. Perhaps try using the method for a matrix x and 
a vector y.


Michael

On 23/03/2020 12:39, varin sacha via R-help wrote:

Dear R-experts,

The rlm command in the MASS package command implements several versions of 
robust regression, for example the Huber and the Tukey (bisquare weighting 
function) estimators.
In my R code here below I try to get the Tukey (bisquare weighting function) 
estimation, R gives me an error message : Error in statistic(data, original, 
...) : unused argument (psi = psi.bisquare)
If I cancel psi=psi.bisquare my code is working but IMHO I will get the Huber 
estimation and not the Tukey. So how can I get the Tukey ? Many thanks for your 
help.


# # # # # # # # # # # # # # # # # # # # # # # #
install.packages( "boot",dependencies=TRUE )
install.packages( "MASS",dependencies=TRUE  )
library(boot)
library(MASS)

n<-50
b<-runif(n, 0, 5)
z <- rnorm(n, 2, 3)
a <- runif(n, 0, 5)

y_model<- 0.1*b - 0.5 * z - a + 10
y_obs <- y_model +c( rnorm(n*0.9, 0, 0.1), rnorm(n*0.1, 0, 0.5) )
df<-data.frame(b,z,a,y_obs)

  # function to obtain MSE
  MSE <- function(data, indices, formula) {
     d <- data[indices, ] # allows boot to select sample
     fit <- rlm(formula, data = d)
     ypred <- predict(fit)
     d[["y_obs "]] <-y_obs
     mean((d[["y_obs"]]-ypred)^2)
  }

  # Make the results reproducible
  set.seed(1234)
  
  # bootstrapping with 600 replications

  results <- boot(data = df, statistic = MSE,
       R = 600, formula = y_obs ~ b+z+a, psi = psi.bisquare)

str(results)

boot.ci(results, type="bca" )
# # # # # # # # # # # # # # # # # # # # # # # # #

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Same results but different functions ?

2020-03-23 Thread varin sacha via R-help
Dear R-experts,

The rlm command in the MASS package command implements several versions of 
robust regression, for example the Huber and the Tukey (bisquare weighting 
function) estimators.
In my R code here below I try to get the Tukey (bisquare weighting function) 
estimation, R gives me an error message : Error in statistic(data, original, 
...) : unused argument (psi = psi.bisquare)
If I cancel psi=psi.bisquare my code is working but IMHO I will get the Huber 
estimation and not the Tukey. So how can I get the Tukey ? Many thanks for your 
help.


# # # # # # # # # # # # # # # # # # # # # # # #
install.packages( "boot",dependencies=TRUE )
install.packages( "MASS",dependencies=TRUE  )
library(boot)
library(MASS)

n<-50
b<-runif(n, 0, 5)
z <- rnorm(n, 2, 3)
a <- runif(n, 0, 5)

y_model<- 0.1*b - 0.5 * z - a + 10
y_obs <- y_model +c( rnorm(n*0.9, 0, 0.1), rnorm(n*0.1, 0, 0.5) )
df<-data.frame(b,z,a,y_obs)

 # function to obtain MSE
 MSE <- function(data, indices, formula) {
    d <- data[indices, ] # allows boot to select sample
    fit <- rlm(formula, data = d)
    ypred <- predict(fit)
    d[["y_obs "]] <-y_obs
    mean((d[["y_obs"]]-ypred)^2)
 }

 # Make the results reproducible
 set.seed(1234)
 
 # bootstrapping with 600 replications
 results <- boot(data = df, statistic = MSE,
      R = 600, formula = y_obs ~ b+z+a, psi = psi.bisquare)

str(results)

boot.ci(results, type="bca" )
# # # # # # # # # # # # # # # # # # # # # # # # #

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conflicting results for a time-varying coefficient in a Cox model

2019-08-18 Thread Ferenci Tamas
Dear Dr. Therneau,

Thank you very much for your exhaustive answer!

I now see the issue. Perhaps even more important was your confirmation that my 
approach with karno:ns(time, df=4) is theoretically correct. (I knew that 
plot.cox.zph is sound, so I was afraid that the difference can be attributed to 
some fundamental mistake in my approach.)

Thank you again,
Tamas


2019. augusztus 8., 18:17:38, írtad:


This is an excellent question.  
The answer, in this particular case, mostly has to do with the outlier time 
values.  (I've never been convinced that the death at time 999 isn't really a 
misplaced code for "missing", actually).If you change the knots used by the 
spline you can get quite different values.
For instance, using a smaller data set:

fit1 <-  coxph(Surv(tstart, time, status) ~ trt + prior + karno, veteran)
zph1 <- cox.zph(fit1, transform='identity')
plot(zph1[3])

dtime <- unique(veteran$time[veteran$status ==1])# all of the death times
veteran2 <- survSplit( Surv(time, status) ~ ., data=veteran, cut=dtime)
fit2 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno +
karno:ns(time, df=4),  data=veteran2)
tx <- 0:100 * 10# x positions for plot
ncall <- attr(terms(fit2), "predvars")[[6]]
ty <- eval(ncall, data.frame(time = tx)) %*% coef(fit2)[4:7] + coef(fit2)[3]
lines(tx, ty, col=2)

-

Now it looks even worse!  The only difference is that the ns() function has 
chosen a different set of knots.   

The test used by the cox.zph function is based on a score test and is solid.  
The plot that it produces uses a smoothed approximation to the variance matrix 
and is approximate.  So the diagnostic plot will never exactly match an actual 
fit.   In this data set the outliers exacerbate the issue.  To see this try a 
different time scale.


zph2 <- cox.zph(fit1, transform= sqrt)
plot(zph2[3])
veteran2$stime <- sqrt(veteran2$time)
fit3 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno +
   karno:ns(stime, df=4),  data=veteran2)

ncall3 <- attr(terms(fit3), "predvars")[[6]] 
ty3 <- eval(ncall3, data.frame(stime= sqrt(tx))) %*% coef(fit3)[4:7] + 
coef(fit3)[3]
lines(sqrt(tx), ty3, col=2)



The right tail is now better behaved.   Eliminating the points >900 makes 
things even better behaved.

Terry T.




On 8/8/19 9:07 AM, Ferenci Tamas wrote:

I was thinking of two possible ways to
plot a time-varying coefficient in a Cox model.

One is simply to use survival::plot.cox.zph which directly produces a
beta(t) vs t diagram.

The other is to transform the dataset to counting process format and
manually include an interaction with time, expanded with spline (to be
similar to plot.cox.zph). Plotting the coefficient produces the needed
beta(t) vs t diagram.

I understand that they're slightly different approaches, so I don't
expect totally identical results, but nevertheless, they approximate
the very same thing, so I do expect that the results are more or less
similar.

However:

library( survival )
library( splines )

data( veteran )

zp <- cox.zph( coxph(Surv(time, status) ~ trt + prior + karno,
 data = veteran ), transform = "identity" )[ 3 ]

veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
   data = veteran, cut = 1:max(veteran$time) )

fit <- coxph(Surv(tstart,time, status) ~ trt + prior + karno +
   karno:ns( time, df = 4 ), data = veteran3 )
cf <- coef( fit )
nsvet <- ns( veteran3$time, df = 4 )

plot( zp )
lines( 0:1000, ns( 0:1000, df = 4, knots = attr( nsvet, "knots" ),
   Boundary.knots = attr( nsvet, "Boundary.knots" ) )%*%cf[
 grep( "karno:ns", names( cf ) ) ] + cf["karno"],
   type = "l", col = "red" )

Where is the mistake? Something must be going on here, because the
plots are vastly different...

Thank you very much in advance,
Tamas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conflicting results for a time-varying coefficient in a Cox model

2019-08-08 Thread Therneau, Terry M., Ph.D. via R-help
This is an excellent question.
The answer, in this particular case, mostly has to do with the outlier time 
values.  (I've 
never been convinced that the death at time 999 isn't really a misplaced code 
for 
"missing", actually).    If you change the knots used by the spline you can get 
quite 
different values.
For instance, using a smaller data set:

fit1 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno, veteran)
zph1 <- cox.zph(fit1, transform='identity')
plot(zph1[3])

dtime <- unique(veteran$time[veteran$status ==1])    # all of the death times
veteran2 <- survSplit( Surv(time, status) ~ ., data=veteran, cut=dtime)
fit2 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno +
     karno:ns(time, df=4),  data=veteran2)
tx <- 0:100 * 10    # x positions for plot
ncall <- attr(terms(fit2), "predvars")[[6]]
ty <- eval(ncall, data.frame(time = tx)) %*% coef(fit2)[4:7] + coef(fit2)[3]
lines(tx, ty, col=2)

-

Now it looks even worse!  The only difference is that the ns() function has 
chosen a 
different set of knots.

The test used by the cox.zph function is based on a score test and is solid.  
The plot 
that it produces uses a smoothed approximation to the variance matrix and is 
approximate.  
So the diagnostic plot will never exactly match an actual fit.   In this data 
set the 
outliers exacerbate the issue.  To see this try a different time scale.


zph2 <- cox.zph(fit1, transform= sqrt)
plot(zph2[3])
veteran2$stime <- sqrt(veteran2$time)
fit3 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno +
    karno:ns(stime, df=4),  data=veteran2)

ncall3 <-attr(terms(fit3), "predvars")[[6]]
ty3 <- eval(ncall3, data.frame(stime= sqrt(tx))) %*% coef(fit3)[4:7] + 
coef(fit3)[3]
lines(sqrt(tx), ty3, col=2)



The right tail is now better behaved.   Eliminating the points >900 makes 
things even 
better behaved.

Terry T.




On 8/8/19 9:07 AM, Ferenci Tamas wrote:
> I was thinking of two possible ways to
> plot a time-varying coefficient in a Cox model.
>
> One is simply to use survival::plot.cox.zph which directly produces a
> beta(t) vs t diagram.
>
> The other is to transform the dataset to counting process format and
> manually include an interaction with time, expanded with spline (to be
> similar to plot.cox.zph). Plotting the coefficient produces the needed
> beta(t) vs t diagram.
>
> I understand that they're slightly different approaches, so I don't
> expect totally identical results, but nevertheless, they approximate
> the very same thing, so I do expect that the results are more or less
> similar.
>
> However:
>
> library( survival )
> library( splines )
>
> data( veteran )
>
> zp <- cox.zph( coxph(Surv(time, status) ~ trt + prior + karno,
>   data = veteran ), transform = "identity" )[ 3 ]
>
> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
> data = veteran, cut = 1:max(veteran$time) )
>
> fit <- coxph(Surv(tstart,time, status) ~ trt + prior + karno +
> karno:ns( time, df = 4 ), data = veteran3 )
> cf <- coef( fit )
> nsvet <- ns( veteran3$time, df = 4 )
>
> plot( zp )
> lines( 0:1000, ns( 0:1000, df = 4, knots = attr( nsvet, "knots" ),
> Boundary.knots = attr( nsvet, "Boundary.knots" ) )%*%cf[
>   grep( "karno:ns", names( cf ) ) ] + cf["karno"],
> type = "l", col = "red" )
>
> Where is the mistake? Something must be going on here, because the
> plots are vastly different...
>
> Thank you very much in advance,
> Tamas


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conflicting results for a time-varying coefficient in a Cox model

2019-08-06 Thread Ferenci Tamas
Dear All,

I was thinking of two possible ways to plot a time-varying coefficient
in a Cox model.

One is simply to use survival::plot.cox.zph which directly produces a
beta(t) vs t diagram.

The other is to transform the dataset to counting process format and
manually include an interaction with time, expanded with spline (to be
similar to plot.cox.zph). Plotting the coefficient produces the needed
beta(t) vs t diagram.

I understand that they're slightly different approaches, so I don't
expect totally identical results, but nevertheless, they approximate
the very same thing, so I do expect that the results are more or less
similar.


However:

library( survival )
library( splines )

data( veteran )

zp <- cox.zph( coxph(Surv(time, status) ~ trt + prior + karno,
 data = veteran ), transform = "identity" )[ 3 ]

veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
   data = veteran, cut = 1:max(veteran$time) )

fit <- coxph(Surv(tstart,time, status) ~ trt + prior + karno +
   karno:ns( time, df = 4 ), data = veteran3 )
cf <- coef( fit )
nsvet <- ns( veteran3$time, df = 4 )

plot( zp )
lines( 0:1000, ns( 0:1000, df = 4, knots = attr( nsvet, "knots" ),
   Boundary.knots = attr( nsvet, "Boundary.knots" ) )%*%cf[
 grep( "karno:ns", names( cf ) ) ] + cf["karno"],
   type = "l", col = "red" )

Where is the mistake? Something must be going on here, because the
plots are vastly different...

Thank you in advance,
Tamas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Retrievable results in a procedure

2018-12-25 Thread Steven Yen
Thanks Sarah. Below, replacing "structure" with "invisible" does 
wonders--that serves my need. What I want is quite simple - I call a 
procedure and it does two things: (1) display results for all; (2) save 
retrievable results for use in further analysis, e.g., in knitr. 
Earlier, with "structure" (or with results<-list(...)) it spits out the 
main results, with components repeated (printed) in a painfully long 
list. Yet, as I said, calling with foo<-try(...) prints the main results 
with the list suppressed. I am just looking for option to NOT have to 
call with foo<- always. There must be more ways to do this, but I am 
happy with invisible. Thanks again.


On 12/25/2018 11:10 PM, Sarah Goslee wrote:
> I'm a bit confused about what you actually want, but I think 
> invisible() might be the answer.
>
> Note that there's already a base function try() so that's not a great 
> name for test functions.
>
> Sarah
>
> On Tue, Dec 25, 2018 at 8:47 AM Steven Yen  > wrote:
>
> I would like to suppressed printing of retrievable results in a
> procedure and to print only when retrieved.
>
> In line 10 below I call procedure "try" and get matrices A,B,C all
> printed upon a call to the procedure. I get around this unwanted
> printing by calling with v<-try(A,B) as in line 11.
>
> Any way to suppress printing of the retrievable results listed in the
> structure command? Thank you, and Merry Christmas to all.
>
>
> A<-matrix(rpois(16,lambda=5),nrow=4,byrow=T)
> B<-diag(4)
>
> try<-function(A,B){
>   C<-A+B
>   cat("\nC:\n"); print(C)
> structure(list(A=A,B=B,C=C))
> }
>
> try(A,B)# line 10
> v<-try(A,B) # line 11
>
> -- 
> st...@ntu.edu.tw  (S.T. Yen)
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Sarah Goslee (she/her)
> http://www.sarahgoslee.com

-- 
st...@ntu.edu.tw (S.T. Yen)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Retrievable results in a procedure

2018-12-25 Thread Jeff Newmiller
You can use `capture.output`, but a far, far better solution is to remove the 
output statements from your computation functions entirely and let the caller 
decide whether to print the results.

You can, for example, add a `debug` parameter to the function, and if true it 
can return a list of as many intermediate results as you like that you can 
examine as you wish.

Of course, if debugging is your goal then learning to use the debug function to 
mark functions for single-stepping as needed is even better.

But no matter what, making functions that do both computation and output is 
really poor practice... do one or the other.

On December 25, 2018 5:42:13 AM PST, Steven Yen  wrote:
>I would like to suppressed printing of retrievable results in a 
>procedure and to print only when retrieved.
>
>In line 10 below I call procedure "try" and get matrices A,B,C all 
>printed upon a call to the procedure. I get around this unwanted 
>printing by calling with v<-try(A,B) as in line 11.
>
>Any way to suppress printing of the retrievable results listed in the 
>structure command? Thank you, and Merry Christmas to all.
>
>
>A<-matrix(rpois(16,lambda=5),nrow=4,byrow=T)
>B<-diag(4)
>
>try<-function(A,B){
>  C<-A+B
>  cat("\nC:\n"); print(C)
>structure(list(A=A,B=B,C=C))
>}
>
>try(A,B)# line 10
>v<-try(A,B) # line 11

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Retrievable results in a procedure

2018-12-25 Thread Sarah Goslee
I'm a bit confused about what you actually want, but I think invisible()
might be the answer.

Note that there's already a base function try() so that's not a great name
for test functions.

Sarah

On Tue, Dec 25, 2018 at 8:47 AM Steven Yen  wrote:

> I would like to suppressed printing of retrievable results in a
> procedure and to print only when retrieved.
>
> In line 10 below I call procedure "try" and get matrices A,B,C all
> printed upon a call to the procedure. I get around this unwanted
> printing by calling with v<-try(A,B) as in line 11.
>
> Any way to suppress printing of the retrievable results listed in the
> structure command? Thank you, and Merry Christmas to all.
>
>
> A<-matrix(rpois(16,lambda=5),nrow=4,byrow=T)
> B<-diag(4)
>
> try<-function(A,B){
>   C<-A+B
>   cat("\nC:\n"); print(C)
> structure(list(A=A,B=B,C=C))
> }
>
> try(A,B)# line 10
> v<-try(A,B) # line 11
>
> --
> st...@ntu.edu.tw (S.T. Yen)
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Sarah Goslee (she/her)
http://www.sarahgoslee.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Retrievable results in a procedure

2018-12-25 Thread Steven Yen
I would like to suppressed printing of retrievable results in a 
procedure and to print only when retrieved.

In line 10 below I call procedure "try" and get matrices A,B,C all 
printed upon a call to the procedure. I get around this unwanted 
printing by calling with v<-try(A,B) as in line 11.

Any way to suppress printing of the retrievable results listed in the 
structure command? Thank you, and Merry Christmas to all.


A<-matrix(rpois(16,lambda=5),nrow=4,byrow=T)
B<-diag(4)

try<-function(A,B){
  C<-A+B
  cat("\nC:\n"); print(C)
structure(list(A=A,B=B,C=C))
}

try(A,B)# line 10
v<-try(A,B) # line 11

-- 
st...@ntu.edu.tw (S.T. Yen)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrapping results in table format

2017-10-07 Thread Peter Wagey
Hi R users,
I was struggling to put the results into table format. Would you mind to
show using following data and code how we can put the results into table? I
further would like to have a  confidence interval for each group.


set.seed(1000)
data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group =
runif(200)>0.5))
data.frame(data)
head(data)
stat <- function(x, i) {x[i, c(m1 = mean(x1))]}
A<-data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apriori Results- Same number to support an confidence

2017-05-23 Thread Bert Gunter
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, May 23, 2017 at 3:52 PM, Raquel D.  wrote:
> Hi!
>
>
> Does anybody knows why it can be happening? Lift = 1

Doubt it. Pretty incoherent, to me anyway.

But this is probaby the wrong place to post. This list is about R
programming; statistics is generally OT. Try posting on
stats.stackexchange.com for statistics issues. Being more coherent
might help, too

-- Bert


>
>
>
>
> rule length distribution (lhs + rhs):sizes
>  1
> 68
>
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>   1   1   1   1   1   1
>
> summary of quality measures:
> support   confidencelift
>  Min.   :0.002050   Min.   :0.002050   Min.   :1
>  1st Qu.:0.002899   1st Qu.:0.002899   1st Qu.:1
>  Median :0.004465   Median :0.004465   Median :1
>  Mean   :0.008703   Mean   :0.008703   Mean   :1
>  3rd Qu.:0.007605   3rd Qu.:0.007605   3rd Qu.:1
>  Max.   :0.059593   Max.   :0.059593   Max.   :1
>
> mining info:
>  data ntransactions support confidence
>   txn438612   0.001  0.002
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odd results from rpart classification tree

2017-05-15 Thread Marshall, Jonathan
Thanks Terry!

I managed to figure that out shortly after posting (as is the way!) Adding an 
additional covariate that splits below one of the x branches but not the other 
and means the class proportion to go over 0.5 means the x split is retained.

However, I now have another conundrum, this time with rpart in anova mode...

library(rpart)
test_split <- function(offset) {
  y <- c(rep(0,10),rep(0.5,2)) + offset
  x <- c(rep(0,10),rep(1,2))
  if (is.null(rpart(y ~ x, minsplit=1, cp=0, xval=0)$splits)) 0 else 1
}

sum(replicate(1000, test_split(0))) # 1000, i.e. always splits
sum(replicate(1000, test_split(0.5))) # 2-12, i.e. splits only sometimes...

Adding a constant to y and getting different trees is a bit strange, 
particularly stochastically.

Will see if I can track down a copy of the CART book.

Jonathan


From: Therneau, Terry M., Ph.D. [thern...@mayo.edu]
Sent: 16 May 2017 00:43
To: r-help@r-project.org; Marshall, Jonathan
Subject: Re: Odd results from rpart classification tree

You are mixing up two of the steps in rpart.  1: how to find the best candidate 
split and
2: evaluation of that split.

With the "class" method we use the information or Gini criteria for step 1.  
The code
finds a worthwhile candidate split at 0.5 using exactly the calculations you 
outline.  For
step 2 the criteria is the "decision theory" loss.  In your data the estimated 
rate is 0
for the left node and 15/45 = .333 for the right node.  As a decision rule both 
predict
y=0 (since both are < 1/2).  The split predicts 0 on the left and 0 on the 
right, so does
nothing.

The CART book (Brieman, Freidman, Olshen and Stone) on which rpart is based 
highlights the
difference between odds-regression (for which the final prediction is a 
percent, and error
is Gini) and classification.  For the former treat y as continuous.

Terry T.


On 05/15/2017 05:00 AM, r-help-requ...@r-project.org wrote:
> The following code produces a tree with only a root. However, clearly the 
> tree with a split at x=0.5 is better. rpart doesn't seem to want to produce 
> it.
>
> Running the following produces a tree with only root.
>
> y <- c(rep(0,65),rep(1,15),rep(0,20))
> x <- c(rep(0,70),rep(1,30))
> f <- rpart(y ~ x, method='class', minsplit=1, cp=0.0001, 
> parms=list(split='gini'))
>
> Computing the improvement for a split at x=0.5 manually:
>
> obs_L <- y[x<.5]
> obs_R <- y[x>.5]
> n_L <- sum(x<.5)
> n_R <- sum(x>.5)
> gini <- function(p) {sum(p*(1-p))}
> impurity_root <- gini(prop.table(table(y)))
> impurity_L <- gini(prop.table(table(obs_L)))
> impurity_R <- gini(prop.table(table(obs_R)))
> impurity <- impurity_root * n - (n_L*impurity_L + n_R*impurity_R) # 2.880952
>
> Thus, an improvement of 2.88 should result in a split. It does not.
>
> Why?
>
> Jonathan
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odd results from rpart classification tree

2017-05-15 Thread Therneau, Terry M., Ph.D.
You are mixing up two of the steps in rpart.  1: how to find the best candidate split and 
2: evaluation of that split.


With the "class" method we use the information or Gini criteria for step 1.  The code 
finds a worthwhile candidate split at 0.5 using exactly the calculations you outline.  For 
step 2 the criteria is the "decision theory" loss.  In your data the estimated rate is 0 
for the left node and 15/45 = .333 for the right node.  As a decision rule both predict 
y=0 (since both are < 1/2).  The split predicts 0 on the left and 0 on the right, so does 
nothing.


The CART book (Brieman, Freidman, Olshen and Stone) on which rpart is based highlights the 
difference between odds-regression (for which the final prediction is a percent, and error 
is Gini) and classification.  For the former treat y as continuous.


Terry T.


On 05/15/2017 05:00 AM, r-help-requ...@r-project.org wrote:

The following code produces a tree with only a root. However, clearly the tree 
with a split at x=0.5 is better. rpart doesn't seem to want to produce it.

Running the following produces a tree with only root.

y <- c(rep(0,65),rep(1,15),rep(0,20))
x <- c(rep(0,70),rep(1,30))
f <- rpart(y ~ x, method='class', minsplit=1, cp=0.0001, 
parms=list(split='gini'))

Computing the improvement for a split at x=0.5 manually:

obs_L <- y[x<.5]
obs_R <- y[x>.5]
n_L <- sum(x<.5)
n_R <- sum(x>.5)
gini <- function(p) {sum(p*(1-p))}
impurity_root <- gini(prop.table(table(y)))
impurity_L <- gini(prop.table(table(obs_L)))
impurity_R <- gini(prop.table(table(obs_R)))
impurity <- impurity_root * n - (n_L*impurity_L + n_R*impurity_R) # 2.880952

Thus, an improvement of 2.88 should result in a split. It does not.

Why?

Jonathan




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odd results from rpart classification tree

2017-05-15 Thread Marshall, Jonathan
The following code produces a tree with only a root. However, clearly the tree 
with a split at x=0.5 is better. rpart doesn't seem to want to produce it.

Running the following produces a tree with only root.

y <- c(rep(0,65),rep(1,15),rep(0,20))
x <- c(rep(0,70),rep(1,30))
f <- rpart(y ~ x, method='class', minsplit=1, cp=0.0001, 
parms=list(split='gini'))

Computing the improvement for a split at x=0.5 manually:

obs_L <- y[x<.5]
obs_R <- y[x>.5]
n_L <- sum(x<.5)
n_R <- sum(x>.5)
gini <- function(p) {sum(p*(1-p))}
impurity_root <- gini(prop.table(table(y)))
impurity_L <- gini(prop.table(table(obs_L)))
impurity_R <- gini(prop.table(table(obs_R)))
impurity <- impurity_root * n - (n_L*impurity_L + n_R*impurity_R) # 2.880952

Thus, an improvement of 2.88 should result in a split. It does not.

Why?

Jonathan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Oddd results when computing confidence intervals

2016-12-02 Thread Abraham Mathew
I have a vector of values, and have written a function that takes each
value in that vector, generates a normal distribution with that value as
the mean, and then finds the interval at different levels. However, these
intervals don't seem to be right (too narrow).

### CREATE PREDICTION INTERVALS

ensemble_forecast = c(200,600,400,500,200,100,200,600,400,500,200,100)
forecast_for=12

lo_intervals = c()
hi_intervals = c()

create_prediction_intervals <- function(use_forecast = ensemble_forecast,
conf_level = 0.90,
do_jitter = FALSE){

   conf.levels1 = paste(round(rep(conf_level, forecast_for/2), 2), "0",
sep="")
   conf.levels2 = seq((conf_level+0.02), 0.95, length=forecast_for/2)
   all.conf.levels = c(conf.levels1, conf.levels2)
   all.conf.levels = as.numeric(as.character(all.conf.levels))
   all.conf.levels

   # forc_num=1
   for(forc_num in 1:length(use_forecast)){
 message("Executing forecast number: ", forc_num, " at confidence
level: ", all.conf.levels[forc_num])

 value = rnorm(5000, mean=use_forecast[forc_num],
sd=sd(use_forecast))

 #t.test(value)$conf.int
 #Rmisc::CI(value, ci=0.99)

 low = Rmisc::CI(value, ci=all.conf.levels[forc_num])[[3]]
 high = Rmisc::CI(value, ci=all.conf.levels[forc_num])[[1]]

 #low = t.test(value, conf.level=all.conf.levels[forc_num])$conf.int
[[1]]
 #high = t.test(value, conf.level=all.conf.levels[forc_num])$
conf.int[[2]]

 lo_intervals.tmp <- c(low)
 hi_intervals.tmp <- c(high)

 if(do_jitter){
if(length(unique(lo_intervals)) <= 3) lo_intervals.tmp <-
round(jitter(lo_intervals.tmp), 0)
if(length(unique(hi_intervals)) <= 3) hi_intervals.tmp <-
round(jitter(hi_intervals.tmp), 0)
lo_intervals <<- c(lo_intervals, lo_intervals.tmp)
hi_intervals <<- c(hi_intervals, hi_intervals.tmp)
 } else {
lo_intervals <<- c(lo_intervals, lo_intervals.tmp)
hi_intervals <<- c(hi_intervals, hi_intervals.tmp)
 }
   }
}

summary(value)
hist(value)
create_prediction_intervals(ensemble_forecast)


Any ideas on what I'm doing wrong?


-- 


*Abraham MathewData Ninja and Statistical Modeler*



*Minneapolis, MN720-648-0108@abmathewksAnalytics_Blog
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results when converting a matrix to a data.frame

2016-11-16 Thread David Winsemius

> On Nov 16, 2016, at 8:43 AM, Jeff Newmiller  wrote:
> 
> I will start by admitting I don't know the answer to your question.
> 
> However, I am responding because I think this should not be an issue in real 
> life use of R. Data frames are lists of distinct vectors, each of which has 
> its own reason for being present in the data, and normally each has its own 
> storage mode. Your use of a matrix as a short cut way to create many columns 
> at once does not change this fundamental difference between data frames and 
> matrices. You should not be surprised that putting the finishing touches on 
> this transformation takes some personal attention. 
> 
> Normally you should give explicit names to each column using the argument 
> names in the data.frame function. When using a matrix as a shortcut, you 
> should either immediately follow the creation of the data frame with a 
> names(DF)<- assignment, or wrap it in a setNames function call. 
> 
> setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )
> 
> Note that using a matrix to create many columns is memory inefficient, 
> because you start by setting aside a single block of memory (the matrix) and 
> then you move that data column at a time to separate vectors for use in the 
> data frame. If working with large data you might want to consider allocating 
> each column separately from the beginning. 
> 
> N <- 2
> nms <- c( "A", "B" )
> as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )
> 
> which is not as convenient, but illustrates that data frames are truly 
> different than matrices.
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On November 16, 2016 7:20:38 AM PST, g.maub...@weinwolf.de wrote:
>> Hi All,
>> 
>> I build an empty dataframe to fill it will values later. I did the 
>> following:
>> 
>> -- cut --
>> matrix(NA, 2, 2)
>>[,1] [,2]
>> [1,]   NA   NA
>> [2,]   NA   NA
>>> data.frame(matrix(NA, 2, 2))
>> X1 X2
>> 1 NA NA
>> 2 NA NA
>>> as.data.frame(matrix(NA, 2, 2))
>> V1 V2
>> 1 NA NA
>> 2 NA NA
>> -- cut --
>> 
>> Why does data.frame deliver different results than as.data.frame with 
>> regard to the variable names (V instead of X)?

They are two different functions:

It's fairly easy to see by looking at the code:

as.data.frame.matrix uses: names(value) <- paste0("V", ic)  when there are no 
column names and data.frame calls make.names which prepends an "X" as the first 
letter of invalid or missing names.


As to why the authors did it this way, I'm unable to comment.

>> 
>> Kind regards
>> 
>> Georg
>> 
>>  [[alternative HTML version deleted]]


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results when converting a matrix to a data.frame

2016-11-16 Thread Jeff Newmiller
I will start by admitting I don't know the answer to your question.

However, I am responding because I think this should not be an issue in real 
life use of R. Data frames are lists of distinct vectors, each of which has its 
own reason for being present in the data, and normally each has its own storage 
mode. Your use of a matrix as a short cut way to create many columns at once 
does not change this fundamental difference between data frames and matrices. 
You should not be surprised that putting the finishing touches on this 
transformation takes some personal attention. 

Normally you should give explicit names to each column using the argument names 
in the data.frame function. When using a matrix as a shortcut, you should 
either immediately follow the creation of the data frame with a names(DF)<- 
assignment, or wrap it in a setNames function call. 

setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )

Note that using a matrix to create many columns is memory inefficient, because 
you start by setting aside a single block of memory (the matrix) and then you 
move that data column at a time to separate vectors for use in the data frame. 
If working with large data you might want to consider allocating each column 
separately from the beginning. 

N <- 2
nms <- c( "A", "B" )
as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )

which is not as convenient, but illustrates that data frames are truly 
different than matrices.
-- 
Sent from my phone. Please excuse my brevity.

On November 16, 2016 7:20:38 AM PST, g.maub...@weinwolf.de wrote:
>Hi All,
>
>I build an empty dataframe to fill it will values later. I did the 
>following:
>
>-- cut --
>matrix(NA, 2, 2)
> [,1] [,2]
>[1,]   NA   NA
>[2,]   NA   NA
>> data.frame(matrix(NA, 2, 2))
>  X1 X2
>1 NA NA
>2 NA NA
>> as.data.frame(matrix(NA, 2, 2))
>  V1 V2
>1 NA NA
>2 NA NA
>-- cut --
>
>Why does data.frame deliver different results than as.data.frame with 
>regard to the variable names (V instead of X)?
>
>Kind regards
>
>Georg
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Different results when converting a matrix to a data.frame

2016-11-16 Thread G . Maubach
Hi All,

I build an empty dataframe to fill it will values later. I did the 
following:

-- cut --
matrix(NA, 2, 2)
 [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> data.frame(matrix(NA, 2, 2))
  X1 X2
1 NA NA
2 NA NA
> as.data.frame(matrix(NA, 2, 2))
  V1 V2
1 NA NA
2 NA NA
-- cut --

Why does data.frame deliver different results than as.data.frame with 
regard to the variable names (V instead of X)?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reporting results from phia

2016-08-11 Thread Walker Pedersen
Hi R community,

I submitted this question to cross validate, but didn't get any
answers.  Hoping someone on here can give me some clarification.

If I understand the phia documentation correctly, when using
testInteractions for factors, the value it returns is a contrast of
adjusted means, but if the "slope" argument is included, it is then
returning a contrast between two slopes, which I interpret to mean a
parameter coefficient. Is this correct?

So if I run:

testInteractions(model, pairwise="Factor")

and get:


P-value adjustment method: holm
   Value Df Chisq Pr(>Chisq)
ConditionA-ConditionB 0.059987  1 1.453 0.2281

The correct way to report this would be: Condition A was not
significantly higher than condition B, m = .06, X^2(1) = 1.45, p =
.23.


But if I run:


testInteractions(model, slope="Covariate", pairwise="Factor")


and get:


Adjusted slope for Covariate
Chisq Test:
P-value adjustment method: holm
  Value Df  Chisq Pr(>Chisq)
ConditionA-ConditionB -0.0094811  1 1.3427 0.2466


Then the correct way to report this would be: The slope for Covariate
was not significantly different for condition A and condition B, b =
-.009, X^2(1) = 1.34, p = .25.

Is this correct? The fact that they both are simply labelled "Value"
makes me second guess this interpretation as it seems to me to be
implying that they represent the same type of statistic...

I am using the lmer function in lme4 to fit this model, if that makes
any difference.

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] store results from loop into a dataframe

2016-01-05 Thread DIGHE, NILESH [AG/2362]
Dear R users:

I am trying to create a function that will loop over three dependent variables 
in my aov model, and then get the HSD.test for each variable.  I like to store 
the results from each loop in a data frame.



When I run my function (funx) on my data (dat), results from only yield gets 
populated in all three columns of the dataframe.  I am not able to store the 
results for each variable in a dataframe. Any help will be highly appreciated.







function (x)

{

trait_names <- c("yield", "lp", "lnth")

d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,

6))

for (i in trait_names) {

mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
PEDIGREE*FIELD + FIELD%in%REP")),

data = x)

out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)

d[, i] <- out$means[, 1]

}

d

}


structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L), .Label = c("FYLS", "HKI1", "KIS1", "LMLS",
"SELS", "SGL1"), class = "factor"), REP = structure(c(1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2",
"3"), class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L,
6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L,
5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L,
5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c("A", "B", "C", "D",
"E", "F"), class = "factor"), yield = c(1003L, 923L, 1268L, 1226L,
1059L, 1150L, 900L, 816L, 1072L, 1158L, 1026L, 1299L, 1083L,
1038L, 1236L, 1287L, 1270L, 1612L, 1513L, 1676L, 1504L, 1417L,
1932L, 1644L, 1293L, 1542L, 1452L, 1180L, 1248L, 1764L, 1326L,
1877L, 1788L, 1606L, 1809L, 1791L, 2294L, 2315L, 2320L, 2083L,
1895L, 2284L, 2000L, 2380L, 1952L, 2414L, 2354L, 2095L, 2227L,
2093L, 2019L, 2505L, 2410L, 2287L, 2507L, 2507L, 2349L, 2162L,
2108L, 2319L, 2028L, 1947L, 2352L, 2698L, 2369L, 1798L, 2422L,
2509L, 2234L, 2451L, 2139L, 1957L, 799L, 787L, 701L, 781L, 808L,
582L, 770L, 752L, 801L, 865L, 608L, 620L, 677L, 775L, 722L, 1030L,
606L, 729L, 1638L, 1408L, 1045L, 1685L, 1109L, 1210L, 1419L,
1048L, 1129L, 1549L, 1325L, 1315L, 1838L, 1066L, 1295L, 1499L,
1472L, 1139L), lp = c(NA, NA, 46.31, NA, NA, 43.8, NA, NA, 43.91,
NA, NA, 44.47, NA, NA, 45.16, NA, NA, 43.57, 40.65, NA, NA, 40.04,
NA, NA, 41.33, NA, NA, 40.75, NA, NA, 42.04, NA, NA, 40.35, NA,
NA, 43.682, NA, NA, 41.712, NA, NA, 42.566, NA, NA, 43.228, NA,
NA, 43.63, NA, NA, 42.058, NA, NA, NA, 45.19, NA, NA, 41.91,
NA, NA, 43.86, NA, NA, 44.48, NA, NA, 44.34, NA, NA, 43.03, NA,
NA, NA, 44.08, NA, NA, 41.39, NA, NA, 42.48, NA, NA, 44.13, NA,
NA, 43.39, NA, NA, 42.82, 42.18, NA, NA, 41.42, NA, NA, 41.25,
NA, NA, 42.31, NA, NA, 43.22, NA, NA, 40.52, NA, NA), lnth = c(NA,
NA, 1.151, NA, NA, 1.135, NA, NA, 1.109, NA, NA, 1.117, NA, NA,
1.107, NA, NA, 1.196, 1.255, NA, NA, 1.229, NA, NA, 1.158, NA,
NA, 1.214, NA, NA, 1.152, NA, NA, 1.194, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1.2, NA, NA, 1.219, NA, NA, 1.115, NA, NA, 1.205, NA, NA, 1.238,
NA, NA, 1.244, NA, NA, NA, 1.096, NA, NA, 1.021, NA, NA, 1.055,
NA, NA, 1.058, NA, NA, 1.026, NA, NA, 1.115, 1.202, NA, NA, 1.161,
NA, NA, 1.168, NA, NA, 1.189, NA, NA, 1.204, NA, NA, 1.277, NA,
NA)), .Names = c("FIELD", "REP", "PEDIGREE", "yield", "lp", "lnth"
), row.names = c(NA, -108L), class = "data.frame")






R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



other attached packages:

[1] agricolae_1.2-1 asreml_3.0  lattice_0.20-31 ggplot2_1.0.1   dplyr_0.4.2 
plyr_1.8.3



loaded via a namespace (and not attached):

 [1] 

Re: [R] store results from loop into a dataframe

2016-01-05 Thread DIGHE, NILESH [AG/2362]
Sarah: Thanks for pointing out the errors in my function.

Below are the errors I am getting after I run the corrected quote:
Error in if (s) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In qtukey(1 - alpha, ntr, DFerror) : NaNs produced

You are right, I have no idea to handle these errors.

Do you recommend any other approach to solve my problem? 

Thanks for your time.
Nilesh 



-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Sent: Tuesday, January 05, 2016 11:20 AM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] store results from loop into a dataframe

Leaving aside the question of whether this is the best way to approach your 
problem (unlikely), there's a couple of errors in your code involving indexing. 
Once fixed, the code demonstrates some errors in your use of HSD.test that will 
be harder for you to deal with.

Thanks for the complete reproducible example.

fun2 <- function (x)

{

trait_names <- c("yield", "lp", "lnth")

d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,

6))

for (i in trait_names) {
# your formula has all the trait names, not the selected one
# mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
PEDIGREE*FIELD + FIELD%in%REP")), data = x)
mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + PEDIGREE*FIELD + 
FIELD%in%REP")), data = x)

out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)

# you're indexing by the trait name, instead of its position
# d[, i] <- out$means[, 1]
d[, which(trait_names == i)] <- out$means[, 1]

}

d

}

Sarah

On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] 
<nilesh.di...@monsanto.com> wrote:
> Dear R users:
>
> I am trying to create a function that will loop over three dependent 
> variables in my aov model, and then get the HSD.test for each variable.  I 
> like to store the results from each loop in a data frame.
>
>
>
> When I run my function (funx) on my data (dat), results from only yield gets 
> populated in all three columns of the dataframe.  I am not able to store the 
> results for each variable in a dataframe. Any help will be highly appreciated.
>
>
>
>
>
>
>
> function (x)
>
> {
>
> trait_names <- c("yield", "lp", "lnth")
>
> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>
> 6))
>
> for (i in trait_names) {
>
> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")),
>
> data = x)
>
> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = 
> FALSE)
>
> d[, i] <- out$means[, 1]
>
> }
>
> d
>
> }
>
>
> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = 
> c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), 
> REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2", "3"), 
> class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
> 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 
> 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 
> 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 
> 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 
> 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 
> 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), 
> .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), yield = 
> c(1003L, 923L, 1268L, 1226L, 1059L, 1150L, 900L, 816L, 1072L, 1158L, 
> 1026L, 1299

Re: [R] store results from loop into a dataframe

2016-01-05 Thread Sarah Goslee
Leaving aside the question of whether this is the best way to approach
your problem (unlikely), there's a couple of errors in your code
involving indexing. Once fixed, the code demonstrates some errors in
your use of HSD.test that will be harder for you to deal with.

Thanks for the complete reproducible example.

fun2 <- function (x)

{

trait_names <- c("yield", "lp", "lnth")

d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,

6))

for (i in trait_names) {
# your formula has all the trait names, not the selected one
# mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD +
PEDIGREE*FIELD + FIELD%in%REP")), data = x)
mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD +
PEDIGREE*FIELD + FIELD%in%REP")), data = x)

out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)

# you're indexing by the trait name, instead of its position
# d[, i] <- out$means[, 1]
d[, which(trait_names == i)] <- out$means[, 1]

}

d

}

Sarah

On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362]
 wrote:
> Dear R users:
>
> I am trying to create a function that will loop over three dependent 
> variables in my aov model, and then get the HSD.test for each variable.  I 
> like to store the results from each loop in a data frame.
>
>
>
> When I run my function (funx) on my data (dat), results from only yield gets 
> populated in all three columns of the dataframe.  I am not able to store the 
> results for each variable in a dataframe. Any help will be highly appreciated.
>
>
>
>
>
>
>
> function (x)
>
> {
>
> trait_names <- c("yield", "lp", "lnth")
>
> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>
> 6))
>
> for (i in trait_names) {
>
> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")),
>
> data = x)
>
> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)
>
> d[, i] <- out$means[, 1]
>
> }
>
> d
>
> }
>
>
> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
> 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
> 6L, 6L, 6L, 6L, 6L), .Label = c("FYLS", "HKI1", "KIS1", "LMLS",
> "SELS", "SGL1"), class = "factor"), REP = structure(c(1L, 2L,
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2",
> "3"), class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L,
> 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L,
> 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
> 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L,
> 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L,
> 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L,
> 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
> 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c("A", "B", "C", "D",
> "E", "F"), class = "factor"), yield = c(1003L, 923L, 1268L, 1226L,
> 1059L, 1150L, 900L, 816L, 1072L, 1158L, 1026L, 1299L, 1083L,
> 1038L, 1236L, 1287L, 1270L, 1612L, 1513L, 1676L, 1504L, 1417L,
> 1932L, 1644L, 1293L, 1542L, 1452L, 1180L, 1248L, 1764L, 1326L,
> 1877L, 1788L, 1606L, 1809L, 1791L, 2294L, 2315L, 2320L, 2083L,
> 1895L, 2284L, 2000L, 2380L, 1952L, 2414L, 2354L, 2095L, 2227L,
> 2093L, 2019L, 2505L, 2410L, 2287L, 2507L, 2507L, 2349L, 2162L,
> 2108L, 2319L, 2028L, 1947L, 2352L, 2698L, 2369L, 1798L, 2422L,
> 2509L, 2234L, 2451L, 2139L, 1957L, 799L, 787L, 701L, 781L, 808L,
> 582L, 770L, 752L, 801L, 865L, 608L, 620L, 677L, 775L, 722L, 1030L,
> 606L, 729L, 1638L, 1408L, 1045L, 1685L, 1109L, 1210L, 1419L,
> 1048L, 1129L, 1549L, 1325L, 1315L, 1838L, 1066L, 1295L, 1499L,
> 1472L, 1139L), lp = c(NA, NA, 46.31, NA, NA, 43.8, NA, NA, 43.91,
> NA, NA, 44.47, NA, NA, 45.16, NA, NA, 43.57, 40.65, NA, NA, 40.04,
> NA, NA, 41.33, NA, NA, 40.75, NA, NA, 42.04, NA, NA, 40.35, NA,
> NA, 43.682, NA, NA, 41.712, NA, NA, 42.566, NA, NA, 43.228, NA,
> NA, 43.63, NA, NA, 42.058, NA, NA, NA, 45.19, NA, NA, 41.91,
> NA, NA, 43.86, NA, NA, 44.48, NA, NA, 44.34, NA, NA, 43.03, NA,
> NA, NA, 44.08, NA, NA, 41.39, NA, NA, 42.48, NA, NA, 44.13, NA,
> NA, 43.39, NA, NA, 42.82, 42.18, NA, NA, 41.42, NA, NA, 41.25,
> NA, NA, 42.31, NA, NA, 43.22, 

Re: [R] store results from loop into a dataframe

2016-01-05 Thread Sarah Goslee
If you run each variable individually, you'll discover that the NAs in
your data are causing problems. It's up to you to figure out what the
best way to handle those missing values for your research is.

Sarah

On Tue, Jan 5, 2016 at 12:39 PM, DIGHE, NILESH [AG/2362]
<nilesh.di...@monsanto.com> wrote:
> Sarah: Thanks for pointing out the errors in my function.
>
> Below are the errors I am getting after I run the corrected quote:
> Error in if (s) { : missing value where TRUE/FALSE needed
> In addition: Warning message:
> In qtukey(1 - alpha, ntr, DFerror) : NaNs produced
>
> You are right, I have no idea to handle these errors.
>
> Do you recommend any other approach to solve my problem?
>
> Thanks for your time.
> Nilesh
>
>
>
> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: Tuesday, January 05, 2016 11:20 AM
> To: DIGHE, NILESH [AG/2362]
> Cc: r-help@r-project.org
> Subject: Re: [R] store results from loop into a dataframe
>
> Leaving aside the question of whether this is the best way to approach your 
> problem (unlikely), there's a couple of errors in your code involving 
> indexing. Once fixed, the code demonstrates some errors in your use of 
> HSD.test that will be harder for you to deal with.
>
> Thanks for the complete reproducible example.
>
> fun2 <- function (x)
>
> {
>
> trait_names <- c("yield", "lp", "lnth")
>
> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>
> 6))
>
> for (i in trait_names) {
> # your formula has all the trait names, not the selected one
> # mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")), data = x)
> mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + PEDIGREE*FIELD + 
> FIELD%in%REP")), data = x)
>
> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)
>
> # you're indexing by the trait name, instead of its position
> # d[, i] <- out$means[, 1]
> d[, which(trait_names == i)] <- out$means[, 1]
>
> }
>
> d
>
> }
>
> Sarah
>
> On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] 
> <nilesh.di...@monsanto.com> wrote:
>> Dear R users:
>>
>> I am trying to create a function that will loop over three dependent 
>> variables in my aov model, and then get the HSD.test for each variable.  I 
>> like to store the results from each loop in a data frame.
>>
>>
>>
>> When I run my function (funx) on my data (dat), results from only yield gets 
>> populated in all three columns of the dataframe.  I am not able to store the 
>> results for each variable in a dataframe. Any help will be highly 
>> appreciated.
>>
>>
>>
>>
>>
>>
>>
>> function (x)
>>
>> {
>>
>> trait_names <- c("yield", "lp", "lnth")
>>
>> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>>
>> 6))
>>
>> for (i in trait_names) {
>>
>> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD +
>> PEDIGREE*FIELD + FIELD%in%REP")),
>>
>> data = x)
>>
>> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console =
>> FALSE)
>>
>> d[, i] <- out$means[, 1]
>>
>> }
>>
>> d
>>
>> }
>>
>>
>> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
>> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
>> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
>> c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"),
>> REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
>> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
>> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
>> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
>> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
>> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
>> 3L, 1L, 2L, 3L, 1L

Re: [R] store results from loop into a dataframe

2016-01-05 Thread DIGHE, NILESH [AG/2362]
Sarah:  Thanks a lot for taking time to guide me to the right direction.  I now 
see how the missing data is causing the problem.
Thanks again!
Nilesh

-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Sent: Tuesday, January 05, 2016 12:13 PM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] store results from loop into a dataframe

If you run each variable individually, you'll discover that the NAs in your 
data are causing problems. It's up to you to figure out what the best way to 
handle those missing values for your research is.

Sarah

On Tue, Jan 5, 2016 at 12:39 PM, DIGHE, NILESH [AG/2362] 
<nilesh.di...@monsanto.com> wrote:
> Sarah: Thanks for pointing out the errors in my function.
>
> Below are the errors I am getting after I run the corrected quote:
> Error in if (s) { : missing value where TRUE/FALSE needed In addition: 
> Warning message:
> In qtukey(1 - alpha, ntr, DFerror) : NaNs produced
>
> You are right, I have no idea to handle these errors.
>
> Do you recommend any other approach to solve my problem?
>
> Thanks for your time.
> Nilesh
>
>
>
> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: Tuesday, January 05, 2016 11:20 AM
> To: DIGHE, NILESH [AG/2362]
> Cc: r-help@r-project.org
> Subject: Re: [R] store results from loop into a dataframe
>
> Leaving aside the question of whether this is the best way to approach your 
> problem (unlikely), there's a couple of errors in your code involving 
> indexing. Once fixed, the code demonstrates some errors in your use of 
> HSD.test that will be harder for you to deal with.
>
> Thanks for the complete reproducible example.
>
> fun2 <- function (x)
>
> {
>
> trait_names <- c("yield", "lp", "lnth")
>
> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>
> 6))
>
> for (i in trait_names) {
> # your formula has all the trait names, not the selected one
> # mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")), data = x)
> mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")), data = x)
>
> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = 
> FALSE)
>
> # you're indexing by the trait name, instead of its position
> # d[, i] <- out$means[, 1]
> d[, which(trait_names == i)] <- out$means[, 1]
>
> }
>
> d
>
> }
>
> Sarah
>
> On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] 
> <nilesh.di...@monsanto.com> wrote:
>> Dear R users:
>>
>> I am trying to create a function that will loop over three dependent 
>> variables in my aov model, and then get the HSD.test for each variable.  I 
>> like to store the results from each loop in a data frame.
>>
>>
>>
>> When I run my function (funx) on my data (dat), results from only yield gets 
>> populated in all three columns of the dataframe.  I am not able to store the 
>> results for each variable in a dataframe. Any help will be highly 
>> appreciated.
>>
>>
>>
>>
>>
>>
>>
>> function (x)
>>
>> {
>>
>> trait_names <- c("yield", "lp", "lnth")
>>
>> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>>
>> 6))
>>
>> for (i in trait_names) {
>>
>> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
>> PEDIGREE*FIELD + FIELD%in%REP")),
>>
>> data = x)
>>
>> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console =
>> FALSE)
>>
>> d[, i] <- out$means[, 1]
>>
>> }
>>
>> d
>>
>> }
>>
>>
>> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
>> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
>> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = 
>> c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), 
>> REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
>> 2L,

Re: [R] no results

2015-11-11 Thread William Dunlap
If you are running these commands from a file using source() then
replacing 'summary(sem)' with 'print(summary(sem))' would help,
as would adding echo=TRUE or print.eval=TRUE to the source()
command.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Nov 10, 2015 at 11:47 AM, Alaa Sindi  wrote:

> Hi All,
>
> I am not getting any summary results and I do not have any error. what
> would be the problem?
>
>
>
> sem=mlogit.optim ( LL  , Start, method = 'nr', iterlim = 2000, tol =
> 1E-05, ftol = 1e-08, steptol = 1e-10, print.level = 0)
> summary(sem)
>
> thanks
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   >