from:"Ted Harding"

Re: [R] ks.test ; impossible to calculate exact exact value with ex-aequos

2018-12-10 Thread Ted Harding

On Mon, 2018-12-10 at 22:17 +0100, Fatma Ell wrote:
> Dear all,
> I'm trying to use ks.test in order to compare two curve. I've 0 values i
> think this is why I have the follonwing warnings :impossible to calculate
> exact exact value with  ex-aequos
> 
> a=c(3.02040816326531, 7.95918367346939, 10.6162790697674, 4.64150943396226,
> 1.86538461538462, 1.125, 1.01020408163265, 1.2093023255814,
> 0.292452830188679,
> 0, 0, 0)
> b=c(2.30769230769231, 4.19252873563218, 5.81924882629108, 6.2248243559719,
> 5.02682926829268, 4.50728862973761, 3.61741424802111, 5.05479452054795,
> 3.68095238095238, 1.875, 5.25, 0)
> 
> ks.test(a,b)
> 
> data:  a and b
> D = 0.58333, p-value = 0.0337
> alternative hypothesis: two-sided
> 
> Warning message:
> In ks.test(a, b) :
> impossible to calculate exact exact value with ex-aequos
> 
> Does the p-value is correct ? Otherwise, how to solve this issue ?
> Thanks a lot.

The warning arises, not because you have "0" values as such,
but because there are repeated values (which happen to be 0).

The K-S test is designed for continuous random variables, for
which the probability of repeated values is (theoretically) zero:
theoretically, they can't happen.

>From the help page ?ks.test :

"The presence of ties always generates a warning, since continuous
distributions do not generate them. If the ties arose from
rounding the tests may be approximately valid, but even modest
amounts of rounding can have a significant effect on the
calculated statistic."

in view of the fact that your sample 'a' has three zeros along with
nine other vauwes which are all different from 0 (and all *very*
different from 0 except for 0.292452830188679), along with the fact
that your sample 'b' has 11 values all *very* different from 0.
and pne finall value equal to 0; together also with the fact that
in each sample the '0' values occur at the end, stringly suggests
that the data source is not such that a K-D test is auitasble.

The K-S test is a non-parametric test for whether
  a) a given sample comes from na given kind of distribiution;
or
  v) two samples are drwn from the same distribition.
In either case, it is assumed that the sample values are drawn
independently of each other; if there is some reason why they
may not be independent of each other, the test os not valid.

You say "I'm trying to use ks.test in order to compare two curve".
When I ezecute
  plot(a)
  plot(b)
on your data, I see (approximately) in each case a rise from a
medium vale (~2 or ~3) to a higher vale {~6 or ~10) followed
by a decline down to an exact 0.

This is not the sort of situation that the K-S test is for!
Hoping this helps,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ignoring the domain of RV in punif()

2018-10-23 Thread Ted Harding

Well, as a final (I hope!) clarification: It is not the case that
"the bigger cube does not exists (because the Q is the reference
space)". It does exist! Simply, the probability of the random point
being in the bigger cube, and NOT in the cube Q, is 0.

Hence "the cumulative probability of reaching a point outside the
cube (u or v or w  > A) is 1" is badly phrased. The "cumulative
probability" is not the probability of *reaching* a point, but of
being (in the case of a real random variable) less than or equal
to the given value. If Prob[X <= x1] = 1, then Prob[X > x1] = 0.
Hence if x0 is the minimum value such that Prob[X <= x0] = 1,
then X "can reach" x0. But for any x1 > x0, Prob[x0 < X <= x1] = 0.
Therefore, since X cannot be greater than x0, X *cannot reach* x1!

Best wishes,
Ted.

On Tue, 2018-10-23 at 12:06 +0100, Hamed Ha wrote:
> Hi Ted,
> 
> Thanks for the explanation.
> 
> I am convinced at least more than average by Eric and your answer. But
> still have some shadows of confusion that is definitely because I have
> forgotten some fundamentals in probabilities.
> 
> In your cube example, the cumulative probability of reaching a point
> outside the cube (u or v or w  > A) is 1 however, the bigger cube does not
> exists (because the Q is the reference space). Other words, I feel that we
> extend the space to accommodate any cube of any size! Looks a bit weird to
> me!
> 
> 
> Hamed.
> 
> On Tue, 23 Oct 2018 at 11:52, Ted Harding  wrote:
> 
> > Sorry -- stupid typos in my definition below!
> > See at ===*** below.
> >
> > On Tue, 2018-10-23 at 11:41 +0100, Ted Harding wrote:
> > Before the ticket finally enters the waste bin, I think it is
> > necessary to explicitly explain what is meant by the "domain"
> > of a random variable. This is not (though in special cases
> > could be) the space of possible values of the random variable.
> >
> > Definition of (real-valued) Random Variable (RV):
> > Let Z be a probability space, i.e. a set {z} of entities z
> > on which a probability distribution is defined. The entities z
> > do not need to be numeric. A real-valued RV X is a function
> > X:Z --> R defined on Z such that, for any z in Z, X(z) is a
> > real number. The set Z, in tthis context, is (by definitipon)
> > the *domain* of X, i.e. the space on which X is defined.
> > It may or may not be (and usually is not) the same as the set
> > of possible values of X.
> >
> > Then. given any real value x0, the CDF of X at x- is Prob[X <= X0].
> > The distribution function of X does not define the domain of X.
> >
> > As a simple exam[ple: Suppose Q is a cube of side A, consisting of
> > points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space
> > of points z with a uniform distribution of position within Q.
> > Define the random variable X:Q --> [0,1] as
> > ===***
> >   X[u,v,w) = x/A
> >
> > Wrong! That should  have been:
> >
> >   X[u,v,w) = w/A
> > ===***
> > Then X is uniformly distributed on [0,1], the domain of X is Q.
> > Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x,
> > for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible
> > values of X is 1-dimensional, and is not the same as the domain of X,
> > which is 3-dimensional.
> >
> > Hopiong this helps!
> > Ted.
> >
> > On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote:
> > > > Yes, now it makes more sense.
> > > >
> > > > Okay, I think that I am convinced and we can close this ticket.
> > > >
> > > > Thanks Eric.
> > > > Regards,
> > > > Hamed.
> > > >
> > > > On Tue, 23 Oct 2018 at 10:42, Eric Berger 
> > wrote:
> > > >
> > > > > Hi Hamed,
> > > > > That reference is sloppy. Try looking at
> > > > > https://en.wikipedia.org/wiki/Cumulative_distribution_function
> > > > > and in particular the first example which deals with a Unif[0,1] r.v.
> > > > >
> > > > > Best,
> > > > > Eric
> > > > >
> > > > >
> > > > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha 
> > wrote:
> > > > >
> > > > >> Hi Eric,
> > > > >>
> > > > >> Thank you for your reply.
> > > > >>
> > > > >> I should say that your justification makes sense to me.  However, I
> > am in
> > > > >> doubt that CDF defines

Re: [R] Ignoring the domain of RV in punif()

2018-10-23 Thread Ted Harding

Sorry -- stupid typos in my definition below!
See at ===*** below.

On Tue, 2018-10-23 at 11:41 +0100, Ted Harding wrote:
Before the ticket finally enters the waste bin, I think it is
necessary to explicitly explain what is meant by the "domain"
of a random variable. This is not (though in special cases
could be) the space of possible values of the random variable.

Definition of (real-valued) Random Variable (RV):
Let Z be a probability space, i.e. a set {z} of entities z
on which a probability distribution is defined. The entities z
do not need to be numeric. A real-valued RV X is a function
X:Z --> R defined on Z such that, for any z in Z, X(z) is a
real number. The set Z, in tthis context, is (by definitipon)
the *domain* of X, i.e. the space on which X is defined.
It may or may not be (and usually is not) the same as the set
of possible values of X.

Then. given any real value x0, the CDF of X at x- is Prob[X <= X0].
The distribution function of X does not define the domain of X.

As a simple exam[ple: Suppose Q is a cube of side A, consisting of
points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space
of points z with a uniform distribution of position within Q.
Define the random variable X:Q --> [0,1] as
===***
  X[u,v,w) = x/A

Wrong! That should  have been:

  X[u,v,w) = w/A
===***
Then X is uniformly distributed on [0,1], the domain of X is Q.
Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x,
for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible
values of X is 1-dimensional, and is not the same as the domain of X,
which is 3-dimensional.

Hopiong this helps!
Ted.

On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote:
> > Yes, now it makes more sense.
> > 
> > Okay, I think that I am convinced and we can close this ticket.
> > 
> > Thanks Eric.
> > Regards,
> > Hamed.
> > 
> > On Tue, 23 Oct 2018 at 10:42, Eric Berger  wrote:
> > 
> > > Hi Hamed,
> > > That reference is sloppy. Try looking at
> > > https://en.wikipedia.org/wiki/Cumulative_distribution_function
> > > and in particular the first example which deals with a Unif[0,1] r.v.
> > >
> > > Best,
> > > Eric
> > >
> > >
> > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha  wrote:
> > >
> > >> Hi Eric,
> > >>
> > >> Thank you for your reply.
> > >>
> > >> I should say that your justification makes sense to me.  However, I am in
> > >> doubt that CDF defines by the Pr(x <= X) for all X? that is the domain of
> > >> RV is totally ignored in the definition.
> > >>
> > >> It makes a conflict between the formula and the theoretical definition.
> > >>
> > >> Please see page 115 in
> > >>
> > >> https://books.google.co.uk/books?id=FEE8D1tRl30C&printsec=frontcover&dq=statistical+distribution&hl=en&sa=X&ved=0ahUKEwjp3PGZmJzeAhUQqxoKHV7OBJgQ6AEIKTAA#v=onepage&q=uniform&f=false
> > >> The
> > >>
> > >>
> > >> Thanks.
> > >> Hamed.
> > >>
> > >>
> > >>
> > >> On Tue, 23 Oct 2018 at 10:21, Eric Berger  wrote:
> > >>
> > >>> Hi Hamed,
> > >>> I disagree with your criticism.
> > >>> For a random variable X
> > >>> X: D - - - > R
> > >>> its CDF F is defined by
> > >>> F: R - - - > [0,1]
> > >>> F(z) = Prob(X <= z)
> > >>>
> > >>> The fact that you wrote a convenient formula for the CDF
> > >>> F(z) = (z-a)/(b-a)  a <= z <= b
> > >>> in a particular range for z is your decision, and as you noted this
> > >>> formula will give the wrong value for z outside the interval [a,b].
> > >>> But the problem lies in your formula, not the definition of the CDF
> > >>> which would be, in your case:
> > >>>
> > >>> F(z) = 0 if z <= a
> > >>>= (z-a)/(b-a)   if a <= z <= b
> > >>>= 1 if 1 <= z
> > >>>
> > >>> HTH,
> > >>> Eric
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Oct 23, 2018 at 12:05 PM Hamed Ha  wrote:
> > >>>
> > >>>> Hi All,
> > >>>>
> > >>>> I recently discovered an interesting issue with the punif() function.
> > >>>> Let
> > >>>> X~Uiform[a,b] then the CDF is defined by F(x)=(x-a)/(b-a) fo

Re: [R] Ignoring the domain of RV in punif()

2018-10-23 Thread Ted Harding

Before the ticket finally enters the waste bin, I think it is
necessary to explicitly explain what is meant by the "domain"
of a random variable. This is not (though in special cases
could be) the space of possible values of the random variable.

Definition of (real-valued) Random Variable (RV):
Let Z be a probability space, i.e. a set {z} of entities z
on which a probability distribution is defined. The entities z
do not need to be numeric. A real-valued RV X is a function
X:Z --> R defined on Z such that, for any z in Z, X(z) is a
real number. The set Z, in tthis context, is (by definitipon)
the *domain* of X, i.e. the space on which X is defined.
It may or may not be (and usually is not) the same as the set
of possible values of X.

Then. given any real value x0, the CDF of X at x- is Prob[X <= X0].
The distribution function of X does not define the domain of X.

As a simple exam[ple: Suppose Q is a cube of side A, consisting of
points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space
of points z with a uniform distribution of position within Q.
Define the random variable X:Q --> [0,1] as
  X(u,v,w) = x/A
Then X is uniformly distributed on [0,1], the domain of X is Q.
Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x,
for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible
values of X is 1-dimensional, and is not the same as the domain of X,
which is 3-dimensional.

Hopiong this helps!
Ted.

On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote:
> Yes, now it makes more sense.
> 
> Okay, I think that I am convinced and we can close this ticket.
> 
> Thanks Eric.
> Regards,
> Hamed.
> 
> On Tue, 23 Oct 2018 at 10:42, Eric Berger  wrote:
> 
> > Hi Hamed,
> > That reference is sloppy. Try looking at
> > https://en.wikipedia.org/wiki/Cumulative_distribution_function
> > and in particular the first example which deals with a Unif[0,1] r.v.
> >
> > Best,
> > Eric
> >
> >
> > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha  wrote:
> >
> >> Hi Eric,
> >>
> >> Thank you for your reply.
> >>
> >> I should say that your justification makes sense to me.  However, I am in
> >> doubt that CDF defines by the Pr(x <= X) for all X? that is the domain of
> >> RV is totally ignored in the definition.
> >>
> >> It makes a conflict between the formula and the theoretical definition.
> >>
> >> Please see page 115 in
> >>
> >> https://books.google.co.uk/books?id=FEE8D1tRl30C&printsec=frontcover&dq=statistical+distribution&hl=en&sa=X&ved=0ahUKEwjp3PGZmJzeAhUQqxoKHV7OBJgQ6AEIKTAA#v=onepage&q=uniform&f=false
> >> The
> >>
> >>
> >> Thanks.
> >> Hamed.
> >>
> >>
> >>
> >> On Tue, 23 Oct 2018 at 10:21, Eric Berger  wrote:
> >>
> >>> Hi Hamed,
> >>> I disagree with your criticism.
> >>> For a random variable X
> >>> X: D - - - > R
> >>> its CDF F is defined by
> >>> F: R - - - > [0,1]
> >>> F(z) = Prob(X <= z)
> >>>
> >>> The fact that you wrote a convenient formula for the CDF
> >>> F(z) = (z-a)/(b-a)  a <= z <= b
> >>> in a particular range for z is your decision, and as you noted this
> >>> formula will give the wrong value for z outside the interval [a,b].
> >>> But the problem lies in your formula, not the definition of the CDF
> >>> which would be, in your case:
> >>>
> >>> F(z) = 0 if z <= a
> >>>= (z-a)/(b-a)   if a <= z <= b
> >>>= 1 if 1 <= z
> >>>
> >>> HTH,
> >>> Eric
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Oct 23, 2018 at 12:05 PM Hamed Ha  wrote:
> >>>
>  Hi All,
> 
>  I recently discovered an interesting issue with the punif() function.
>  Let
>  X~Uiform[a,b] then the CDF is defined by F(x)=(x-a)/(b-a) for (a<= x<=
>  b).
>  The important fact here is the domain of the random variable X. Having
>  said
>  that, R returns CDF for any value in the real domain.
> 
>  I understand that one can justify this by extending the domain of X and
>  assigning zero probabilities to the values outside the domain. However,
>  theoretically, it is not true to return a value for the CDF outside the
>  domain. Then I propose a patch to R function punif() to return an error
>  in
>  this situations.
> 
>  Example:
>  > punif(10^10)
>  [1] 1
> 
> 
>  Regards,
>  Hamed.
> 
>  [[alternative HTML version deleted]]
> 
>  __
>  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
> 
> >>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented,

Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Ted Harding

I think that one can usefully look at this question from the
point of view of what "NaN" and "NA" are abbreviations for
(at any rate, according to the understanding I have adopted
since many years -- maybe over-simplified).

NaN: Mot a Number
NA: Not Available

So NA is typically used for missing values, whereas NaN
represents the reults of numerical calculations which
cannot give a result which is a definite number,

Hence 0/0 is not a number, so NaN; similarly Inf/Inf.

Thus, with your x <- c(NA, NA, NA) mean(x, na.rm=TRUE)
sum(x, na.rm=TRUE) = 0, since the set of values of x
with na.rm=TRUE is empty so the number of elements
in x is 0; hence mean = 0/0 = NaN.

But for median(x, na.rm=TRUE), because there are no available
elements in x with na.rm=TRUE, and the median is found by
searching among available elements for the value which
divides the set of values into two halves, the median
is not available, hence NA.

Best wishes to all,
Ted.

On Wed, 2018-08-22 at 11:24 -0400, Marc Schwartz via R-help wrote:
> Hi,
> 
> It might even be worthwhile to review this recent thread on R-Devel:
> 
>   https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html
> 
> which touches upon a subtly related topic vis-a-vis NaN handling.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
> > On Aug 22, 2018, at 10:55 AM, Bert Gunter  wrote:
> > 
> > ... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
> > 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
> > NaN. So you can see the sorts of issues you may need to consider.
> > 
> > Bert Gunter
> > 
> > "The trouble with having an open mind is that people keep coming along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > 
> > 
> > On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter  wrote:
> > 
> >> Actually, the dissonance is a bit more basic.
> >> 
> >> After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
> >> what you see is actually:
> >> 
> >>> z <- numeric(0)
> >>> mean(z)
> >> [1] NaN
> >>> median(z)
> >> [1] NA
> >>> sd(z)
> >> [1] NA
> >>> sum(z)
> >> [1] 0
> >> etc.
> >> 
> >> I imagine that there may be more of these little inconsistencies due to
> >> the organic way R evolved over time. What the conventions should be  can be
> >> purely a matter of personal opinion in the absence of accepted standards.
> >> But I would look to see what accepted standards were, if any, first.
> >> 
> >> -- Bert
> >> 
> >> 
> >> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:
> >> 
> >>> Dear useRs,
> >>> 
> >>> I have just noticed that when input is only NA with na.rm=TRUE, mean()
> >>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
> >>> be the same? I think NA makes more sense than NaN in that case.
> >>> 
> >>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
> >>> NAsd(x, na.rm=TRUE) [1] NA
> >>> 
> >>> Thanks for any feedback.
> >>> 
> >>> Best,
> >>> Ivan
> >>> 
> >>> --
> >>> Dr. Ivan Calandra
> >>> TraCEr, laboratory for Traceology and Controlled Experiments
> >>> MONREPOS Archaeological Research Centre and
> >>> Museum for Human Behavioural Evolution
> >>> Schloss Monrepos
> >>> 56567 Neuwied, Germany
> >>> +49 (0) 2631 9772-243
> >>> https://www.researchgate.net/profile/Ivan_Calandra
> >>> 
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>> 
> >> 
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error custom indicator Quantstrat colnames

2018-07-14 Thread Ted Harding

Pietro,
Please post this to r-help@r-project.org
not to r-help-ow...@r-project.org
which is a mailing liat concerned with list management, and
does not deal with questions regarding the use of R.
Best wishes,
Ted.

On Sat, 2018-07-14 at 13:04 +, Pietro Fabbro via R-help wrote:
> I will try to be as clear as possible as I have been rebuked by some users. I 
> deleted the last questions and I will try to be sufficiently explicative in 
> this one. I apologize if the data I will insert will not be enough.
> 
> So, I am trying to run a strategy through the package Quantstrat.
> 
> install.packages("quantstrat")
> My problem is that I get the following error
> 
> Error incolnames<-(tmp, value = seq(ncol(tmp_val))) : 
> attempt to set 'colnames' on an object with less than two dimensions
> 
> when I try to run the following command:
> 
> > out <- applyStrategy(strategy=strategy.st,portfolios=portfolio.st)
> I do not have this problem if I use, as indicator, one or more indicators, 
> which are already defined by the package TTR.
> 
> I have this error only when I try to use a custom indicator. Here is the code 
> for the custom indicator that I use:
> 
> wma <-  WMA(Cl(mktdata), 4, wts=c(1:4)) 
> wmamaxt <- rollmaxr(wma, 30, fill = NA)
> wmamint <- - rollmaxr(- wma, 30, fill = NA)
> CNOwma <- function (mktdata=quote(mktdata),x) {(wma - wmamint) / (wmamaxt - 
> wmamint)}
> Please refer to the following code:
> 
> library(devtools)
> library(quantmod)
> library(quantstrat)
> library(TTR)
> library(png)
> library(IKTrading)
> 
> wma <-  WMA(Cl(mktdata), 4, wts=c(1:4)) 
> wmamaxt <- rollmaxr(wma, 30, fill = NA)
> wmamint <- - rollmaxr(- wma, 30, fill = NA)
> CNOwma <- function (mktdata=quote(mktdata),x) {(wma - wmamint) / (wmamaxt - 
> wmamint)}
> initdate <- "2010-01-01"
> from <- "2012-01-01" #start of backtest
> to <- "2017-31-12" #end of backtest
> 
> Sys.setenv(TZ= "EST") #Set up environment for timestamps
> 
> currency("USD") #Set up environment for currency to be used
> 
> symbols <- c("RUT", "IXIC") #symbols used in our backtest
> getSymbols(Symbols = symbols, src = "google", from=from, to=to, adjust = 
> TRUE) #receive data from google finance,  adjusted for splits/dividends
> 
> stock(symbols, currency = "USD", multiplier = 1) #tells quanstrat what 
> instruments present and what currency to use
> 
> tradesize <-1 #default trade size
> initeq <- 10 #default initial equity in our portfolio
> 
> strategy.st <- portfolio.st <- account.st <- "firststrat" #naming strategy, 
> portfolio and account
> 
> #removes old portfolio and strategy from environment
> rm.strat(portfolio.st)
> rm.strat(strategy.st) 
> 
> #initialize portfolio, account, orders and strategy objects
> initPortf(portfolio.st, symbols = symbols, initDate = initdate, currency = 
> "USD")
> 
> initAcct(account.st, portfolios = portfolio.st, initDate = initdate, currency 
> = "USD", initEq = initeq)
> 
> initOrders(portfolio.st, initDate = initdate)
> strategy(strategy.st, store=TRUE)
> 
> add.indicator(strategy = strategy.st,
> name = 'CNOwma',
> arguments = list(x = quote(Cl(mktdata)), n=4),
> label = 'CNOwma4')
> 
> 
> 
> 
> 
> add.signal(strategy.st, name = "sigThreshold",
> arguments = list(column = "CNOwma4", threshold = 0.6,
> relationship = "gt", cross = TRUE),
> label = "longthreshold")
> 
> 
> add.signal(strategy.st, name = "sigThreshold",
> arguments = list(column = "CNOwma4", threshold = 0.6,
> relationship = "lt", cross = TRUE),
> label = "shortthreshold")
> 
> 
> 
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "longthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "long", replace = FALSE,
> prefer = "Open"),
> type = "enter")
> 
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "shortthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "long", replace = FALSE,
> prefer = "Open"),
> type = "exit")
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "shortthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "short", replace = FALSE,
> prefer = "Open"),
> type = "enter")
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "longthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "short", replace = FALSE,
> prefer = "Open"),
> type = "exit")
> 
> 
> 
> out <- applyStrategy(strategy = strategy.st, portfolios = portfolio.st)
> When I run the traceback() of the error, this is what I get:
> 
> > traceback()
> 4: stop("attempt to set 'colnames' on an object with less than two 
> dimensions")
> 3: `colnames<-`(`*tmp*`, value = seq(ncol(tmp_val)))
> 2: applyIndicators(strategy = strategy, mktdata = mktdata, parameters = 
> parameters, 
> ...)
> 1: applyStrategy(strategy = strategy.st, portfolios = portfolio.st)
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCR

Re: [R] prod(NaN, NA) vs. prod(NA, NaN)

2018-07-04 Thread Ted Harding

I've been following this thread, and wondering where it might lead.
My (possibly naive) view of these matters is basically logical,
relying on (possibly over-simplified) interpretaions of "NA" and "NaN".

These are that:
  "NaN" means "Not a Number", though it can result from a
numerical calculation, e.g. '0/0' or 'Inf/Inf', while:
  "NA" means "Not Available" (e.g. "Missing Value"), but
   should be interpreted as in rhe appropriate class of its
   context -- so '2 + NA' interporets "NA" as numeric,
   while 'vec("abc",NA)' interprets "NA" as character.

On that basis, the result of 'sum(NaN, )' should be
"NaN", since 'sum' presumes that its arguments are numeric,
and the sum of  is not a number.
Likewise 'sum(, NaN)' should be NaN.

And in both of 'sum(NA, NaN) and sum(NaN, NA), the "NA" should
be interepreted as a "not-available number", and because
of the "NaN" the result cannot be a number, hence is "NaN".

So it sould seem that Martin Møller Skarbiniks Pedersen's
inconsistency:
  sum(c(NaN,NA))
  [1] NaN
  sum(NaN,NA)
  [1] NA
is not consistent with the above reasoning.

However, in my R version 2.14.0 (2011-10-31):
  sum(NaN,NA)
  [1] NA
  sum(NA,NaN)
  [1] NA
which **is** consistent! Hmmm...
Best wishes to all,
Ted.

On Wed, 2018-07-04 at 12:06 +0100, Barry Rowlingson wrote: 
> I'm having deja-vu of a similar discussion on R-devel:
> 
> https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html
> 
> This was the funniest inconsistency I could find:
> 
>  > sum(c(NaN,NA))
>  [1] NaN
>  > sum(NaN,NA)
>  [1] NA
> 
> THEY'RE IN THE SAME ORDER!!!
> 
> The doc in ?NaN has this clause:
> 
>  In R, basically all mathematical functions (including basic
>  ‘Arithmetic’), are supposed to work properly with ‘+/- Inf’ and
>  ‘NaN’ as input or output.
> 
> which doesn't define "properly", but you'd think commutativity was a
> "proper" property of addition. So although they "are supposed to" they
> don't. Naughty mathematical functions!
> 
> And then there's...
> 
>  Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
>  which of those two is not guaranteed and may depend on the R
>  platform (since compilers may re-order computations).
> 
> Which is at least telling you there is vagueness in the system. But
> hey, mathematics is not a precise science... oh wait...
> 
> Barry
> 
> 
> 
> 
> 
> On Tue, Jul 3, 2018 at 10:09 PM, Rolf Turner  wrote:
> >
> > On 04/07/18 00:24, Martin Møller Skarbiniks Pedersen wrote:
> >
> >> Hi,
> >>I am currently using R v3.4.4 and I just discovered this:
> >>
> >>> prod(NA, NaN) ; prod(NaN, NA)
> >> [1] NA
> >> [1] NaN
> >>
> >> ?prod says:
> >>  If ‘na.rm’ is ‘FALSE’ an ‘NA’ value in any of the arguments will
> >>   cause a value of ‘NA’ to be returned, otherwise ‘NA’ values are
> >>   ignored.
> >>
> >> So according to the manual-page for prod() NA should be returned in both
> >> cases?
> >>
> >>
> >> However for sum() is opposite is true:
> >>> sum(NA, NaN) ; sum(NaN, NA)
> >> [1] NA
> >> [1] NA
> >>
> >> ?sum says:
> >>  If ‘na.rm’ is ‘FALSE’ an ‘NA’ or ‘NaN’ value in any of the
> >>   arguments will cause a value of ‘NA’ or ‘NaN’ to be returned,
> >>   otherwise ‘NA’ and ‘NaN’ values are ignored.
> >>
> >>
> >> Maybe the manual for prod() should say the same as sum() that
> >> both NA and NaN can be returned?
> >
> > But:
> >
> >  > sum(NA,NaN)
> > [1] NA
> >  > sum(NaN,NA)
> > [1] NA
> >
> > so sum gives NA "both ways around".  Perhaps a slight inconsistency
> > here?  I doubt that it's worth losing any sleep over, however.
> >
> > Interestingly (???):
> >
> >  > NaN*NA
> > [1] NaN
> >  > NA*NaN
> > [1] NA
> >  > NaN+NA
> > [1] NaN
> >  > NA+NaN
> > [1] NA
> >
> > So we have an instance of non-commutative arithmetic operations.  And
> > sum() is a wee bit inconsistent with "+".
> >
> > Again I doubt that the implications are all that serious.
> >
> > cheers,
> >
> > Rolf Turner
> >
> > --
> > Technical Editor ANZJS
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

Re: [R] R maintains old values

2018-07-03 Thread Ted Harding

On Tue, Jul 3, 2018 at 9:25 AM, J C Nash  wrote:
> 
> > . . . Now, to add to the controversy, how do you set a computer on fire?
> >
> > JN

Perhaps by exploring the context of this thread,
where new values strike a match with old values???

Ted

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT --- grammar.

2018-06-24 Thread Ted Harding

On Mon, 2018-06-25 at 09:46 +1200, Rolf Turner wrote:
> Does/should one say "the degrees of freedom is defined to be" or "the 
> degrees of freedom are defined to be"?
> 
> Although value of "degrees of freedom" is a single number, the first 
> formulation sounds very odd to my ear.
> 
> I would like to call upon the collective wisdom of the R community to 
> help me decide.
> 
> Thanks, and my apologies for the off-topic post.
> 
> cheers,
> Rolf Turner

Interesting question, Rolf!
>From my point of view. I see "degrees of freedon" as a plural noun,
because of "degrees". But in some cases, we have only 1 degree of
freedon. Then the degrees of freedon is 1.

But we do not say, in that case, "the degree of freedom is defined
to be", or the degree of freedom are 1"

Nor would we say "The degrees of freedom are 19".!

So I thonk that the solution is to encapsulate the term within
aingle quotes, so that it becomes a singular entity. Thus:

The 'degrees of freedom' is defined to be ... "; and
The 'degrees of freedom' is 1.
Or
The degrees of freedom' is 19.

This is not the same issue as (one of my prime hates) saying
"the data is srored in the dataframe ... ". "Data" is a
plural noun (ainguler "datum"), and I would insist on
"the data are stored ... ". The French use "une donnee" and
"les donnees"; the Germans use "ein Datum", "der Daten";
so they know what they're doing! English-speakers mostly do not"

Best wishes to all,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mysterious rounding digits output

2018-05-31 Thread Ted Harding

Thanks Martin! Good to know that you have made this important
change, And, regarding
  Maybe we should additionally say that this is *not* round()ing,
  and give a link to the help for signif() ?
I think that also would be most useful. In fact, ?signif
leads to a whole survey of "Rounding of Numbers", covering the
functions ceiling(), floor(), trunc(), round(), signif().
Well worth reading!
Best wishes,
Ted.

On Thu, 2018-05-31 at 08:58 +0200, Martin Maechler wrote:
> >>>>> Ted Harding 
> >>>>> on Thu, 31 May 2018 07:10:32 +0100 writes:
> 
> > Well pointed out, Jim!
> 
> > It is infortunate that the documentation for options(digits=...)
> > does not mention that these are *significant digits* 
> > and not *decimal places* (which is what Joshua seems to want):
> 
> Since R 3.4.0  the help on  ?options  *does* say significant!
> 
> The change in R's source code was this one,  14 months ago :
> 
> r72380 | maechler | 2017-03-21 11:28:13 +0100 (Tue, 21. Mar 2017) | 2 Zeilen
> GeĆ¤nderte Pfade:
>M /trunk/src/library/base/man/options.Rd
> 
> digits: + "signficant"
> 
> 
> and since then, the text has been
> 
>  ‘digits’: controls the number of significant digits to print when
>   printing numeric values.  It is a suggestion only.
> ...
> 
> whereas what you (Ted) cite is from R 3.3.x and earlier
> 
>  > "‘digits’: controls the number of digits to print when
>  > printing numeric values."
> 
> Maybe we should additionally say that this is *not* round()ing,
> and give a link to the help for signif() ?
> 
> 
> > On the face of it, printing the value "0,517" of 'ccc' looks
> > like printing 4 digits! Joshua's could look even worse if 'ddd'
> > had values in the 1000s!
> 
> > To achieve exactly what Joshua seems to want, use the round()
> > function. Starting with his original assignment of values to
> > the variable itemInfo, the result of round(itemInfo,digits=3) is:
> 
> > aaa   bbb   ccc   dddeee
> > skill 1.396 6.225 0.517 5.775  2.497
> > predict   1.326 5.230 0.462 5.116 -2.673
> > waiting   1.117 4.948NANA NA
> > complex   1.237 4.170 0.220 4.713  5.642
> > novelty   1.054 4.005 0.442 4.260  2.076
> > creative  1.031 3.561 0.362 3.689  2.549
> > evaluated 0.963 3.013NANA NA
> > body  0.748 2.238 0.596 2.019  0.466
> > control   0.620 2.149NANA NA
> > stakes0.541 1.905 0.227 2.020  2.343
> > spont 0.496 1.620NANA NA
> > chatter   0.460 1.563 0.361 1.382  0.540
> > present   0.428 1.236 0.215 1.804  2.194
> > reward0.402 1.101 0.288 1.208  0.890
> > feedback  0.283 0.662NANA NA
> > goal  0.237 0.474NANA NA
> 
> > Best wishes to all,
> > Ted.
> 
> 
> > On Thu, 2018-05-31 at 15:30 +1000, Jim Lemon wrote:
> >> Hi Joshua,
> >> Because there are no values in column ddd less than 1.
> >> 
> >> itemInfo[3,"ddd"]<-0.3645372
> >> itemInfo
> >> aaa   bbb   ccc   dddeee
> >> skill 1.396 6.225 0.517 5.775  2.497
> >> predict   1.326 5.230 0.462 5.116 -2.673
> >> waiting   1.117 4.948NA 0.365 NA
> >> complex   1.237 4.170 0.220 4.713  5.642
> >> novelty   1.054 4.005 0.442 4.260  2.076
> >> creative  1.031 3.561 0.362 3.689  2.549
> >> evaluated 0.963 3.013NANA NA
> >> body  0.748 2.238 0.596 2.019  0.466
> >> control   0.620 2.149NANA NA
> >> stakes0.541 1.905 0.227 2.020  2.343
> >> spont 0.496 1.620NANA NA
> >> chatter   0.460 1.563 0.361 1.382  0.540
> >> present   0.428 1.236 0.215 1.804  2.194
> >> reward0.402 1.101 0.288 1.208  0.890
> >> feedback  0.283 0.662NANA NA
> >> goal  0.237 0.474NANA NA
> >> 
> >> digits specifies the number of significant digits ("It is a suggestion
> >> only"), so when at least one number is padded with a leading zero it
> >> affects the formatting of the other numbers in that column. I suspect
> >> that this is an esthetic consideration to line

Re: [R] mysterious rounding digits output

2018-05-30 Thread Ted Harding

Well pointed out, Jim!
It is infortunate that the documentation for options(digits=...)
does not mention that these are *significant digits* and not
*decimal places* (which is what Joshua seems to want):
  "‘digits’: controls the number of digits to print when
  printing numeric values."

On the face of it, printing the value "0,517" of 'ccc' looks
like printing 4 digits! Joshua's could look even worse if 'ddd'
had values in the 1000s!

To achieve exactly what Joshua seems to want, use the round()
function. Starting with his original assignment of values to
the variable itemInfo, the result of round(itemInfo,digits=3) is:

aaa   bbb   ccc   dddeee
skill 1.396 6.225 0.517 5.775  2.497
predict   1.326 5.230 0.462 5.116 -2.673
waiting   1.117 4.948NANA NA
complex   1.237 4.170 0.220 4.713  5.642
novelty   1.054 4.005 0.442 4.260  2.076
creative  1.031 3.561 0.362 3.689  2.549
evaluated 0.963 3.013NANA NA
body  0.748 2.238 0.596 2.019  0.466
control   0.620 2.149NANA NA
stakes0.541 1.905 0.227 2.020  2.343
spont 0.496 1.620NANA NA
chatter   0.460 1.563 0.361 1.382  0.540
present   0.428 1.236 0.215 1.804  2.194
reward0.402 1.101 0.288 1.208  0.890
feedback  0.283 0.662NANA NA
goal  0.237 0.474NANA NA

Best wishes to all,
Ted.


On Thu, 2018-05-31 at 15:30 +1000, Jim Lemon wrote:
> Hi Joshua,
> Because there are no values in column ddd less than 1.
> 
> itemInfo[3,"ddd"]<-0.3645372
> itemInfo
> aaa   bbb   ccc   dddeee
> skill 1.396 6.225 0.517 5.775  2.497
> predict   1.326 5.230 0.462 5.116 -2.673
> waiting   1.117 4.948NA 0.365 NA
> complex   1.237 4.170 0.220 4.713  5.642
> novelty   1.054 4.005 0.442 4.260  2.076
> creative  1.031 3.561 0.362 3.689  2.549
> evaluated 0.963 3.013NANA NA
> body  0.748 2.238 0.596 2.019  0.466
> control   0.620 2.149NANA NA
> stakes0.541 1.905 0.227 2.020  2.343
> spont 0.496 1.620NANA NA
> chatter   0.460 1.563 0.361 1.382  0.540
> present   0.428 1.236 0.215 1.804  2.194
> reward0.402 1.101 0.288 1.208  0.890
> feedback  0.283 0.662NANA NA
> goal  0.237 0.474NANA NA
> 
> digits specifies the number of significant digits ("It is a suggestion
> only"), so when at least one number is padded with a leading zero it
> affects the formatting of the other numbers in that column. I suspect
> that this is an esthetic consideration to line up the decimal points.
> 
> Jim
> 
> 
> On Thu, May 31, 2018 at 11:18 AM, Joshua N Pritikin  
> wrote:
> > R version 3.5.0 (2018-04-23) -- "Joy in Playing"
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > options(digits=3)
> >
> > itemInfo <- structure(list("aaa" = c(1.39633732316667, 1.32598263816667,  
> > 1.1165832407, 1.23651072616667, 1.0536867998, 1.0310073738,  
> > 0.9630728395, 0.7483865045, 0.62008664617, 0.5411017985,  
> > 0.49639760783, 0.45952804467, 0.42787704783, 0.40208597967, 
> >  0.28316118123, 0.23689627723), "bbb" = c(6.22533860696667,  
> > 5.229736804, 4.94816041772833, 4.17020503255333, 4.00453781427167,  
> > 3.56058007398333, 3.0125202404, 2.2378235873, 2.14863910661167,  
> > 1.90460903044777, 1.62001089796667, 1.56341257968151, 1.23618558850667,  
> > 1.10086688908262, 0.661981500639833, 0.47397754310745), "ccc" = 
> > c(0.5165911355,  0.46220470767, NA, 0.21963592433, 
> > 0.44186378083,  0.36150286583, NA, 0.59613794667, NA, 
> > 0.22698477157,  NA, 0.36092266158, 0.2145347068, 0.28775624948, 
> > NA, NA ), "ddd" = c(5.77538400186667,  5.115877113, NA, 4.71294520316667, 
> > 4.25952652129833, 3.68879921863167,  NA, 2.01942456211145, NA, 
> > 2.02032557108, NA, 1.381810875
>  9571,  1.80436759778167, 1.20789851993367, NA, NA), "eee" = 
> c(2.4972534717,  -2.67340172316667, NA, 5.6419520633, 
> 2.0763355523, 2.548949539,  NA, 0.465537272243167, NA, 2.34255027516667, 
> NA, 0.5400824922975,  2.1935000655, 0.89000797687, NA, NA)), row.names = 
> c("skill",  "predict", "waiting", "complex", "novelty", "creative", 
> "evaluated",  "body", "control", "stakes", "spont", "chatter", "present", 
> "reward",  "feedback", "goal"), class = "data.frame")
> >
> > itemInfo  # examine column ddd
> >
> > When I try this, column ddd has 1 fewer digits than expected. See the
> > attached screenshot.
> >
> > Why don't all the columns have the same number of digits displayed?
> >
> > --
> > Joshua N. Pritikin, Ph.D.
> > Virginia Institute for Psychiatric and Behavioral Genetics
> > Virginia Commonwealth University
> > PO Box 980126
> > 800 E Leigh St, Biotech One, Suite 1-133
> > Richmond, VA 23219
> > http://exuberant-island.surge.sh
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posti

[R] TEST message

2018-04-24 Thread Ted Harding

Apologies for disturbance! Just checking that I can
get through to r-help.
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hacked

2018-04-18 Thread Ted Harding

Not necessarily. The R-help archives are publicly accessible,
with a "sort by date" option. So if someone sets up a web=page
monitor which reports back when new messages appear there
(at the bpottom end), then their email addresses are readily
copied (subject to " at " --> "@".

Once they have the address then anything can happen!

Best wishes,
Ted (eagerly awaiting attempted seduction ... ).

On Wed, 2018-04-18 at 10:36 +, Fowler, Mark wrote:
> Seems it must be the R-list. A horde of ‘solicitation’ emails began arriving 
> about 27 minutes after I posted about not seeing any! Had left work by that 
> time, so did not encounter them until now.
> 
> From: Mark Leeds [mailto:marklee...@gmail.com]
> Sent: April 18, 2018 12:33 AM
> To: Rui Barradas
> Cc: Ding, Yuan Chun; Fowler, Mark; Luis Puerto; Peter Langfelder; R-Help ML 
> R-Project; Neotropical bat risk assessments
> Subject: Re: [R] Hacked
> 
> Hi All: I lately get a lot more spam-porn type emails lately also but I don't 
> know if they are due to me being on
> the R-list.
> 
> 
> On Tue, Apr 17, 2018 at 5:09 PM, Rui Barradas 
> mailto:ruipbarra...@sapo.pt>> wrote:
> Hello,
> 
> Nor do I, no gmail, also got spam.
> 
> Rui Barradas
> 
> On 4/17/2018 8:34 PM, Ding, Yuan Chun wrote:
> No, I do not use gmail, still got dirty spam email twice.
> 
> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Fowler, Mark
> Sent: Tuesday, April 17, 2018 12:32 PM
> To: Luis Puerto; Peter Langfelder
> Cc: R-Help ML R-Project; Neotropical bat risk assessments
> Subject: Re: [R] Hacked
> 
> [Attention: This email came from an external source. Do not open attachments 
> or click on links from unknown senders or unexpected emails.]
> 
> 
> 
> 
> 
> Just an observation. I have not seen the spam you are discussing. Possibly it 
> is specific to gmail addresses?
> 
> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Luis Puerto
> Sent: April 17, 2018 4:11 PM
> To: Peter Langfelder
> Cc: R-Help ML R-Project; Neotropical bat risk assessments
> Subject: Re: [R] Hacked
> 
> Hi!
> 
> This happened to me also! I just got a spam email just after posting and then 
> in following days I got obnoxious spam emails in my spam filter. As the 
> others, I think that there is some kind of bot subscribed to the list, but 
> also perhaps a spider or crawler monitoring the R-Help archive and getting 
> email addresses there. Nabble is a possibility too.
> On 17 Apr 2018, at 21:50, Peter Langfelder 
> mailto:peter.langfel...@gmail.com>> wrote:
> 
> I got some spam emails after my last post to the list, and the emails
> did not seem to go through r-help. The spammers may be subscribed to
> the r-help, or they get the poster emails from some of the web copies
> of this list (nabble or similar).
> 
> Peter
> 
> On Tue, Apr 17, 2018 at 11:37 AM, Ulrik Stervbo 
> mailto:ulrik.ster...@gmail.com>> wrote:
> I asked the moderators about it. This is the reply
> 
> "Other moderators have looked into this a bit and may be able to shed
> more light on it. This is a "new" tactic where the spammers appear to
> reply to the r-help post. They are not, however, going through the r-help 
> server.
> 
> It also seems that this does not happen to everyone.
> 
> I am not sure how you can automatically block the spammers.
> 
> Sorry I cannot be of more help."
> 
> --Ulrik
> 
> Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>> 
> schrieb am Di., 17. Apr.
> 2018,
> 14:59:
> Likely a spammer has joined the mailing list and is auto-replying to
> posts made to the list. Unlikely that the list itself has been
> "hacked". Agree that it is obnoxious.
> 
> On April 17, 2018 5:01:10 AM PDT, Neotropical bat risk assessments <
> neotropical.b...@gmail.com> wrote:
> Hi all,
> 
> Site has been hacked?
> Bad SPAM arriving
> 
> __
> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> --
> Sent from my phone. Please excuse my brevity.
> 
> __
> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the po

Re: [R] R help

2018-03-31 Thread Ted Harding

 A. On Sat, 2018-03-31 at 15:45 +0200, Henri Moolman wrote:
> Could you please provide help with something from R that I find rather
> puzzling? In the small program below x[1]=1, .  .  .  , x[5]=5. R also
> finds that x[1]<=5 is TRUE. Yet when you attempt to execute while, R does
> not seem to recognize the condition. Any thoughts on why this happens?
> 
> Regards
> 
> Henri Moolman
> 
> > x=c(1,2,3,4,5)
> > x[1]
> [1] 1
> > i=1
> > x[1]<=5
> [1] TRUE
> > while(x[i]<=5){
> + i=i+1
> + }
> Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed

If you run the following you should understand why (the only
change is to include "print(i)" in the loop, so you can see
what is happening):

  x=c(1,2,3,4,5)
  x[1]
# [1] 1
  i=1
  x[1]<=5
# [1] TRUE
  while(x[i]<=5){
  i = i+1 ; print(i)
  }
# [1] 3
# [1] 4
# [1] 5
# [1] 6
# Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed

So everything is fine so long as i <= 5 (i.e. x[i] <= 5),
but then the loop sets i = 6. and then:

  i
# [1] 6
  x[i]
# [1] NA
  x[i] <= 5
# [1] NA

Helpful?
Best wishes,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] substr gives empty output

2018-01-21 Thread Ted Harding

On Sun, 2018-01-21 at 09:59 +0100, Luigi Marongiu wrote:
> Dear all,
> I have a string, let's say "testing", and I would like to extract in
> sequence each letter (character) from it. But when I use substr() I only
> properly get the first character, the rest is empty (""). What am I getting
> wrong?
> For example, I have this code:
> 
> >>>
> x <- "testing"
> k <- nchar(x)
> for (i in 1:k) {
>   y <- substr(x, i, 1)
>   print(y)
> }

>From the help page
  substr(x, start, stop)
where 'start' is the position in the character vector x at which the
substring starts, and 'stop' is the position at which it stops.

Hence 'stop' must be >= 'start'; and if they are equal then you get
just the single character. That is the case in your code, when i=1;
when i > 1 then stop < start, so you get nothing. Compare with:

  x <- "testing"
  k <- nchar(x)
  for (i in 1:k) { 
y <- substr(x, i, i)  ### was: substr(x, i, 1)
print(y)
  }

[1] "t"
[1] "e"
[1] "s"
[1] "t"
[1] "i"
[1] "n"
[1] "g"

Hoping this helps,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Precision of values > 53 bits

2017-07-20 Thread Ted Harding

On Thu, 2017-07-20 at 14:33 +0200, peter dalgaard wrote:
> > On 10 Jan 2013, at 15:56 , S Ellison  wrote:
> > 
> > 
> > 
> >> I am working with large numbers and identified that R looses 
> >> precision for such high numbers.
> > Yes. R uses standard 32-bit double precision.
> 
> 
> Well, for large values of 32... such as 64.

Hmmm ... Peter, as one of your compatriots (guess who) once solemnly
said to me:

  2 plus 2 is never equal to 5 -- not even for large values of 2.

Best wishes,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Logical Operators' inconsistent Behavior

2017-05-19 Thread Ted Harding

[I unadvertently sent my reply below to Jeremie, instead of R-help.
Also, I havve had an additional thought which may clarify things
for R users].
[Original reply]:
The point about this is that (as Rolf wrote) FALSE & (anything)
is FALSE, provided logical NA is either TRUE ot FALSE but,
because the "NA" says that it is not known which it is,
it could be "anything". And, indeed, if "NA" is given the
"missing" meaning and if we assume that a missing logical value
did indeed have a value (necessarily either TRUE or FALSE),
then it follows logically that FALSE & NA = FALS£.

On the other hand, if with the "missing" interpretation of "NA"
we don't even know that it is a logical, then it might be fair
enough to say FALSE & NA = NA.
Ted.

[Additional thought]:
Testing to see what would happen if the NA were not loigical,
I put myself (not being logical ... ) on the line, facing up to R:
   X <- "Ted"
   FALSE & X
   Error in FALSE & X : 
   operations are possible only for numeric, logical or complex types
So R will refuse to deal with any variable which cannot partake in
a logical expression.

Ted.

On Fri, 2017-05-19 at 14:24 +0200, Jérémie Juste wrote:
> My apologies if I was not clear enough,
> 
> TRUE & NA could be either TRUE or FALSE and consequently is NA.
> why is   FALSE & NA = FALSE?  NA could be TRUE or FALSE, so FALSE & NA
> should be NA?
> 
> 
> On Fri, May 19, 2017 at 2:13 PM, Rolf Turner 
> wrote:
> 
> > On 20/05/17 00:01, Jérémie Juste wrote:
> >
> >> Hello,
> >>
> >> Rolf said,
> >>
> >> TRUE & FALSE is FALSE but TRUE & TRUE is TRUE, so TRUE & NA could be
> >> either TRUE or FALSE and consequently is NA.
> >>
> >> OTOH FALSE & (anything) is FALSE so FALSE & NA is FALSE.
> >>
> >>
> >> According to this logic why is
> >>
> >> FALSE & NA
> >>
> >> [1] FALSE
> >>
> >
> > Huh
> >
> >
> > cheers,
> >
> > Rolf Turner
> >
> > --
> > Technical Editor ANZJS
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> 
> 
> 
> -- 
> Jérémie Juste
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interesting quirk with fractions and rounding

2017-04-21 Thread Ted Harding

I've been following this thread with interest. A nice
collection of things to watch out for, if you don't
want the small arithmetic errors due to finite-length
digital representations of fractions to cause trouble!

However, as well as these small discrepancies, major
malfunctions can also result.

Back on Dec 22, 2013, I posted a Christmas Greetings
message to R-help:

  Season's Greetings (and great news ... )!

which starts:

  Greetings All!
  With the Festive Season fast approaching, I bring you joy
  with the news (which you will surely wish to celebrate)
  that R cannot do arithmetic!

  Usually, this is manifest in a trivial way when users report
  puzzlement that, for instance,

sqrt(pi)^2 == pi
# [1] FALSE

  which is the result of a (generally trivial) rounding or
  truncation error:

 sqrt(pi)^2 - pi
# [1] -4.440892e-16

  But for some very simple calculations R goes off its head.

And the example given is:

  Consider a sequence generated by the recurrence relation

x[n+1] = 2*x[n] if 0 <= x[n] <= 1/2
x[n+1] = 2*(1 - x[n]) if 1/2 < x[n] <= 1

  (for 0 <= x[n] <= 1).

  This has equilibrium points (x[n+1] = x[n]) at x[n] = 0
  and at x[n] = 2/3:

2/3 -> 2*(1 - 2/3) = 2/3

  It also has periodic points, e.g.

2/5 -> 4/5 -> 2/5 (period 2)
2/9 -> 4/9 -> 8/9 -> 2/9 (period 3)

  The recurrence relation can be implemented as the R function

nextx <- function(x){
  if( (0<=x)&(x<=1/2) ) {x <- 2*x} else {x <- 2*(1 - x)}
}

  Now have a look at what happens when we start at the equilibrium
  point x = 2/3:

N <- 1 ; x <- 2/3
while(x > 0){
  cat(sprintf("%i: %.9f\n",N,x))
  x <- nextx(x) ; N <- N+1
}
cat(sprintf("%i: %.9f\n",N,x))

For a while [run it and see!], this looks as though it's doing what
the arithmetic would lead us to expect: the first 24 results will all
be printed as 0.7, which looks fine as 2/3 to 9 places.

But then the "little errors" start to creep in:

  N=25: 0.6
  N=28: 0.66672 
  N=46: 0.667968750
  N=47: 0.664062500
  N=48: 0.671875000
  N=49: 0.65625
  N=50: 0.68750
  N=51: 0.62500
  N=52: 0.75000
  N=53: 0.5
  N=54: 1.0
  N=55: 0.0

  What is happening is that, each time R multiplies by 2, the binary
  representation is shifted up by one and a zero bit is introduced
  at the bottom end.

At N=53, the first binary bit of 'x' is 1, and all the rest are 0,
so now 'x' is exactly 0.5 = 1/2, hence the final two are also exact
results; 53 is the Machine$double.digits = 53 binary places.

So this normally "almost" trivial feature can, for such a simple
calculation, lead to chaos or catastrophe (in the literal technical
sense).

For more detail, including an extension of the above, look at the
original posting in the R-help archives for Dec 22, 2013:

  From: (Ted Harding) 
  Subject: [R] Season's Greetings (and great news ... )!
  Date: Sun, 22 Dec 2013 09:59:16 - (GMT)

(Apologies, but I couldn't track down the URL for this posting
in the R-help archives; there were a few follow-ups).

I gave this as an example to show that the results of the "little"
arithmetic errors (such as have recently been discussed from many
aspects) can, in certain contexts, destroy a computation.

So be careful to consider what can happen in the particular
context you are working with.

There are ways to dodge the issue -- such as using the R interface
to the 'bc' calculator, which computes arithmetic expressions in
a way which is quite different from the fixed-finite-length binary
representation and algorithms used, not only by R, but also by many
other numerical computation software suites

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) 
Date: 21-Apr-2017  Time: 22:03:15
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginner needs help with R

2017-02-07 Thread Ted Harding

Bert, your solution seems to presuppose that the programmer
knows beforehand that the leading digit in the number is "0"
(which in fact is clearly the case in Nabila Arbi's original
query). However, the sequence might arise from some process
outside of the progammer's contgrol, and may then either have
a leading 0 or not.In that case, I think Jim's solution is safer!
Best wishes,
Ted.


On 07-Feb-2017 16:02:18 Bert Gunter wrote:
> No need for sprintf(). Simply:
> 
>> paste0("DQ0",seq.int(60054,60060))
> 
> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
> [7] "DQ060060"
> 
> 
> Cheers,
> Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Mon, Feb 6, 2017 at 5:45 AM, jim holtman  wrote:
>> You need the leading zeros, and 'numerics' just give the number without
>> leading zeros.  You can use 'sprintf' for create a character string with
>> the leading zeros:
>>
>>> # this is using 'numeric' and drops leading zeros
>>>
>>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>> seq1
>> [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"
>>>
>>> # use 'sprintf' to create leading zeros
>>> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060)))
>>> seq2
>> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
>> "DQ060060"
>>>
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi 
>> wrote:
>>
>>> Dear R-Help Team!
>>>
>>> I have some trouble with R. It's probably nothing big, but I can't find a
>>> solution.
>>> My problem is the following:
>>> I am trying to download some sequences from ncbi using the ape package.
>>>
>>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>>
>>> sequences <- read.GenBank(seq1,
>>> seq.names = seq1,
>>> species.names = TRUE,
>>> gene.names = FALSE,
>>> as.character = TRUE)
>>>
>>> write.dna(sequences, "mysequences.fas", format = "fasta")
>>>
>>> My problem is, that R doesn't take the whole sequence number as "060054"
>>> but it puts it as DQ60054 (missing the zero in the beginning, which is
>>> essential).
>>>
>>> Could please tell me, how I can get R to accepting the zero in the
>>> beginning of the accession number?
>>>
>>> Thank you very much in advance and all the best!
>>>
>>> Nabila
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 07-Feb-2017  Time: 16:48:41
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram first bar wrong position

2016-12-22 Thread Ted Harding

Willam has listed the lid on the essence of the problem, which is
that in R the way that breaks (and therefore counts) in a histogram
are evaluated is an area of long grass with lurking snakes!

To get a glimpse of this, have a look at
  ?hist
and in the seaction "Arguments", look at "breaks", "freq", "right".
Also see under "Details".

and, as suggested under "See also", look at
  ?nclass.Sturges

As William suggests, if you know what claa intervals you want,
create them yourself! For your example (with N=100), look at:

   hist(y,freq=TRUE, col='red', breaks=0.5+(0:6))

or

   hist(y,freq=TRUE, col='red', breaks=0.25+(0:12)/2)

Hoping this helps!
Best wishes,
Ted.


On 22-Dec-2016 16:36:34 William Dunlap via R-help wrote:
> Looking at the return value of hist will show you what is happening:
> 
>> x <- rep(1:6,10*(6:1))
>> z <- hist(x, freq=TRUE)
>> z
> $breaks
>  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
> 
> $counts
>  [1] 60 50  0 40  0 30  0 20  0 10
> ...
> 
> The the first bin is [1-1.5], including both endpoints, while the other
> bins include only the upper endpoint.  I recommend defining your
> own breakpoints, ones don't include possible data points, as in
> 
>> print(hist(x, breaks=seq(min(x)-0.5, max(x)+0.5, by=1), freq=TRUE))
> $breaks
> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5
> 
> $counts
> [1] 60 50 40 30 20 10
> ...
> 
> S+ had a 'factor' method for hist() that did this sort of thing, but R does
> not.
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Thu, Dec 22, 2016 at 5:17 AM, itpro  wrote:
> 
>> Hi, everyone.
>>
>>
>> I stumbled upon weird histogram behaviour.
>>
>> Consider this "dice emulator":
>> Step 1: Generate uniform random array x of size N.
>> Step 2: Multiply each item by six and round to next bigger integer to get
>> numbers 1 to 6.
>> Step 3: Plot histogram.
>>
>> > x<-runif(N)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='orange')
>>
>>
>> Now what I get with N=10
>>
>> > x<-runif(10)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='green')
>>
>> At first glance looks OK.
>>
>> Now try N=100
>>
>> > x<-runif(100)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='red')
>>
>> Now first bar is not where it should be.
>> Hmm. Look again to 10 histogram... First bar is not where I want it,
>> it's only less striking due to narrow bars.
>>
>> So, first bar is always in wrong position. How do I fix it to make
>> perfectly spaced bars?
>>
>>
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 22-Dec-2016  Time: 17:23:26
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] detecting if a variable has changed

2016-06-05 Thread Ted Harding

Ah, perhaps I'm beginning to undertstand the question!
Presumably the issue is that evaluating X[X<=y] takes O(n) time,
where n = length(X), and similarly X[X>y]. 

So I suppose that one needs to be looking at some procedure for
a "bisecting" search for the largest r such that X[r] <= y, which
would then be O(log2(n)).

Perhaps not altogether straightforward to program, but straqightforward
in concept!

Apologies for misunderstanding.
Ted.

On 05-Jun-2016 18:13:15 Bert Gunter wrote:
> Nope, Ted. I asked for  a O(log(n)) solution, not an O(n) one.
> 
> I will check out the data.table package, as suggested.
> 
> -- Bert
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Sun, Jun 5, 2016 at 11:01 AM, Ted Harding 
> wrote:
>> Surely it is straightforward, since the vector (say 'X') is already sorted?
>>
>> Example (raw code with explicit example):
>>
>> set.seed(54321)
>> X <- sort(runif(10))
>> # X ## The initial sorted vector
>> # [1] 0.04941009 0.17669234 0.20913493 0.21651016 0.27439354
>> # [6] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110
>>
>> y <- runif(1)
>> # y ## The new value to be inserted
>> [1] 0.1366424
>>
>> Y <- c(X[X<=y],y,X[X>y]) ## Now insert y into X:
>> Y
>> [1] 0.04941009 0.13664239 0.17669234 0.20913493 0.21651016 0.27439354
>> [7] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110
>>
>> ## And there it is at Y[2]
>>
>> Easy to make such a function!
>> Best wishes to all,
>> Ted.
>>
>> On 05-Jun-2016 17:44:29 Neal H. Walfield wrote:
>>> On Sun, 05 Jun 2016 19:34:38 +0200,
>>> Bert Gunter wrote:
>>>> This help thread suggested a question to me:
>>>>
>>>> Is there a function in some package that efficiently (I.e. O(log(n)) )
>>>> inserts a single new element into the correct location in an
>>>> already-sorted vector? My assumption here is that doing it via sort()
>>>> is inefficient, but maybe that is incorrect. Please correct me if so.
>>>
>>> I think data.table will do this if the the column is marked
>>> appropriately.
>>>
>>>> I realize that it would be straightforward to write such a function,
>>>> but I just wondered if it already exists. My google & rseek searches
>>>> did not succeed, but maybe I used the wrong keywords.
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Sun, Jun 5, 2016 at 9:47 AM, William Dunlap via R-help
>>>>  wrote:
>>>> > I don't know what you mean by "without having to use any special
>>>> > interfaces", but "reference classes" will do what I think you want. 
>>>> > E.g.,
>>>> > the following makes a class called 'SortedNumeric' that only sorts the
>>>> > vector when you want to get its value, not when you append values.  It
>>>> > stores the sorted vector so it does not get resorted each time you ask
>>>> > for
>>>> > it.
>>>> >
>>>> > SortedNumeric <- setRefClass("sortedNumeric",
>>>> > fields = list(
>>>> > fData = "numeric",
>>>> > fIsUnsorted = "logical"),
>>>> > methods = list(
>>>> > initialize = function(Data = numeric(), isUnsorted =
>>>> > TRUE)
>>>> > {
>>>> > fData <<- Data
>>>> > stopifnot(is.logical(isUnsorted),
>>>> >   length(isUnsorted)==1,
>>>> >   !is.na(isUnsorted))
>>>> > fIsUnsorted <<- isUnsorted
>>>> > },
>>>> > getData = function() {
>>>> > if (isUnsorted) {
>>>> > fData <<- sort(

Re: [R] detecting if a variable has changed

2016-06-05 Thread Ted Harding

s and
>> >> then sort it on demand.  My idea is to use something like weak
>> >> references combined with attributes.  Consider:
>> >>
>> >>   # Initialization.
>> >>   l = as.list(1:10)
>> >>   # Note that it is sorted.
>> >>   attr(l, 'sorted') = weakref(l)
>> >>
>> >>   # Modify the list.
>> >>   l = append(l, 1:3)
>> >>
>> >>   # Check if the list is still sorted.  (I use identical here, but it
>> >>   # probably too heavy weight: I just need to compare the addresses.)
>> >>   if (! identical(l, attr(l, 'sorted'))) {
>> >> l = sort(unlist(l))
>> >> attr(l, 'sorted') = weakref(l)
>> >>   }
>> >>   # Do operation that requires sorted list.
>> >>   ...
>> >>
>> >> This is obviously a toy example.  I'm not actually sorting integers
>> >> and I may use a matrix instead of a list.
>> >>
>> >> I've read:
>> >>
>> >>   http://www.hep.by/gnu/r-patched/r-exts/R-exts_122.html
>> >>   http://homepage.stat.uiowa.edu/~luke/R/references/weakfinex.html
>> >>
>> >> As far as I can tell, weakrefs are only available via the C API.  Is
>> >> there a way to do what I want in R without resorting to C code?  Is
>> >> what I want to do better achieved using something other than weakrefs?
>> >>
>> >> Thanks!
>> >>
>> >> :) Neal
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 05-Jun-2016  Time: 19:01:10
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LaplacesDemon package installation

2016-02-04 Thread Ted Harding

See at [***] below.

On 04-Feb-2016 21:23:05 Rolf Turner wrote:
> 
> (1) You might get better mileage asking this on the r-sig-mac list.
> 
> (2) The phenomena you describe are puzzling and are beyond my capacity 
> to explain.  Perhaps someone else will be able to enlighten you.
> 
> (3) Out of idle curiosity I went to the github site and downloaded the
> zip file of the package.  (I could not see a *.tag.gz file, but perhaps 
> I just don't understand how github works.  Actually, there's no 
> "perhaps" about it!)
> 
> I unzipped the download and then did
> 
>  R CMD build LaplacesDemon-master
> 
> in the appropriate directory.  This created a file
> 
>  LaplacesDemon_15.03.19.tar.gz
> 
> Note that the version number seems to be as you require.
> 
> I then used your install.packages syntax, and the package installed 
> seamlessly.  It also loaded seamlessly.
> 
> So I don't know why the computer gods are picking on you.
>
[***] 
> Note that I am not working on a Mac, but rather running Linux (as do all 
> civilized human beings! :-) )

Might this be yet another candidate for a Fortune today?

Ted.

> cheers,
> 
> Rolf Turner
> 
> -- 
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> On 05/02/16 06:42, lluis.hurt...@uv.es wrote:
>> Dear all,
>>
>> I've recently changed my Mac and I am trying to reinstall my commonly
>> used R-packages. I'm having troubles with a package called
>> LaplacesDemon.
>>
>> This package is no more in the CRAN list and the developers web page
>> (http://www.bayesian-inference.com/software) is out for more than
>> half a year. Old versions of the package can still be found in tar.gz
>> in
>>
>> https://cran.r-project.org/src/contrib/Archive/LaplacesDemon/
>>
>> and in github
>>
>> https://github.com/ecbrown/LaplacesDemon
>>
>> Last version is LaplacesDemon_13.03.04.tar.gz, but I was able to get
>> version LaplacesDemon_15.03.19.tar.gz time ago (can't find it
>> anymore).
>>
>> I have tried to install this packages from source in my computer
>> using
>>
>>> install.packages("/Users/.../LaplacesDemon_15.03.19.tar.gz", repos
>>> = NULL, type="source")
>>
>> answer:
>>
>>> Warning: invalid package 'Users/.../LaplacesDemon_15.03.19.tar.gzâ
>>> Error: ERROR: no packages specified Warning message: In
>>> install.packages("Users/.../LaplacesDemon_15.03.19.tar.gz",  :
>> installation of package âUsers/.../LaplacesDemon_15.03.19.tar.gzâ had
>> non-zero exit status
>>
>> I also tried the 'Packages & Data' menu in R, selecting the source
>> file or the directory from Finder and I got this message:
>>
>>> * installing *source* package âLaplacesDemonâ ... ** R ** data **
>>> inst ** byte-compile and prepare package for lazy loading ** help
>>> *** installing help indices ** building package indices **
>>> installing vignettes ** testing if installed package can be loaded
>>> * DONE (LaplacesDemon)
>>
>> but
>>
>>> library(LaplacesDemon)
>> Error in library(LaplacesDemon) : there is no package called
>> âLaplacesDemonâ
>>
>> Finally I tried
>>
>>> install.packages("devtools") library(devtools)
>>> install_github("ecbrown/LaplacesDemon")
>>
>> but I am not able to install devtools (for similar reasons). So my
>> questions are:
>>
>> -Do anyone knows how to install this packages in my Mac? -Do anyone
>> knows were can I find the information previously content in
>> http://www.bayesian-inference.com/software?
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 04-Feb-2016  Time: 22:03:31
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R project and the TPP

2016-02-04 Thread Ted Harding

Saludos José!
Could you please give a summary of the relevant parts of TPP
that might affect the use of R? I have looked up TPP on Wikipedia
without beginning to understand what it might imply for the use of R.
Best wishes,
Ted.

On 04-Feb-2016 14:43:29 JosÃ© Bustos wrote:
> Hi everyone,
> 
> I have a question regarding the use R software under the new TPP laws
> adopted by some governments in the region. Who know how this new agreements
> will affect researchers and the R community?
> 
> Hope some of you knows better and can give ideas about it.
> 
> saludos,
> JosÃ©

---------
E-Mail: (Ted Harding) 
Date: 04-Feb-2016  Time: 22:01:42
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Has R-help changed reply-to policy?

2016-02-04 Thread Ted Harding

Steve, I'm inclined to suspect that something *has* changed at your end
(or along the line between you and R-help). In replying to your message,
selecting "include all recipients" (i.e. reply to all), the result was:
  To: S Ellison 
  Cc: r-help@r-project.org
just as it always has been! So no change that *I* can perceive at the
R-help end.

Hoping this is useful,
Ted.

On 04-Feb-2016 16:33:29 S Ellison wrote:
> Apologies if I've missed a post, but have the default treatment of posts and
> reply-to changed on R-Help of late?
> 
> I ask because as of today, my email client now only lists the OP email when
> replying to an R-help message, even with a reply-all, so the default reply is
> not to the list.
> I also noticed a few months back that I no longer see my own posts, other
> than as a receipt notification, and tinkering with my account defaults
> doesn't change that.
> 
> I don't _think_ things have changed at my end ...
> 
> Steve Ellison

-
E-Mail: (Ted Harding) 
Date: 04-Feb-2016  Time: 17:03:37
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help mailing list activity / R-not-help?

2016-01-25 Thread Ted Harding

My feelings exactly! (And since quite some time ago).
Ted.

On 25-Jan-2016 12:23:16 Fowler, Mark wrote:
> I'm glad to see the issue of negative feedback addressed. I can especially
> relate to the 'cringe' feeling when reading some authoritarian backhand to a
> new user. We do see a number of obviously inappropriate or overly lazy
> postings, but I encounter far more postings where I don't feel competent to
> judge their merit. It might be better to simply disregard a posting one does
> not like for some reason. It might also be worthwhile to actively counter
> negative feedback when we experience that 'cringing' moment. I'm not thinking
> to foster contention, but simply to provide some tangible reassurance to new
> users, and not just the ones invoking the negative feedback, that a
> particular respondent may not represent the perspective of the list.
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
> Friendly
> Sent: January 24, 2016 5:43 PM
> To: Jean-Luc Dupouey; r-help@r-project.org
> Subject: Re: [R] R-help mailing list activity / R-not-help?
> 
> 
> On 1/23/2016 7:28 AM, Jean-Luc Dupouey wrote:
>> Dear members,
>>
>> Not a technical question:
> But one worth raising...
>>
>> The number of threads in this mailing list, following a long period of 
>> increase, has been regularly and strongly decreasing since 2010, 
>> passing from more than 40K threads to less than 11K threads last year. 
>> The trend is similar for most of the "ancient" mailing lists of the
>> R-project.
> [snip ...]
>>
>> I hope it is the wright place to ask this question. Thanks in advance,
>>
> 
> In addition to the other replies, there is another trend I've seen that has
> actively worked to suppress discussion on R-help and move it elsewhere. The
> general things:
> - R-help was too unwieldy and so it was a good idea to hive-off specialized
> topics to various sub lists, R-SIG-Mac, R-SIG-Geo, etc.
> - Many people posted badly-formed questions to R-help, and so it was a good
> idea to develop and refer to the posting guide to mitigate the number of
> purely junk postings.
> 
> 
> Yet, the trend I've seen is one of increasing **R-not-help**, in that there
> are many posts, often by new R users who get replies that not infrequently
> range from just mildly off-putting to actively hostile:
> 
> - Is this homework? We don't do homework (sometimes false alarms, where the
> OP has to reply to say it is not)
> - Didn't you bother to do your homework, RTFM, or Google?
> - This is off-topic because XXX (e.g., it is not strictly an R programming
> question).
> - You asked about doing XXX, but this is a stupid thing to want to do.
> - Don't ask here; you need to talk to a statistical consultant.
> 
> I find this sad in a public mailing list sent to all R-help subscribers and I
> sometimes cringe when I read replies to people who were actually trying to
> get help with some R-related problem, but expressed it badly, didn't know
> exactly what to ask for, or how to format it, or somehow motivated a
> frequent-replier to publicly dis the OP.
> 
> On the other hand, I still see a spirit of great generosity among some people
> who frequently reply to R-help, taking a possibly badly posed or
> ill-formatted question, and going to some lengths to provide a a helpful
> answer of some sort.  I applaud those who take the time and effort to do
> this.
> 
> I use R in a number of my courses, and used to advise students to post to
> R-help for general programming questions (not just homework) they couldn't
> solve. I don't do this any more, because several of them reported a negative
> experience.
> 
> In contrast, in the Stackexchange model, there are numerous sublists
> cross-classified by their tags.  If I have a specific knitr, ggplot2, LaTeX,
> or statistical modeling question, I'm now more likely to post it there, and
> the worst that can happen is that no one "upvotes" it or someone (helpfully)
> marks it as a duplicate of a similar question.
> But comments there are not propagated to all subscribers, and those who reply
> helpfully, can see their solutions accepted or not, or commented on in that
> specific topic.
> 
> Perhaps one solution would be to create a new "R-not-help" list where, as in
> a Monty Python skit, people could be directed there to be insulted and all
> these unhelpful replies could be sent.
> 
> A milder alternative is to encourage some R-help subscribers to click the
> "Don't send" or "Save" button and think bett

Re: [R] If else

2015-10-31 Thread Ted Harding

>>> >numeric
>>> >(in this case, sex) will be a response variable.  Some records have
>>> >missing
>>> >observation on sex and it is blank.
>>> > id  sex
>>> >  1
>>> >  2
>>> >  3  M
>>> >  4  F
>>> >  5  M
>>> >  6  F
>>> >  7  F
>>> >
>>> >I am reading the data like this
>>> >
>>> >mydata <- read.csv(header=TRUE, text=', sep=", ")
>>> > id  sex
>>> >  1   NA
>>> >  2  NA
>>> >  3  M
>>> >  4  F
>>> >  5  M
>>> >  6  F
>>> >  7  F
>>> >
>>> >The  data set is huge   (>250,000)
>>> >
>>> >
>>> >I want the output like this
>>> >
>>> > id  sexsex1
>>> >  1   NA0
>>> >  2  NA 0
>>> >  3  M   1
>>> >  4  F   2
>>> >  5  M  1
>>> >  6  F   2
>>> >  7  F   2
>>> >
>>> >Thank you in advance
>>> >
>>> >
>>> >On Sat, Oct 31, 2015 at 5:59 AM, John Kane 
>>wrote:
>>> >
>>> >> In line.
>>> >>
>>> >> John Kane
>>> >> Kingston ON Canada
>>> >>
>>> >>
>>> >> > -Original Message-
>>> >> > From: valkr...@gmail.com
>>> >> > Sent: Fri, 30 Oct 2015 20:40:03 -0500
>>> >> > To: istaz...@gmail.com
>>> >> > Subject: Re: [R] If else
>>> >> >
>>> >> > I am trying to change the mydata$sex  from character to numeric
>>> >>
>>> >> Why?
>>> >>  As Ista (mydata$confusingWillCauseProblemsLater) has pointed out
>>> >this is
>>> >> a very unusual thing to do in R.
>>> >>
>>> >> Is there a very specific reason for doing this in your analysis.
>>> >> Otherwise it may better to leave the coding as NA. Some of the
>>data
>>> >mungers
>>> >> here may be able to suggest which is the best strategy in R.
>>> >>
>>> >> R is 'weird' compared to more mundane stats packages such as SAS
>>or
>>> >SPSS
>>> >> and common techniques that one would use with them often are not
>>> >> appropriate in R.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> > I want teh out put like
>>> >> >id  sex
>>> >> >   1  NA   0
>>> >> >   2  NA   0
>>> >> >   3  M 1
>>> >> >   4  F 2
>>> >> >   5  M1
>>> >> >   6  F 2
>>> >> >   7  F2
>>> >> >
>>> >> > mydata$sex1 <- 0
>>> >> > if(mydata$sex =="M " ){
>>> >> >   mydata$sex1<-1
>>> >> > } else {
>>> >> >   mydata$sex1<-2
>>> >> > }
>>> >> >
>>> >> > mydata$sex1
>>> >> >
>>> >> > Warning message:In if (mydata$sex == "M ") { :
>>> >> >   the condition has length > 1 and only the first element will
>>be
>>> >> > used> mydata$sex1[1] 2 2 2 2 2 2 2 2
>>> >> >
>>> >> >>
>>> >> >
>>> >> >
>>> >> > On Fri, Oct 30, 2015 at 8:28 PM, Ista Zahn 
>>> >wrote:
>>> >> >
>>> >> >> Using numeric for missing sounds like asking for trouble. But
>>if
>>> >you
>>> >> >> must, something like
>>> >> >>
>>> >> >> mydata$confusingWillCauseProblemsLater <-
>>> >> >>   ifelse(
>>> >> >> is.na(mydata$sex),
>>> >> >> 0,
>>> >> >> as.numeric(factor(mydata$sex,
>>> >> >>   levels = c("M", "F"
>>> >> >>
>>> >> >> should do it.
>>> >> >>
>>> >> >> Best,
>>> >> >> Ista
>>> >> >>
>>> >> >>

Re: [R] Subscription request

2015-10-14 Thread Ted Harding

On 14-Oct-2015 15:19:06 Manish Sindagi wrote:
> Hi,
> 
> I have a few R programming related questions that i wanted to ask.
> Can you please accept my subscription request.
> 
> Regards,
> Manish.

Visit the R-help info web page:

  https://stat.ethz.ch/mailman/listinfo/r-help

Towards the bottom of this page is a section "Subscribing to R-help".
Follow the instructions in this section, and it should work!

Best wishes,
Ted.

---------
E-Mail: (Ted Harding) 
Date: 14-Oct-2015  Time: 19:34:55
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beta distribution approximate to Normal distribution

2015-09-15 Thread Ted Harding

Using non-central chi-squared (especially with df=1) is unlikely
to generate random numbers anywhere near a Normal distribution
(see below).

And "rchisq(100, df=1, ncp=u/a)" won't work anyway with u<0,
since ncp must be >= 0 (if < 0 then all are NA).

Better to shoot straight for the target (truncated Normal), though
several shots are likely to be required! For example (code which
spells it out), taking u=3 and a=2:

  n <- 100
  u <- 3 ; a <- 2
  x <- NULL
  N <- length(x)
  while(N < n){
x <- c(x,rnorm(n,mean=u,sd=a))
x <- x[x>0]
N <- length(x)
  }
  x <- x[1:n]

Comparison with non-central chi-squared:

  y <- rchisq(100, df=1, ncp=u/a)
  hist(x)
  hist(y)

On 15-Sep-2015 13:26:44 jlu...@ria.buffalo.edu wrote:
> Your question makes no sense as stated.  However, guessing at what you 
> want, you should  perhaps consider the non-central chi-square density with 
> 1 df and ncp = u/a, i.e,
> 
> rchisq(100, df=1, ncp=u/a)
> 
> Joe
> Joseph F. Lucke, PhD
> Senior Statistician
> Research Institute on Addictions
> University at Buffalo
> State University of New York
> 1021 Main Street
> Buffalo, NY  14203-1016
> 
> Chien-Pang Chin  
> Sent by: "R-help" 
> 09/15/2015 06:58 AM
> 
> To
> "r-help@r-project.org" , 
> 
> Subject
> [R] Beta distribution approximate to Normal distribution
> 
> Hi,
> I need to generate 1000 numbers from N(u, a^2), however I don't
> want to include 0 and negative values. How can I use beta distribution
> approximate to N(u, a^2) in R.
> 
> Thx for help

-
E-Mail: (Ted Harding) 
Date: 15-Sep-2015  Time: 16:12:35
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance is different in R vs. Excel?

2015-02-09 Thread Ted Harding

[See at end]

On 09-Feb-2015 21:45:11 David L Carlson wrote:
> Time for a new version of Excel? I cannot duplicate your results in Excel
> 2013.
> 
> R:
>> apply(dat, 2, var)
> [1] 21290.80 24748.75
> 
> Excel 2013:
> =VAR.S(A2:A21)   =VAR.S(B2:B21)
> 21290.8  24748.74737
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Fetter
> Sent: Monday, February 9, 2015 3:33 PM
> To: r-help@r-project.org
> Subject: [R] Variance is different in R vs. Excel?
> 
> Hello everyone, I have a simple question. when I use the var() function in
> R to find a variance, it differs greatly from the variance found in excel
> using the =VAR.S function. Any explanations on what those two functions are
> actually doing?
> 
> Here is the data and the results:
> 
> dat<-matrix(c(402,908,553,522,627,1040,756,679,806,711,713,734,683,790,597,872
> ,476,1026,423,476,419,591,376,640,550,601,588,499,646,693,351,730,632,707,779,
> 838,814,771,533,818),
> nrow=20, ncol=2, byrow=T)
> 
> var(dat[,1])
>#21290.8
> 
> var(dat[,2])
>#24748.75
> 
>#in Excel, the variance of dat[,1] = 44763.91; for dat[,2] = 52034.2
> 
> Thanks,
> Karl

I suspect that something has happened to the reading-in of the
data into Excel. (I don't know much about Excel, and that's because
I don't want to ... ).

The ratio of the variances of the two datasets in R is:

  var(dat[,2])/var(dat[,1])
  # [1] 1.162415

while the ratio of th results from Excel is:

  52034.2/44763.91
  # [1] 1.162414

so they are almost identical. 

So it is as if Excel was evaluating the variances for data which
are

  sqrt(44763.91/var(dat[,1]))
  # [1] 1.45
  sqrt(52034.2/var(dat[,2]))
  # [1] 1.44

times the data used by R. So maybe there's a "nasty" lurking somewhere
in the spreadsheet? (Excel is notorious for planting things invisibly
in its spreadsheets which lead to messed-up results for no apparent
reasion ... ).

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 09-Feb-2015  Time: 22:15:44
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the less-than-minus gotcha

2015-02-02 Thread Ted Harding

On 02-Feb-2015 11:58:10 Rolf Turner wrote:
> 
> On 02/02/15 14:26, Steve Taylor wrote:
> 
>> All the more reason to use = instead of <-
> 
> Couldn't agree less, Steve. The "<-" should be used for assignment. The 
> "=" sign should be reserved for handling function arguments in the 
> "name=value" form.  Doing anything else invites confusion and 
> occasionally chaos.
> 
> Lots of examples have been given in the past involving syntax of the 
> form foo(x = y) and foo(x <- y).
> 
> cheers,
> Rolf
> -- 
> Rolf Turner

As well as agreering with Rolf, it should also be said that Steve's
advice "All the more reason to use = instead of <-" is not applicable
in this context, which was:

  which( frame$var>4 )   # no problem
  which( frame$var<-4 )  # huge problem: frame$var is destroyed

There is no place for an "=" here!

What does not seems to have been mentioned so far is that this
kind of thing can be safely wrapped in parentheses:

  which( frame$var>4 )# no problem oper
  which( frame$var<(-4) ) # [again no problem]

Personally, I'm prone to using parentheses if I have any feeling
that a "gotcha" may be lurking -- not only in the distinction
between "var<-4" and "var< -4", but also to be confident that,
for instance, my intentions are not being over-ridden by operator
precedence rules.

Some people object to code "clutter" from parentheses that could
be more simply replaced (e.g. "var< -4" instead of "var<(-4)"),
but parentheses ensure that it's right and also make it clear
when one reads it.

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) 
Date: 02-Feb-2015  Time: 12:43:51
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate the median age interval

2015-01-12 Thread Ted Harding

Sorry, a typo in my reply below. See at "###".

On 12-Jan-2015 11:12:43 Ted Harding wrote:
> On 12-Jan-2015 10:32:41 Erik B Svensson wrote:
>> Hello
>> I've got a problem I don't know how to solve. I have got a dataset that
>> contains age intervals (age groups) of people and the number of persons in
>> each age group each year (y1994-y1996). The number of persons varies each
>> year. I only have access to the age intervals, not the age of each person,
>> which would make things easier.
>> 
>> I want to know the median age interval (not the median number) for each
>> year. Let's say that in y1994 23 corresponds to the median age interval
>> "45-54", I want to "45-54" as a result. How is that done?
>> 
>> This is the sample dataset:
>> 
> agegrp <-
> c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74",
>   "75-84","84-")
>   y1994 <- c(0,5,7,9,25,44,23,32,40,36,8)
>   y1995 <- c(2,4,1,7,20,39,32,18,21,23,5)
>   y1996 <- c(1,3,1,4,22,37,41,24,24,26,8)
> 
>> I look forward to your response
>> 
>> Best regards,
>> Erik Svensson
> 
> In principle, this is straightforward. But in ##practice you may
> need to be careful about how to deal with borderline cases -- and
> about what you mean by "median age interval".
> The underlying idea is based on:
> 
>  cumsum(y1994)/sum(y1994)
>  # [1] 0. 0.02183406 0.05240175 0.09170306 0.20087336
>  # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.
> 
> Thus age intervals 1-7 ("<1" - "45-64") contain less that 50%
> (0.49344978...), though "45-64" almost gets there. However,
> age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence
> the median age is within "49-64".
### Should be:
  age groups 1-8 ("<1" - 55-64") contain more than 50%. Hence
  the median age is within "55-64".

> Implementing the above as a procedure:
> 
>   agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)]
>   # [1] "55-64"
> 
> Note that the "obvious solution":
> 
>   agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))]
>   # [1] "45-54"
> 
> gives an incorrect answer, since with these data it returns a group
> whose maximum age is below the median. This is because the "<=" is
> satisfied by "<" also.
> 
> Hoping this helps!
> Ted.
> 
> -
> E-Mail: (Ted Harding) 
> Date: 12-Jan-2015  Time: 11:12:39
> This message was sent by XFMail
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 12-Jan-2015  Time: 11:21:11
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate the median age interval

2015-01-12 Thread Ted Harding

On 12-Jan-2015 10:32:41 Erik B Svensson wrote:
> Hello
> I've got a problem I don't know how to solve. I have got a dataset that
> contains age intervals (age groups) of people and the number of persons in
> each age group each year (y1994-y1996). The number of persons varies each
> year. I only have access to the age intervals, not the age of each person,
> which would make things easier.
> 
> I want to know the median age interval (not the median number) for each
> year. Let's say that in y1994 23 corresponds to the median age interval
> "45-54", I want to "45-54" as a result. How is that done?
> 
> This is the sample dataset:
> 
agegrp <-
c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74",
  "75-84","84-")
  y1994 <- c(0,5,7,9,25,44,23,32,40,36,8)
  y1995 <- c(2,4,1,7,20,39,32,18,21,23,5)
  y1996 <- c(1,3,1,4,22,37,41,24,24,26,8)

> I look forward to your response
> 
> Best regards,
> Erik Svensson

In principle, this is straightforward. But in practice you may
need to be careful about how to deal with borderline cases -- and
about what you mean by "median age interval".
The underlying idea is based on:

 cumsum(y1994)/sum(y1994)
 # [1] 0. 0.02183406 0.05240175 0.09170306 0.20087336
 # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.

Thus age intervals 1-7 ("<1" - "45-64") contain less that 50%
(0.49344978...), though "45-64" almost gets there. However,
age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence
the median age is within "49-64".

Implementing the above as a procedure:

  agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)]
  # [1] "55-64"

Note that the "obvious solution":

  agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))]
  # [1] "45-54"

gives an incorrect answer, since with these data it returns a group
whose maximum age is below the median. This is because the "<=" is
satisfied by "<" also.

Hoping this helps!
Ted.

-
E-Mail: (Ted Harding) 
Date: 12-Jan-2015  Time: 11:12:39
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] diff question

2015-01-11 Thread Ted Harding

I should have added an extra line to the code below, to complete
the picture. Here it is (see below line "##".
Ted.

On 11-Jan-2015 08:48:06 Ted Harding wrote:
> Troels, this is due to the usual tiny difference between numbers
> as computed by R and the numbers that you think they are!

  tt  <- seq(0,20,by=0.02)
  dtt <- diff(tt)
  length(dtt)
  # [1] 1000
  r02 <- rep(0.02,1000)
  unique(r02 - dtt)
  # [1]  0.00e+00  3.469447e-18 -3.469447e-18  1.040834e-17
  # [5] -1.734723e-17  3.816392e-17  9.367507e-17  2.046974e-16
  # [9]  4.267420e-16 -4.614364e-16 -1.349615e-15 -3.125972e-15
  ##
  sum(dtt != 0.02)
  # [1] 998

So only 2 values among the 1000 in diff(tt) are exactly equal to
[R's representation of] 0.2!

> Hoping this helps!
> Ted.
> 
> On 11-Jan-2015 08:29:26 Troels Ring wrote:
>> R version 3.1.1 (2014-07-10) -- "Sock it to Me"
>> Copyright (C) 2014 The R Foundation for Statistical Computing
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> 
>> Dear friends - I have a small problem with diff (I guess)
>> I made a sequence with fixed interval between consecutive elements - and 
>> hence thought the  diff would be as specified
>> but had a vector with apparently identical 12 elements returned from diff
>> tt <- seq(0,20,by=0.02)
>> unique(diff(tt)) #[1] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 
>> 0.02 0.02
>> Trying to see if these elements in diff were duplicated
>> duplicated(diff(tt))
>>#[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE and from
>> sum(duplicated(diff(tt)))
>> [1] 988
>> saw that 12 of the elements in duplicated(diff(tt)) were FALSE. Would it 
>> be expected that the first was FALSE and the rest TRUE?
>>|duplicated()|determines which elements of a vector or data frame are 
>> duplicates of elements with smaller subscripts, and returns a logical 
>> vector indicating which elements (rows) are duplicates.
>> 
>> All best wishes
>> Troels
>> Aalborg, Denmark
>> 
>>   [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -
> E-Mail: (Ted Harding) 
> Date: 11-Jan-2015  Time: 08:48:03
> This message was sent by XFMail
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 11-Jan-2015  Time: 11:41:27
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] diff question

2015-01-11 Thread Ted Harding

Troels, this is due to the usual tiny difference between numbers
as computed by R and the numbers that you think they are!

  tt  <- seq(0,20,by=0.02)
  dtt <- diff(tt)
  length(dtt)
  # [1] 1000
  r02 <- rep(0.02,1000)
  unique(r02 - dtt)
  # [1]  0.00e+00  3.469447e-18 -3.469447e-18  1.040834e-17
  # [5] -1.734723e-17  3.816392e-17  9.367507e-17  2.046974e-16 
  # [9]  4.267420e-16 -4.614364e-16 -1.349615e-15 -3.125972e-15

Hoping this helps!
Ted.

On 11-Jan-2015 08:29:26 Troels Ring wrote:
> R version 3.1.1 (2014-07-10) -- "Sock it to Me"
> Copyright (C) 2014 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> Dear friends - I have a small problem with diff (I guess)
> I made a sequence with fixed interval between consecutive elements - and 
> hence thought the  diff would be as specified
> but had a vector with apparently identical 12 elements returned from diff
> tt <- seq(0,20,by=0.02)
> unique(diff(tt)) #[1] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 
> 0.02 0.02
> Trying to see if these elements in diff were duplicated
> duplicated(diff(tt))
>#[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE and from
> sum(duplicated(diff(tt)))
> [1] 988
> saw that 12 of the elements in duplicated(diff(tt)) were FALSE. Would it 
> be expected that the first was FALSE and the rest TRUE?
>|duplicated()|determines which elements of a vector or data frame are 
> duplicates of elements with smaller subscripts, and returns a logical 
> vector indicating which elements (rows) are duplicates.
> 
> All best wishes
> Troels
> Aalborg, Denmark
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 11-Jan-2015  Time: 08:48:03
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding down with as.integer

2015-01-01 Thread Ted Harding

;> to the nearest integer, why not use round() (without the as.integer
>>>> afterwards)?  Or if you really do want an integer, why add 0.1 or 0.0001,
>>>> why not add 0.5 before calling as.integer()?  This is the classical way to
>>>> implement round().
>>>>
>>>> To state the problem clearly, I'd like to know what result is expected for
>>>> any real number x.  Since R's numeric type only approximates the real
>>>> numbers we might not be able to get a perfect match, but at least we could
>>>> quantify how close we get.  Or is the input really character data?  The
>>>> original post mentioned reading numbers from a text file.
>>>
>>>
>>>
>>> Maybe you'd like to know what I'm really doing.  I have 1600 text files
>>> each
>>> with up to 16,000 lines with 3100 numbers per line, delimited by a single
>>> space.  The numbers are between 0 and 2, inclusive, and they have up to
>>> three digits to the right of the decimal.  Every possible value in that
>>> range will occur in the data.  Some examples numbers: 0 1 2 0.325 1.12 1.9.
>>> I want to multiply by 1000 and store them as 16-bit integers (uint16).
>>>
>>> I've been reading in the data like so:
>>>
>>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>>
>>>
>>> At first I tried making the integers like so:
>>>
>>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm
>>>
>>>user  system elapsed
>>>   0.187   0.387   0.574
>>>
>>> I decided I should compare with the result I got using round():
>>>
>>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ;
>>>> proc.time()-ptm
>>>
>>>user  system elapsed
>>>   1.595   0.757   2.352
>>>
>>> It is a curious fact that only a few of the values from 0 to 2000 disagree
>>> between the two methods:
>>>
>>>> table( ints2[ ints2 != ints ] )
>>>
>>>
>>>  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
>>> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112
>>>
>>> I understand that it's all about the problem of representing digital
>>> numbers
>>> in binary, but I still find some of the results a little surprising, like
>>> that list of numbers from the table() output.  For another example:
>>>
>>>> 1000+3 - 1000*(1+3/1000)
>>>
>>> [1] 1.136868e-13
>>>
>>>> 3 - 1000*(0+3/1000)
>>>
>>> [1] 0
>>>
>>>> 2000+3 - 1000*(2+3/1000)
>>>
>>> [1] 0
>>>
>>> See what I mean?  So there is something special about the numbers around
>>> 1000.
>>>
>>> Back to the quesion at hand:  I can avoid use of round() and speed things
>>> up
>>> a little bit by just adding a small number after multiplying by 1000:
>>>
>>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ;
>>>> proc.time()-ptm
>>>
>>>user  system elapsed
>>>   0.224   0.594   0.818
>>>
>>> You point out that adding .5 makes sense.  That is probably a better idea
>>> and I should take that approach under most conditions, but in this case we
>>> can add anything between 2e-13 and about 0.999 and always get the
>>> same answer.  We also have to remember that if a number might be negative
>>> (not a problem for me in this application), we need to subtract 0.5 instead
>>> of adding it.
>>>
>>> Anyway, right now this is what I'm actually doing:
>>>
>>>> con <- file( paste0(FILE, ".uint16"), "wb" )
>>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE,
>>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm
>>>
>>> Read 48013406 items
>>>user  system elapsed
>>>  10.263   0.733  10.991
>>>>
>>>> close(con)
>>>
>>>
>>> By the way, writeBin() is something that I learned about here, from you,
>>> Duncan.  Thanks for that, too.
>>>
>>> Mike
>>>
>>> --
>>> Michael B. Miller, Ph.D.
>>> University of Minnesota
>>> http://scholar.google.com/citations?user=EV_phq4J
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 01-Jan-2015  Time: 21:28:22
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding

Yes, your basic reasoning is correct. In general, the observed variables
carry information about the variables with missing values, so (in some
way) the missing values can be replaced with estimates ("imputations")
and the standard regression method will then work as though the
replacements were there is the first place. To incorporate the inevitable
uncertainty about what the missing values really were, one approach
("multiple imputation") is to do the replacement many times over,
sampling the replacement values from a posterior distribution estimated
from the non-missing data. There are other approaches.

This is where the "many questions" kick in! I don't have time at the
moment, to go into further detail (there's a lot of it, and several
R packages which deal with missing data in different ways), but I hope
that someone can meanwhile point you in the right direction.

With best wishes,
Ted.

On 19-Dec-2014 11:17:27 aoife doherty wrote:
> Many thanks, I appreciate the response.
> 
> When I convert the missing values to NA and run the cox model as described
> in previous post,  the cox model seems to remove all of the rows with a
> missing value (as the number of rows "n" in the cox output after I
> completely remove any row with missing data is the same as the number of
> rows "n" in the cox output after I change the missing values to NA).
> 
> What I had been hoping to do is not completely remove a row with missing
> data for a co-variable, but rather somehow censor or estimate a value for
> the missing value?
> 
> In reality, I have ~600 people with survival data and say 6 variables
> attached to them. After I incorporate a 7th variable (for which the
> information isn't available for every individual), I have 400 people left.
> Since I still have survival data and almost all of the information for the
> other 200 people (the only thing missing is information about that 7th
> variable), it seems a waste to remove all of the survival data for 200
> people over one co-variate. So I was hoping instead of completely removing
> the rows, to just somehow acknowledge that the data for this particular
> co-variate is missing in the model but not completely remove the row? This
> is more what I was hoping someone would know if it's possible to
> incorporate into the model I described above?
> 
> Thanks
> 
> 
> 
> On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding 
> wrote:
>>
>> Hi Aoife,
>> I think that if you simply replace each "*" in the data file
>> with "NA", then it should work ("NA" is usually interpreted
>> as "missing" for those functions for which missingness is
>> relevant). How you subsequently deal with records which have
>> missing values is another question (or many questions ... ).
>>
>> So your data should look like:
>>
>> V1   V2  V3   Survival   Event
>> ann  13  WTHomo   41
>> ben  20  NA   51
>> tom  40  Variant  61
>>
>> Hoping this helps,
>> Ted.
>>
>> On 19-Dec-2014 10:12:00 aoife doherty wrote:
>> > Hi all,
>> >
>> > I have a data set like this:
>> >
>> > Test.cox file:
>> >
>> > V1V2 V3   Survival   Event
>> > ann  13  WTHomo   41
>> > ben  20  *51
>> > tom  40  Variant  61
>> >
>> >
>> > where "*" indicates that I don't know what the value is for V3 for Ben.
>> >
>> > I've set up a Cox model to run like this:
>> >
>> >#!/usr/bin/Rscript
>> > library(bdsmatrix)
>> > library(kinship2)
>> > library(survival)
>> > library(coxme)
>> > death.dat <- read.table("Test.cox",header=T)
>> > deathdat.kmat <-2*with(death.dat,makekinship(famid,ID,faid,moid))
>> > sink("Test.cox.R.Output")
>> > Model <- coxme(Surv(Survival,Event)~ strata(factor(V1)) +
>> > strata(factor(V2)) + factor(V3)) +
>> > (1|ID),data=death.dat,varlist=deathdat.kmat)
>> > Model
>> > sink()
>> >
>> >
>> >
>> > As you can see from the Test.cox file, I have a missing value "*". How
>> and
>> > where do I tell the R script "treat * as a missing variable". If I can't
>> > incorporate missing values into the model, I assume the alternativ

Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding

Hi Aoife,
I think that if you simply replace each "*" in the data file
with "NA", then it should work ("NA" is usually interpreted
as "missing" for those functions for which missingness is
relevant). How you subsequently deal with records which have
missing values is another question (or many questions ... ).

So your data should look like:

V1   V2  V3   Survival   Event
ann  13  WTHomo   41
ben  20  NA   51
tom  40  Variant  61

Hoping this helps,
Ted.

On 19-Dec-2014 10:12:00 aoife doherty wrote:
> Hi all,
> 
> I have a data set like this:
> 
> Test.cox file:
> 
> V1V2 V3   Survival   Event
> ann  13  WTHomo   41
> ben  20  *51
> tom  40  Variant  61
> 
> 
> where "*" indicates that I don't know what the value is for V3 for Ben.
> 
> I've set up a Cox model to run like this:
> 
>#!/usr/bin/Rscript
> library(bdsmatrix)
> library(kinship2)
> library(survival)
> library(coxme)
> death.dat <- read.table("Test.cox",header=T)
> deathdat.kmat <-2*with(death.dat,makekinship(famid,ID,faid,moid))
> sink("Test.cox.R.Output")
> Model <- coxme(Surv(Survival,Event)~ strata(factor(V1)) +
> strata(factor(V2)) + factor(V3)) +
> (1|ID),data=death.dat,varlist=deathdat.kmat)
> Model
> sink()
> 
> 
> 
> As you can see from the Test.cox file, I have a missing value "*". How and
> where do I tell the R script "treat * as a missing variable". If I can't
> incorporate missing values into the model, I assume the alternative is to
> remove all of the rows with missing data, which will greatly reduce my data
> set, as most rows have at least one missing variable.
> 
> Thanks
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 19-Dec-2014  Time: 10:21:23
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-09 Thread Ted Harding

;>>>>
>>>>>>Sys.which('texi2dvi')
>>>>>>
>>>>>>   texi2dvi
>>>>>>
>>>>>>"/usr/bin/texi2dvi"
>>>>>>
>>>>>>> file.exists(Sys.which('texi2dvi'))
>>>>>>
>>>>>>[1] TRUE
>>>>>>
>>>>>>> file.exists(Sys.which('pdflatex'))
>>>>>>
>>>>>>[1] TRUE
>>>>>>
>>>>>>Is there a specific path I should be giving with pdflatex and/or
>>>>>>'texi2dvi to make this work?
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger 
>>>>>>wrote:
>>>>>>> yes of course, and the answer is latex() in the Hmisc package.
>>>>>>> Why were you excluding it?
>>>>>>> Details follow
>>>>>>>
>>>>>>> Rich
>>>>>>>
>>>>>>>
>>>>>>> The current release of the Hmisc package has this capability on
>>>>>>> Macintosh and Linux.
>>>>>>> For Windows, you need the next release 3.14-7 which is available now
>>>>>>at github.
>>>>>>>
>>>>>>> ## windows needs these lines until the new Hmisc version is on CRAN
>>>>>>> install.packages("devtools")
>>>>>>> devtools::install_github("Hmisc", "harrelfe")
>>>>>>>
>>>>>>> ## All operating systems
>>>>>>> options(latexcmd='pdflatex')
>>>>>>> options(dviExtension='pdf')
>>>>>>>
>>>>>>> ## Macintosh
>>>>>>> options(xdvicmd='open')
>>>>>>>
>>>>>>> ## Windows, one of the following
>>>>>>>
>>>>>>options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
>>>>>>> ## 32-bit windows
>>>>>>>
>>>>>>options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
>>>>>>> ## 64 bit windows
>>>>>>>
>>>>>>> ## Linux
>>>>>>> ## I don't know the xdvicmd value
>>>>>>>
>>>>>>>
>>>>>>> ## this works on all R systems
>>>>>>> library(Hmisc)
>>>>>>> tmp <- matrix(1:9,3,3)
>>>>>>> tmp.dvi <- dvi(latex(tmp))
>>>>>>> print.default(tmp.dvi) ## prints filepath of the pdf file
>>>>>>> tmp.dvi  ## displays the pdf file on your screen
>>>>>>>
>>>>>>> On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius
>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a simple question.  I know there are plenty of packages out
>>>>>>>> there that can provide code to generate a table in latex.  But I was
>>>>>>>> wondering whether there was one out there where I can generate a
>>>>>>table
>>>>>>>> from my data (which ever way I please) then allow me to save it as a
>>>>>>>> pdf?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> K.
>>>>>>>>
>>>>>>>> __
>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>__
>>>>>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>PLEASE do read the posting guide
>>>>>>http://www.R-project.org/posting-guide.html
>>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 09-Dec-2014  Time: 22:07:23
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inverse Student t-value

2014-09-30 Thread Ted Harding

And, with 1221 degrees of freedom, one cannot be far off a Normal
distribution. So:

  abs(qnorm(0.408831/2))
  [1] 4.102431

which is nearer to Duncan's answer (4.117, and a bit smaller, as it
should be) than to yours (4.0891672, which, if accurate, should
be greater than the Normal value 4.102431).

Ted.

On 30-Sep-2014 18:20:39 Duncan Murdoch wrote:
> On 30/09/2014 2:11 PM, Andre wrote:
>> Hi Duncan,
>>
>> No, that's correct. Actually, I have data set below;
> 
> Then it seems Excel is worse than I would have expected.  I confirmed 
> R's value in two other pieces of software,
> OpenOffice and some software I wrote a long time ago based on an 
> algorithm published in 1977 in Applied Statistics.  (They are probably 
> all using the same algorithm.  I wonder what Excel is doing?)
> 
>> N= 1223
>> alpha= 0.05
>>
>> Then
>> probability= 0.05/1223=0.408831
>> degree of freedom= 1223-2= 1221
>>
>> So, TINV(0.408831,1221) returns 4.0891672
>>
>>
>> Could you show me more detail a manual equation. I really appreciate 
>> it if you may give more detail.
> 
> I already gave you the expression:  abs(qt(0.408831/2, df=1221)). 
> For more detail, I suppose you could look at the help page for the qt 
> function, using help("qt").
> 
> Duncan Murdoch
> 
>>
>> Cheers!
>>
>>
>> On Wed, Oct 1, 2014 at 1:01 AM, Duncan Murdoch 
>> mailto:murdoch.dun...@gmail.com>> wrote:
>>
>> On 30/09/2014 1:31 PM, Andre wrote:
>>
>> Dear Sir/Madam,
>>
>> I am trying to use calculation for two-tailed inverse of the
>> student`s
>> t-distribution function presented by Excel functions like
>> =TINV(probability, deg_freedom).
>>
>> For instance: The Excel function =TINV(0.408831,1221) = 
>> returns
>>   4.0891672.
>>
>> Would you like to show me a manual calculation for this?
>>
>> Appreciate your helps in advance.
>>
>>
>> That number looks pretty far off the true value.  Have you got a
>> typo in your example?
>>
>> You can compute the answer to your question as
>> abs(qt(0.408831/2, df=1221)), but you'll get 4.117.
>>
>> Duncan Murdoch
>>
>>
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 30-Sep-2014  Time: 19:35:45
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.table/ifelse conditional new variable question

2014-08-17 Thread Ted Harding

On 17-Aug-2014 03:50:33 John McKown wrote:
> On Sat, Aug 16, 2014 at 9:02 PM, Kate Ignatius 
> wrote:
> 
>> Actually - your code is not wrong... because this is a large file I
>> went through the file to see if there was anything wrong with it -
>> looks like there are two fathers or three mothers in some families.
>> Taking these duplicates out fixed the problem.
>>
>> Sorry about the confusion!  And thanks so much for your help!
>>
>>
> Kate,
> I hope you don't mind, but I have a curiosity question on my part.
> Were the families with multiple fathers or mothers a mistake, just
> duplicates (same Family.ID & Sample.ID), or more like an "intermixed"
> family due to divorce and remarriage. Or even, like in some countries,
> a case of polygamy? Sorry, I just get curious about the strangest
> things sometimes.
> -- 
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
> 
> Maranatha! <><
> John McKown

When Kate first posted her query, similar thoughts to John's occurred
to me. The potential for convoluted ancestry and kinship is enormous!

For perhaps (or perhaps not) ultimate convolution, try reconstructing
a canine pedigree from a breeding register of thoroughbreds, where
again the primary data is for each individual is
  * ID of individual
  * ID of litter the individual was born in ("family")
  * ID of male parent
  * ID of female parent
(as, for instance, registered with the UK Kennel Club).

Similar convolutions can be found with race-horses.

But even humans can compete. Here is a little challenge for anyone
who has an R program that will work out a pedigree from data such as
described above. I have used Kate's notation. Individuals are numbered
from 1 up (with a gap): Sample.ID; Families from 101 up: Family.ID.
Relationships are "sibling", "father", "mother".

ID for father/mother may be "NA" (data not given).

Family.ID Sample.ID Relationship
101   01sibling
101   02father
101   03mother

102   02sibling
102   04father
102   05mother

103   03sibling
103   06father
103   07mother

104   04sibling
104   08father
104   09mother

104   05sibling
104   08father
104   09mother

104   06sibling
104   08father
104   09mother

104   15sibling
104   08father
104   09mother

105   07sibling
105   04father
105   15mother

106   08sibling
106   16father
106   17mother

106   18sibling
106   16father
106   17mother

106   19sibling
106   16father
106   17mother

107   09sibling
107   18father
107   19mother

108   16sibling
108   NAfather
108   NAmother

109   17sibling
109   NAfather
109   NAmother

That's the data. Now a little quiz question: Can you guess the
identity of the person with sample.ID = 01 ?

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) 
Date: 17-Aug-2014  Time: 19:41:38
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A basic statistics question

2014-08-13 Thread Ted Harding

On 12-Aug-2014 22:22:13 Ted Harding wrote:
> On 12-Aug-2014 21:41:52 Rolf Turner wrote:
>> On 13/08/14 07:57, Ron Michael wrote:
>>> Hi,
>>>
>>> I would need to get a clarification on a quite fundamental statistics
>>> property, hope expeRts here would not mind if I post that here.
>>>
>>> I leant that variance-covariance matrix of the standardized data is
>>> equal to the correlation matrix for the unstandardized data. So I
>>> used following data.
>> 
>> 
>> 
>>> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
>>>
>>> Point is that I am not getting exact CORR matrix. Can somebody point
>>> me what I am missing here?
>> 
>> You are using a denominator of "n" in calculating your "covariance" 
>> matrix for your normalized data.  But these data were normalized using 
>> the sd() function which (correctly) uses a denominator of n-1 so as to 
>> obtain an unbiased estimator of the population standard deviation.
>> 
>> If you calculated
>> 
>> (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
>> 
>> then you would get the same result as you get from cor(Data) (to within 
>> about 1e-15).
>> 
>> cheers,
>> Rolf Turner
> 
> One could argue about "(correctly)"!
> 
>>From the "descriptive statistics" point of view, if one is given a single
> number x, then this dataset has no variation, so one could say that
> sd(x) = 0. And this is what one would get with a denominator of "n".
> 
> But if the single value x is viewed as sampled from a distribution
> (with positive dispersion), then the value of x gives no information
> about the SD of the distribution. If you use denominator (n-1) then
> sd(x) = NA, i.e. is indeterminate (as it should be in this application).
> 
> The important thing when using pre-programmed functions is to know
> which is being used. R uses (n-1), and this can be found from
> looking at
> 
>   ?sd
> 
> or (with more detail) at
> 
>   ?cor
> 
> Ron had assumed that the denominator was n, apparently not being aware
> that R uses (n-1).
> 
> Just a few thoughts ...
> Ted.
> -
> E-Mail: (Ted Harding) 
> Date: 12-Aug-2014  Time: 23:22:09

After some hesitation (not wanting to prolong a thread whose original
query has been effectively settled), nevertheless I think it may be
worth stating the general picture, for the sake of users who might
be confused or misled regarding the use of  the functions var(), cov(),
cor(), sd() etc.

The distinction to become aware of is between "summary" statistics
and statistics which will be used for estimation/inference.

Given a set of numbers, x[1], x[2], ... , x[N], they have a mean

  MEAN = sum(x)/N

and their variance VAR is the mean of the deviations between the x[i]
and MEAN:

  VAR = (sum(x - MEAN)^2)/N

If a random value X is drawn from {x[1], x[2], ... , x[N]}, with
uniform probability, then the expectation of X is again

  E(X) = MEAN

and the variance of X is again

  Var(X) = E((X - MEAN)^2) = VAR

with MEAN and VAR as given above. And the R function mean(x) will
return MEAN is its value. However, the R function var(x) will not
return VAR -- it will return (N/(N-1))*VAR

The above definitions of MEAN and VAR use division by N, i.e.
"denominator = N". But the R functions var(x), sd(x) etc. divide by
(N-1), i.e. use "denominator = N-1", as explained in
  ?var
  ?sd
etc.

The basic reason for this is that, given a random sample X[1], ... ,X[n]
of size n from {x[1], x[2], ... , x[N]}, the expectation of

  sum((X - mean(X))^2)

is (n-1)*VAR, so to obtain an unbiased estimate of VAR from the X
sample one must use the "bias-corrected" sample variance

  var(X) = sum((X - mean(X))^2)/(n-1)

i.e. "denominator = (n-1)" as described in the help pages.

So the function var(), with denominator = (n-1), is "correct" for
obtaining an unbiased estimator. But it will not be correct for the
variance sum((X - mean(X))^2)/n of the numbers X[1], ... ,X[n].

Since sd() also uses denominator = (n-1), sd(X) is the square root
of var(X). But, while var(X) is unbiased for VAR, sd(X) is not
unbiased for SD = sqrt(VAR), since, for a positive ransom variable Y,
in general

  E(sqrt(Y)) < sqrt(E(Y))

i.e. E(sd(X)) = E(sqrt(var(X))) < sqrt(E(var(X))) = sqrt(VAR) = SD.

The R functions var() etc., which use denominator = (n-1), do not
have an option which allows the user to choose denominator = n.

Therefore, in particular, a user who has a set of numbers
  {x[1], x[2], ... , x[N]}
(e.g. a record of a popuation) and wants the SD of these (e.g. for
use in summary stat

Re: [R] A basic statistics question

2014-08-12 Thread Ted Harding

On 12-Aug-2014 21:41:52 Rolf Turner wrote:
> On 13/08/14 07:57, Ron Michael wrote:
>> Hi,
>>
>> I would need to get a clarification on a quite fundamental statistics
>> property, hope expeRts here would not mind if I post that here.
>>
>> I leant that variance-covariance matrix of the standardized data is equal to
>> the correlation matrix for the unstandardized data. So I used following
>> data.
> 
> 
> 
>> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
>>
>> Point is that I am not getting exact CORR matrix. Can somebody point
>> me what I am missing here?
> 
> You are using a denominator of "n" in calculating your "covariance" 
> matrix for your normalized data.  But these data were normalized using 
> the sd() function which (correctly) uses a denominator of n-1 so as to 
> obtain an unbiased estimator of the population standard deviation.
> 
> If you calculated
> 
> (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
> 
> then you would get the same result as you get from cor(Data) (to within 
> about 1e-15).
> 
> cheers,
> Rolf Turner

One could argue about "(correctly)"!

>From the "descriptive statistics" point of view, if one is given a single
number x, then this dataset has no variation, so one could say that
sd(x) = 0. And this is what one would get with a denominator of "n".

But if the single value x is viewed as sampled from a distribution
(with positive dispersion), then the value of x gives no information
about the SD of the distribution. If you use denominator (n-1) then
sd(x) = NA, i.e. is indeterminate (as it should be in this application).

The important thing when using pre-programmed functions is to know
which is being used. R uses (n-1), and this can be found from
looking at

  ?sd

or (with more detail) at

  ?cor

Ron had assumed that the denominator was n, apparently not being aware
that R uses (n-1).

Just a few thoughts ...
Ted.

-
E-Mail: (Ted Harding) 
Date: 12-Aug-2014  Time: 23:22:09
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A basic statistics question

2014-08-12 Thread Ted Harding

On 12-Aug-2014 19:57:29 Ron Michael wrote:
> Hi,
> 
> I would need to get a clarification on a quite fundamental statistics
> property, hope expeRts here would not mind if I post that here.
> 
> I leant that variance-covariance matrix of the standardized data is equal to
> the correlation matrix for the unstandardized data. So I used following data.
> 
> Data <- structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L,  7L,
> 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L,  0L, 10L, 10L,
> 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L,  0L, 13L, 13L, 10L, 7L,
> 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L,  10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L,
> 7L, 7L, 8L, 7L, 8L, 6L,  6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L,
> 12L, 8L, 5L,  5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L,
> 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L,  6L, 4L, 4L,
> 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L,  5L, 4L, 6L, 6L, 6L, 7L,
> 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L,  6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L,
> 6L, 6L, 5L, 5L, 6L, 6L,  4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L,
> 4L, 6L, 6L,  6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 
> 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L,  9L, 10L, 8L,
> 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L,
>  13L,  10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L,  3L, 7L,
> 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L,  10L, 8L, 8L, 9L, 9L,
> 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L,  2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L,
> 1L, 1L, 3L, 3L, 4L, 6L, 4L,  5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L,
> 7L, 3L, 3L, 10L,  13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L,
> 3L,  0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L,  2L, 2L,
> 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L,  4L, 4L, 8L, 7L, 10L,
> 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L,  8L), .Dimnames = list(NULL, c("V1",
> "V7", "V13", "V19", "V25",  "V31", "V37", "V43"))) 
> 
>   
> Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) 
> 
> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
> 
> 
> 
> Point is that I am not getting exact CORR matrix. Can somebody point me
> what I am missing here?
> 
> Thanks for your pointer.

Try:
  Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x)))
  (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)

and compare the result with

  cor(Data)

And why? Look at

  ?sd

and note that:

  Details:
 Like 'var' this uses denominator n - 1.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 12-Aug-2014  Time: 22:32:26
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generate quasi-random positive numbers

2014-08-05 Thread Ted Harding

On 05-Aug-2014 10:27:54 Frederico Mestre wrote:
> Hello all:
> 
> Is it possible to generate quasi-random positive numbers, given a standard
> deviation and mean? I need all positive values to have the same probability
> of selection (uniform distribution). Something like:
> 
> runif(10, min = 0, max = 100)
> 
> This way I'm generating random positive numbers from a uniform
> distribution. However, using runif I can't previously select SD and mean
> (as in rnorm).
> 
> Alternatively, I'm able to generate a list of quasi-random numbers given a
> SD and a mean.
> 
> b <- (sqrt(SD^2*12)+(MEAN*2))/2
> a <- (MEAN*2) - b
> x1 <- runif(N,a,b)
> 
> However, negative values might be included, since "a" can assume a negative
> value.
> 
> Any help?
> 
> Thanks,
> Frederico

There is an inevitable constraint on MEAN and SD for a uniform
ditribution of positive numbers. Say the parent distribution is
uniform on (a,b) with a >= 0 and b > a.

Then MEAN = (a+b)/2, SD^2 = ((b-a)^2)/12, so

  12*SD^2  = b^2 - 2*a*b + a^2
  4*MEAN^2 = b^2 + 2*a*b + a^2

  4*MEAN^2 - 12*SD^2 = 4*a*b

  MEAN^2 - 3*SD^2 = a*b

Hence for a >= 0 and b > a you must have MEAN^2 >= 3*SD^2.

Once you have MEAN and SD satisfying this constraint, you should
be able to solve the equations for a and b.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 05-Aug-2014  Time: 11:46:52
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple permutation question

2014-06-25 Thread Ted Harding

I think Robert wants deterministic permutations. In the e1071
package -- load with library(e1071) -- there is a function
permutations():

  Description:
  Returns a matrix containing all permutations of the integers
  '1:n' (one permutation per row).

  Usage:
  permutations(n)

  Arguments:
  n: Number of element to permute.

so, starting with
  x <- c("A","B","C","D","E")
  library(e1071)
  P <- permutations(length(x))

then, for say the 27th of these 120 permutations of x,

  x[P[27,]]

will return it.

Ted.

On 25-Jun-2014 20:38:45 Cade, Brian wrote:
> It is called sample(,replace=F), where the default argument is sampling
> without replacement.
> Try
> x <- c("A","B","C","D","E")
> sample(x)
> 
> Brian
> 
> Brian S. Cade, PhD
> 
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO  80526-8818
> 
> email:  ca...@usgs.gov 
> tel:  970 226-9326
> 
> 
> 
> On Wed, Jun 25, 2014 at 2:22 PM, Robert Latest  wrote:
> 
>> So my company has hired a few young McKinsey guys from overseas for a
>> couple of weeks to help us with a production line optimization. They
>> probably charge what I make in a year, but that's OK because I just
>> never have the time to really dive into one particular time, and I have
>> to hand it to the consultants that they came up with one or two really
>> clever ideas to model the production line. Of course it's up to me to
>> feed them the real data which they then churn through their Excel
>> models that they cook up during the nights in their hotel rooms, and
>> which I then implement back into my experimental system using live data.
>>
>> Anyway, whenever they need something or come up with something I skip
>> out of the room, hack it into R, export the CSV and come back in about
>> half the time it takes Excel to even read in the data, let alone
>> process it. Of course that gor them curious, and I showed off a couple
>> of scripts that condense their abysmal Excel convolutions in a few
>> lean and mean lines of R code.
>>
>> Anyway, I'm in my office with this really attractive, clever young
>> McKinsey girl (I'm in my mid-forties, married with kids and all, but I
>> still enjoyed impressing a woman with computer stuff, of all things!),
>> and one of her models involves a simple permutation of five letters --
>> "A" through "E".
>>
>> And that's when I find out that R doesn't have a permutation function.
>> How is that possible? R has EVERYTHING, but not that? I'm
>> flabbergasted. Stumped. And now it's up to me to spend the evening at
>> home coding that model, and the only thing I really need is that
>> permutation.
>>
>> So this is my first attempt:
>>
>> perm.broken <- function(x) {
>> if (length(x) == 1) return(x)
>> sapply(1:length(x), function(i) {
>> cbind(x[i], perm(x[-i]))
>> })
>> }
>>
>> But it doesn't work:
>> > perm.broken(c("A", "B", "C"))
>>  [,1] [,2] [,3]
>> [1,] "A"  "B"  "C"
>> [2,] "A"  "B"  "C"
>> [3,] "B"  "A"  "A"
>> [4,] "C"  "C"  "B"
>> [5,] "C"  "C"  "B"
>> [6,] "B"  "A"  "A"
>> >
>>
>> And I can't figure out for the life of me why. It should work because I
>> go through the elements of x in order, use that in the leftmost column,
>> and slap the permutation of the remaining elements to the right. What
>> strikes me as particularly odd is that there doesn't even seem to be a
>> systematic sequence of letters in any of the columns. OK, since I
>> really need that function I wrote this piece of crap:
>>
>> perm.stupid <- function(x) {
>> b <- as.matrix(expand.grid(rep(list(x), length(x
>> b[!sapply(1:nrow(b), function(r) any(duplicated(b[r,]))),]
>> }
>>
>> It works, but words cannot describe its ugliness. And it gets really
>> slow really fast with growing x.
>>
>> So, anyway. My two questions are:
>> 1. Does R really, really, seriously lack a permutation function?
>> 2. OK, stop kidding me. So what's it called?
>> 3. Why doesn't my recursive function work, and what would a
>>working version look like?
>>
>> Thanks,
>> robert
>>
>>

Re: [R] C: drive memory full

2014-06-17 Thread Ted Harding

Don't forget the possibility of "hidden files", which will not be
shown using normal file-listing procedures. It may be that R has
stored those files as hidden files.

See, for example:

http://windows.microsoft.com/en-gb/windows/show-hidden-files

http://www.sevenforums.com/tutorials/394-hidden-files-folders-show-hide.html

[NB: These are the results of a google search. I am no expert on
Windows myself ... ]

Hoping this helps,
Ted.

On 17-Jun-2014 12:48:54 Hiyoshi, Ayako wrote:
> Dear Martyn and Professor Ripley,
> 
> Thank you so much for your help. I used Window's large file search (it was
> useful! thank you), but there is no big files detected in C: drive .
> Perhaps I will have to reinstall Windows..
> 
> Thank you so much for your replies.
> 
> Best wishes,
> Ayako
> 
> From: Martyn Byng 
> Sent: 17 June 2014 12:10
> To: Hiyoshi, Ayako; Prof Brian Ripley; r-help@R-project.org
> Subject: RE: [R] C: drive memory full
> 
> Hi,
> 
> Try
> 
> http://social.technet.microsoft.com/wiki/contents/articles/19295.windows-8-how
> -to-search-for-large-files.aspx
> 
> Martyn
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Hiyoshi, Ayako
> Sent: 17 June 2014 11:40
> To: Prof Brian Ripley; r-help@R-project.org
> Subject: Re: [R] C: drive memory full
> 
> Dear Professor Ripley,
> 
> Thank you so much for your quick reply.
> 
> I tried 'dianame(tempdir())' and removed several 'Rtemp' and all other files.
> Unfortunately it did not changed C: drive space much.
> 
> I am really sorry, but could there be other things stored in somewhere in C:
> drive ?
> 
> I called IT support, but they could not spot either.
> 
> Kind regards,
> Ayako
> 
> 
> From: Prof Brian Ripley 
> Sent: 17 June 2014 10:37
> To: Hiyoshi, Ayako; r-help@R-project.org
> Subject: Re: [R] C: drive memory full
> 
> tempdir() is specific to an R session.
> 
> Start up R and run
> 
> dirname(tempdir())
> 
> and look at that directory.  Shut down R, then remove all old files/folders
> in that directory, especially those beginning with 'Rtmp'.
> 
> An R process tries to clean up after itself but
> 
> - it cannot do so if it segfaults 
> - Windows has rules which no other OS has which can result in files being
> locked and hence not deletable.
> 
> 
> On 17/06/2014 08:49, Hiyoshi, Ayako wrote:
>> Dear R users,
>>
>> Hello, I am new to R and confused about my PC's memory space after
>> using R a while.
>>
>> My PC is Windows 8, RAM 8GB. I have run some analyses using commands
>> like "vglm", "aalen(Sruv())", lm()... some regressions.
>>
>> I also use Stata, and when I tried to run Stata (big file), Stata
>> could not do something which used to do because of the lack of memory
>> space. I suspect R's something because R is the only new activity I
>> did recently.
>>
>> I tried to google and found 'tempdir()'.
>>
>> I checked my temporary file but it was empty.
>>
>> Just in case, after running 'unlink(tempdir(),recursive=TRUE)', I
>> restarted my computer, but memory space did not change. But there
>> seems still something big in my C: drive storage and nearly 12GB
>> is eaten.
>>
>> Could it be possible that R saved something somewhere?
>>
>> As I finished analyses, all I need is to erase everything stored
>> so that I can get my memory space back.
>> 
>> Ayako
>>   [[alternative HTML version deleted]]

-
E-Mail: (Ted Harding) 
Date: 17-Jun-2014  Time: 14:14:40
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix column division by vector

2014-05-14 Thread Ted Harding

Maybe I am missing the point -- but what is wrong with line 3 of:

  m=rbind(c(6,4,2),c(3,2,1))
  v= c(3,2,1)
  m%*%diag(1/v)
  #  [,1] [,2] [,3]
  # [1,]222
  # [2,]111

Ted.

On 14-May-2014 15:03:36 Frede Aakmann Tøgersen wrote:
> Have a look at ?sweep
> 
> Br. Frede
> 
> 
> Sendt fra Samsung mobil
>  Oprindelig meddelelse 
> Fra: carol white
> Dato:14/05/2014 16.53 (GMT+01:00)
> Til: r-h...@stat.math.ethz.ch
> Emne: [R] matrix column division by vector
> 
> Hi,
> What is the elegant script to divide the columns of a matrix by the
> respective position of a vector elements?
> 
> m=rbind(c(6,4,2),c(3,2,1))
> 
> v= c(3,2,1)
> 
> res= 6/3   4/2  2/1
> 3/3   2/21/1
> 
> 
> this is correct
> mat2 = NULL
> 
> for (i in 1: ncol(m))
> 
> mat2 = cbind(mat2, m[,i]/ v[i])
> 
> 
> but how to do more compact and elegant with for ex do.call?
> 
> Many thanks
> 
> Carol
> [[alternative HTML version deleted]]
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 14-May-2014  Time: 18:16:12
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Precedence and parentheses

2014-05-06 Thread Ted Harding

On 06-May-2014 18:09:12 Göran Broström wrote:
> A thread on r-devel ("Historical NA question") went (finally) off-topic, 
> heading towards "Precedence". This triggered a question that I think is 
> better put on this list:
> 
> I have been more or less regularly been writing programs since the 
> seventies (Fortran, later C) and I early got the habit of using 
> parentheses almost everywhere, for two reasons. The first is the 
> obvious, to avoid mistakes with precedences, but the second is almost
> as important: Readability.
> 
> Now, I think I have seen somewhere that unnecessary parentheses in  R 
> functions may slow down execution time considerably. Is this really 
> true, ant should I consequently get rid of my faiblesse for parentheses?
> Or are there rules for when it matters and doesn't matter?
> 
> Thanks, Göran

I have much sympathy with your motives for using parentheses!
(And I have a similar computing history).

My general encouragement would be: continue using them when you
feel that each usage brings a benefit.

Of course, you would avoid silliness like
  x <- (1*(2*(3*(4*(5)

and indeed, that does slow it down somewhat (it takes about twice
as long as a <- 1*2*3*4*5):

  system.time(for(i in (1:1)) 1*2*3*4*5)
#  user  system elapsed 
# 0.032   0.000   0.032 

  system.time(for(i in (1:1)) (1*(2*(3*(4*(5))
#  user  system elapsed 
# 0.056   0.000   0.054 

The main reason, I suppose, is that a "(" forces a new level
on a stack, which is not popped until the matching ")" arrives.

Interestingly, the above silliness takes a little longer when
the nesting is the other way round:

  system.time(for(i in (1:1)) 1*2*3*4*5)
#  user  system elapsed 
# 0.028   0.000   0.029 

  system.time(for(i in (1:1)) (1)*2)*3)*4)*5) )
#  user  system elapsed 
#  0.052   0.000   0.081 

(though in fact the times are somwhat variable in both cases,
so I'm not sure of the value of the relationship).

Best wishes,
Ted.

-
E-Mail: (Ted Harding) 
Date: 06-May-2014  Time: 19:41:13
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with products in R ?

2014-05-04 Thread Ted Harding

Thanks for this link, Gabor (and especially for the link
therein to your posting on Thu May 7 14:10:53 CEST 2009).
This confirms that the R 'bc' package is not on CRAN and
points to where it can be sourced from. I used to have bc.R
installed on an old machine, which has gone into terminal
coma now.

Best wishes,
Ted.


On 04-May-2014 17:10:00 Gabor Grothendieck wrote:
> Checking this with the bc R package (https://code.google.com/p/r-bc/),
> the Ryacas package (CRAN), the gmp package (CRAN) and the Windows 8.1
> calculator all four give the same result:
> 
>> library(bc)
>> bc("168988580159 * 36662978")
> [1] "6195624596620653502"
> 
>> library(Ryacas)
>> yacas("168988580159 * 36662978", retclass = "character")
> 6195624596620653502
> 
>> library(gmp)
>> as.bigz("168988580159") * as.bigz("36662978")
> Big Integer ('bigz') :
> [1] 6195624596620653502
> 
> 
> On Sun, May 4, 2014 at 12:50 PM, Ted Harding 
> wrote:
>> On 04-May-2014 14:13:27 Jorge I Velez wrote:
>>> Try
>>>
>>> options(digits = 22)
>>> 168988580159 * 36662978
>>># [1] 6195624596620653568
>>>
>>> HTH,
>>> Jorge.-
>>
>> Err, not quite ... !
>> I hitch my horses to my plough (with help from R):
>>
>> options(digits=22)
>> 168988580159*8 = 1351908641272 (copy down)
>> 168988580159*7 = 1182920061113  ( " " )
>> 168988580159*9 = 1520897221431  ( " " )
>> 168988580159*2 =  337977160318  ( " " )
>> 168988580159*6 = 1013931480954  ( " " )^3
>> 168988580159*3 =  506965740477  ( " " )
>>
>>  1351908641272
>> 11829200611130
>>152089722143100
>>337977160318000
>>  1013931480954
>> 10139314809540
>>101393148095400
>>506965740477000
>> ==
>>6195624596620653502
>> [after adding up mentally]
>>
>> compared with Jorge's:
>>6195624596620653568
>>
>> ("02" vs "68" in the final two digits).
>>
>> Alternatively, if using a unixoid system with 'bc' present,
>> one can try interfacing R with 'bc'. 'bc' is an calculating
>> engine which works to arbitrary precision.
>>
>> There certainly used to be a utility in which R can evoke 'bc',
>> into which one can enter a 'bc' command and get the result
>> returned as a string, but I can't seem to find it on CRAN now.
>> In any case, the raw UNIX command line for this calculation
>> with 'bc' (with result) is:
>>
>> $ bc -l
>> [...]
>> 168988580159 * 36662978
>> 6195624596620653502
>> quit
>>
>> which agrees with my horse-drawn working.
>>
>> Best wishes to all,
>> Ted.
>>
>>> On Sun, May 4, 2014 at 10:44 PM, ARTENTOR Diego Tentor <
>>> diegotento...@gmail.com> wrote:
>>>
>>>> Trying algorithm for products with large numbers i encountered a
>>>> difference
>>>> between result of 168988580159 * 36662978 in my algorithm and r product.
>>>> The Microsoft calculator confirm my number.
>>>>
>>>> Thanks.
>>>> --
>>>> *GrÃ¡fica ARTENTOR  *
>>>>
>>>> de Diego L. Tentor
>>>> EchagÃ¼e 558
>>>> Tel.:0343 4310119
>>>> ParanÃ¡ - Entre RÃos
>>
>> -
>> E-Mail: (Ted Harding) 
>> Date: 04-May-2014  Time: 17:50:54
>> This message was sent by XFMail
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

-
E-Mail: (Ted Harding) 
Date: 04-May-2014  Time: 18:28:23
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with products in R ?

2014-05-04 Thread Ted Harding

On 04-May-2014 14:13:27 Jorge I Velez wrote:
> Try
> 
> options(digits = 22)
> 168988580159 * 36662978
># [1] 6195624596620653568
> 
> HTH,
> Jorge.-

Err, not quite ... !
I hitch my horses to my plough (with help from R):

options(digits=22)
168988580159*8 = 1351908641272 (copy down)
168988580159*7 = 1182920061113  ( " " )
168988580159*9 = 1520897221431  ( " " )
168988580159*2 =  337977160318  ( " " )
168988580159*6 = 1013931480954  ( " " )^3
168988580159*3 =  506965740477  ( " " )

 1351908641272
11829200611130
   152089722143100
   337977160318000
 1013931480954
10139314809540
   101393148095400
   506965740477000
==
   6195624596620653502
[after adding up mentally]

compared with Jorge's:
   6195624596620653568

("02" vs "68" in the final two digits).

Alternatively, if using a unixoid system with 'bc' present,
one can try interfacing R with 'bc'. 'bc' is an calculating
engine which works to arbitrary precision.

There certainly used to be a utility in which R can evoke 'bc',
into which one can enter a 'bc' command and get the result
returned as a string, but I can't seem to find it on CRAN now.
In any case, the raw UNIX command line for this calculation
with 'bc' (with result) is:

$ bc -l
[...]
168988580159 * 36662978
6195624596620653502
quit

which agrees with my horse-drawn working.

Best wishes to all,
Ted.

> On Sun, May 4, 2014 at 10:44 PM, ARTENTOR Diego Tentor <
> diegotento...@gmail.com> wrote:
> 
>> Trying algorithm for products with large numbers i encountered a difference
>> between result of 168988580159 * 36662978 in my algorithm and r product.
>> The Microsoft calculator confirm my number.
>>
>> Thanks.
>> --
>> *Gráfica ARTENTOR  *
>>
>> de Diego L. Tentor
>> Echagüe 558
>> Tel.:0343 4310119
>> Paraná - Entre Ríos

-
E-Mail: (Ted Harding) 
Date: 04-May-2014  Time: 17:50:54
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting a particular weekday for a given month

2014-04-07 Thread Ted Harding

On 07-Apr-2014 17:49:09 Christofer Bogaso wrote:
> Hi,
> Given a month name, I am looking for some script to figure out,
> what is the date for 3rd Wednesday. For example let say I have
> following month:
> 
> library(zoo)
> Month <- as.yearmon(as.Date(Sys.time()))
> 
> I need to answer: What is the date for 3rd Wednesday of 'Month'?
> 
> Really appreciate for any pointer.
> 
> Thanks for your time.

The following may not suit you, but it the sort of approach I tend
to adopt myself, using things I know about rather than getting lost
in R documentation! (Outline of general method, not details). And it
also assumes you are using a Unixoid system (e.g. Linux or Mac OS2).

Your two commands currently give:

  library(zoo)
  Month <- as.yearmon(as.Date(Sys.time()))
  Month
  # [1] "Apr 2014"

and it is straightforward to extract "Apr" and "2014" from Month.

This is the point at which I attach my horses to my wooden plough ...

In Unixoid systems there is a command 'cal' which, for "2014",
yields output:

$ cal 2014
 2014  

  January   February   March
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
   1  2  3  4  5  1  2  1  2
 6  7  8  9 10 11 12   3  4  5  6  7  8  9   3  4  5  6  7  8  9
13 14 15 16 17 18 19  10 11 12 13 14 15 16  10 11 12 13 14 15 16
20 21 22 23 24 25 26  17 18 19 20 21 22 23  17 18 19 20 21 22 23
27 28 29 30 3124 25 26 27 2824 25 26 27 28 29 30
31  
   April  May   June
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
1  2  3  4  5  61  2  3  4 1
 7  8  9 10 11 12 13   5  6  7  8  9 10 11   2  3  4  5  6  7  8
14 15 16 17 18 19 20  12 13 14 15 16 17 18   9 10 11 12 13 14 15
21 22 23 24 25 26 27  19 20 21 22 23 24 25  16 17 18 19 20 21 22
28 29 30  26 27 28 29 30 31 23 24 25 26 27 28 29
30  
July August  September  
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
1  2  3  4  5  6   1  2  3   1  2  3  4  5  6  7
 7  8  9 10 11 12 13   4  5  6  7  8  9 10   8  9 10 11 12 13 14
14 15 16 17 18 19 20  11 12 13 14 15 16 17  15 16 17 18 19 20 21
21 22 23 24 25 26 27  18 19 20 21 22 23 24  22 23 24 25 26 27 28
28 29 30 31   25 26 27 28 29 30 31  29 30   

  October   November  December  
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
   1  2  3  4  5  1  2   1  2  3  4  5  6  7
 6  7  8  9 10 11 12   3  4  5  6  7  8  9   8  9 10 11 12 13 14
13 14 15 16 17 18 19  10 11 12 13 14 15 16  15 16 17 18 19 20 21
20 21 22 23 24 25 26  17 18 19 20 21 22 23  22 23 24 25 26 27 28
27 28 29 30 3124 25 26 27 28 29 30  29 30 31

After the first two lines, this consists of 4 blocks, each with 8 rows,
each covering 3 months where each month consists of 7 columns, one
for each day of the week (Mo - Su). Each column occupies 3 character
spaces (excpet for the last -- only 2).

>From "April" you can readily identify that this is the 4th month,
so you need to go to Month 1 of the 2nd block of rows. The "We"
column is the 3rd in that month, and you are looking for the
date of the 3rd Wednesday. So count down to the 3rd non-blank
entry[*] in this 3rd column, and you will find "16". Done.

[*] Some months, e.g. November above, have an initial blank
entry because this day belongs to the previous month.

Quite how you would program this efficiently in R is another matter!
But the principle is simple. To give R a text file to work on, at the
shell prompt use a command like:

$ cal 2014 > cal2014.txt

and then "cal2014.txt" is accessible as a plain text file.

Even simpler (if it is only one particular month you want,
as in your example) is:

$ cal April 2014

which yields:

 April 2014 
Mo Tu We Th Fr Sa Su
1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30        

and now just count down the 3rd column (as before).

Maybe this helps ...
Ted.

-
E-Mail: (Ted Harding) 
Date: 07-Apr-2014  Time: 19:29:41
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about lm()

2014-03-30 Thread Ted Harding

I suspect the problem may be with the structure of the data.

Si Qi L wrote:
  [...]
  acc1<- lm(data$acc ~ dummy + data$Reqloanamount + data$Code +
  data$Code.1 + data$EmpNetMonthlyPay + data$Customer_Age + data$RTI)
  [...]
  These attributes are all numerical except the "acc"(factors)
  [...]

If, as he implies, the "acc" variable in "data" is a factor,
then lm() will not enjoy fitting an lm where the dependent
variables (response) is a factor!

Just a shot in the dark ...
Ted.

On 30-Mar-2014 18:46:27 Bert Gunter wrote:
> 1. Post in plain text, not HTML.
> 
> 2. Read ?lm and note the data argument. Use it in your lm call instead
> of all the $data extractions.
> 
> 3. Your problem is with the summary() call, so read ?summary.lm. Learn
> about S3 methods if you do not know where the ".lm" part is coming
> from by reading the "Introduction to R " tutorial that ships with R,
> which you should have read already anyway.
> 
> Cheers,
> Bert
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> H. Gilbert Welch
> 
> 
> 
> 
> On Sun, Mar 30, 2014 at 9:43 AM, Frank Schwidom  wrote:
>> Please provide some data from your variable data.
>>
>> Show the output of dput( data) or an subset
>> of data which leads toe the specific error.
>>
>> Regards
>>
>>
>> On Sun, Mar 30, 2014 at 02:23:09PM +0100, Si Qi L. wrote:
>>> Hi
>>>
>>> I have a problem with linear regression. This is my codes:
>>>
>>> acc1<- lm(data$acc ~ dummy + data$Reqloanamount + data$Code + data$Code.1 +
>>> data$EmpNetMonthlyPay + data$Customer_Age + data$RTI)
>>> summary(acc1)
>>>
>>> These attributes are all numerical except the "acc"(factors), so how can I
>>> fix the problem R showed me? can anyone please help me out? Many thanks!
>>>
>>> But the R shows:
>>>
>>> Residuals:Error in quantile.default(resid) : factors are not allowedIn
>>> addition: Warning message:In Ops.factor(r, 2) : ^ not meaningful for
>>> factors
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 30-Mar-2014  Time: 21:12:16
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] points with-in boundaries of a map

2014-03-23 Thread Ted Harding

On 23-Mar-2014 22:50:50 Jim Lemon wrote:
> On 03/23/2014 10:29 PM, eliza botto wrote:
>> Thankyou very much jim. it worked! but regarding second part of my
>> question, isn't there a way to read the coordinates of intersecting
>> lines with the premises of the map?
> 
> Hi Eliza,
> I think you want the "locator" function, which will return the 
> coordinates to the accuracy that you can click on a point in the plot.
> 
> Jim

There is a possible analytical approach to this. If the boundary of
a simple closed region is given as a series of straight lines which
join successive points as one proceeds continuously round the boundary
of theregion, and one has the data of the (x,y) coordinates of the points
in that order (with the last point the same as the first), then one
can proceed as follows:

Let the points be P1=(x1,y1), P2=(x2,y2), ... , PN=(xN,yN)=P1
in the order specified above.

Since the grid lines have been drawn at specified intervals, you
can work through every point (x0,y0) which is an intersection.

Let (x0,y0) be a point of intersection of the grid of lines. Now you
know the coordinates of (x0,y0) already, so the question is whether
this point is inside or outside the region.

Consider a line jpoining (x0,y0) to (x1,y1). Now work out the angle
through which the line rotates when it is moved so that it joins
(x0,y0) to (x2,y2); say this is phi2 (clockwise positive, anticlockise
negative). Similarly work out phi3, the angle through which the line
next rotates when it is further moved so that it joins (x0,y0) to (x2,y2);
and so on.

Now add up phi2 + phi3 + ... + phiN

If the point (x0,y0) is inside the region, and you have proceeded
consecutively along the boundary, then this sum of angles will be
pi or -pi. If (x0,y0) is outside the region, then this sum will be zero.

So you can work through all the intersections of the grid lines,
and for each intersection you can determine whether it is inside
or outside; and, if it is inside, you already know its coordinates.

It can get more complicated if the region is not simply-connected.
E.g. if you do not want to count points (x0,y0) which are within
the boundary coastline but also are within the boundary of an
inland lake; or, further, you do want to count points which are
on an island in the lake; and so on.

The essential for following a procedure of this kind is that you
can extract from the data from which the map is drawn the coordinates
of a series of points which are in consecutive order as one proceeds
along the boundar of a region. Not all geographic data have the
coordinates in such an order -- one can find datasets such that the
boundary is drawn as a set of separate partial boundaries which
are in no particular order as a whole; and in some datasets the
different separate parts of the boundary do not exactly match up
at the points where they should exactly join.

Hoping this helps,
Ted.

---------
E-Mail: (Ted Harding) 
Date: 23-Mar-2014  Time: 23:47:15
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding to whole number

2014-03-20 Thread Ted Harding

On 20-Mar-2014 19:42:35 Kristi Glover wrote:
> Hi R User, 
> I was trying to convert a decimal value into integral (whole number). I used
> round function but some of the cells have the value less than 0.5 and it
> converted into 0. But I wanted these cell to have 1 instead of 0. Another
> way, I could multiply by 10. But l did not want it because it affected my
> results later.
> I was wondering about how I can convert 0.2 to 1. For example 
> data<-structure(list(site1 = structure(1:4, .Label = c("s1", "s2", 
> "s3", "s4"), class = "factor"), A = c(0, 2.3, 2.6, 1.3), B = c(0.5, 
> 0.17, 2.9, 3)), .Names = c("site1", "A", "B"), class = "data.frame",
> row.names = c(NA, 
> -4L))
> output<-structure(list(site1 = structure(1:4, .Label = c("s1", "s2", 
> "s3", "s4"), class = "factor"), A = c(0L, 3L, 3L, 2L), B = c(1L, 
> 1L, 3L, 3L)), .Names = c("site1", "A", "B"), class = "data.frame", row.names
> = c(NA, 
> -4L))
> 
> Thanks for your suggestions in advance.
> cheers,
> KG

Hi Kristi,
the function round() has peculiarites (because of the IEEE standard).
E.g.

  round(1+1/2)
  # [1] 2
  round(0+1/2)
  # [1] 0

i.e. in "equipoised" cases like these, it rounds to the nearest *even"
number; otherwise, if it is nearer to one integer value (above or below)
than to the other (below or above), then it rounds to the nearest integer
value. There are two unambiguous functions you could use for "equipoised"
cases, depending on which way you want to go:

  floor(1+1/2)
  # [1] 1
  floor(0+1/2)
  # [1] 0

  ceiling(1+1/2)
  # [1] 2
  ceiling(0+1/2)
  # [1] 1

The latter would certainly meet your request for "how I can convert
0.2 to 1", since

  ceiling(0.2)
  # [1] 1

But it will not round, say, (1+1/4) to the nearest integer.

On the other hand, if you want a function which rounds to the
nearest integer when not exactly halfway between, and rounds
either always up or always down when the fractional part is exactly 1/2,
then I think (but others will probably correct me) that you may have
to write your own -- say roundup() or rounddown():

  roundup <- function(x){
if((x-floor(x))==1/2){
  ceiling(x)
} else {round(x)}
  }

  rounddown <- function(x){
if((x-floor(x))==1/2){
  floor(x)
} else {round(x)}
  }

Also have a look at the help page

  ?round

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 20-Mar-2014  Time: 20:04:20
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pattern Matching

2014-03-02 Thread Ted Harding

On 02-Mar-2014 20:12:57 Benno Pütz wrote:
> Try
> 
> as.numeric(sub(".*\\(","", sub('\\)','',aa)))
> 
> You may also want to look at regexec/regmatches for a more general approach
> ...
> 
> On 02 Mar 2014, at 20:55, Doran, Harold  wrote:
> 
>> "1 (472)" "2 (445)" "3 (431)" "3 (431)" "5 (415)" "6 (405)" "7 (1)
> 
> Benno Pütz

Another formulation, which breaks it into steps and may therefore
be easier to adopt to similar but different cases, is

  aa0<-gsub("^[0-9]+ ","",aa)
  aa0
  # [1] "(472)" "(445)" "(431)" "(431)" "(415)" "(405)" "(1)"

  as.numeric(gsub("[()]","",aa0))
  # [1] 472 445 431 431 415 405   1

Ted.

-
E-Mail: (Ted Harding) 
Date: 02-Mar-2014  Time: 20:57:07
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geneation

2014-02-20 Thread Ted Harding

[see at end]
On 20-Feb-2014 10:47:50 Rui Barradas wrote:
> Hello,
> 
> I'm not sure I understand the question. When you use set.seed, it will 
> have effect in all calls to the random number generator following it. So 
> the value for y is also fixed.
> As for your code, you don't need the second set.seed. And though it is 
> not syntatically incorrect, the way you are coding it is not very usual. 
> Try instead
> 
> set.seed(100)
> x <- 10*runif(10)
> x
> 
> y <- rnorm(10)
> y
> 
> y <- rnorm(10)  #different y
> y
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Em 20-02-2014 07:05, IZHAK shabsogh escreveu:
>> how do i use set.seed? for example i want to generate fix x with different
>> value of y each time i.e
>>
>> genarate x<-rnorm(10)
>> generate y<-rnorm(10)
>> i want have x fix but y changes at each iteration. this what i try but is
>> not working
>>
>>
>> {
>> set.seed(100)
>>
>> x<-10*runif(10)
>> }
>> x
>> set.seed(y<-rnorm(10))
>> y

It seems clear that Izhak seeks to detach the random generation of y
from the random generation of x after using set.seed(). On my reading of
  ?RNG
once set.seed has been used, as RUI says, it affects all subsequent
calls to the generator. Initially, however:

  Note:
  Initially, there is no seed; a new one is created from the current
  time when one is required.  Hence, different sessions started at
  (sufficiently) different times will give different simulation
  results, by default.

But, even so, it still seems (perhaps) that using a RNG without the
call to set.seed() will still establish a seed (and its consequences)
for that session (in effect there is an implicit call to set.seed()).

This leads me to suggest that a useful innovation could be to add a
feature to set.seed() so that

  set.seed(NULL)

(which currently generates an error) would undo the effect of any
previous (explicit or implicit) call to set.seed() so that, for instance,

  set.seed(100)
  x<-10*runif(10)

  set.seed(NULL)
  y <- rnorm(10)

would result in y being generated from a seed which was set from the
system clock. There is no form of argument to set.seed() which instructs
it to take its value from the system clock (or other sourc of external
random events); and indeed it seems that only the system clock is available.

On Linux/UNIX systems, at least, there is a possible accessible source
of external randomness, namely '/dev/random', and its companion
'/dev/urandom'. This accumulates random noise from high-resolution timings
of system events like key-presses, mouse-clicks, disk accesses, etc.,
which take place under external influences and are by nature irregular in
timings.

The difference between /dev/random and /dev/urandom is that one read
from /dev/random effectively resets it, and further reads may be blocked
until sufficient new noise has accumulated; while repeated reads from
/dev/urandom are always possible (though with short time-lapses there
may not be much difference between successive reads).

The basic mechanism for this is via the command 'dd', on the lines of

  dd if=/dev/urandom of=newseed count=1 bs=32

which makes one ("count=1") read from input file ("if") /dev/urandom
of 32 bytes ("bs=32") into the output file ("of") newseed.

When I ran the above command on my machine just now, and inspected the
results in hex notation ('od -x newseed') I got (on-screen):

od -x newseed
000 4be9 7634 41cf 5e17 b068 7898 879e 8b5f
020 fb4f 52e6 59ef 0b58 5258 4a3a df04 c18d
040

where the initial "000" etc. denote byte-counts to the beginning
of the current line (expressed also in hex); so the actual byte content
of newseed is:

  4b e9 76 34 41 cf 5e 17 b0 68 78 98 87 9e 8b 5f
  fb 4f 52 e6 59 ef 0b 58 52 58 4a 3a df 04 c1 8d

This could be achieved via a system() call from R; and the contents
of newseed would then need to be converted into a format suitable
for use as argument to set.seed().

For the time being (not having time just now) I leave the details
to others ...
Ted.

-
E-Mail: (Ted Harding) 
Date: 20-Feb-2014  Time: 12:00:07
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate probability of union of independent events

2014-02-18 Thread Ted Harding

On 18-Feb-2014 22:08:38 Meinfelder, Florian wrote:
> Dear all,
> 
> I am looking for a way to calculate the probability of the union of k
> independent events in R. Is there a function that takes a k-dimensional
> probability vector as input and returns p(A_1 U A_2 U...U A_k)?
> 
> Thanks,
> Florian

I don't know (off-hand); but it is very easy to write one's own!

Since

  P(A1 U A2 U ... U Ak )

= 1 - P(not(A1 U A2 U ... u Ak))

= 1 - P((not A1) & (not A2) & ... & (not Ak))

= 1 - P(not A1)*P(not A2)* ... *P(not Ak)  [by independence]

= 1 - (1-p1)*(1-p2)* ... *(1-pk)

where pj is P(Aj). Hence

  punion <- function(p){1 - prod(1-p)}

should do it!

Ted.

---------
E-Mail: (Ted Harding) 
Date: 18-Feb-2014  Time: 23:51:31
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot a shifted Gamma distribution

2014-02-13 Thread Ted Harding

On 13-Feb-2014 15:30:43 Rodrigo Cesar Silva wrote:
> I have the parameters of a gamma distribution that I am trying to plot.
> The parameters are shape = 2, scale = 5.390275 and the minimum value
> x0 is 65.44945.
> 
> Since the gamma is shifted by x0, we have
> 
Mean = shape*scale + x0 = 76.23
> 
> My question is, how can I do it in r?
> 
> I was trying to use dgamma function, but I saw no options to offset
> the gamma distribution. I can only use the shape and scale parameters,
> like this:
> 
>> x <- seq(from=0, to=100, length.out=100)
> 
>> plot(x, dgamma(x, shape=2, scale=5.390275),
>   main="Gamma",type='l')

If all that is happening is that the distribution is the same as that
of a Gamma distribution with origin at 0, but simply shifted to the
right by an amount x0 (though I am wondering what context this could
arise in), then the commands

  x <- seq(from=0, to=100, length.out=100)
  x0 <- 65.44945
  plot(x+x0, dgamma(x, shape=2, scale=5.390275),
   main="Gamma",type='l')

will produce such a plot. However, I wonder if you have correctly
expressed the problem!

Ted.

> This generates a distribution with origin equal zero, but I want the
> origin to be x0
> 
> How can I handle shifted gamma distribution in r?
> 
> Thanks a lot!
> Rodrigo.

-
E-Mail: (Ted Harding) 
Date: 13-Feb-2014  Time: 17:27:29
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating an equivalent of r-help on r.stackexchange.com ?

2014-02-03 Thread Ted Harding

gt;>
>>> --
>>> Don MacQueen
>>>
>>> Lawrence Livermore National Laboratory
>>> 7000 East Ave., L-627
>>> Livermore, CA 94550
>>> 925-423-1062
>>>
>>>
>>>
>>>
>>>
>>> On 2/2/14 1:49 PM, "Liviu Andronic"  wrote:
>>>
>>>> Dear Duncan,
>>>> I discovered something interesting wrt to the licensing and mirroring
>>>> of user-contributed material on StackExchange.  Please read below.
>>>>
>>>>
>>>> On Sun, Nov 24, 2013 at 9:00 PM, Duncan Murdoch
>>>>  wrote:
>>>>>> I'm not aware of a discussion on this, but I would say no.
>>>>>> Fragmentation is bad. Further fragmentation is worse.
>>>>>>
>>>>>> TL;DR
>>>>>> =
>>>>>>
>>>>>> Actually I'd say all mailing lists except r-devel should be moving to
>>>>>> StackOverlow in the future (disclaimer: I'm not affiliated with it).
>>>>>
>>>>>
>>>>> I would generally agree with you, except for a few points.
>>>>>
>>>>> 1.  I avoid StackOverflow, because they claim copyright on the
>>>>> compilation.
>>>>> As I read their terms of service, it would be illegal for anyone to
>>>>> download
>>>>> and duplicate all postings about R.  So a posting there is only
>>>>> available as
>>>>> long as they choose to make it available. Postings to the mailing list
>>>>> are
>>>>> archived in several places.
>>>>>
>>>> It seems that StackOverflow is officially proposing user-generated
>>>> content for download/mirroring:
>>>> http://blog.stackoverflow.com/2014/01/stack-exchange-cc-data-now-hosted-by
>>>> -the-internet-archive/?cb=1
>>>>
>>>> "All community-contributed content on Stack Exchange is licensed under
>>>> the Creative Commons BY-SA 3.0 license. " And it is currently being
>>>> mirrored at least at the Internet Archive:
>>>> https://archive.org/details/stackexchange
>>>>
>>>> So, in principle, it would be possible/desirable to:
>>>> - spin the 'r' tag from StackOverflow and propose an r.stackexchange.com
>>>> at
>>>> http://area51.stackexchange.com/categories/8/technology . Such a SE
>>>> site would be similar to http://mathematica.stackexchange.com/
>>>> - involve R Core to give blessing for using the R logo, if necessary.
>>>> This would be similar to what Ubuntu does with AskUbuntu:
>>>> http://meta.askubuntu.com/questions/5444/is-ask-ubuntu-official-ubuntu
>>>> - set a mirror on r-project.org for all the user content that is
>>>> produced by r.stackexchange.com , and thus allow R Core to keep the
>>>> info publicly available at all times. The mirroring on Internet
>>>> Archive would still hold.
>>>>
>>>>
>>>>> 2.  I think an interface like StackOverflow is better than the mailing
>>>>> list
>>>>> interface, and will eventually win out.  R-help needs to do nothing,
>>>>> once
>>>>> someone puts together something like StackOverflow that attracts most
>>>>> of the
>>>>> people who give good answers, R-help will just fade away.
>>>>>
>>>> The advantages for such a move are countless (especially wrt to
>>>> efficiently organizing R-related knowledge and directing users to
>>>> appropriate sources of info), so I won't go into that. I would only
>>>> note that most 'r-sig-*' MLs would become obsolete in such a setup,
>>>> and would be replaced by the much more efficient tagging system of the
>>>> SE Q&A web interface (for example, all posts appropriate for r-sig-gui
>>>> would simply be tagged with 'gui'; no need for duplicated efforts of
>>>> monitoring multiple mailing lists).
>>>>
>>>> Opinions?
>>>>
>>>> Liviu
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 03-Feb-2014  Time: 21:49:47
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subscribe this mailing list

2014-01-12 Thread Ted Harding

[Also inline]
On 12-Jan-2014 17:45:03 Rui Barradas wrote:
> Hello,
> 
> Inline.
> 
> Em 12-01-2014 10:48, gj1989lh escreveu:
>> Hi,
>>
>>
>> How can I subscribe this mailing list?
> 
> Apparently, you already have.
> Welcome.

Well, apparently not. "gj1989lh" is not listed in the R-help
subscriber list, and the original posting was moderator approved,
so had been held for moderation (presumably because the address
was not subscribed).

The way to subscribe is to visit the web-page:

  https://stat.ethz.ch/mailman/listinfo/r-help

and follow the instructions which are set out in the section
"Subscribing to R-help"
(including the instructions relating to the follow-up confirmation
email which the subscriber will subsequently receive).

Hoping this helps (and replying to the R-help list so that
everone else can see that it has been dealt with).

The best address for enquiries about subscribing to/using/posting
to R-help is

  r-help-ow...@r-project.org

Ted.

>> thx
>>  [[alternative HTML version deleted]]
> 
> Don't post in html, please.
> 
> Rui Barradas
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 12-Jan-2014  Time: 18:01:25
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tracking what R actually executes

2014-01-02 Thread Ted Harding

On 02-Jan-2014 23:55:28 Duncan Murdoch wrote:
> On 14-01-02 6:05 PM, Fisher Dennis wrote:
>> R 3.0.2
>> All platforms
>>
>> Colleagues
>>
>> This question is probably conceptual rather than technical and I have not
>> thought out all of the issues yet.  Lets say I have an extensive list of
>> functions and some lines of code that call the functions.  I would like to
>> have a record of all the commands that were actually executed.  I realize
> that this could be voluminous but it might be necessary.
>> For example, the code and function might be:
>>
>> ###
>> INPUT:
>> COUNTER  <- function(N)
>>  for (i in 1:N)  cat(count, i, \n)
>> COUNTER(10)
>>
>> ###
>> OUTPUT:
>> cat(count, 1, \n)
>> cat(count, 2, \n)
>> cat(count, 3, \n)
>> cat(count, 4, \n)
>> cat(count, 5, \n)
>> cat(count, 6, \n)
>> cat(count, 7, \n)
>> cat(count, 8, \n)
>> cat(count, 9, \n)
>> cat(count, 10, \n)
>>
>> #
>> If I have formulated the question poorly, please do you best to understand
>> the intent.
>>
>> Dennis
> 
> As far as I know, R doesn't have exactly this built in, but the Rprof() 
> function gives an approximation.  It will interrupt the execution at a 
> regular time interval (1/50 sec is the default, I think), and record all 
> functions that are currently active on the execution stack.  So tiny 
> little functions could be missed, but bigger ones probably won't be.
> 
> There are also options to Rprof to give other profiling information, 
> including more detail on execution position (down to the line number), 
> and various measures of memory use.
> 
> Duncan Murdoch

Also have a look at

  ?trace

which you may be able to use for what you want.
Ted.

-
E-Mail: (Ted Harding) 
Date: 03-Jan-2014  Time: 00:14:51
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Season's Greetings (and great news ... )!

2013-12-22 Thread Ted Harding

Thanks for the comments, Bert and Mehmet! It is of course a serious
and interesting area to explore (and I'm aware of the chaos context;
I initially got into this areas year ago when I was exploring the
possibilities for chaos in fish population dynamics -- and they're
certainly there)!

But, before anyone takes my posting *too* seriously, let me say that
it was written tongue-in-cheek (or whatever the keyboard analogue of
that may be). I'm certainly not "blaming R".

Have fun anyway!
Ted.

On 22-Dec-2013 17:35:56 Bert Gunter wrote:
> Yes.
> 
> See also Feigenbaum's constant and chaos theory for the general context.
> 
> Cheers,
> Bert
> 
> On Sun, Dec 22, 2013 at 8:54 AM, Suzen, Mehmet  wrote:
>> I wouldn't blame R for floating-point arithmetic and our personal
>> feeling of what 'zero' should be.
>>
>>> options(digits=20)
>>> pi
>> [1] 3.141592653589793116
>>> sqrt(pi)^2
>> [1] 3.1415926535897926719
>>> (pi - sqrt(pi)^2) < 1e-15
>> [1] TRUE
>>
>> There was a similar post before, for example see:
>> http://r.789695.n4.nabble.com/Why-does-sin-pi-not-return-0-td4676963.html
>>
>> There is an example by Martin Maechler (author of Rmpfr) on how to use
>> arbitrary precision
>> with your arithmetic.
>>
>> On 22 December 2013 10:59, Ted Harding  wrote:
>>> Greetings All!
>>> With the Festive Season fast approaching, I bring you joy
>>> with the news (which you will surely wish to celebrate)
>>> that R cannot do arithmetic!
>>>
>>> Usually, this is manifest in a trivial way when users report
>>> puzzlement that, for instance,
>>>
>>>   sqrt(pi)^2 == pi
>>>   # [1] FALSE
>>>
>>> which is the result of a (generally trivial) rounding or
>>> truncation error:
>>>
>>>   sqrt(pi)^2 - pi
>>>   [1] -4.440892e-16
>>>
>>> But for some very simple calculations R goes off its head.
>>>
>>> I had originally posted this example some years ago, but I
>>> have since generalised it, and the generalisation is even
>>> more entertaining than the original.
>>>
>>> The Original:
>>> Consider a sequence generated by the recurrence relation
>>>
>>>   x[n+1] = 2*x[n] if 0 <= x[n] <= 1/2
>>>   x[n+1] = 2*(1 - x[n]) if 1/2 < x[n] <= 1
>>>
>>> (for 0 <= x[n] <= 1).
>>>
>>> This has equilibrium points (x[n+1] = x[n]) at x[n] = 0
>>> and at x[n] = 2/3:
>>>
>>>   2/3 -> 2*(1 - 2/3) = 2/3
>>>
>>> It also has periodic points, e.g.
>>>
>>>   2/5 -> 4/5 -> 2/5 (period 2)
>>>   2/9 -> 4/9 -> 8/9 -> 2/9 (period 3)
>>>
>>> The recurrence relation can be implemented as the R function
>>>
>>>   nextx <- function(x){
>>> if( (0<=x)&(x<=1/2) ) {x <- 2*x} else {x <- 2*(1 - x)}
>>>   }
>>>
>>> Now have a look at what happens when we start at the equilibrium
>>> point x = 2/3:
>>>
>>>   N <- 1 ; x <- 2/3
>>>   while(x > 0){
>>> cat(sprintf("%i: %.9f\n",N,x))
>>> x <- nextx(x) ; N <- N+1
>>>   }
>>>   cat(sprintf("%i: %.9f\n",N,x))
>>>
>>> Run that, and you will see that successive values of x collapse
>>> towards zero. Things look fine to start with:
>>>
>>>   1: 0.7
>>>   2: 0.7
>>>   3: 0.7
>>>   4: 0.7
>>>   5: 0.7
>>>   ...
>>>
>>> but, later on,
>>>
>>>   24: 0.7
>>>   25: 0.6
>>>   26: 0.8
>>>   27: 0.4
>>>   28: 0.66672
>>>   ...
>>>
>>>   46: 0.667968750
>>>   47: 0.664062500
>>>   48: 0.671875000
>>>   49: 0.65625
>>>   50: 0.68750
>>>   51: 0.62500
>>>   52: 0.75000
>>>   53: 0.5
>>>   54: 1.0
>>>   55: 0.0
>>>
>>> What is happening is that, each time R multiplies by 2, the binary
>>> representation is shifted up by one and a zero bit is introduced
>>> at the bottom end. To illustrate this, do the calculation in
>>> 7-bit arithmetic where 2/3 = 0.1010101, so:
>>>
>>> 0.1010101  x[1], >1/2 so subtract from 1 = 1.000 -> 0.0101011,
>>> and then multiply by 2 to get x[2] = 0.1010110. Hence
>>

[R] Season's Greetings (and great news ... )!

2013-12-22 Thread Ted Harding

2 4 8 16 6 12 ...
so period = 9.

And so on ...

So, one sniff of something like S<-19, and R is off its head!

All it has to do is multiply by 2 -- and it gets it cumulatively wrong!
R just doesn't add up ...

Season's Greetings to all -- and may your calculations always
be accurate -- to within machine precision ...

Ted.

-
E-Mail: (Ted Harding) 
Date: 22-Dec-2013  Time: 09:59:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] iterated sum

2013-12-14 Thread Ted Harding

On 14-Dec-2013 10:46:10 Ë®¾²Á÷Éî wrote:
> x<-c(1,4,9,20,3,7)
> i want to get a serie c(5,13,29,23,10).
>  y <- c()
>  for (i in 2:length(x)){
>  y[i-1] <- x[i-1]+x[i]}
> 
> is there more simple way to get?

  x <- c(1,4,9,20,3,7)
  N <- length(x)

  x[1:(N-1)] + x[2:N]
  # [1]  5 13 29 23 10

Best wishes,
Ted.

-------------
E-Mail: (Ted Harding) 
Date: 14-Dec-2013  Time: 10:54:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting decimal to binary in R

2013-12-14 Thread Ted Harding

> On Fri, Dec 13, 2013 at 10:11 PM, æ°´éæµæ·± <1248283...@qq.com> wrote:
>> i  have write a function to convert decimal number into binary number in R.
>>
>> dectobin<-function(x){
>>   as.numeric(intToBits(x))->x1
>>   paste(x1,collapse="")->x2
>>   as.numeric(gsub("0+$","",x2))->x3
>>   return(as.character(x3))}
>>
>> dectobin can get right result ,it is so long ,is there a  build-in function
>> to do ?

On 14-Dec-2013 06:17:30 Richard M. Heiberger wrote:
> I recommend
> ?sprintf
> 
> (4^(1/3))^3 != 4
> (4^(1/3))^3 == 4
> (4^(1/3))^3 - 4
> format(c((4^(1/3))^3 , 4), digits=17)
> sprintf("%+13.13a", c((4^(1/3))^3 , 4))

The above generates a hexadecinal representation, not binary!
So it needs further substitutions to get the binary representation.

Can I add a tip which I have very often found useful for this kind
of global substitution? Not just binary -- I first cooked it up
in text-editing when faced with replacing "European" numbers by
"Anglo-Saxon" numbers -- e.g. "1.234.567,89" needs to be converted
into "1,234,567.89", therefore swapping "." and ",". But you don't
want to do "." --> "," and then "," --> "." since you would then
end up with  "1.234.567,89" --> "1,234,567,89" --> "1.234.567.89"

There, the trick was to use a character such as "#", which does
not appear in the text, as a marker for the first substitution while
the second is being made. Then substitute the desired character for "#":
"1.234.567,89" --> "1#234#567,89" --> "1#234#567.89" --> "1,234,567.89"
(first replacing "." by "#", then finally "#" by ",").

You need to replace, for instance, "0" in hex by "" in binary,
"1" by "0001", "2" by "0010", ... , "A" by "1010", and so on.
However, you need to avoid replacing already-replaced symbols.

So I suggest using, in a first round, "U" for "1" and "Z" for "0"
(or whatever you prefer, provided it avoids "0" and "1").
So 0 -> , 1 -> ZZZU, ... , A -> UZUZ, etc.
Then, finally, replace each "Z" by "0" and each "U" by "1".

Hence (using a truncated representation), sqrt(pi) in hex is:

  sprintf("%+8.8A", sqrt(pi))
  # [1] "+0X1.C5BF891BP+0"

Then the successive substitutions (which can of course be programmed)
would be:

"+0X1.C5BF891BP+0"

0: "+X1.C5BF891BP+"
1: "+XZZZU.C5BF89ZZZUBP+"
2: "+XZZZU.C5BF89ZZZUBP+"
3: "+XZZZU.C5BF89ZZZUBP+"
4: "+XZZZU.C5BF89ZZZUBP+"
5: "+XZZZU.CZUZUBF89ZZZUBP+"
6: "+XZZZU.CZUZUBF89ZZZUBP+"
7: "+XZZZU.CZUZUBF89ZZZUBP+"
8: "+XZZZU.CZUZUBFUZZZ9ZZZUBP+"
9: "+XZZZU.CZUZUBFUZZZUZZUZZZUBP+"
A: "+XZZZU.CZUZUBFUZZZUZZUZZZUBP+"
B: "+XZZZU.CZUZUUZUUFUZZZUZZUZZZUUZUUP+"
C: "+XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+"
D: "+XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+"
E: "+XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+"
F: "+XZZZU.UUZZZUZUUZUUUZZZUZZUZZZUUZUUP+"

Z: "+X000U.UU000U0UU0UUU000U00U000UU0UUP+"
U: "+X0001.1100010110111000100100011011P+"

The final result probably needs tidying up in accordance with
the needs of subsequent uses!

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 14-Dec-2013  Time: 10:50:03
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fortune? [was: Re: quotation marks and scan]

2013-11-17 Thread Ted Harding

[See in-line below]
On 17-Nov-2013 22:38:30 Rolf Turner wrote:
> 
> (1) The backslashes are not really there; they are an artefact of the R 
> print() function.
> Try cat(u,"\n").  I think this might be an FAQ.
> 
> (2) Is not your problem the fact that your are setting "replacement" 
> equal to the
> thing you are trying to get rid of?  I.e. don't you want
> 
>  v <- gsub(pattern='\"',replacement='',x=u) ???
> 


> Either I am misunderstanding your intent or you need another cup of coffee.

Is the above line a Fortune?


>  cheers,
> 
>  Rolf
> 
> On 11/18/13 11:07, Erin Hodgess wrote:
>> Dear R People:
>>
>> I'm sure that this is a very simple problem, but I have been wresting with
>> it for some time.
>>
>> I have the following file that has the following one line:
>>
>>   CRS("+init=epsg:28992")
>>
>> Fair enough.  I scan it into R and get the following:
>>
>>> u
>> [1] "CRS(\"+init=epsg:28992\")"
>>> gsub(pattern='\"',replacement='"',x=u)
>> [1] "CRS(\"+init=epsg:28992\")"
>>
>> I need to get rid of the extra quotation marks and slashes.  I've tried all
>> sorts of things, including gsub, as you see,  but no good.
> 
> __________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 17-Nov-2013  Time: 23:55:48
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] all combinations with replacement not ordered

2013-11-07 Thread Ted Harding

On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote:
> Hello!
> 
> I need to obtain all possible combinations with replacement when
> order is not important.
> E.g. I have a population x{1,2,3}.
> So I can get (choose(3+3-1,3)=) 10 combinations from this population
> with 'size=3'.
> How can I get a list of all that combinations?
> 
> I have tried 'expand.grid()' and managed to get only samples where
> order is important.
> 'combn()' gave me samples without replacement.
> 
> Best regards,
> Konstantin Tretyakov.

>From your description I infer that, from {1,2,3}, you want the result:

  1 1 1
  1 1 2
  1 1 3
  1 2 2
  1 2 3
  1 3 3
  2 2 2
  2 2 3
  2 3 3
  3 3 3

The following will do that:

u <- c(1,2,3)
unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1)))

#  [,1] [,2] [,3]
# [1,]111
# [2,]112
# [3,]113
# [4,]122
# [5,]123
# [6,]133
# [7,]222
# [9,]233
#[10,]333

There may be a simpler way!
Ted.

-
E-Mail: (Ted Harding) 
Date: 07-Nov-2013  Time: 17:04:50
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there something wrong with R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"?

2013-10-16 Thread Ted Harding

On 16-Oct-2013 14:54:00 tom soyer wrote:
> Hi,
> 
> pnorm(-1.53,0,1) under version 3.0.2 gives 0.05155075. I am pretty sure it
> should be 0.063. Is there something wrong with this version of R?
> 
> I am using:
> R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
> Copyright (C) 2013 The R Foundation for Statistical Computing
> Platform: i686-pc-linux-gnu (32-bit)
> -- 
> Tom

If you did exactly as you describe, then something is indeed wrong:

  pnorm(-1.53,0,1)
  # [1] 0.06300836
  [R version 2.11.0 (2010-04-22)]

as you expected.

However:

  qnorm(0.05155075)
  [1] -1.63

so maybe you mistyped "1.63" instead of "1.53"?

-----
E-Mail: (Ted Harding) 
Date: 16-Oct-2013  Time: 16:12:56
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help: concurrent R sessions for different settings of simulations

2013-09-29 Thread Ted Harding

[See at end]
On 29-Sep-2013 17:22:24 Chee Chen wrote:
> Dear All, 
> I have spent almost 2 days but did not succeed yet.
> 
> Problem description:  I have 3 parameters, p1, p2 and p3, for which
> p1 take 1 of 5 possible distributions (e.g., normal, laplace),
> p2 takes 1 of 3 possible distributions, and p3 takes 1 of 5 possible
> distribution. These 3 parameters create 75 settings, and these 3
> parameters are arguments of a function F; and F is part of simulation
> codes. To summarize: different value of the ordered triple (p1,p2,p3)
> means different setting and this is the only difference in the
> simulation codes. 
> 
> Target to achieve: instead of loop through each of the 75 settings
> one after another, I would like to concurrently run all 75 settings
> on the cluster.
> 
> My attempts: via loops, I used Perl to create 75 files, each for a
> different triple (p1,p2,p3), and Perl uses "system(R ..)" to execute
> this setting once it is created. The Perl codes are submitted to
> cluster correctly. But when I looked into the log file, the cluster
> still executes it one setting after another setting. 
> 
> Request: any help is appreciated!  It is because of the loops of Perl
> that executes a setting once it is created?
> 
> Have a nice day!
> Chee

Just a simple comment (which does not cionsider the technicalities
of using Perl, using a cluster, etc.).

>From your description, it looks as though the system waits for one
item in the loop to finish before it starts the next one.

If that is the case, and *if* you are using UNIX/Linux (or other
UNIX-like OS), then you could try appending " &" to each submitted
command. An outline exemplar:

  for( s in settings ){
system("R  &")
  }

The " &" has the effect, in a UNIX command line, of detaching the
command from the executing program. So the program can continue to
run (and take as long as it likes) while the system command-shell
is immediately freed up for the next command.

Therefore, with the above exemplar, is there were say 75 settings,
then that loop would complete in a very short time, after which
you would have 75 copies of R executing simulations, and your
original R command-line would be available.

Just a suggestion (which may have missed the essential point of
your query, but worth a try ... ).

I have no idea how to achieve a similar effect in Windows ...

Ted.

-
E-Mail: (Ted Harding) 
Date: 29-Sep-2013  Time: 19:31:29
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why does sin(pi) not return 0?

2013-09-26 Thread Ted Harding

On 26-Sep-2013 07:55:38 Rolf Turner wrote:
> On 09/26/13 19:31, Babak Bastan wrote:
>> Hi experts,
>>
>> If I test sin(pi) in r, it returns me 1.224606e-16
>>
>> Why doesn't return me 0?
> 
> If you think that 1.224606e-16 is different from 0, you should probably not
> be using computers.

Is that a Fortune? And, if so, should R be using computers?

  sin(pi)
  # [1] 1.224606e-16
  sin(pi)==0
  # [1] FALSE

> See FAQ 7.31 (which is in a way about the inverse of
> your question, but it should provide the necessary insight).
> 
>  cheers,
>  Rolf Turner

Though, mind you, FAQ 3.71 does also offer some consolation to R:

  all.equal(0,sin(pi))
  # [1] TRUE

So it depends on what you mean by "different from". Computers
have their own fuzzy concept of this ... Babak has too fussy
a concept.

Ted.

-
E-Mail: (Ted Harding) 
Date: 26-Sep-2013  Time: 09:13:33
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coordinate scales for pairs plot

2013-08-21 Thread Ted Harding

On 21-Aug-2013 19:08:29 David Winsemius wrote:
> 
> On Aug 21, 2013, at 10:30 AM, (Ted Harding) wrote:
> 
>> Greetings all.
>> 
>> I suspect this question has already been asked. Apologies
>> for not having taced it ...
>> 
>> In the default pairs plot produces by the function pairs(),
>> the coordinate scales alternate between top and bottom and
>> right and left sides.
>> 
>> For example, in a 5x5 plot for variables X1, X2, X3, X4, X5
>> the coordinate scales for X2, X4 appear beside rows 2 and 4
>> on the left, and the scales for X1, X3, X5 appear beside rows
>> 1, 3, 5 on the right.
>> 
>> Similarly, the scales for X2 and X4 appear above columns 2 and 4,
>> and the scales for X1, X3, X5 appear below columns 1, 3, 5.
>> 
>> Is there a parameter lurking somewhere in the depths of this
>> function which can be set so that the scales for all the variables
>> X1,X2,X3,X4,X5 appear both above and below  columns 1,2,3,4,5;
>> and both to the left and to the right of rows 1,2,3,4,5?
> 
> I've searched for a parameter and come up empty; Hacking the code for
> pairs.default is not that difficult. I stripped out the conditionals that
> were driving the Axis calls to alternating "sides": 
> Search for `box()` to start this surgery and replace everything to the 'mfg'
> assignment to get uniform axis locations on sides 1 and 2.
> 
> pairs.12 <- function(x, ... arglist same as pairs.default)
>{upper portion of code
> box()
> if (i == nc ) 
> localAxis(1L , x[, j], x[, i], 
>   ...)
> if (j == 1 ) 
> localAxis(2L, x[, j], x[, i], ...)
> 
> mfg <- par("mfg")
>lower portion of code }
> 
> Oooops,  that wasn't what you asked for ... Use this instead:
> 
> 
> box()  # begin surgery
> if (i == 1 ) 
> localAxis(3, x[, j], x[, i],  ...)
> if (i == nc ) 
> localAxis(1, x[, j], x[, i],  ...)
> if (j == 1 ) 
> localAxis(2L, x[, j], x[, i], ...)
> if (j == nc ) 
> localAxis(4L, x[, j], x[, i], ...)
> # end anastomosis
>  mfg <- par("mfg")
> ..
> --
> David Winsemius
> Alameda, CA, USA

Thanks very much, David! I'll give it a try. It looks promising.

Good surgery, steady hand!
Ted.

-
E-Mail: (Ted Harding) 
Date: 21-Aug-2013  Time: 21:23:39
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Coordinate scales for pairs plot

2013-08-21 Thread Ted Harding

Greetings all.

I suspect this question has already been asked. Apologies
for not having taced it ...

In the default pairs plot produces by the function pairs(),
the coordinate scales alternate between top and bottom and
right and left sides.

For example, in a 5x5 plot for variables X1, X2, X3, X4, X5
the coordinate scales for X2, X4 appear beside rows 2 and 4
on the left, and the scales for X1, X3, X5 appear beside rows
1, 3, 5 on the right.

Similarly, the scales for X2 and X4 appear above columns 2 and 4,
and the scales for X1, X3, X5 appear below columns 1, 3, 5.

Is there a parameter lurking somewhere in the depths of this
function which can be set so that the scales for all the variables
X1,X2,X3,X4,X5 appear both above and below  columns 1,2,3,4,5;
and both to the left and to the right of rows 1,2,3,4,5?

With thanks,
Ted.

-
E-Mail: (Ted Harding) 
Date: 21-Aug-2013  Time: 18:30:44
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odds ratio per standard deviation

2013-06-12 Thread Ted Harding

[Replies transposed so as to achieve bottom-posting ... ]

On 12-Jun-2013 14:53:02 Greg Snow wrote:
> 
> On Tue, Jun 11, 2013 at 7:38 PM, vinhnguyen04x  wrote:
> 
>> Hi all
>> i have a question:
>>
>> why and when do we use odds ratio per standard deviation instead of odds
>> ratio?
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/odds-ratio-per-standard-deviation-tp4669315.htm
>> l
>> Sent from the R help mailing list archive at Nabble.com.
>> __
> 
> Without context this is a shot in the dark, but my guess is this is
> referring to something like a logistic regression where the odds ratio
> (exponential of the coefficient) refers to the change in odds for the
> outcome for a 1 unit change in x.  Now often a 1 unit change in x is very
> meaningful, but other times it is not that meaningful, e.g. if x is
> measuring the size of diamonds in carats and the data does not span an
> entire carat, or x is measured in days and our x variable spans years.  In
> these cases it can make more sense to talk about the change in the odds
> relative to a different step size than just a 1 unit change in x, a
> reasonable thing to use is the change in odds for a one standard deviation
> change in x (much smaller than 1 for the diamonds, much larger for the
> days), this may be the odds ratio per standard deviation that you mention.
> -- 
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com

I think there is one comment that needs to be made about this!

The odds ratio "per unit change in x" means exactly what it says,
and can be converted into the odds ratio per any other amount of
change in x very easily. With x originally in (say) days, and
with estimated logistic regression logodds = a + b*x, if you
change your unit of x to, say weeks, so that x' = x/7, then this
is equivalent to changing b to b' = 7*b. Now just take your sliderule;
no need for R (oops, now off-topic ... ).

Another comment: I do not favour blindly "standardising" a variable
relative to its standard deviation in the data. The SD may be what
it is for any number of reasons, ranging from x being randomly sampled
fron a population to x being assigned specific values in a designed
experiment.

Since, for exactly the same meanings of the variables in the regression,
the standard deviation may change from one set of data to another of
exactly the same kind, the "odds per standard deviation" could vary
from dataset to dataset of the same investigation, and even vary
dramatically. That looks like a situation to avoid, unless it is very
carefully discussed!

The one argument that might give some sense to "odds ratio per standard
deviation" could apply when x has been sampled from a population in
which the variation of x can be approximately described by a Normal
distribution. Then "odds ratio per standard deviation" refers to
a change from, say, the mean/median of the population to the 84th
percentile, or from the 31st percentile to the 69th percentile,
or from the 69th percentile to the 93rd percentile, etc.
But these steps cover somewhat different proportions of the populatipn:
50th to 85th = 35%; 31st to 69th = 38%; 69th to 93rd = 24%. So you are
still facing issues of what you mean, or what you want to mean.

Simpler to stick to the original "odds per unit of x" and then apply
it to whatever multiple of the unit you happen to be interested in
as a change (along with the reasons for that interest).

Ted.

-
E-Mail: (Ted Harding) 
Date: 12-Jun-2013  Time: 17:14:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: Your message to R-help awaits moderator approval

2013-06-01 Thread Ted Harding

[See at end]

On 01-Jun-2013 17:52:01 Janh Anni wrote:
> Hello,
> 
> I don't understand why my mails are being held up.  What could be the
> problem?
> 
> Thanks
> Janh
> -- Forwarded message --
> From: 
> Date: Sat, Jun 1, 2013 at 1:48 PM
> Subject: Your message to R-help awaits moderator approval
> To: annij...@gmail.com
> 
> 
> Your mail to 'R-help' with the subject
> 
> Re: [R] wilcox_test function in coin package
> 
> Is being held until the list moderator can review it for approval.
> 
> The reason it is being held:
> 
> The message headers matched a filter rule
> 
> Either the message will get posted to the list, or you will receive
> notification of the moderator's decision.  If you would like to cancel
> this posting, please visit the following URL:
> 
> https://stat.ethz.ch/mailman/confirm/r-help/067afe28f7ead30dfea844b8a34449526c
> d665d8

This can happen to anyone, depending on the current sensitivity
of the mail-server's spam-detection filter to potential
"triggers" in the message.

It is particularly likely to arise with mails posted from a gmail
account, as your was. This is because gmail is a major source of
spam emails, and the spam filter is alert to these.

I have had a look at your message (which was duly approved), and
I see that it is in reply to a message which itself is in a thread
that includes several messages sent via gmail. From the headers
of your message:

In-Reply-To: <51a9291c.70...@ucalgary.ca>
References:

 <51a9291c.70...@ucalgary.ca>

so that's a total of 6 references to gmail (including your own message)
which is probably why the spam filter felt a bit twitchy!

Don't worry about it. As I say, it can happen to anyone (though more
often to some than to others). If it is a proper message to R-help,
one of the moderators will approve it (though quite possible not
immediately).

Hoping this helps,
Ted (one of the moderators)

-
E-Mail: (Ted Harding) 
Date: 01-Jun-2013  Time: 20:11:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected behavior of "apply" when FUN=sample

2013-05-14 Thread Ted Harding

On 14-May-2013 09:46:32 Duncan Murdoch wrote:
> On 13-05-14 4:52 AM, Luca Nanetti wrote:
>> Dear experts,
>>
>> I wanted to signal a peculiar, unexpected behaviour of 'apply'.
>> It is not a bug, it is per spec, but it is so counterintuitive
>> that I thought it could be interesting.
>>
>> I have an array, let's say "test", dim=c(7,5).
>>
>>> test <- array(1:35, dim=c(7, 5))
>>> test
>>
>>   [,1] [,2] [,3] [,4] [,5]
>> [1,]18   15   22   29
>> [2,]29   16   23   30
>> [3,]3   10   17   24   31
>> [4,]4   11   18   25   32
>> [5,]5   12   19   26   33
>> [6,]6   13   20   27   34
>> [7,]7   14   21   28   35
>>
>> I want a new array where the content of the rows (columns) are
>> permuted, differently per row (per column)
>>
>> Let's start with the columns, i.e. the second MARGIN of the array:
>>> test.m2 <- apply(test, 2, sample)
>>> test.m2
>>
>>   [,1] [,2] [,3] [,4] [,5]
>> [1,]1   10   18   23   32
>> [2,]79   16   25   30
>> [3,]6   14   17   22   33
>> [4,]4   11   15   24   34
>> [5,]2   12   21   28   31
>> [6,]58   20   26   29
>> [7,]3   13   19   27   35
>>
>> perfect. That was exactly what I wanted: the content of each column is
>> shuffled, and differently for each column.
>> However, if I use the same with the rows (MARGIIN = 1), the output is
>> transposed!
>>
>>> test.m1 <- apply(test, 1, sample)
>>> test.m1
>>
>>   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
>> [1,]12345   13   21
>> [2,]   22   30   17   18   19   20   35
>> [3,]   15   23   24   32   26   27   14
>> [4,]   29   16   31   25   33   34   28
>> [5,]89   10   11   1267
>>
>> In other words, I wanted to permute the content of the rows of "test", and
>> I expected to see in the output, well, the shuffled rows as rows, not as
>> column!
>>
>> I would respectfully suggest to make this behavior more explicit in the
>> documentation.
> 
> It's is already very explicit:  "If each call to FUN returns a vector of 
> length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) 
> if n > 1."  In your first case, sample is applied to columns, and 
> returns length 7 results, so the shape of the final result is c(7, 5). 
> In the second case it is applied to rows, and returns length 5 results, 
> so the shape is c(5, 7).
> 
> Duncan Murdoch

And the (quite simple) practical implication of what Duncan points out is:

  test <- array(1:35, dim=c(7, 5))
  test
  #  [,1] [,2] [,3] [,4] [,5]
  # [1,]18   15   22   29
  # [2,]29   16   23   30
  # [3,]3   10   17   24   31
  # [4,]4   11   18   25   32
  # [5,]5   12   19   26   33
  # [6,]6   13   20   27   34
  # [7,]7   14   21   28   35

# To permute the rows:
  t(apply(t(test), 2, sample))
  #  [,1] [,2] [,3] [,4] [,5]
  # [1,]   22   298   151
  # [2,]   30   16   2329
  # [3,]   10   31   243   17
  # [4,]   114   25   32   18
  # [5,]   265   12   33   19
  # [6,]   27   34   20   136
  # [7,]   35   28   147   21

which looks right!
Ted.

-
E-Mail: (Ted Harding) 
Date: 14-May-2013  Time: 11:07:46
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Choice of statistical test (in R) of two apparently different distributions

2013-05-09 Thread Ted Harding

On 09-May-2013 01:42:07 Pascal Oettli wrote:
> On 05/09/2013 10:29 AM, Gundala Viswanath wrote:
>> I have the following list of data each has 10 samples.
>> The values indicate binding strength of a particular molecule.
>>
>> What I want so show is that 'x' is statistically different from
>> 'y', 'z' and 'w'.  Which it does if you look at X it has
>> more values greater than zero (2.8,1.00,5.4, etc) than others.
>>
>> I tried t-test, but all of them shows insignificant difference
>> with high P-value.
>>
>> What's the appropriate test for that?
>>
>> Below is my code:
>>
>> x   <-
>> c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,0.065654
>> 548,0.971907344,3.325405405,0.606504718)
>> y   <-
>> c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,0.888600425,0.72
>> 8667099,0.000375047,0.911153571,0.553786408);
>> z   <-
>> c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,0.408554832,0.7
>> 99010791,0.737676439,0.433279599,0.947906524)
>> w   <-
>> c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,0.338574774,0.2
>> 11689885,0.434050179,0.955522972,0.014195184)
>>
>> t.test(x,y)
>> t.test(x,z)
>>
>> --END--
>>
>> G.V.
> 
> Hello,
> 
> 1) Why 'x' should be statistically different from others?
> 2) 'y' looks to be bimodal. The mean is not an appropriate measurement 
> for this kind of distribution.
> 
> Regards,
> Pascal

Running the commands:

  plot(x,pch="+",col="red",ylim=c(0,6))
  points(y,pch="+",col="green")
  points(z,pch="+",col="blue")
  points(w,pch="+",col="black")
  lines(x,col="red")
  lines(y,col="green")
  lines(z,col="blue")
  lines(w,col="black")

indicates that y, z and w are similar to each other (with some
suggestion of a serial structure).

However, while part of x is also similar to y, z and w, it is
clear that 3 values of x are "outliers" (well above the range
of all other values, including those of x). [And I think Pascal
meant "x" when he wrote "'y' looks to be bimodal."]

And it may be of interest that these exceptional values of x
occur at x[1], x[5], x[9] (i.e. every 4th observation).

Taken together, these facts suggest that an examination of the
procedure giving rise to the data may be relevant. As one
example of the sort of thing to look for: were the 3 outlying
observations obtained by the same worker/laboratory/apparatus
as the others (or a similar question for x as opposed to y, z, w,
raising issues of reliability). There are many similar questions
one could think of raising, but knowledge of the background
is essential for appropriate choice!

I would agree with Pascal that a "routine" t-test is not
appropriate.

One thing that can be directly looked at statistically
is, taking as given that there are 3 outliers somewhere
in all 40 data, what is the probability that all three
occur in one of the 4 groups (x,y,z,w) of data?

This is 4 times the probability that they occur is a specific
group (say x). The chance of all 3 being in x is the number
of ways of choosing the remaining 7 out of the remaining 37,
divided by the number of ways of choosing any 10 out of 40,
i.e. (in R-speak)

  choose(37,7)/choose(40,10)
  # [1] 0.01214575

so the chance of all 3 being in some one of the 4 groups is

  4*choose(37,7)/choose(40,10)
  # [1] 0.048583

which, if you are addicted to P-values, is just significant
at the 5% (P <= 0.05) level. So this gives some indication
that the "x" group of data is not on the same footing as the
other ("y", "z", "w") groups. However, such a test does not
address any question of why such outliers should be there
in the first place; this needs to be addressed differently
(see above).

And one must not forget that the above "P-value" has been
obtained by a method which was prompted by looking at the data
in the first place.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 09-May-2013  Time: 09:35:05
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decomposing a List

2013-04-25 Thread Ted Harding

Thanks, Jorge, that seems to work beautifully!
(Now to try to understand why ... but that's for later).
Ted.

On 25-Apr-2013 10:21:29 Jorge I Velez wrote:
> Dear Dr. Harding,
> 
> Try
> 
> sapply(L, "[", 1)
> sapply(L, "[", 2)
> 
> HTH,
> Jorge.-
> 
> 
> 
> On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding wrote:
> 
>> Greetings!
>> For some reason I am not managing to work out how to do this
>> (in principle) simple task!
>>
>> As a result of applying strsplit() to a vector of character strings,
>> I have a long list L (N elements), where each element is a vector
>> of two character strings, like:
>>
>>   L[1] = c("A1","B1")
>>   L[2] = c("A2","B2")
>>   L[3] = c("A3","B3")
>>   [etc.]
>>
>> >From L, I wish to obtain (as directly as possible, e.g. avoiding
>> a loop) two vectors each of length N where one contains the strings
>> that are first in the pair, and the other contains the strings
>> which are second, i.e. from L (as above) I would want to extract:
>>
>>   V1 = c("A1","A2","A3",...)
>>   V2 = c("B1","B2","B3",...)
>>
>> Suggestions?
>>
>> With thanks,
>> Ted.
>>
>> -
>> E-Mail: (Ted Harding) 
>> Date: 25-Apr-2013  Time: 11:16:46
>> This message was sent by XFMail
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

-
E-Mail: (Ted Harding) 
Date: 25-Apr-2013  Time: 11:31:57
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decomposing a List

2013-04-25 Thread Ted Harding

Greetings!
For some reason I am not managing to work out how to do this
(in principle) simple task!

As a result of applying strsplit() to a vector of character strings,
I have a long list L (N elements), where each element is a vector
of two character strings, like:

  L[1] = c("A1","B1")
  L[2] = c("A2","B2")
  L[3] = c("A3","B3")
  [etc.]

>From L, I wish to obtain (as directly as possible, e.g. avoiding
a loop) two vectors each of length N where one contains the strings
that are first in the pair, and the other contains the strings
which are second, i.e. from L (as above) I would want to extract:

  V1 = c("A1","A2","A3",...)
  V2 = c("B1","B2","B3",...)

Suggestions?

With thanks,
Ted.

-
E-Mail: (Ted Harding) 
Date: 25-Apr-2013  Time: 11:16:46
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a large number into smaller numbers and find the largest product

2013-04-18 Thread Ted Harding

On 18-Apr-2013 08:47:18 Janesh Devkota wrote:
> Hello,
> 
> I have a big number lets say of around hundred digits. I want to subset
> that big number into consecutive number of 5 digits and find the product of
> those 5 digits. For example my first 5 digit number would be 73167. I need
> to check the product of the individual numbers in 73167 and so on.
> 
> The sample number is as follows:
> 
> 
> 731671765313306249192251196744265747423553491949349698352031277450632623957831
> 801698480186947885184385861560789112949495459501737958331952853208805511125406
> 98747158523863050715693290963295227443043557
> 
> I have a problem subsetting the small numbers out of the big number.
> 
> Any help is highly appreciated.
> 
> Best Regards,
> Janesh Devkota

Since the number is too large to be stored in any numerical format in R,
it needs to be stored as a character string:

X <- "73167176531330624963295227443043557".

Then you can easily access successive 5-digit blocks as, e.g. for
block i (i = 1:N, where 5*N is the length of the number):

  block <- substr(X, 1+5*(i-1), 5*i)

You now have a 5-character string from which you can extract the
individual digits. And then on to whatever you want to do ...

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 18-Apr-2013  Time: 10:06:43
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] I don't understand the 'order' function

2013-04-16 Thread Ted Harding

[See in-line below[

On 16-Apr-2013 17:51:41 Julio Sergio wrote:
> I thought I've understood the 'order' function, using simple examples like:
> 
>order(c(5,4,-2))
>[1] 3 2 1
> 
> However, I arrived to the following example:
> 
>order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) 
>[1]  8  9 10  7 11  6  5  4  3  2  1
> 
> and I was completely perplexed!
> Shouldn't the output vector be  11 10 9 8 7 6 4 1 2 3 5 ?
> Do I have a damaged version of R?

I think the simplest explanation can be given as:

S <- c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)
cbind(Index=1:length(S), S, Order=order(S), Sort=sort(S))
  IndexS Order Sort
 [1,] 1 2465 8  210
 [2,] 2 2255 9  210
 [3,] 3 208510  505
 [4,] 4 1545 7  920
 [5,] 5 133511 1045
 [6,] 6 1210 6 1210
 [7,] 7  920 5 1335
 [8,] 8  210 4 1545
 [9,] 9  210 3 2085
[10,]10  505 2 2255
[11,]11 1045 1 2465

showing that the value of 'order' for any one of the numbers
is the Index (position) of that number in the original series,
placed in the position that number occupies in the sorted series.
(With a tie for S[8] = S[9] = 210).

For example: which one of S occurs in 5th position in the sorted
series? It is the 11th of S (1045).

> I became still more astonished when I used the sort function and got the 
> right answer: 
> 
> sort(c(2465, 2255, 2085, 1545, 1335, 1210,  920,  210,  210,  505, 1045))
> [1]  210  210  505  920 1045 1210 1335 1545 2085 2255 2465
> since 'sort' documentation claims to be using 'order' to establish the right 
> order.

Indeed, once you have order(S), you know which element of S to put in
each position of the sorted order:

  S[order(S)]
  [1]  210  210  505  920 1045 1210 1335 1545 2085 2255 2465

Does this help to explain it?
Ted.

> Please help me to understand all this!
> 
>   Thanks,
> 
>   -Sergio.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 16-Apr-2013  Time: 19:12:21
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %*%

2013-04-11 Thread Ted Harding

On 11-Apr-2013 10:25:17 Shane Carey wrote:
> What does these operators do: %*%
> 
> Thanks
> -- 
> Shane

Enter the command

  ?"%*%"

and you will see:

  Matrix Multiplication
  Description:
 Multiplies two matrices, if they are conformable.  If one argument
 is a vector, it will be promoted to either a row or column matrix
 to make the two arguments conformable.  If both are vectors it
 will return the inner product (as a matrix).
  Usage:
 x %*% y
  [etc.]

Ted.

---------
E-Mail: (Ted Harding) 
Date: 11-Apr-2013  Time: 11:41:22
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rep() fails at times=0.29*100

2013-04-09 Thread Ted Harding

[See at end]
On 09-Apr-2013 16:11:18 Jorge Fernando Saraiva de Menezes wrote:
> Dear list,
> 
> I have found an unusual behavior and would like to check if it is a
> possible bug, and if updating R would fix it. I am not sure if should post
> it in this mail list but I don't where is R bug tracker. The only mention I
> found that might relate to this is "If times is a computed quantity it is
> prudent to add a small fuzz." in rep() help, but not sure if it is related
> to this particular problem
> 
> Here it goes:
> 
>> rep(TRUE,29)
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [28] TRUE TRUE
>> rep(TRUE,0.29*100)
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [28] TRUE
>> length(rep(TRUE,29))
> [1] 29
>> length(rep(TRUE,0.29*100))
> [1] 28
> 
> Just to make sure:
>> 0.29*100
> [1] 29
> 
> This behavior seems to be independent of what is being repeated (rep()'s
> first argument)
>> length(rep(1,0.29*100))
> [1] 28
> 
> Also it occurs only with the 0.29.
>> length(rep(1,0.291*100))
> [1] 29
>> for(a in seq(0,1,0.01)) {print(sum(rep(TRUE,a*100)))} #also shows correct
> values in values from 0 to 1 except for 0.29.
> 
> I have confirmed that this behavior happens in more than one machine
> (though I only have session info of this one)
> 
> 
>> sessionInfo()
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> [1]  LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252
>  LC_MONETARY=Portuguese_Brazil.1252
> [4] LC_NUMERIC=C   LC_TIME=Portuguese_Brazil.1252
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
> 
> other attached packages:
> [1] spatstat_1.31-1 deldir_0.0-21   mgcv_1.7-22
> 
> loaded via a namespace (and not attached):
> [1] grid_2.15.3 lattice_0.20-13 Matrix_1.0-11   nlme_3.1-108
>  tools_2.15.3

The basic issue is, believe or not, that despite apparently:
  0.29*100
  # [1] 29

in "reality":
  0.29*100 == 29
  # [1] FALSE

In other words, as computed by R, 0.29*100 is not exactly equal to 29:

  29 - 0.29*100
  # [1] 3.552714e-15

The difference is tiny, but it is sufficient to make 0.29*100 slightly
smaller than 29, so rep(TRUE,0.29*100) uses the largest integer compatible
with "times = 0.29*100", i.e. 28. Hence the recommendation to "add a
little fuzz".

On the other hand, when you use rep(1,0.291*100) you will be OK:
This is because:

  29 - 0.291*100
  # [1] -0.1

so 0.291*100 is comfortably greater than 29 (but well clear of 30).

The reason for the small inaccuracy (compared with "mathematical
truth") is that R performs numerical calculations using binary
representations of numbers, and there is no exact binary representation
of 0.29, so the result of 0.29*100 will be slightly inaccurate.

If you do need to do this sort of thing (e.g. the value of "times"
will be the result of a calculation) then one useful precaution
could be to round the result:

  round(0.29*100)
  # [1] 29
  29-round(0.29*100)
  # [1] 0
  length(rep(TRUE,0.29*100))
  # [1] 28
  length(rep(TRUE,round(0.29*100)))
  # [1] 29

(The default for round() is 0 decimal places, i.e. it rounds to
an integer).

So, compared with:
  0.29*100 == 29
  # [1] FALSE

we have:
  round(0.29*100) == 29
  # [1] TRUE

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 09-Apr-2013  Time: 17:56:33
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Console display "buffer size"

2013-04-01 Thread Ted Harding

On 01-Apr-2013 21:26:07 Robert Baer wrote:
> On 4/1/2013 4:08 PM, Peter Ehlers wrote:
>> On 2013-04-01 13:37, Ted Harding wrote:
>>> Greetings All.
>>> This is a somewhat generic query (I'm really asking on behalf
>>> of a friend who uses R on Windows, whereas I'm on Linux, but
>>> the same phenomenon appears on both).
>>>
>>> Say one has a largish dataframe -- call it "G" -- which in the
>>> case under discussion has 592 rows and 41 columns. The intention
>>> is to display the data by simply entering its name (G) as command.
>>>
>>> Say the display console has been set wide enough to display 15
>>> columns (determined by lengths of column names). Then R will output
>>> succesively to the console, in continuous flow:
>>>Chunk 1: rows 1:592 of columns 1:15
>>>Chunk 2: rows 1:592 of columns 16:30
>>>Chunk 3: rows 1:592 of columns 31:41
>>> If the number of rows that can be displayed on the screen is, say, 60,
>>> then only rows 533:592 of Chunk 3 will be visible in the first instance.
>>>
>>> However, on my Linux system at any rate, I can use Shift+PgUp to
>>> scroll back up through what has been output to the console. It seems
>>> that my friend proceeds similarly.
>>>
>>> But now, after a certain number of (Shift+PgUps), one runs out
>>> of road before getting to Row 1 of Chunk 1 (in my friend's case,
>>> only Rows 468-592 of Chunk 1 can be seen).
>>>
>>> The explanation which occurs to me is that the console has a "buffer"
>>> in which such an output is stored, and if the dataframe is too big
>>> then lines 1:N (for some N) of the output are dropped from the start
>>> of the buffer, and it is impossible to go further back than line (N+1)
>>> of Chunk 1 where in this case N=467 (of course one may not even be
>>> able to go further back than Chunk K, for some K > 1, for a bigger
>>> dataframe).
>>>
>>> The query I have is: In the light of the above, is there a way to
>>> change the size of the "buffer" so that one can scroll all the way
>>> back to the very first row of Chunk 1? (The size-change may perhaps
>>> have to be determined empirically).
>>
>> Isn't this set by the 'bufbytes' and 'buflines' specifications in the
>> Rconsole file?
>> Anyway, it's probably best to use 'View' to inspect data.
>>
> Although I have not tried them, Windows RGUI  [ -- Edit | GUI 
> Preferences --] has settings for both buffer and lines which on my 
> 64-bit machine default to 25 and 8000 respectively.
> 
> Rob Baer
> 
>> Peter Ehlers

Thanks for the replies, Rob and Peter. Rob's reply in particular
corresponds to what my friend has himself just found out in his
Windows installation of R. It seems, however, that this setting
is once-and-for-all in an R session (I can't check that myself).

Re Peter's comment: I don;t seem to have an "Rconsole" file
anywhere in my Linux installation. Is it unique to Windows?

Once again, thanks. Very helpful.
Ted.

-
E-Mail: (Ted Harding) 
Date: 01-Apr-2013  Time: 23:06:21
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Console display "buffer size"

2013-04-01 Thread Ted Harding

Greetings All.
This is a somewhat generic query (I'm really asking on behalf
of a friend who uses R on Windows, whereas I'm on Linux, but
the same phenomenon appears on both).

Say one has a largish dataframe -- call it "G" -- which in the
case under discussion has 592 rows and 41 columns. The intention
is to display the data by simply entering its name (G) as command.

Say the display console has been set wide enough to display 15
columns (determined by lengths of column names). Then R will output
succesively to the console, in continuous flow:
  Chunk 1: rows 1:592 of columns 1:15
  Chunk 2: rows 1:592 of columns 16:30
  Chunk 3: rows 1:592 of columns 31:41
If the number of rows that can be displayed on the screen is, say, 60,
then only rows 533:592 of Chunk 3 will be visible in the first instance.

However, on my Linux system at any rate, I can use Shift+PgUp to
scroll back up through what has been output to the console. It seems
that my friend proceeds similarly.

But now, after a certain number of (Shift+PgUps), one runs out
of road before getting to Row 1 of Chunk 1 (in my friend's case,
only Rows 468-592 of Chunk 1 can be seen).

The explanation which occurs to me is that the console has a "buffer"
in which such an output is stored, and if the dataframe is too big
then lines 1:N (for some N) of the output are dropped from the start
of the buffer, and it is impossible to go further back than line (N+1)
of Chunk 1 where in this case N=467 (of course one may not even be
able to go further back than Chunk K, for some K > 1, for a bigger
dataframe).

The query I have is: In the light of the above, is there a way to
change the size of the "buffer" so that one can scroll all the way
back to the very first row of Chunk 1? (The size-change may perhaps
have to be determined empirically).

With thanks,
Ted.

-------------
E-Mail: (Ted Harding) 
Date: 01-Apr-2013  Time: 21:37:17
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prop.test correct true and false gives same answer

2013-03-27 Thread Ted Harding

On 27-Mar-2013 21:04:51 David Arnold wrote:
> All,
> How come both of these are the same.  Both say "1-sample proportions
> test without continuity correction." I would suspect one would say
> "without" and one would say "with."
> 
> 
>> prop.test(118,236,.5,correct=FALSE,conf.level=0.95)
> 
>   1-sample proportions test without continuity correction
> 
> data:  118 out of 236, null probability 0.5 
> X-squared = 0, df = 1, p-value = 1
> alternative hypothesis: true p is not equal to 0.5 
> 95 percent confidence interval:
>  0.4367215 0.5632785 
> sample estimates:
>   p 
> 0.5 
> 
>> prop.test(118,236,.5,correct=TRUE,conf.level=0.95)
> 
>   1-sample proportions test without continuity correction
> 
> data:  118 out of 236, null probability 0.5 
> X-squared = 0, df = 1, p-value = 1
> alternative hypothesis: true p is not equal to 0.5 
> 95 percent confidence interval:
>  0.4367215 0.5632785 
> sample estimates:
>   p 
> 0.5

Note what is said (admittedly somewhat deeply tucked away)
under "Details" in ?prop.test:

 "Continuity correction is used only if it does not exceed
  the difference between sample and null proportions
  in absolute value."

In your example, the sample proportion exactly matches the
null-hypothesis proportion (0.5).

Confirmation:
[A] Your same example:
  prop.test(118,236,.5,correct=TRUE,conf.level=0.95)
  # 1-sample proportions test without continuity correction
  # data:  118 out of 236, null probability 0.5 
  # X-squared = 0, df = 1, p-value = 1
  # alternative hypothesis: true p is not equal to 0.5 
  # 95 percent confidence interval:
  #  0.4367215 0.5632785 
  # sample estimates:
  #   p 
  # 0.5 

[B1] Slightly change x, but keep "correct=TRUE":
  prop.test(117,236,.5,correct=TRUE,conf.level=0.95)
  # 1-sample proportions test with continuity correction
  # data:  117 out of 236, null probability 0.5 
  # X-squared = 0.0042, df = 1, p-value = 0.9481
  # alternative hypothesis: true p is not equal to 0.5 
  # 95 percent confidence interval:
  #  0.4304724 0.5611932 
  # sample estimates:
  # p 
  # 0.4957627 

[B2] Slightly change x, but now "correct=FALSE":
  prop.test(117,236,.5,correct=FALSE,conf.level=0.95)
  # 1-sample proportions test without continuity correction
  # data:  117 out of 236, null probability 0.5 
  # X-squared = 0.0169, df = 1, p-value = 0.8964
  # alternative hypothesis: true p is not equal to 0.5 
  # 95 percent confidence interval:
  #  0.4325543 0.5591068 
  # sample estimates:
  # p 
  # 0.4957627 

So it doesn't do the requested continuity correction in [A] because
there is no need to. But in [B1] it makes a difference (compare
with [B2]), so it does it.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 27-Mar-2013  Time: 21:21:39
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] edit.data() read-only?

2013-03-26 Thread Ted Harding

Thanks! ?View does indeed state "The object is then viewed
in a spreadsheet-like data viewer, a read-only version of
'data.entry', which is what I was looking for!
Ted.

On 26-Mar-2013 10:23:59 Blaser Nello wrote:
> Try ?View()
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Ted Harding
> Sent: Dienstag, 26. März 2013 11:09
> To: r-help@r-project.org
> Subject: [R] edit.data() read-only?
> 
> Greetings All.
> 
> The function edit.data() allows a convenient spreadsheet-like view of a
> dataframe with too many rows/columns to fit on the screen (especially when
> there are many columns). Very useful when scanning through a dataset (row &
> column are conveniently identified by the labels at the side and above).
> 
> However, there seens to be no option to set it "read-only" on start-up, with
> the consequence that a clumsy key-press or mouse-click could cause a change
> in the data which would then be stored after quitting edit.data().
> 
> Is there a possibility of a read-only option? Or some other function which
> could offer similar viewing capability without the risk of data change?
> 
> With thanks,
> Ted.
> 
> -
> E-Mail: (Ted Harding) 
> Date: 26-Mar-2013  Time: 10:08:58
> This message was sent by XFMail
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-
E-Mail: (Ted Harding) 
Date: 26-Mar-2013  Time: 10:38:34
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] edit.data() read-only?

2013-03-26 Thread Ted Harding

Sorry, I meant "data.entry()", not "edit.data()" (the latter due
to mental cross-wiring with "edit.data.frame()").

I think that Nello Blaser's suggestion of "View" may be what I
seek (when I can persuade it to find the font it seeks ... )!

With thanks, Barry.
Ted.

On 26-Mar-2013 10:20:59 Barry Rowlingson wrote:
> On Tue, Mar 26, 2013 at 10:09 AM, Ted Harding 
> wrote:
>> Greetings All.
>>
>> The function edit.data() allows a convenient spreadsheet-like
>> view of a dataframe with too many rows/columns to fit on the
>> screen (especially when there are many columns). Very useful
>> when scanning through a dataset (row & column are conveniently
>> identified by the labels at the side and above).
> 
>  Do you mean:
> 
> d=data.frame(x=1:10,y=runif(10))
> edit(d)
> 
> ? Because I don't have an edit.data function (maybe its windows only).
> 
>> However, there seens to be no option to set it "read-only" on
>> start-up, with the consequence that a clumsy key-press or
>> mouse-click could cause a change in the data which would then
>> be stored after quitting edit.data().
>>
>> Is there a possibility of a read-only option? Or some other
>> function which could offer similar viewing capability without
>> the risk of data change?
> 
>  If you just want to view the data, don't assign it back. The "edit"
> function only updates the data if you assign it back, as in:
> 
> d=edit(d)
> 
> so a 'read only' version would be:
> 
> invisible(edit(d))
> 
> except the user here can change the values in the cells, they just
> don't go anywhere. Except into .Last.value if you decide you really
> did want to get the values...
> 
> Barry
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 26-Mar-2013  Time: 10:36:32
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] edit.data() read-only?

2013-03-26 Thread Ted Harding

Greetings All.

The function edit.data() allows a convenient spreadsheet-like
view of a dataframe with too many rows/columns to fit on the
screen (especially when there are many columns). Very useful
when scanning through a dataset (row & column are conveniently
identified by the labels at the side and above).

However, there seens to be no option to set it "read-only" on
start-up, with the consequence that a clumsy key-press or
mouse-click could cause a change in the data which would then
be stored after quitting edit.data().

Is there a possibility of a read-only option? Or some other
function which could offer similar viewing capability without
the risk of data change?

With thanks,
Ted.

-----
E-Mail: (Ted Harding) 
Date: 26-Mar-2013  Time: 10:08:58
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] order statistic of multivariate normal

2013-03-22 Thread Ted Harding

On 22-Mar-2013 13:02:25 li li wrote:
> Thank you all for the reply.
> 
> One example of my question is as follows.
> 
> Suppose X1, ..., X10 has multivariate normal distribution
> and X(1), ..., X(10) are the corresponding order statistics.
> 
> My question is that whether there is a R function that would
> help compute the c which satisfies
> P(X(4)  Here beta is a known constant between 0 and 1.
> 
> Thank you.
>Hanna

The basic question which needs to be answered (which has been hinted
at in earlier replis) is: How do you define "order statistic" for
multivariate observations?

For example, here is a sample of 10 (to 3 d.p.) from a bivariate
normal distribution:

  [,1]   [,2]
   [1,]  1.143 -0.396
   [2,] -0.359 -0.217
   [3,] -0.391 -0.601
   [4,] -0.416 -1.093
   [5,] -1.810 -1.499
   [6,] -0.367 -0.636
   [7,] -2.238  0.563
   [8,]  0.811  1.230
   [9,]  0.082  0.174
  [10,] -1.359 -0.364

Which one of these 10 rows is X(4)?

There is an alternative interpretation of your question:

  "Suppose X1, ..., X10 has multivariate normal distribution
  and X(1), ..., X(10) are the corresponding order statistics."

This could mean that the vector (X1,...,X10) has a multivariate
normal distribution with 10 dimensions, and, for a single vector
(X1,...,X10) drawn from this distribution, (X(1), ..., X(10))
is a vector consisting of these same values (X1,...,X10), but
in increasing order.

Is that what you mean?

Hoping this helps,
Ted.

---------
E-Mail: (Ted Harding) 
Date: 22-Mar-2013  Time: 13:31:31
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random Sample with constraints

2013-03-03 Thread Ted Harding

On 03-Mar-2013 16:29:05 Angelo Scozzarella Tiscali wrote:
> For example,  I want to simulate different populations with same mean and
> standard deviation but different distribution.
> 
> Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto:
>> Dear R friends,
>> 
>> I'd like to generate random sample (variable size and range) without a
>> specified distribution but with given mean and standard deviation.
>> 
>> Could you help me?
>> 
>> thanks in advance
>> Angelo

As Ben Bolker said, any random sample must come from some distribution,
so you cannot generate one without specifying some distribution.

Insofar as your question can be interpreted, it will be satisfied
if, given the desired mean, M, and SD, S, you take any two available
distributions with, respectively, known means M1 and M2 and known
SDs S1 and S2. Let X1 denote a sample from t5he first, and X2 a
sample from the second.

Then (X1 - M1)/(S1/S) is a sample from the first distribution
re-scaled to have mean M and SD S, as required.

Similarly, (X2 - M2)/(S2/S) is a sample from the second distribution
re-scaled to have mean M and SD S, as required.

As for what the first distribution that you sample from, and the second,
that can be at your own choice -- for eample, the first could be
the Standard Normal (M1 = 0, S1 = 1); use rnomr().
The second could be the uniform on (0,1) (M2 = 0.5, S2 = 1/sqrt(12));
use runif().

Similar for other arbitrary choices of first and second distribution
(so long as each has at least a second moment, hence excluding, for
example, the Cauchy distribution).

That's about as far as one can go with your question!

Hoping it helps, howevr.
Ted.

-
E-Mail: (Ted Harding) 
Date: 03-Mar-2013  Time: 17:12:50
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] arithmetic and logical operators

2013-01-30 Thread Ted Harding

On 30-Jan-2013 20:39:34 Berend Hasselman wrote:
> 
> On 30-01-2013, at 21:32, Dave Mitchell  wrote:
> 
>> Why, in R, does (0.1 + 0.05) > 0.15 evaluate to True?  What am I missing
>> here?  How can I ensure this (ostensibly incorrect) behavior doesn't
>> introduce bugs into my code?  Thanks for your time.
>> 
> 
> R-FAQ 7.31:
> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-num
> bers-are-equal_003f
> 
> Berend

And, to put Dave's specific query into the context of that FAQ:

  (0.1 + 0.05) > 0.15
  # [1] TRUE
  (0.1 + 0.05) - 0.15
  # [1] 2.775558e-17

so that tiny 2.775558e-17 is the inexactitude (due to finite
binary representation).

As an interesting variant:

  (1.0 + 0.5) > 1.5
  # [1] FALSE
  (1.0 + 0.5) - 1.5
  # [1] 0

and that is because 1.0 and 0.5, and also 1.5, have exact finite
binary representations, e.g.:

  1.0 == 1.
  0.5 == 0.1000
  1.5 == 1.1000

whereas 0.1, 0.5 and 0.15 are these numbers divided by 10 = 2*5;
and while you can exactly do the "/2" part (just shift right by
one binary place), you can't exactly divide by 5 in finite binary
arithmetic (any more than you can exactly divide by 3 in decimal),
because 5 is not a factor of the base (2) of the binary representation
(whereas, in decimal, both 2 and 5 are factors of 10; but 3 isn't).

Whereas R has the function all.equal() to give the "right" answer
for most things like

  (0.1 + 0.05) == 0.15
  # [1] FALSE
  all.equal((0.1 + 0.05), 0.15)
  # [1] TRUE

(because it tests for equality to within a very small tolerance,
by default the square root of the binary precision of a double
precision binary real number), R does not have a straightforward
method for testing the truth of "(0.1 + 0.05) > 0.15" (and that
is because the question is not clearly discernible from the
expression, when imprecision underlies it).

You could emulate all.equal() on the lines of

  (0.1 + 0.05) > (0.15 + .Machine$double.eps^0.5)
  # [1] FALSE
  (0.1 + 0.05) < (0.15 - .Machine$double.eps^0.5)
  # [1] FALSE

(or similar). Observe that

  .Machine$double.eps^0.5
  # [1] 1.490116e-08
  .Machine$double.eps
  # [1] 2.220446e-16

  (0.1 + 0.05) - 0.15
  # [1] 2.775558e-17

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 30-Jan-2013  Time: 23:22:53
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package: VennDiagram. Error in draw.pairwise.venn Impossible: cross section area too large

2013-01-27 Thread Ted Harding


On 27-Jan-2013 23:50:57 Ben Bolker wrote:
> Fabrice Tourre  gmail.com> writes:
> 
>> Dear list,
>> When I use VennDiagram package, I got a error as follow:
>> 
>> venn.plot <- draw.pairwise.venn(
>> area1 = 3186,
>> area2 = 325,
>> cross.area = 5880);
>> 
>> Error in draw.pairwise.venn(area1 = 3186, area2 = 325, cross.area = 588) :
>>   Impossible: cross section area too large.
>> 
>> Does anyone have suggestion?
>> 
>> Thank you.
> 
>   I don't know the package, but it looks like you're trying to 
> draw two bubbles with areas of 3186 and 325.  cross.area sounds like
> the area of the intersection.  You can't have an area of intersection
> that's bigger than one of the categories ...

According to the PDF manual for VennDiagram:

  Arguments
area1 The size of the first set
area2 The size of the second set
cross.area The size of the intersection between the sets

so Ben has hit the nail on the head! (Maybe "area2 = 325" was a typo?).

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) 
Date: 28-Jan-2013  Time: 00:09:32
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] t-test behavior given that the null hypothesis is true

2013-01-09 Thread Ted Harding

Ah! You have aqssigned a parameter "equal.var=TRUE", and "equal.var"
is not a listed paramater for t.test() -- see ?t.test :

  t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)

Try it instead with "var.equal=TRUE", i.e. in your code:
  for(i in 1:k){
rv.t.pvalues[i] <- t.test(rv[i, 1:(c/2)], rv[i, (c/2+1):c],
  ##equal.var=TRUE, alternative="two.sided")$p.value
var.equal=TRUE, alternative="two.sided")$p.value
  }

When I run your code with "equal.var", I indeed repeatedly see
the deficient bin for the lowest P-values that you observed.
When I run your code with "var.equal" I do not see it.

The explanation is that, since "equal.var" is not a recognised
parameter for t.test(), it has assumed the default value FALSE
for var.equal, and has therefore (since it is a 2-sample test)
adopted the Welch/Satterthwaite procedure:

  var.equal: a logical variable indicating whether to treat
the two variances as being equal. If 'TRUE' then the
pooled variance is used to estimate the variance
otherwise the Welch (or Satterthwaite) approximation
to the degrees of freedom is used.

This has the effect of somewhat adapting the test procedure to
the data, so that extreme (i.e. small) values of P are even
rarer than they should be.

With best wishes,
Ted.

On 09-Jan-2013 13:24:59 Pavlos Pavlidis wrote:
> Hi Ted,
> thanks for the reply. I use a similar code which you can see below:
> 
> k <- 1
> c <- 6
> rv <- array(NA, dim=c(k, c) )
> for(i in 1:k){
>   rv[i,] <- rnorm(c, mean=0, sd=1)
> }
> 
> rv.t.pvalues <- array(NA, k)
> 
> for(i in 1:k){
>   rv.t.pvalues[i] <- t.test(rv[i, 1:(c/2)], rv[i, (c/2+1):c],
> equal.var=TRUE, alternative="two.sided")$p.value
> }
> 
> hist(rv.t.pvalues)
> 
> The histogram is this one:
> *http://tinyurl.com/histogram-rt-pvalues-pdf
> 
> *
> *all the best
> idaios
> *
> 
> 
> On Wed, Jan 9, 2013 at 12:29 PM, Ted Harding wrote:
> 
>> On 09-Jan-2013 08:50:46 Pavlos Pavlidis wrote:
>> > Dear all,
>> > I observer a strange behavior of the pvalues of the t-test under
>> > the null hypothesis. Specifically, I obtain 2 samples of 3
>> > individuals each from a normal distribution of mean 0 and variance 1.
>> > Then, I calculate the pvalue using the t-test (var.equal=TRUE,
>> > samples are independent). When I make a histogram of pvalues
>> > I see that consistently the bin of the smallest pvalues has a
>> > lower frequency. Is this a known behavior of the t-test or it's
>> > a kind of bug/random number generation problem?
>> >
>> > kind regards,
>> > idaios
>>
>> Using the following code, I did not observe the behavious you describe.
>> The histograms are consistent with a uniform distribution of the
>> P-values, and the lowest bin for the P-values (when the code is
>> run repeatedly) is not consistently lower (or higher, or anything
>> else) than the other bins.
>>
>> ## My code:
>> N <- 10000
>> Ps <- numeric(N)
>> for(i in (1:N)){
>>   X1 <- rnorm(3,0,1) ; X2 <- rnorm(3,0,1)
>>   Ps[i] <- t.test(X1,X2,var.equal=TRUE)$p.value
>> }
>> hist(Ps)
>> 
>>
>> If you would post the code you used, the reason why you are observing
>> this may become more evident!
>>
>> Hoping this helps,
>> Ted.
>>
>> -
>> E-Mail: (Ted Harding) 
>> Date: 09-Jan-2013  Time: 10:29:21
>> This message was sent by XFMail
>> -
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 09-Jan-2013  Time: 14:51:04
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] t-test behavior given that the null hypothesis is true

2013-01-09 Thread Ted Harding

On 09-Jan-2013 08:50:46 Pavlos Pavlidis wrote:
> Dear all,
> I observer a strange behavior of the pvalues of the t-test under
> the null hypothesis. Specifically, I obtain 2 samples of 3
> individuals each from a normal distribution of mean 0 and variance 1.
> Then, I calculate the pvalue using the t-test (var.equal=TRUE,
> samples are independent). When I make a histogram of pvalues
> I see that consistently the bin of the smallest pvalues has a
> lower frequency. Is this a known behavior of the t-test or it's
> a kind of bug/random number generation problem?
> 
> kind regards,
> idaios

Using the following code, I did not observe the behavious you describe.
The histograms are consistent with a uniform distribution of the
P-values, and the lowest bin for the P-values (when the code is
run repeatedly) is not consistently lower (or higher, or anything
else) than the other bins.

## My code:
N <- 1
Ps <- numeric(N)
for(i in (1:N)){
  X1 <- rnorm(3,0,1) ; X2 <- rnorm(3,0,1)
  Ps[i] <- t.test(X1,X2,var.equal=TRUE)$p.value
}
hist(Ps)

If you would post the code you used, the reason why you are observing
this may become more evident!

Hoping this helps,
Ted.

-------------
E-Mail: (Ted Harding) 
Date: 09-Jan-2013  Time: 10:29:21
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression line does not show on scatterplot

2012-12-18 Thread Ted Harding

Sorry, I made a mistake in re-writing your code below. See at [***]
On 18-Dec-2012 21:00:28 Ted Harding wrote:
 On 18-Dec-2012 20:09:36 Beatriz González Domínguez wrote:
> Hello,
> 
> I have done a scatterplot and now would like to add its regression
> line but it does not show.
> Below, the code I have used. 
> 
> lm3 <- lm(data$S_pH_KCl2.5_BCx~data$B_OleicoPF_BCx_per)
> plot(data$S_pH_KCl2.5_BCx, data$B_OleicoPF_BCx_per)
> abline(lm3)
> 
> I have been able to do the complete operation using the software
> STATISTICA but it would be great to do it with R.
> 
> If you require more details please get in touch.
> 
> Thanks a lot!
> Bea

By the look of things you have either the regression or the plot
the wrong way round. I suspect it is the regression. So try:

[***] The following should be correct (I previously mis-copied
[***] your variable names).
Either:
##lm3 <- lm(data$S_pH_KCl2.5_BCx~data$B_OleicoPF_BCx_per)
  lm3 <- lm(data$B_OleicoPF_BCx_per ~ data$S_pH_KCl2.5_BCx)
  plot(data$S_pH_KCl2.5_BCx, data$B_OleicoPF_BCx_per)
  abline(lm3)

Or:
  lm3 <- lm(data$S_pH_KCl2.5_BCx ~ data$B_OleicoPF_BCx_per)
##plot(data$S_pH_KCl2.5_BCx, data$B_OleicoPF_BCx_per)
  plot(data$B_OleicoPF_BCx_per, data$S_pH_KCl2.5_BCx)
  abline(lm3)

The point is that in lm(V~U) the variable "U" is taken as
corresponding the the x-axis (independent variable), and the
variable "V" to the y-axis (dependent variable).

Similarly for plot(U,V).

So, for lm3 <- lm(V~U), abline(lm3) will plot the fitted V-values
(y-axis) against the U-values (x-axis).

Your original code was equivalent to:

  lm3 <- lm(V~U)
  plot(V,U)
  abline(lm3)

whereas it should be
Either:
  lm3 <- lm(V~U)
  plot(U,V)
  abline(lm3)
Or:
  lm3 <- lm(U~V)
  plot(V,U)
  abline(lm3)

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 18-Dec-2012  Time: 21:15:47
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression line does not show on scatterplot

2012-12-18 Thread Ted Harding

On 18-Dec-2012 20:09:36 Beatriz González Domínguez wrote:
> Hello,
> 
> I have done a scatterplot and now would like to add its regression
> line but it does not show.
> Below, the code I have used. 
> 
> lm3 <- lm(data$S_pH_KCl2.5_BCx~data$B_OleicoPF_BCx_per)
> plot(data$S_pH_KCl2.5_BCx, data$B_OleicoPF_BCx_per)
> abline(lm3)
> 
> I have been able to do the complete operation using the software
> STATISTICA but it would be great to do it with R.
> 
> If you require more details please get in touch.
> 
> Thanks a lot!
> Bea

By the look of things you have either the regression or the plot
the wrong way round. I suspect it is the regression. So try:

Either:
##lm3 <- lm(data$S_pH_KCl2.5_BCx~data$B_OleicoPF_BCx_per)
  lm3 <- lm(data$S_pH_KCl2.5_BCx_per~data$B_OleicoPF_BCx)
  plot(data$S_pH_KCl2.5_BCx, data$B_OleicoPF_BCx_per)
  abline(lm3)

Or:
  lm3 <- lm(data$S_pH_KCl2.5_BCx~data$B_OleicoPF_BCx_per)
##plot(data$S_pH_KCl2.5_BCx, data$B_OleicoPF_BCx_per)
  plot(data$S_pH_KCl2.5_BCx_per, data$B_OleicoPF_BCx)
  abline(lm3)

The point is that in lm(V~U) the variable "U" is taken as
corresponding the the x-axis (independent variable), and the
variable "V" to the y-axis (dependent variable).

Similarly for plot(U,V).

So, for lm3 <- lm(V~U), abline(lm3) will plot the fitted V-values
(y-axis) against the U-values (x-axis).

Your original code was equivalent to:

  lm3 <- lm(V~U)
  plot(V,U)
  abline(lm3)

whereas it should be
Either:
  lm3 <- lm(V~U)
  plot(U,V)
  abline(lm3)
Or:
  lm3 <- lm(U~V)
  plot(V,U)
  abline(lm3)

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 18-Dec-2012  Time: 21:00:25
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-squared test when observed near expected

2012-12-03 Thread Ted Harding

On 03-Dec-2012 21:40:35 Troy S wrote:
> Dear UseRs,
> I'm running a chi-squared test where the expected matrix is the same
> as the observed, after rounding. R reports a X-squared of zero with
> a p value of one. I can justify this because any other result will
> deviate at least as much from the expected because what we observe
> is the expected, after rounding. But the formula for X-squared,
> sum (O-E)^2/E gives a positive value. What is the reason for X-Squared
being zero in this case?
> 
> Troy
> 
>> trial<-as.table(matrix(c(26,16,13,7),ncol=2))
>> x<-chisq.test(trial)
>> x
> 
> data:  trial
> X-squared = 0, df = 1, p-value = 1
> 
>> x$expected
>  A B
> A 26.41935 12.580645
> B 15.58065  7.419355
>>
>> x$statistic
>X-squared
> 5.596653e-31
>> (x$observed-x$expected)^2/x$expected
> A   B
> A 0.006656426 0.013978495
> B 0.011286983 0.023702665
>> sum((x$observed-x$expected)^2/x$expected)
> [1] 0.05562457

The reason is that (by default, see ?chisq.test ) the statistic
is caluclated using the "continuity correction" (1/2 is subtracted
from each abs(O-E) difference). The default setting in chisq.test()
is "correct = TRUE". Try it with "correct = FALSE":

  x0<-chisq.test(trial,correct=FALSE)
  x0
  #  Pearson's Chi-squared test
  # data:  trial 
  # X-squared = 0.0556, df = 1, p-value = 0.8136

which agrees with your calculation of

  sum((x$observed-x$expected)^2/x$expected)
  # [1] 0.05562457

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 03-Dec-2012  Time: 22:44:14
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] discrepancy in fisher exact test between R and wiki formula

2012-12-03 Thread Ted Harding

On 03-Dec-2012 21:22:28 JiangMei wrote:
> Hi All. Sorry to bother you. I have a question about fisher exact test.
> 
> I counted the presence of gene mutation in two groups of samples.
> My data is as follows
> Presence   Absence
> GroupA   46
> GroupB   511
> 
> When using the formula of fisher exact test provided by wiki
> (http://en.wikipedia.org/wiki/Fisher%27s_exact_test), the p-value is 0.29.
> 
> But when calculated by R, the p-value is 0.69. My code is shown below
> counts<-c(4,5,6,11)
> data<-matrix(counts,nrow=2)
> fisher.test(data)
> 
> Why did I get two different numbers? Is there anything wrong with my R codes?
> 
> Wish your help! Thanks very much! I really appreciate it.

The reason is that the formula given in Wikipedia is for one particlar
set of values (a,b,c,d). In your case, a=4, b=6, c=5, d=11 and the
Wikipedia formula for p gives the probability of (a,b,c,d) = (4,6,5,11).

However, this is not the P-value for the test. For a 3-sided
alternative (see ?fisher.test ) the P-value is the sum of all such
probabilities for values of (a,b,c,d) such that a+b = 10, c+d = 16,
a+c = 9, b+d = 17 AND the probability p is less than or equal to
the probability of (4,6,5,11). So it includes the case that has been
observed and (in general) others, so will be greater (0.69) than the
value (0.29) given by the formula.

The default alternative for R's fisher.test() is "two-sided".
If you look at ?fisher.test() you will see:

  Two-sided tests are based on the probabilities of the tables,
  and take as 'more extreme' all tables with probabilities less
  than or equal to that of the observed table, the p-value being
  the sum of such probabilities.

I hope this helps.
Ted.

-
E-Mail: (Ted Harding) 
Date: 03-Dec-2012  Time: 22:24:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting all possible contingency tables

2012-12-02 Thread Ted Harding

On 02-Dec-2012 14:17:20 Christofer Bogaso wrote:
> Thanks Ted (and other) for your suggestion. Here I have implemented 
> following:
> 
> Tab <- matrix(c(8, 10, 12, 6), nr = 2)
> 
> Simu_Number <- 5
> Tab_Simulate <- vector("list", length = Simu_Number)
> for (i in 1:Simu_Number) {
>  Tab_Simulate[[i]] <- matrix(rmultinom(1, sum(Tab), rep(0.25, 
> 4)), nrow = 2)   ## All Cells have equal probability
>  }
> Sample_ChiSq <- sapply(Tab_Simulate, function(x) {
>  Statistic <-
> sum((chisq.test(as.table(x))$observed - 
> chisq.test(as.table(x))$expected)^2/chisq.test(as.table(x))$expected)
>  return(Statistic)
>  })
> 
> length(Sample_ChiSq[Sample_ChiSq < qchisq(0.95, 1)])/Simu_Number
> 
> hist(Sample_ChiSq, freq = FALSE)
> lines(dchisq(seq(min(Sample_ChiSq), max(Sample_ChiSq), by = 0.5), 1))
> 
> 
> However I think I am making some serious mistake as histogram did not 
> match the density curve.
> 
> Can somebody help me where I am making mistake?
> 
> Thanks and regards,
> [the remainder (copies of previous posts) snipped]

The reasons for the mis-match are:

A: that you have put the curve in the wrong place, by not
supplying x-coordinates to lines(), so that it plots its
points at x = 1,2,3,4,...

B: that you need to multiply the plotted density by the width
of the histogram cells, so as to match the density curve to the
discrete density of the histogram. It will also then look better
when the chis-squared curve is plotted at the mid-points of the cells.

Hence, try something like:

hist(Sample_ChiSq, freq = FALSE, breaks=0.5*(0:40))
x0 <- 0.25+0.5*(0:20)
lines(x0,dchisq(x0,1))

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 02-Dec-2012  Time: 15:02:45
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1019 matches

Mail list logo