[R] phyl.RMA error
Hello, We're trying to run phylogenetically corrected reduced major axes regression analyses and have encountered an error we can't debug. We're using the function phyl.RMA in the package 'phytools'. Here is the code we are using and the error it returns. >Model <- phyl.RMA(log(Skull), log(Tusk), tree, h0=1.0) Error in if (sign(beta1) != sign(h0)) { : missing value where TRUE/FALSE needed We can't seem to figure out which argument is missing, and we've tried including all of the T/F based arguments we think are possible. Our species dataset and nexus file are printed below. Any advice would be greatly appreciated. We have the following dataset: Binomial Skull Tusk 1 Tragulus_javanicus93.7 14.6 2 Tragulus_kanchil 99.7 13.9 3 Tragulus_napu 98.1 11.1 4 Tragulus_nigricans99.8 13.2 5 Moschiola_meminna101. 14.6 6 Moschus_berezovskii 134. 55.0 7 Moschus_moschiferus 152. 52.9 8 Muntiacus_muntjak193. 26.4 9 Muntiacus_reevesi159. 23.4 10 Muntiacus_truongsonensis 184. 27.7 11 Muntiacus_vaginalis 203. 28.6 12 Hydropotes_inermis 162. 48.5 13 Hyemoschus_aquaticus 122. 20.1 14 Elaphodus_cephalophus186. 17.3 And the following nexus tree: #NEXUS [R-package APE, Mon Jun 08 12:20:01 2020] BEGIN TAXA; DIMENSIONS NTAX = 12; TAXLABELS Tragulus_napu Tragulus_kanchil Tragulus_javanicus Hyemoschus_aquaticus Moschiola_meminna Muntiacus_reevesi Muntiacus_muntjak Muntiacus_truongsonensis Elaphodus_cephalophus Hydropotes_inermis Moschus_moschiferus Moschus_berezovskii ; END; BEGIN TREES; TRANSLATE 1Tragulus_napu, 2Tragulus_kanchil, 3Tragulus_javanicus, 4Hyemoschus_aquaticus, 5Moschiola_meminna, 6Muntiacus_reevesi, 7Muntiacus_muntjak, 8Muntiacus_truongsonensis, 9Elaphodus_cephalophus, 10 Hydropotes_inermis, 11 Moschus_moschiferus, 12 Moschus_berezovskii ; TREE * UNTITLED = [&R] 1:5.540957781,(2:2.978817423,3:2.978817423):2.562139698):10.78911152,4:16.33006601):6.360692368,5:22.69076035):5.725388419,(6:1.611149584,7:1.611149848):1.556474893,8:3.167624477):4.130280196,9:7.297904013):1.497063399,10:8.794967413):7.19682079,(11:2.539095678,12:2.539096008):13.45269085):12.42436025); Dr. Ted Stankowich Associate Professor Department of Biological Sciences California State University Long Beach Long Beach, CA 90840 theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu> 562-985-4826 http://www.csulb.edu/mammal-lab/ @CSULBMammalLab [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] na.omit not omitting rows
Thanks - a previous response resolved the issue and I'm off and running with the analyses. -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, June 4, 2020 5:02 PM To: Ted Stankowich Cc: Rui Barradas ; William Dunlap ; r-help@r-project.org Subject: Re: [R] na.omit not omitting rows CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments. Perhaps indexing with rowSums(is.na(dfrm))? — David Sent from my iPhone > On Jun 4, 2020, at 4:58 PM, Ted Stankowich > wrote: > > This worked! Thank you! > > -Original Message- > From: Rui Barradas [mailto:ruipbarra...@sapo.pt] > Sent: Thursday, June 4, 2020 2:49 PM > To: Ted Stankowich ; William Dunlap > > Cc: r-help@r-project.org > Subject: Re: [R] na.omit not omitting rows > > CAUTION: This email was sent from an external source. Use caution when > replying, opening links or attachments. > > > Hello, > > If the problem is the "na.action" attribute, here are two ways of solving it. > > First, an example data set. > > set.seed(2020)# Make the example reproducible > phamComplBinomial <- sprintf("f%003d", 1:356) > is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <- > factor(sample(1:2, 356, TRUE)) > Protect1 <- data.frame(UphamComplBinomial = > factor(UphamComplBinomial), > DarkEum) > > > 1. Setting the attribute "na.action" to NULL removes it > > Protect2 <- na.omit(Protect1) > attributes(Protect2) > attr(Protect2, "na.action") <- NULL > attributes(Protect2) > > > 2. Use an index vector to subset the data > > na <- is.na(Protect1$UphamComplBinomial) > Protect3 <- Protect1[!na, ] > > > The results are identical. But if you have more than one column with NA's, > this second way will be more complicated. > > identical(Protect2, Protect3) > #[1] TRUE > > > Hope this helps, > > Rui Barradas > > Às 22:27 de 04/06/20, Ted Stankowich escreveu: >> Thanks, but no that doesn’t work. The na.omit attributes are still in >> the dataframe, which you can see in the str outputs from the post. >> The problem line is likely: - attr(*, "na.action")= 'omit' Named int >> [1:2] 2 3 >> >> From: William Dunlap [mailto:wdun...@tibco.com] >> Sent: Thursday, June 4, 2020 12:39 PM >> To: Ted Stankowich >> Cc: r-help@r-project.org >> Subject: Re: [R] na.omit not omitting rows >> >> CAUTION: This email was sent from an external source. Use caution when >> replying, opening links or attachments. >> >> Does droplevels() help? >> >>> d <- data.frame(size = factor(c("S","M","M","L","L"), >>> levels=c("S","M","L")), id=c(101,NA,NA,104,105)) >>> str(d) >> 'data.frame': 5 obs. of 2 variables: >> $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3 $ id : num 101 >> NA NA 104 105 >>> str(na.omit(d)) >> 'data.frame': 3 obs. of 2 variables: >> $ size: Factor w/ 3 levels "S","M","L": 1 3 3 $ id : num 101 104 >> 105 >> - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 >> ..- attr(*, "names")= chr [1:2] "2" "3" >>> str(droplevels(na.omit(d))) >> 'data.frame': 3 obs. of 2 variables: >> $ size: Factor w/ 2 levels "S","L": 1 2 2 $ id : num 101 104 105 >> - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 >> ..- attr(*, "names")= chr [1:2] "2" "3" >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com<http://tibco.com> >> >> >> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich >> mailto:theodore.stankow...@csulb.edu>> wrote: >> Hello! I'm trying to create a subset of a dataset and then remove all rows >> with NAs in them. Ultimately, I am running phylogenetic analyses with trees >> that require the tree tiplabels to match exactly with the rows in the >> dataframe. But when I use na.omit to delete the rows with NAs, there is >> still a trace of those omitted rows in the data.frame, which then causes an >> error in the phylogenetic analyses. Is there any way to completely scrub >> those omitted rows from the dataframe? The code is below. As you can see >> from the result of the final str(Protect1) line, there are attributes with >> the omitted features still in the data
Re: [R] na.omit not omitting rows
This worked! Thank you! -Original Message- From: Rui Barradas [mailto:ruipbarra...@sapo.pt] Sent: Thursday, June 4, 2020 2:49 PM To: Ted Stankowich ; William Dunlap Cc: r-help@r-project.org Subject: Re: [R] na.omit not omitting rows CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments. Hello, If the problem is the "na.action" attribute, here are two ways of solving it. First, an example data set. set.seed(2020)# Make the example reproducible phamComplBinomial <- sprintf("f%003d", 1:356) is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <- factor(sample(1:2, 356, TRUE)) Protect1 <- data.frame(UphamComplBinomial = factor(UphamComplBinomial), DarkEum) 1. Setting the attribute "na.action" to NULL removes it Protect2 <- na.omit(Protect1) attributes(Protect2) attr(Protect2, "na.action") <- NULL attributes(Protect2) 2. Use an index vector to subset the data na <- is.na(Protect1$UphamComplBinomial) Protect3 <- Protect1[!na, ] The results are identical. But if you have more than one column with NA's, this second way will be more complicated. identical(Protect2, Protect3) #[1] TRUE Hope this helps, Rui Barradas Às 22:27 de 04/06/20, Ted Stankowich escreveu: > Thanks, but no that doesn’t work. The na.omit attributes are still in > the dataframe, which you can see in the str outputs from the post. The > problem line is likely: - attr(*, "na.action")= 'omit' Named int > [1:2] 2 3 > > From: William Dunlap [mailto:wdun...@tibco.com] > Sent: Thursday, June 4, 2020 12:39 PM > To: Ted Stankowich > Cc: r-help@r-project.org > Subject: Re: [R] na.omit not omitting rows > > CAUTION: This email was sent from an external source. Use caution when > replying, opening links or attachments. > > Does droplevels() help? > >> d <- data.frame(size = factor(c("S","M","M","L","L"), >> levels=c("S","M","L")), id=c(101,NA,NA,104,105)) >> str(d) > 'data.frame': 5 obs. of 2 variables: > $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3 > $ id : num 101 NA NA 104 105 >> str(na.omit(d)) > 'data.frame': 3 obs. of 2 variables: > $ size: Factor w/ 3 levels "S","M","L": 1 3 3 > $ id : num 101 104 105 > - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 >..- attr(*, "names")= chr [1:2] "2" "3" >> str(droplevels(na.omit(d))) > 'data.frame': 3 obs. of 2 variables: > $ size: Factor w/ 2 levels "S","L": 1 2 2 > $ id : num 101 104 105 > - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 >..- attr(*, "names")= chr [1:2] "2" "3" > > Bill Dunlap > TIBCO Software > wdunlap tibco.com<http://tibco.com> > > > On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich > mailto:theodore.stankow...@csulb.edu>> wrote: > Hello! I'm trying to create a subset of a dataset and then remove all rows > with NAs in them. Ultimately, I am running phylogenetic analyses with trees > that require the tree tiplabels to match exactly with the rows in the > dataframe. But when I use na.omit to delete the rows with NAs, there is still > a trace of those omitted rows in the data.frame, which then causes an error > in the phylogenetic analyses. Is there any way to completely scrub those > omitted rows from the dataframe? The code is below. As you can see from the > result of the final str(Protect1) line, there are attributes with the omitted > features still in the dataframe (356 species names in the UphamComplBinomial > factor, but only 319 observations). These traces are causing errors with the > phylo analyses. > >> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, >> Shade)) #Create the dataframe with variables of interest from an >> attached dataset row.names(Protect1)=Protect1$UphamComplBinomial >> #assign species names as rownames >> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing >> data >> str(Protect1) > 'data.frame': 319 obs. of 4 variables: > $ UphamComplBinomial: Factor w/ 356 levels > "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 > 11 12 ... > $ DarkEum : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ... > $ NoctCrep : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ... > $ Shade : Factor w/ 59 levels
Re: [R] na.omit not omitting rows
Thanks, but no that doesn’t work. The na.omit attributes are still in the dataframe, which you can see in the str outputs from the post. The problem line is likely: - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 From: William Dunlap [mailto:wdun...@tibco.com] Sent: Thursday, June 4, 2020 12:39 PM To: Ted Stankowich Cc: r-help@r-project.org Subject: Re: [R] na.omit not omitting rows CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments. Does droplevels() help? > d <- data.frame(size = factor(c("S","M","M","L","L"), levels=c("S","M","L")), > id=c(101,NA,NA,104,105)) > str(d) 'data.frame': 5 obs. of 2 variables: $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3 $ id : num 101 NA NA 104 105 > str(na.omit(d)) 'data.frame': 3 obs. of 2 variables: $ size: Factor w/ 3 levels "S","M","L": 1 3 3 $ id : num 101 104 105 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 ..- attr(*, "names")= chr [1:2] "2" "3" > str(droplevels(na.omit(d))) 'data.frame': 3 obs. of 2 variables: $ size: Factor w/ 2 levels "S","L": 1 2 2 $ id : num 101 104 105 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 ..- attr(*, "names")= chr [1:2] "2" "3" Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich mailto:theodore.stankow...@csulb.edu>> wrote: Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses. > Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade)) > #Create the dataframe with variables of interest from an attached dataset > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as > rownames > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data > str(Protect1) 'data.frame': 319 obs. of 4 variables: $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ... $ DarkEum : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ... $ NoctCrep : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ... $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ... - attr(*, "na.action")= 'omit' Named int 6 7 23 36 37 40 42 50 51 60 ... ..- attr(*, "names")= chr "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ... Dr. Ted Stankowich Associate Professor Department of Biological Sciences California State University Long Beach Long Beach, CA 90840 theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu><mailto:theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu>> 562-985-4826 http://www.csulb.edu/mammal-lab/ @CSULBMammalLab [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] na.omit not omitting rows
Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses. > Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade)) > #Create the dataframe with variables of interest from an attached dataset > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as > rownames > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data > str(Protect1) 'data.frame': 319 obs. of 4 variables: $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ... $ DarkEum : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ... $ NoctCrep : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ... $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ... - attr(*, "na.action")= 'omit' Named int 6 7 23 36 37 40 42 50 51 60 ... ..- attr(*, "names")= chr "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ... Dr. Ted Stankowich Associate Professor Department of Biological Sciences California State University Long Beach Long Beach, CA 90840 theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu> 562-985-4826 http://www.csulb.edu/mammal-lab/ @CSULBMammalLab [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ks.test ; impossible to calculate exact exact value with ex-aequos
On Mon, 2018-12-10 at 22:17 +0100, Fatma Ell wrote: > Dear all, > I'm trying to use ks.test in order to compare two curve. I've 0 values i > think this is why I have the follonwing warnings :impossible to calculate > exact exact value with ex-aequos > > a=c(3.02040816326531, 7.95918367346939, 10.6162790697674, 4.64150943396226, > 1.86538461538462, 1.125, 1.01020408163265, 1.2093023255814, > 0.292452830188679, > 0, 0, 0) > b=c(2.30769230769231, 4.19252873563218, 5.81924882629108, 6.2248243559719, > 5.02682926829268, 4.50728862973761, 3.61741424802111, 5.05479452054795, > 3.68095238095238, 1.875, 5.25, 0) > > ks.test(a,b) > > data: a and b > D = 0.58333, p-value = 0.0337 > alternative hypothesis: two-sided > > Warning message: > In ks.test(a, b) : > impossible to calculate exact exact value with ex-aequos > > Does the p-value is correct ? Otherwise, how to solve this issue ? > Thanks a lot. The warning arises, not because you have "0" values as such, but because there are repeated values (which happen to be 0). The K-S test is designed for continuous random variables, for which the probability of repeated values is (theoretically) zero: theoretically, they can't happen. >From the help page ?ks.test : "The presence of ties always generates a warning, since continuous distributions do not generate them. If the ties arose from rounding the tests may be approximately valid, but even modest amounts of rounding can have a significant effect on the calculated statistic." in view of the fact that your sample 'a' has three zeros along with nine other vauwes which are all different from 0 (and all *very* different from 0 except for 0.292452830188679), along with the fact that your sample 'b' has 11 values all *very* different from 0. and pne finall value equal to 0; together also with the fact that in each sample the '0' values occur at the end, stringly suggests that the data source is not such that a K-D test is auitasble. The K-S test is a non-parametric test for whether a) a given sample comes from na given kind of distribiution; or v) two samples are drwn from the same distribition. In either case, it is assumed that the sample values are drawn independently of each other; if there is some reason why they may not be independent of each other, the test os not valid. You say "I'm trying to use ks.test in order to compare two curve". When I ezecute plot(a) plot(b) on your data, I see (approximately) in each case a rise from a medium vale (~2 or ~3) to a higher vale {~6 or ~10) followed by a decline down to an exact 0. This is not the sort of situation that the K-S test is for! Hoping this helps, Ted. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ignoring the domain of RV in punif()
Well, as a final (I hope!) clarification: It is not the case that "the bigger cube does not exists (because the Q is the reference space)". It does exist! Simply, the probability of the random point being in the bigger cube, and NOT in the cube Q, is 0. Hence "the cumulative probability of reaching a point outside the cube (u or v or w > A) is 1" is badly phrased. The "cumulative probability" is not the probability of *reaching* a point, but of being (in the case of a real random variable) less than or equal to the given value. If Prob[X <= x1] = 1, then Prob[X > x1] = 0. Hence if x0 is the minimum value such that Prob[X <= x0] = 1, then X "can reach" x0. But for any x1 > x0, Prob[x0 < X <= x1] = 0. Therefore, since X cannot be greater than x0, X *cannot reach* x1! Best wishes, Ted. On Tue, 2018-10-23 at 12:06 +0100, Hamed Ha wrote: > Hi Ted, > > Thanks for the explanation. > > I am convinced at least more than average by Eric and your answer. But > still have some shadows of confusion that is definitely because I have > forgotten some fundamentals in probabilities. > > In your cube example, the cumulative probability of reaching a point > outside the cube (u or v or w > A) is 1 however, the bigger cube does not > exists (because the Q is the reference space). Other words, I feel that we > extend the space to accommodate any cube of any size! Looks a bit weird to > me! > > > Hamed. > > On Tue, 23 Oct 2018 at 11:52, Ted Harding wrote: > > > Sorry -- stupid typos in my definition below! > > See at ===*** below. > > > > On Tue, 2018-10-23 at 11:41 +0100, Ted Harding wrote: > > Before the ticket finally enters the waste bin, I think it is > > necessary to explicitly explain what is meant by the "domain" > > of a random variable. This is not (though in special cases > > could be) the space of possible values of the random variable. > > > > Definition of (real-valued) Random Variable (RV): > > Let Z be a probability space, i.e. a set {z} of entities z > > on which a probability distribution is defined. The entities z > > do not need to be numeric. A real-valued RV X is a function > > X:Z --> R defined on Z such that, for any z in Z, X(z) is a > > real number. The set Z, in tthis context, is (by definitipon) > > the *domain* of X, i.e. the space on which X is defined. > > It may or may not be (and usually is not) the same as the set > > of possible values of X. > > > > Then. given any real value x0, the CDF of X at x- is Prob[X <= X0]. > > The distribution function of X does not define the domain of X. > > > > As a simple exam[ple: Suppose Q is a cube of side A, consisting of > > points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space > > of points z with a uniform distribution of position within Q. > > Define the random variable X:Q --> [0,1] as > > ===*** > > X[u,v,w) = x/A > > > > Wrong! That should have been: > > > > X[u,v,w) = w/A > > ===*** > > Then X is uniformly distributed on [0,1], the domain of X is Q. > > Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x, > > for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible > > values of X is 1-dimensional, and is not the same as the domain of X, > > which is 3-dimensional. > > > > Hopiong this helps! > > Ted. > > > > On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote: > > > > Yes, now it makes more sense. > > > > > > > > Okay, I think that I am convinced and we can close this ticket. > > > > > > > > Thanks Eric. > > > > Regards, > > > > Hamed. > > > > > > > > On Tue, 23 Oct 2018 at 10:42, Eric Berger > > wrote: > > > > > > > > > Hi Hamed, > > > > > That reference is sloppy. Try looking at > > > > > https://en.wikipedia.org/wiki/Cumulative_distribution_function > > > > > and in particular the first example which deals with a Unif[0,1] r.v. > > > > > > > > > > Best, > > > > > Eric > > > > > > > > > > > > > > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha > > wrote: > > > > > > > > > >> Hi Eric, > > > > >> > > > > >> Thank you for your reply. > > > > >> > > > > >> I should say that your justification makes sense to me. However, I > > am in > > > > >> doubt that CDF defines
Re: [R] Ignoring the domain of RV in punif()
Sorry -- stupid typos in my definition below! See at ===*** below. On Tue, 2018-10-23 at 11:41 +0100, Ted Harding wrote: Before the ticket finally enters the waste bin, I think it is necessary to explicitly explain what is meant by the "domain" of a random variable. This is not (though in special cases could be) the space of possible values of the random variable. Definition of (real-valued) Random Variable (RV): Let Z be a probability space, i.e. a set {z} of entities z on which a probability distribution is defined. The entities z do not need to be numeric. A real-valued RV X is a function X:Z --> R defined on Z such that, for any z in Z, X(z) is a real number. The set Z, in tthis context, is (by definitipon) the *domain* of X, i.e. the space on which X is defined. It may or may not be (and usually is not) the same as the set of possible values of X. Then. given any real value x0, the CDF of X at x- is Prob[X <= X0]. The distribution function of X does not define the domain of X. As a simple exam[ple: Suppose Q is a cube of side A, consisting of points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space of points z with a uniform distribution of position within Q. Define the random variable X:Q --> [0,1] as ===*** X[u,v,w) = x/A Wrong! That should have been: X[u,v,w) = w/A ===*** Then X is uniformly distributed on [0,1], the domain of X is Q. Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x, for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible values of X is 1-dimensional, and is not the same as the domain of X, which is 3-dimensional. Hopiong this helps! Ted. On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote: > > Yes, now it makes more sense. > > > > Okay, I think that I am convinced and we can close this ticket. > > > > Thanks Eric. > > Regards, > > Hamed. > > > > On Tue, 23 Oct 2018 at 10:42, Eric Berger wrote: > > > > > Hi Hamed, > > > That reference is sloppy. Try looking at > > > https://en.wikipedia.org/wiki/Cumulative_distribution_function > > > and in particular the first example which deals with a Unif[0,1] r.v. > > > > > > Best, > > > Eric > > > > > > > > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha wrote: > > > > > >> Hi Eric, > > >> > > >> Thank you for your reply. > > >> > > >> I should say that your justification makes sense to me. However, I am in > > >> doubt that CDF defines by the Pr(x <= X) for all X? that is the domain of > > >> RV is totally ignored in the definition. > > >> > > >> It makes a conflict between the formula and the theoretical definition. > > >> > > >> Please see page 115 in > > >> > > >> https://books.google.co.uk/books?id=FEE8D1tRl30C&printsec=frontcover&dq=statistical+distribution&hl=en&sa=X&ved=0ahUKEwjp3PGZmJzeAhUQqxoKHV7OBJgQ6AEIKTAA#v=onepage&q=uniform&f=false > > >> The > > >> > > >> > > >> Thanks. > > >> Hamed. > > >> > > >> > > >> > > >> On Tue, 23 Oct 2018 at 10:21, Eric Berger wrote: > > >> > > >>> Hi Hamed, > > >>> I disagree with your criticism. > > >>> For a random variable X > > >>> X: D - - - > R > > >>> its CDF F is defined by > > >>> F: R - - - > [0,1] > > >>> F(z) = Prob(X <= z) > > >>> > > >>> The fact that you wrote a convenient formula for the CDF > > >>> F(z) = (z-a)/(b-a) a <= z <= b > > >>> in a particular range for z is your decision, and as you noted this > > >>> formula will give the wrong value for z outside the interval [a,b]. > > >>> But the problem lies in your formula, not the definition of the CDF > > >>> which would be, in your case: > > >>> > > >>> F(z) = 0 if z <= a > > >>>= (z-a)/(b-a) if a <= z <= b > > >>>= 1 if 1 <= z > > >>> > > >>> HTH, > > >>> Eric > > >>> > > >>> > > >>> > > >>> > > >>> On Tue, Oct 23, 2018 at 12:05 PM Hamed Ha wrote: > > >>> > > >>>> Hi All, > > >>>> > > >>>> I recently discovered an interesting issue with the punif() function. > > >>>> Let > > >>>> X~Uiform[a,b] then the CDF is defined by F(x)=(x-a)/(b-a) fo
Re: [R] Ignoring the domain of RV in punif()
Before the ticket finally enters the waste bin, I think it is necessary to explicitly explain what is meant by the "domain" of a random variable. This is not (though in special cases could be) the space of possible values of the random variable. Definition of (real-valued) Random Variable (RV): Let Z be a probability space, i.e. a set {z} of entities z on which a probability distribution is defined. The entities z do not need to be numeric. A real-valued RV X is a function X:Z --> R defined on Z such that, for any z in Z, X(z) is a real number. The set Z, in tthis context, is (by definitipon) the *domain* of X, i.e. the space on which X is defined. It may or may not be (and usually is not) the same as the set of possible values of X. Then. given any real value x0, the CDF of X at x- is Prob[X <= X0]. The distribution function of X does not define the domain of X. As a simple exam[ple: Suppose Q is a cube of side A, consisting of points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space of points z with a uniform distribution of position within Q. Define the random variable X:Q --> [0,1] as X(u,v,w) = x/A Then X is uniformly distributed on [0,1], the domain of X is Q. Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x, for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible values of X is 1-dimensional, and is not the same as the domain of X, which is 3-dimensional. Hopiong this helps! Ted. On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote: > Yes, now it makes more sense. > > Okay, I think that I am convinced and we can close this ticket. > > Thanks Eric. > Regards, > Hamed. > > On Tue, 23 Oct 2018 at 10:42, Eric Berger wrote: > > > Hi Hamed, > > That reference is sloppy. Try looking at > > https://en.wikipedia.org/wiki/Cumulative_distribution_function > > and in particular the first example which deals with a Unif[0,1] r.v. > > > > Best, > > Eric > > > > > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha wrote: > > > >> Hi Eric, > >> > >> Thank you for your reply. > >> > >> I should say that your justification makes sense to me. However, I am in > >> doubt that CDF defines by the Pr(x <= X) for all X? that is the domain of > >> RV is totally ignored in the definition. > >> > >> It makes a conflict between the formula and the theoretical definition. > >> > >> Please see page 115 in > >> > >> https://books.google.co.uk/books?id=FEE8D1tRl30C&printsec=frontcover&dq=statistical+distribution&hl=en&sa=X&ved=0ahUKEwjp3PGZmJzeAhUQqxoKHV7OBJgQ6AEIKTAA#v=onepage&q=uniform&f=false > >> The > >> > >> > >> Thanks. > >> Hamed. > >> > >> > >> > >> On Tue, 23 Oct 2018 at 10:21, Eric Berger wrote: > >> > >>> Hi Hamed, > >>> I disagree with your criticism. > >>> For a random variable X > >>> X: D - - - > R > >>> its CDF F is defined by > >>> F: R - - - > [0,1] > >>> F(z) = Prob(X <= z) > >>> > >>> The fact that you wrote a convenient formula for the CDF > >>> F(z) = (z-a)/(b-a) a <= z <= b > >>> in a particular range for z is your decision, and as you noted this > >>> formula will give the wrong value for z outside the interval [a,b]. > >>> But the problem lies in your formula, not the definition of the CDF > >>> which would be, in your case: > >>> > >>> F(z) = 0 if z <= a > >>>= (z-a)/(b-a) if a <= z <= b > >>>= 1 if 1 <= z > >>> > >>> HTH, > >>> Eric > >>> > >>> > >>> > >>> > >>> On Tue, Oct 23, 2018 at 12:05 PM Hamed Ha wrote: > >>> > >>>> Hi All, > >>>> > >>>> I recently discovered an interesting issue with the punif() function. > >>>> Let > >>>> X~Uiform[a,b] then the CDF is defined by F(x)=(x-a)/(b-a) for (a<= x<= > >>>> b). > >>>> The important fact here is the domain of the random variable X. Having > >>>> said > >>>> that, R returns CDF for any value in the real domain. > >>>> > >>>> I understand that one can justify this by extending the domain of X and > >>>> assigning zero probabilities to the values outside the domain. However, > >>>> theoretically, it is not true to return a value for the CDF outside the > >>>> d
Re: [R] differing behavior of mean(), median() and sd() with na.rm
I think that one can usefully look at this question from the point of view of what "NaN" and "NA" are abbreviations for (at any rate, according to the understanding I have adopted since many years -- maybe over-simplified). NaN: Mot a Number NA: Not Available So NA is typically used for missing values, whereas NaN represents the reults of numerical calculations which cannot give a result which is a definite number, Hence 0/0 is not a number, so NaN; similarly Inf/Inf. Thus, with your x <- c(NA, NA, NA) mean(x, na.rm=TRUE) sum(x, na.rm=TRUE) = 0, since the set of values of x with na.rm=TRUE is empty so the number of elements in x is 0; hence mean = 0/0 = NaN. But for median(x, na.rm=TRUE), because there are no available elements in x with na.rm=TRUE, and the median is found by searching among available elements for the value which divides the set of values into two halves, the median is not available, hence NA. Best wishes to all, Ted. On Wed, 2018-08-22 at 11:24 -0400, Marc Schwartz via R-help wrote: > Hi, > > It might even be worthwhile to review this recent thread on R-Devel: > > https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html > > which touches upon a subtly related topic vis-a-vis NaN handling. > > Regards, > > Marc Schwartz > > > > On Aug 22, 2018, at 10:55 AM, Bert Gunter wrote: > > > > ... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) = > > 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives > > NaN. So you can see the sorts of issues you may need to consider. > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter wrote: > > > >> Actually, the dissonance is a bit more basic. > >> > >> After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So > >> what you see is actually: > >> > >>> z <- numeric(0) > >>> mean(z) > >> [1] NaN > >>> median(z) > >> [1] NA > >>> sd(z) > >> [1] NA > >>> sum(z) > >> [1] 0 > >> etc. > >> > >> I imagine that there may be more of these little inconsistencies due to > >> the organic way R evolved over time. What the conventions should be can be > >> purely a matter of personal opinion in the absence of accepted standards. > >> But I would look to see what accepted standards were, if any, first. > >> > >> -- Bert > >> > >> > >> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra wrote: > >> > >>> Dear useRs, > >>> > >>> I have just noticed that when input is only NA with na.rm=TRUE, mean() > >>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all > >>> be the same? I think NA makes more sense than NaN in that case. > >>> > >>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1] > >>> NAsd(x, na.rm=TRUE) [1] NA > >>> > >>> Thanks for any feedback. > >>> > >>> Best, > >>> Ivan > >>> > >>> -- > >>> Dr. Ivan Calandra > >>> TraCEr, laboratory for Traceology and Controlled Experiments > >>> MONREPOS Archaeological Research Centre and > >>> Museum for Human Behavioural Evolution > >>> Schloss Monrepos > >>> 56567 Neuwied, Germany > >>> +49 (0) 2631 9772-243 > >>> https://www.researchgate.net/profile/Ivan_Calandra > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error custom indicator Quantstrat colnames
Pietro, Please post this to r-help@r-project.org not to r-help-ow...@r-project.org which is a mailing liat concerned with list management, and does not deal with questions regarding the use of R. Best wishes, Ted. On Sat, 2018-07-14 at 13:04 +, Pietro Fabbro via R-help wrote: > I will try to be as clear as possible as I have been rebuked by some users. I > deleted the last questions and I will try to be sufficiently explicative in > this one. I apologize if the data I will insert will not be enough. > > So, I am trying to run a strategy through the package Quantstrat. > > install.packages("quantstrat") > My problem is that I get the following error > > Error incolnames<-(tmp, value = seq(ncol(tmp_val))) : > attempt to set 'colnames' on an object with less than two dimensions > > when I try to run the following command: > > > out <- applyStrategy(strategy=strategy.st,portfolios=portfolio.st) > I do not have this problem if I use, as indicator, one or more indicators, > which are already defined by the package TTR. > > I have this error only when I try to use a custom indicator. Here is the code > for the custom indicator that I use: > > wma <- WMA(Cl(mktdata), 4, wts=c(1:4)) > wmamaxt <- rollmaxr(wma, 30, fill = NA) > wmamint <- - rollmaxr(- wma, 30, fill = NA) > CNOwma <- function (mktdata=quote(mktdata),x) {(wma - wmamint) / (wmamaxt - > wmamint)} > Please refer to the following code: > > library(devtools) > library(quantmod) > library(quantstrat) > library(TTR) > library(png) > library(IKTrading) > > wma <- WMA(Cl(mktdata), 4, wts=c(1:4)) > wmamaxt <- rollmaxr(wma, 30, fill = NA) > wmamint <- - rollmaxr(- wma, 30, fill = NA) > CNOwma <- function (mktdata=quote(mktdata),x) {(wma - wmamint) / (wmamaxt - > wmamint)} > initdate <- "2010-01-01" > from <- "2012-01-01" #start of backtest > to <- "2017-31-12" #end of backtest > > Sys.setenv(TZ= "EST") #Set up environment for timestamps > > currency("USD") #Set up environment for currency to be used > > symbols <- c("RUT", "IXIC") #symbols used in our backtest > getSymbols(Symbols = symbols, src = "google", from=from, to=to, adjust = > TRUE) #receive data from google finance, adjusted for splits/dividends > > stock(symbols, currency = "USD", multiplier = 1) #tells quanstrat what > instruments present and what currency to use > > tradesize <-1 #default trade size > initeq <- 10 #default initial equity in our portfolio > > strategy.st <- portfolio.st <- account.st <- "firststrat" #naming strategy, > portfolio and account > > #removes old portfolio and strategy from environment > rm.strat(portfolio.st) > rm.strat(strategy.st) > > #initialize portfolio, account, orders and strategy objects > initPortf(portfolio.st, symbols = symbols, initDate = initdate, currency = > "USD") > > initAcct(account.st, portfolios = portfolio.st, initDate = initdate, currency > = "USD", initEq = initeq) > > initOrders(portfolio.st, initDate = initdate) > strategy(strategy.st, store=TRUE) > > add.indicator(strategy = strategy.st, > name = 'CNOwma', > arguments = list(x = quote(Cl(mktdata)), n=4), > label = 'CNOwma4') > > > > > > add.signal(strategy.st, name = "sigThreshold", > arguments = list(column = "CNOwma4", threshold = 0.6, > relationship = "gt", cross = TRUE), > label = "longthreshold") > > > add.signal(strategy.st, name = "sigThreshold", > arguments = list(column = "CNOwma4", threshold = 0.6, > relationship = "lt", cross = TRUE), > label = "shortthreshold") > > > > > add.rule(strategy.st, name = "ruleSignal", > arguments = list(sigcol = "longthreshold", sigval = TRUE, > orderqty = "all", ordertype = "market", > orderside = "long", replace = FALSE, > prefer = "Open"), > type = "enter") > > > add.rule(strategy.st, name = "ruleSignal", > arguments = list(sigcol = "shortthreshold", sigval = TRUE, > orderqty = "all", ordertype = "market", > orderside = "long", replace = FALSE, > prefer = "Open"), > type = "exit") > > add.rule(strategy.st, name = "ruleSignal", > arguments = list(sigcol = "shortthreshold", sigval = TRUE, > orderqty = "all", ordertype = "market", > orderside = "short", replac
Re: [R] prod(NaN, NA) vs. prod(NA, NaN)
I've been following this thread, and wondering where it might lead. My (possibly naive) view of these matters is basically logical, relying on (possibly over-simplified) interpretaions of "NA" and "NaN". These are that: "NaN" means "Not a Number", though it can result from a numerical calculation, e.g. '0/0' or 'Inf/Inf', while: "NA" means "Not Available" (e.g. "Missing Value"), but should be interpreted as in rhe appropriate class of its context -- so '2 + NA' interporets "NA" as numeric, while 'vec("abc",NA)' interprets "NA" as character. On that basis, the result of 'sum(NaN, )' should be "NaN", since 'sum' presumes that its arguments are numeric, and the sum of is not a number. Likewise 'sum(, NaN)' should be NaN. And in both of 'sum(NA, NaN) and sum(NaN, NA), the "NA" should be interepreted as a "not-available number", and because of the "NaN" the result cannot be a number, hence is "NaN". So it sould seem that Martin Møller Skarbiniks Pedersen's inconsistency: sum(c(NaN,NA)) [1] NaN sum(NaN,NA) [1] NA is not consistent with the above reasoning. However, in my R version 2.14.0 (2011-10-31): sum(NaN,NA) [1] NA sum(NA,NaN) [1] NA which **is** consistent! Hmmm... Best wishes to all, Ted. On Wed, 2018-07-04 at 12:06 +0100, Barry Rowlingson wrote: > I'm having deja-vu of a similar discussion on R-devel: > > https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html > > This was the funniest inconsistency I could find: > > > sum(c(NaN,NA)) > [1] NaN > > sum(NaN,NA) > [1] NA > > THEY'RE IN THE SAME ORDER!!! > > The doc in ?NaN has this clause: > > In R, basically all mathematical functions (including basic > ‘Arithmetic’), are supposed to work properly with ‘+/- Inf’ and > ‘NaN’ as input or output. > > which doesn't define "properly", but you'd think commutativity was a > "proper" property of addition. So although they "are supposed to" they > don't. Naughty mathematical functions! > > And then there's... > > Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’: > which of those two is not guaranteed and may depend on the R > platform (since compilers may re-order computations). > > Which is at least telling you there is vagueness in the system. But > hey, mathematics is not a precise science... oh wait... > > Barry > > > > > > On Tue, Jul 3, 2018 at 10:09 PM, Rolf Turner wrote: > > > > On 04/07/18 00:24, Martin Møller Skarbiniks Pedersen wrote: > > > >> Hi, > >>I am currently using R v3.4.4 and I just discovered this: > >> > >>> prod(NA, NaN) ; prod(NaN, NA) > >> [1] NA > >> [1] NaN > >> > >> ?prod says: > >> If ‘na.rm’ is ‘FALSE’ an ‘NA’ value in any of the arguments will > >> cause a value of ‘NA’ to be returned, otherwise ‘NA’ values are > >> ignored. > >> > >> So according to the manual-page for prod() NA should be returned in both > >> cases? > >> > >> > >> However for sum() is opposite is true: > >>> sum(NA, NaN) ; sum(NaN, NA) > >> [1] NA > >> [1] NA > >> > >> ?sum says: > >> If ‘na.rm’ is ‘FALSE’ an ‘NA’ or ‘NaN’ value in any of the > >> arguments will cause a value of ‘NA’ or ‘NaN’ to be returned, > >> otherwise ‘NA’ and ‘NaN’ values are ignored. > >> > >> > >> Maybe the manual for prod() should say the same as sum() that > >> both NA and NaN can be returned? > > > > But: > > > > > sum(NA,NaN) > > [1] NA > > > sum(NaN,NA) > > [1] NA > > > > so sum gives NA "both ways around". Perhaps a slight inconsistency > > here? I doubt that it's worth losing any sleep over, however. > > > > Interestingly (???): > > > > > NaN*NA > > [1] NaN > > > NA*NaN > > [1] NA > > > NaN+NA > > [1] NaN > > > NA+NaN > > [1] NA > > > > So we have an instance of non-commutative arithmetic operations. And > > sum() is a wee bit inconsistent with "+". > > > > Again I doubt that the implications are all that serious. > > > > cheers, > > > > Rolf Turner > > > > -- > > Technical Editor ANZJS > > Department of Statistics > > University of Auc
Re: [R] R maintains old values
On Tue, Jul 3, 2018 at 9:25 AM, J C Nash wrote: > > > . . . Now, to add to the controversy, how do you set a computer on fire? > > > > JN Perhaps by exploring the context of this thread, where new values strike a match with old values??? Ted __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT --- grammar.
On Mon, 2018-06-25 at 09:46 +1200, Rolf Turner wrote: > Does/should one say "the degrees of freedom is defined to be" or "the > degrees of freedom are defined to be"? > > Although value of "degrees of freedom" is a single number, the first > formulation sounds very odd to my ear. > > I would like to call upon the collective wisdom of the R community to > help me decide. > > Thanks, and my apologies for the off-topic post. > > cheers, > Rolf Turner Interesting question, Rolf! >From my point of view. I see "degrees of freedon" as a plural noun, because of "degrees". But in some cases, we have only 1 degree of freedon. Then the degrees of freedon is 1. But we do not say, in that case, "the degree of freedom is defined to be", or the degree of freedom are 1" Nor would we say "The degrees of freedom are 19".! So I thonk that the solution is to encapsulate the term within aingle quotes, so that it becomes a singular entity. Thus: The 'degrees of freedom' is defined to be ... "; and The 'degrees of freedom' is 1. Or The degrees of freedom' is 19. This is not the same issue as (one of my prime hates) saying "the data is srored in the dataframe ... ". "Data" is a plural noun (ainguler "datum"), and I would insist on "the data are stored ... ". The French use "une donnee" and "les donnees"; the Germans use "ein Datum", "der Daten"; so they know what they're doing! English-speakers mostly do not" Best wishes to all, Ted. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mysterious rounding digits output
Thanks Martin! Good to know that you have made this important change, And, regarding Maybe we should additionally say that this is *not* round()ing, and give a link to the help for signif() ? I think that also would be most useful. In fact, ?signif leads to a whole survey of "Rounding of Numbers", covering the functions ceiling(), floor(), trunc(), round(), signif(). Well worth reading! Best wishes, Ted. On Thu, 2018-05-31 at 08:58 +0200, Martin Maechler wrote: > >>>>> Ted Harding > >>>>> on Thu, 31 May 2018 07:10:32 +0100 writes: > > > Well pointed out, Jim! > > > It is infortunate that the documentation for options(digits=...) > > does not mention that these are *significant digits* > > and not *decimal places* (which is what Joshua seems to want): > > Since R 3.4.0 the help on ?options *does* say significant! > > The change in R's source code was this one, 14 months ago : > > r72380 | maechler | 2017-03-21 11:28:13 +0100 (Tue, 21. Mar 2017) | 2 Zeilen > GeƤnderte Pfade: >M /trunk/src/library/base/man/options.Rd > > digits: + "signficant" > > > and since then, the text has been > > ‘digits’: controls the number of significant digits to print when > printing numeric values. It is a suggestion only. > ... > > whereas what you (Ted) cite is from R 3.3.x and earlier > > > "‘digits’: controls the number of digits to print when > > printing numeric values." > > Maybe we should additionally say that this is *not* round()ing, > and give a link to the help for signif() ? > > > > On the face of it, printing the value "0,517" of 'ccc' looks > > like printing 4 digits! Joshua's could look even worse if 'ddd' > > had values in the 1000s! > > > To achieve exactly what Joshua seems to want, use the round() > > function. Starting with his original assignment of values to > > the variable itemInfo, the result of round(itemInfo,digits=3) is: > > > aaa bbb ccc dddeee > > skill 1.396 6.225 0.517 5.775 2.497 > > predict 1.326 5.230 0.462 5.116 -2.673 > > waiting 1.117 4.948NANA NA > > complex 1.237 4.170 0.220 4.713 5.642 > > novelty 1.054 4.005 0.442 4.260 2.076 > > creative 1.031 3.561 0.362 3.689 2.549 > > evaluated 0.963 3.013NANA NA > > body 0.748 2.238 0.596 2.019 0.466 > > control 0.620 2.149NANA NA > > stakes0.541 1.905 0.227 2.020 2.343 > > spont 0.496 1.620NANA NA > > chatter 0.460 1.563 0.361 1.382 0.540 > > present 0.428 1.236 0.215 1.804 2.194 > > reward0.402 1.101 0.288 1.208 0.890 > > feedback 0.283 0.662NANA NA > > goal 0.237 0.474NANA NA > > > Best wishes to all, > > Ted. > > > > On Thu, 2018-05-31 at 15:30 +1000, Jim Lemon wrote: > >> Hi Joshua, > >> Because there are no values in column ddd less than 1. > >> > >> itemInfo[3,"ddd"]<-0.3645372 > >> itemInfo > >> aaa bbb ccc dddeee > >> skill 1.396 6.225 0.517 5.775 2.497 > >> predict 1.326 5.230 0.462 5.116 -2.673 > >> waiting 1.117 4.948NA 0.365 NA > >> complex 1.237 4.170 0.220 4.713 5.642 > >> novelty 1.054 4.005 0.442 4.260 2.076 > >> creative 1.031 3.561 0.362 3.689 2.549 > >> evaluated 0.963 3.013NANA NA > >> body 0.748 2.238 0.596 2.019 0.466 > >> control 0.620 2.149NANA NA > >> stakes0.541 1.905 0.227 2.020 2.343 > >> spont 0.496 1.620NANA NA > >> chatter 0.460 1.563 0.361 1.382 0.540 > >> present 0.428 1.236 0.215 1.804 2.194 > >> reward0.402 1.101 0.288 1.208 0.890 > >> feedback 0.283 0.662NANA NA > >> goal 0.237 0.474NANA NA > >> > >> digits specifies the number of significant digits ("It is a suggestion > >> only"), so when at least one number is padded with a leading zero it > >> affects the formatting of the other numbers in that column. I suspect > >> that this is an esthetic consideration to line
Re: [R] mysterious rounding digits output
Well pointed out, Jim! It is infortunate that the documentation for options(digits=...) does not mention that these are *significant digits* and not *decimal places* (which is what Joshua seems to want): "‘digits’: controls the number of digits to print when printing numeric values." On the face of it, printing the value "0,517" of 'ccc' looks like printing 4 digits! Joshua's could look even worse if 'ddd' had values in the 1000s! To achieve exactly what Joshua seems to want, use the round() function. Starting with his original assignment of values to the variable itemInfo, the result of round(itemInfo,digits=3) is: aaa bbb ccc dddeee skill 1.396 6.225 0.517 5.775 2.497 predict 1.326 5.230 0.462 5.116 -2.673 waiting 1.117 4.948NANA NA complex 1.237 4.170 0.220 4.713 5.642 novelty 1.054 4.005 0.442 4.260 2.076 creative 1.031 3.561 0.362 3.689 2.549 evaluated 0.963 3.013NANA NA body 0.748 2.238 0.596 2.019 0.466 control 0.620 2.149NANA NA stakes0.541 1.905 0.227 2.020 2.343 spont 0.496 1.620NANA NA chatter 0.460 1.563 0.361 1.382 0.540 present 0.428 1.236 0.215 1.804 2.194 reward0.402 1.101 0.288 1.208 0.890 feedback 0.283 0.662NANA NA goal 0.237 0.474NANA NA Best wishes to all, Ted. On Thu, 2018-05-31 at 15:30 +1000, Jim Lemon wrote: > Hi Joshua, > Because there are no values in column ddd less than 1. > > itemInfo[3,"ddd"]<-0.3645372 > itemInfo > aaa bbb ccc dddeee > skill 1.396 6.225 0.517 5.775 2.497 > predict 1.326 5.230 0.462 5.116 -2.673 > waiting 1.117 4.948NA 0.365 NA > complex 1.237 4.170 0.220 4.713 5.642 > novelty 1.054 4.005 0.442 4.260 2.076 > creative 1.031 3.561 0.362 3.689 2.549 > evaluated 0.963 3.013NANA NA > body 0.748 2.238 0.596 2.019 0.466 > control 0.620 2.149NANA NA > stakes0.541 1.905 0.227 2.020 2.343 > spont 0.496 1.620NANA NA > chatter 0.460 1.563 0.361 1.382 0.540 > present 0.428 1.236 0.215 1.804 2.194 > reward0.402 1.101 0.288 1.208 0.890 > feedback 0.283 0.662NANA NA > goal 0.237 0.474NANA NA > > digits specifies the number of significant digits ("It is a suggestion > only"), so when at least one number is padded with a leading zero it > affects the formatting of the other numbers in that column. I suspect > that this is an esthetic consideration to line up the decimal points. > > Jim > > > On Thu, May 31, 2018 at 11:18 AM, Joshua N Pritikin > wrote: > > R version 3.5.0 (2018-04-23) -- "Joy in Playing" > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > options(digits=3) > > > > itemInfo <- structure(list("aaa" = c(1.39633732316667, 1.32598263816667, > > 1.1165832407, 1.23651072616667, 1.0536867998, 1.0310073738, > > 0.9630728395, 0.7483865045, 0.62008664617, 0.5411017985, > > 0.49639760783, 0.45952804467, 0.42787704783, 0.40208597967, > > 0.28316118123, 0.23689627723), "bbb" = c(6.22533860696667, > > 5.229736804, 4.94816041772833, 4.17020503255333, 4.00453781427167, > > 3.56058007398333, 3.0125202404, 2.2378235873, 2.14863910661167, > > 1.90460903044777, 1.62001089796667, 1.56341257968151, 1.23618558850667, > > 1.10086688908262, 0.661981500639833, 0.47397754310745), "ccc" = > > c(0.5165911355, 0.46220470767, NA, 0.21963592433, > > 0.44186378083, 0.36150286583, NA, 0.59613794667, NA, > > 0.22698477157, NA, 0.36092266158, 0.2145347068, 0.28775624948, > > NA, NA ), "ddd" = c(5.77538400186667, 5.115877113, NA, 4.71294520316667, > > 4.25952652129833, 3.68879921863167, NA, 2.01942456211145, NA, > > 2.02032557108, NA, 1.381810875 > 9571, 1.80436759778167, 1.20789851993367, NA, NA), "eee" = > c(2.4972534717, -2.67340172316667, NA, 5.6419520633, > 2.0763355523, 2.548949539, NA, 0.465537272243167, NA, 2.34255027516667, > NA, 0.5400824922975, 2.1935000655, 0.89000797687, NA, NA)), row.names = > c("skill", "predict", "waiting", "complex", "novelty", "creative", > "evaluated", "body", "control", "stakes", "spont", "chatter", "present", > "reward", "feedback", "goal"), class = "data.frame") > > > > itemInfo # examine column ddd > > > > When I try this, column ddd has 1 fewer digits than expected. See
[R] TEST message
Apologies for disturbance! Just checking that I can get through to r-help. Ted. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hacked
Not necessarily. The R-help archives are publicly accessible, with a "sort by date" option. So if someone sets up a web=page monitor which reports back when new messages appear there (at the bpottom end), then their email addresses are readily copied (subject to " at " --> "@". Once they have the address then anything can happen! Best wishes, Ted (eagerly awaiting attempted seduction ... ). On Wed, 2018-04-18 at 10:36 +, Fowler, Mark wrote: > Seems it must be the R-list. A horde of ‘solicitation’ emails began arriving > about 27 minutes after I posted about not seeing any! Had left work by that > time, so did not encounter them until now. > > From: Mark Leeds [mailto:marklee...@gmail.com] > Sent: April 18, 2018 12:33 AM > To: Rui Barradas > Cc: Ding, Yuan Chun; Fowler, Mark; Luis Puerto; Peter Langfelder; R-Help ML > R-Project; Neotropical bat risk assessments > Subject: Re: [R] Hacked > > Hi All: I lately get a lot more spam-porn type emails lately also but I don't > know if they are due to me being on > the R-list. > > > On Tue, Apr 17, 2018 at 5:09 PM, Rui Barradas > mailto:ruipbarra...@sapo.pt>> wrote: > Hello, > > Nor do I, no gmail, also got spam. > > Rui Barradas > > On 4/17/2018 8:34 PM, Ding, Yuan Chun wrote: > No, I do not use gmail, still got dirty spam email twice. > > -Original Message- > From: R-help > [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On > Behalf Of Fowler, Mark > Sent: Tuesday, April 17, 2018 12:32 PM > To: Luis Puerto; Peter Langfelder > Cc: R-Help ML R-Project; Neotropical bat risk assessments > Subject: Re: [R] Hacked > > [Attention: This email came from an external source. Do not open attachments > or click on links from unknown senders or unexpected emails.] > > > > > > Just an observation. I have not seen the spam you are discussing. Possibly it > is specific to gmail addresses? > > -Original Message- > From: R-help > [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On > Behalf Of Luis Puerto > Sent: April 17, 2018 4:11 PM > To: Peter Langfelder > Cc: R-Help ML R-Project; Neotropical bat risk assessments > Subject: Re: [R] Hacked > > Hi! > > This happened to me also! I just got a spam email just after posting and then > in following days I got obnoxious spam emails in my spam filter. As the > others, I think that there is some kind of bot subscribed to the list, but > also perhaps a spider or crawler monitoring the R-Help archive and getting > email addresses there. Nabble is a possibility too. > On 17 Apr 2018, at 21:50, Peter Langfelder > mailto:peter.langfel...@gmail.com>> wrote: > > I got some spam emails after my last post to the list, and the emails > did not seem to go through r-help. The spammers may be subscribed to > the r-help, or they get the poster emails from some of the web copies > of this list (nabble or similar). > > Peter > > On Tue, Apr 17, 2018 at 11:37 AM, Ulrik Stervbo > mailto:ulrik.ster...@gmail.com>> wrote: > I asked the moderators about it. This is the reply > > "Other moderators have looked into this a bit and may be able to shed > more light on it. This is a "new" tactic where the spammers appear to > reply to the r-help post. They are not, however, going through the r-help > server. > > It also seems that this does not happen to everyone. > > I am not sure how you can automatically block the spammers. > > Sorry I cannot be of more help." > > --Ulrik > > Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>> > schrieb am Di., 17. Apr. > 2018, > 14:59: > Likely a spammer has joined the mailing list and is auto-replying to > posts made to the list. Unlikely that the list itself has been > "hacked". Agree that it is obnoxious. > > On April 17, 2018 5:01:10 AM PDT, Neotropical bat risk assessments < > neotropical.b...@gmail.com<mailto:neotropical.b...@gmail.com>> wrote: > Hi all, > > Site has been hacked? > Bad SPAM arriving > > __ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > > __ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To > UNSUBSCRIB
Re: [R] R help
A. On Sat, 2018-03-31 at 15:45 +0200, Henri Moolman wrote: > Could you please provide help with something from R that I find rather > puzzling? In the small program below x[1]=1, . . . , x[5]=5. R also > finds that x[1]<=5 is TRUE. Yet when you attempt to execute while, R does > not seem to recognize the condition. Any thoughts on why this happens? > > Regards > > Henri Moolman > > > x=c(1,2,3,4,5) > > x[1] > [1] 1 > > i=1 > > x[1]<=5 > [1] TRUE > > while(x[i]<=5){ > + i=i+1 > + } > Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed If you run the following you should understand why (the only change is to include "print(i)" in the loop, so you can see what is happening): x=c(1,2,3,4,5) x[1] # [1] 1 i=1 x[1]<=5 # [1] TRUE while(x[i]<=5){ i = i+1 ; print(i) } # [1] 3 # [1] 4 # [1] 5 # [1] 6 # Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed So everything is fine so long as i <= 5 (i.e. x[i] <= 5), but then the loop sets i = 6. and then: i # [1] 6 x[i] # [1] NA x[i] <= 5 # [1] NA Helpful? Best wishes, Ted. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substr gives empty output
On Sun, 2018-01-21 at 09:59 +0100, Luigi Marongiu wrote: > Dear all, > I have a string, let's say "testing", and I would like to extract in > sequence each letter (character) from it. But when I use substr() I only > properly get the first character, the rest is empty (""). What am I getting > wrong? > For example, I have this code: > > >>> > x <- "testing" > k <- nchar(x) > for (i in 1:k) { > y <- substr(x, i, 1) > print(y) > } >From the help page substr(x, start, stop) where 'start' is the position in the character vector x at which the substring starts, and 'stop' is the position at which it stops. Hence 'stop' must be >= 'start'; and if they are equal then you get just the single character. That is the case in your code, when i=1; when i > 1 then stop < start, so you get nothing. Compare with: x <- "testing" k <- nchar(x) for (i in 1:k) { y <- substr(x, i, i) ### was: substr(x, i, 1) print(y) } [1] "t" [1] "e" [1] "s" [1] "t" [1] "i" [1] "n" [1] "g" Hoping this helps, Ted. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How (in general) take out data sets (available in the packages)?
Suzen, thank you very much for your so useful information (I will try to understand it)! And my sincere gratitude to the moderator! >"Suzen, Mehmet" < msu...@gmail.com >: >I also suggest you Hadley's optimized package for interoperating xls >files with R: >https://github.com/tidyverse/readxl >https://cran.r-project.org/web/packages/readxl/index.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How export data set (available in the package) from R?
"Data set flchain available in the survival package". How can I get it (from R) as Excel file? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: How to multiplying y-axis ticks value by 100?
Many thanks, Jim!!! >Jim Lemon < drjimle...@gmail.com >: >Have a look at axis.mult in the plotrix package. >Jim >>iPad via R-help < r-help@r-project.org > wrote: >> How to multiplying y-axis ticks value by 100 (without put the % symbol next >> to the number) here: >> plot (CI.overall, curvlab=c("Discharge", "Death"), xlab="Days", las=1) >> P.S. So instead 0.00, 0.20 etc. the 0, 20 etc. __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Precision of values > 53 bits
On Thu, 2017-07-20 at 14:33 +0200, peter dalgaard wrote: > > On 10 Jan 2013, at 15:56 , S Ellison wrote: > > > > > > > >> I am working with large numbers and identified that R looses > >> precision for such high numbers. > > Yes. R uses standard 32-bit double precision. > > > Well, for large values of 32... such as 64. Hmmm ... Peter, as one of your compatriots (guess who) once solemnly said to me: 2 plus 2 is never equal to 5 -- not even for large values of 2. Best wishes, Ted. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Logical Operators' inconsistent Behavior
[I unadvertently sent my reply below to Jeremie, instead of R-help. Also, I havve had an additional thought which may clarify things for R users]. [Original reply]: The point about this is that (as Rolf wrote) FALSE & (anything) is FALSE, provided logical NA is either TRUE ot FALSE but, because the "NA" says that it is not known which it is, it could be "anything". And, indeed, if "NA" is given the "missing" meaning and if we assume that a missing logical value did indeed have a value (necessarily either TRUE or FALSE), then it follows logically that FALSE & NA = FALS£. On the other hand, if with the "missing" interpretation of "NA" we don't even know that it is a logical, then it might be fair enough to say FALSE & NA = NA. Ted. [Additional thought]: Testing to see what would happen if the NA were not loigical, I put myself (not being logical ... ) on the line, facing up to R: X <- "Ted" FALSE & X Error in FALSE & X : operations are possible only for numeric, logical or complex types So R will refuse to deal with any variable which cannot partake in a logical expression. Ted. On Fri, 2017-05-19 at 14:24 +0200, Jérémie Juste wrote: > My apologies if I was not clear enough, > > TRUE & NA could be either TRUE or FALSE and consequently is NA. > why is FALSE & NA = FALSE? NA could be TRUE or FALSE, so FALSE & NA > should be NA? > > > On Fri, May 19, 2017 at 2:13 PM, Rolf Turner > wrote: > > > On 20/05/17 00:01, Jérémie Juste wrote: > > > >> Hello, > >> > >> Rolf said, > >> > >> TRUE & FALSE is FALSE but TRUE & TRUE is TRUE, so TRUE & NA could be > >> either TRUE or FALSE and consequently is NA. > >> > >> OTOH FALSE & (anything) is FALSE so FALSE & NA is FALSE. > >> > >> > >> According to this logic why is > >> > >> FALSE & NA > >> > >> [1] FALSE > >> > > > > Huh > > > > > > cheers, > > > > Rolf Turner > > > > -- > > Technical Editor ANZJS > > Department of Statistics > > University of Auckland > > Phone: +64-9-373-7599 ext. 88276 > > > > > > -- > Jérémie Juste > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interesting quirk with fractions and rounding
I've been following this thread with interest. A nice collection of things to watch out for, if you don't want the small arithmetic errors due to finite-length digital representations of fractions to cause trouble! However, as well as these small discrepancies, major malfunctions can also result. Back on Dec 22, 2013, I posted a Christmas Greetings message to R-help: Season's Greetings (and great news ... )! which starts: Greetings All! With the Festive Season fast approaching, I bring you joy with the news (which you will surely wish to celebrate) that R cannot do arithmetic! Usually, this is manifest in a trivial way when users report puzzlement that, for instance, sqrt(pi)^2 == pi # [1] FALSE which is the result of a (generally trivial) rounding or truncation error: sqrt(pi)^2 - pi # [1] -4.440892e-16 But for some very simple calculations R goes off its head. And the example given is: Consider a sequence generated by the recurrence relation x[n+1] = 2*x[n] if 0 <= x[n] <= 1/2 x[n+1] = 2*(1 - x[n]) if 1/2 < x[n] <= 1 (for 0 <= x[n] <= 1). This has equilibrium points (x[n+1] = x[n]) at x[n] = 0 and at x[n] = 2/3: 2/3 -> 2*(1 - 2/3) = 2/3 It also has periodic points, e.g. 2/5 -> 4/5 -> 2/5 (period 2) 2/9 -> 4/9 -> 8/9 -> 2/9 (period 3) The recurrence relation can be implemented as the R function nextx <- function(x){ if( (0<=x)&(x<=1/2) ) {x <- 2*x} else {x <- 2*(1 - x)} } Now have a look at what happens when we start at the equilibrium point x = 2/3: N <- 1 ; x <- 2/3 while(x > 0){ cat(sprintf("%i: %.9f\n",N,x)) x <- nextx(x) ; N <- N+1 } cat(sprintf("%i: %.9f\n",N,x)) For a while [run it and see!], this looks as though it's doing what the arithmetic would lead us to expect: the first 24 results will all be printed as 0.7, which looks fine as 2/3 to 9 places. But then the "little errors" start to creep in: N=25: 0.6 N=28: 0.66672 N=46: 0.667968750 N=47: 0.664062500 N=48: 0.671875000 N=49: 0.65625 N=50: 0.68750 N=51: 0.62500 N=52: 0.75000 N=53: 0.5 N=54: 1.0 N=55: 0.0 What is happening is that, each time R multiplies by 2, the binary representation is shifted up by one and a zero bit is introduced at the bottom end. At N=53, the first binary bit of 'x' is 1, and all the rest are 0, so now 'x' is exactly 0.5 = 1/2, hence the final two are also exact results; 53 is the Machine$double.digits = 53 binary places. So this normally "almost" trivial feature can, for such a simple calculation, lead to chaos or catastrophe (in the literal technical sense). For more detail, including an extension of the above, look at the original posting in the R-help archives for Dec 22, 2013: From: (Ted Harding) Subject: [R] Season's Greetings (and great news ... )! Date: Sun, 22 Dec 2013 09:59:16 - (GMT) (Apologies, but I couldn't track down the URL for this posting in the R-help archives; there were a few follow-ups). I gave this as an example to show that the results of the "little" arithmetic errors (such as have recently been discussed from many aspects) can, in certain contexts, destroy a computation. So be careful to consider what can happen in the particular context you are working with. There are ways to dodge the issue -- such as using the R interface to the 'bc' calculator, which computes arithmetic expressions in a way which is quite different from the fixed-finite-length binary representation and algorithms used, not only by R, but also by many other numerical computation software suites Best wishes to all, Ted. - E-Mail: (Ted Harding) Date: 21-Apr-2017 Time: 22:03:15 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner needs help with R
Bert, your solution seems to presuppose that the programmer knows beforehand that the leading digit in the number is "0" (which in fact is clearly the case in Nabila Arbi's original query). However, the sequence might arise from some process outside of the progammer's contgrol, and may then either have a leading 0 or not.In that case, I think Jim's solution is safer! Best wishes, Ted. On 07-Feb-2017 16:02:18 Bert Gunter wrote: > No need for sprintf(). Simply: > >> paste0("DQ0",seq.int(60054,60060)) > > [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" > [7] "DQ060060" > > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Feb 6, 2017 at 5:45 AM, jim holtman wrote: >> You need the leading zeros, and 'numerics' just give the number without >> leading zeros. You can use 'sprintf' for create a character string with >> the leading zeros: >> >>> # this is using 'numeric' and drops leading zeros >>> >>> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >>> seq1 >> [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060" >>> >>> # use 'sprintf' to create leading zeros >>> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) >>> seq2 >> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" >> "DQ060060" >>> >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi >> wrote: >> >>> Dear R-Help Team! >>> >>> I have some trouble with R. It's probably nothing big, but I can't find a >>> solution. >>> My problem is the following: >>> I am trying to download some sequences from ncbi using the ape package. >>> >>> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >>> >>> sequences <- read.GenBank(seq1, >>> seq.names = seq1, >>> species.names = TRUE, >>> gene.names = FALSE, >>> as.character = TRUE) >>> >>> write.dna(sequences, "mysequences.fas", format = "fasta") >>> >>> My problem is, that R doesn't take the whole sequence number as "060054" >>> but it puts it as DQ60054 (missing the zero in the beginning, which is >>> essential). >>> >>> Could please tell me, how I can get R to accepting the zero in the >>> beginning of the accession number? >>> >>> Thank you very much in advance and all the best! >>> >>> Nabila >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/ >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 07-Feb-2017 Time: 16:48:41 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram first bar wrong position
Willam has listed the lid on the essence of the problem, which is that in R the way that breaks (and therefore counts) in a histogram are evaluated is an area of long grass with lurking snakes! To get a glimpse of this, have a look at ?hist and in the seaction "Arguments", look at "breaks", "freq", "right". Also see under "Details". and, as suggested under "See also", look at ?nclass.Sturges As William suggests, if you know what claa intervals you want, create them yourself! For your example (with N=100), look at: hist(y,freq=TRUE, col='red', breaks=0.5+(0:6)) or hist(y,freq=TRUE, col='red', breaks=0.25+(0:12)/2) Hoping this helps! Best wishes, Ted. On 22-Dec-2016 16:36:34 William Dunlap via R-help wrote: > Looking at the return value of hist will show you what is happening: > >> x <- rep(1:6,10*(6:1)) >> z <- hist(x, freq=TRUE) >> z > $breaks > [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 > > $counts > [1] 60 50 0 40 0 30 0 20 0 10 > ... > > The the first bin is [1-1.5], including both endpoints, while the other > bins include only the upper endpoint. I recommend defining your > own breakpoints, ones don't include possible data points, as in > >> print(hist(x, breaks=seq(min(x)-0.5, max(x)+0.5, by=1), freq=TRUE)) > $breaks > [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 > > $counts > [1] 60 50 40 30 20 10 > ... > > S+ had a 'factor' method for hist() that did this sort of thing, but R does > not. > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Thu, Dec 22, 2016 at 5:17 AM, itpro wrote: > >> Hi, everyone. >> >> >> I stumbled upon weird histogram behaviour. >> >> Consider this "dice emulator": >> Step 1: Generate uniform random array x of size N. >> Step 2: Multiply each item by six and round to next bigger integer to get >> numbers 1 to 6. >> Step 3: Plot histogram. >> >> > x<-runif(N) >> > y<-ceiling(x*6) >> > hist(y,freq=TRUE, col='orange') >> >> >> Now what I get with N=10 >> >> > x<-runif(10) >> > y<-ceiling(x*6) >> > hist(y,freq=TRUE, col='green') >> >> At first glance looks OK. >> >> Now try N=100 >> >> > x<-runif(100) >> > y<-ceiling(x*6) >> > hist(y,freq=TRUE, col='red') >> >> Now first bar is not where it should be. >> Hmm. Look again to 10 histogram... First bar is not where I want it, >> it's only less striking due to narrow bars. >> >> So, first bar is always in wrong position. How do I fix it to make >> perfectly spaced bars? >> >> >> >> >> >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 22-Dec-2016 Time: 17:23:26 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] detecting if a variable has changed
Ah, perhaps I'm beginning to undertstand the question! Presumably the issue is that evaluating X[X<=y] takes O(n) time, where n = length(X), and similarly X[X>y]. So I suppose that one needs to be looking at some procedure for a "bisecting" search for the largest r such that X[r] <= y, which would then be O(log2(n)). Perhaps not altogether straightforward to program, but straqightforward in concept! Apologies for misunderstanding. Ted. On 05-Jun-2016 18:13:15 Bert Gunter wrote: > Nope, Ted. I asked for a O(log(n)) solution, not an O(n) one. > > I will check out the data.table package, as suggested. > > -- Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, Jun 5, 2016 at 11:01 AM, Ted Harding > wrote: >> Surely it is straightforward, since the vector (say 'X') is already sorted? >> >> Example (raw code with explicit example): >> >> set.seed(54321) >> X <- sort(runif(10)) >> # X ## The initial sorted vector >> # [1] 0.04941009 0.17669234 0.20913493 0.21651016 0.27439354 >> # [6] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110 >> >> y <- runif(1) >> # y ## The new value to be inserted >> [1] 0.1366424 >> >> Y <- c(X[X<=y],y,X[X>y]) ## Now insert y into X: >> Y >> [1] 0.04941009 0.13664239 0.17669234 0.20913493 0.21651016 0.27439354 >> [7] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110 >> >> ## And there it is at Y[2] >> >> Easy to make such a function! >> Best wishes to all, >> Ted. >> >> On 05-Jun-2016 17:44:29 Neal H. Walfield wrote: >>> On Sun, 05 Jun 2016 19:34:38 +0200, >>> Bert Gunter wrote: >>>> This help thread suggested a question to me: >>>> >>>> Is there a function in some package that efficiently (I.e. O(log(n)) ) >>>> inserts a single new element into the correct location in an >>>> already-sorted vector? My assumption here is that doing it via sort() >>>> is inefficient, but maybe that is incorrect. Please correct me if so. >>> >>> I think data.table will do this if the the column is marked >>> appropriately. >>> >>>> I realize that it would be straightforward to write such a function, >>>> but I just wondered if it already exists. My google & rseek searches >>>> did not succeed, but maybe I used the wrong keywords. >>>> >>>> Cheers, >>>> Bert >>>> >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming along >>>> and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Sun, Jun 5, 2016 at 9:47 AM, William Dunlap via R-help >>>> wrote: >>>> > I don't know what you mean by "without having to use any special >>>> > interfaces", but "reference classes" will do what I think you want. >>>> > E.g., >>>> > the following makes a class called 'SortedNumeric' that only sorts the >>>> > vector when you want to get its value, not when you append values. It >>>> > stores the sorted vector so it does not get resorted each time you ask >>>> > for >>>> > it. >>>> > >>>> > SortedNumeric <- setRefClass("sortedNumeric", >>>> > fields = list( >>>> > fData = "numeric", >>>> > fIsUnsorted = "logical"), >>>> > methods = list( >>>> > initialize = function(Data = numeric(), isUnsorted = >>>> > TRUE) >>>> > { >>>> > fData <<- Data >>>> > stopifnot(is.logical(isUnsorted), >>>> > length(isUnsorted)==1, >>>> > !is.na(isUnsorted)) >>>> > fIsUnsorted <<- isUnsorted >>>> > }, >>>> > getData = function() { >>>> > if (isUnsorted) { >>>> > fData <<- sort(
Re: [R] detecting if a variable has changed
Surely it is straightforward, since the vector (say 'X') is already sorted? Example (raw code with explicit example): set.seed(54321) X <- sort(runif(10)) # X ## The initial sorted vector # [1] 0.04941009 0.17669234 0.20913493 0.21651016 0.27439354 # [6] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110 y <- runif(1) # y ## The new value to be inserted [1] 0.1366424 Y <- c(X[X<=y],y,X[X>y]) ## Now insert y into X: Y [1] 0.04941009 0.13664239 0.17669234 0.20913493 0.21651016 0.27439354 [7] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110 ## And there it is at Y[2] Easy to make such a function! Best wishes to all, Ted. On 05-Jun-2016 17:44:29 Neal H. Walfield wrote: > On Sun, 05 Jun 2016 19:34:38 +0200, > Bert Gunter wrote: >> This help thread suggested a question to me: >> >> Is there a function in some package that efficiently (I.e. O(log(n)) ) >> inserts a single new element into the correct location in an >> already-sorted vector? My assumption here is that doing it via sort() >> is inefficient, but maybe that is incorrect. Please correct me if so. > > I think data.table will do this if the the column is marked > appropriately. > >> I realize that it would be straightforward to write such a function, >> but I just wondered if it already exists. My google & rseek searches >> did not succeed, but maybe I used the wrong keywords. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sun, Jun 5, 2016 at 9:47 AM, William Dunlap via R-help >> wrote: >> > I don't know what you mean by "without having to use any special >> > interfaces", but "reference classes" will do what I think you want. E.g., >> > the following makes a class called 'SortedNumeric' that only sorts the >> > vector when you want to get its value, not when you append values. It >> > stores the sorted vector so it does not get resorted each time you ask for >> > it. >> > >> > SortedNumeric <- setRefClass("sortedNumeric", >> > fields = list( >> > fData = "numeric", >> > fIsUnsorted = "logical"), >> > methods = list( >> > initialize = function(Data = numeric(), isUnsorted = TRUE) >> > { >> > fData <<- Data >> > stopifnot(is.logical(isUnsorted), >> > length(isUnsorted)==1, >> > !is.na(isUnsorted)) >> > fIsUnsorted <<- isUnsorted >> > }, >> > getData = function() { >> > if (isUnsorted) { >> > fData <<- sort(fData) >> > fIsUnsorted <<- FALSE >> > } >> > fData >> > }, >> > appendData = function(newEntries) { >> > fData <<- c(fData, newEntries) >> > fIsUnsorted <<- TRUE >> > } >> > )) >> > >> > Use it as: >> > >> >> x <- SortedNumeric$new() >> >> x$appendData(c(4,2,5)) >> >> x$appendData(c(1,8,9)) >> >> x >> > Reference class object of class "sortedNumeric" >> > Field "fData": >> > [1] 4 2 5 1 8 9 >> > Field "fIsUnsorted": >> > [1] TRUE >> >> x$getData() >> > [1] 1 2 4 5 8 9 >> >> x >> > Reference class object of class "sortedNumeric" >> > Field "fData": >> > [1] 1 2 4 5 8 9 >> > Field "fIsUnsorted": >> > [1] FALSE >> > >> > >> > Outside of base R, I think the R6 package gives another approach to this. >> > >> > >> > Bill Dunlap >> > TIBCO Software >> > wdunlap tibco.com >> > >> > On Sun, Jun 5, 2016 at 6:53 AM, Neal H. Walfield >> > wrote: >> > >> >> Hi, >> >> >> >> I have a huge list. Normally it is sorted, but I want to be able to >> >> add elements to it without having to use any special interface
Re: [R] LaplacesDemon package installation
See at [***] below. On 04-Feb-2016 21:23:05 Rolf Turner wrote: > > (1) You might get better mileage asking this on the r-sig-mac list. > > (2) The phenomena you describe are puzzling and are beyond my capacity > to explain. Perhaps someone else will be able to enlighten you. > > (3) Out of idle curiosity I went to the github site and downloaded the > zip file of the package. (I could not see a *.tag.gz file, but perhaps > I just don't understand how github works. Actually, there's no > "perhaps" about it!) > > I unzipped the download and then did > > R CMD build LaplacesDemon-master > > in the appropriate directory. This created a file > > LaplacesDemon_15.03.19.tar.gz > > Note that the version number seems to be as you require. > > I then used your install.packages syntax, and the package installed > seamlessly. It also loaded seamlessly. > > So I don't know why the computer gods are picking on you. > [***] > Note that I am not working on a Mac, but rather running Linux (as do all > civilized human beings! :-) ) Might this be yet another candidate for a Fortune today? Ted. > cheers, > > Rolf Turner > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > On 05/02/16 06:42, lluis.hurt...@uv.es wrote: >> Dear all, >> >> I've recently changed my Mac and I am trying to reinstall my commonly >> used R-packages. I'm having troubles with a package called >> LaplacesDemon. >> >> This package is no more in the CRAN list and the developers web page >> (http://www.bayesian-inference.com/software) is out for more than >> half a year. Old versions of the package can still be found in tar.gz >> in >> >> https://cran.r-project.org/src/contrib/Archive/LaplacesDemon/ >> >> and in github >> >> https://github.com/ecbrown/LaplacesDemon >> >> Last version is LaplacesDemon_13.03.04.tar.gz, but I was able to get >> version LaplacesDemon_15.03.19.tar.gz time ago (can't find it >> anymore). >> >> I have tried to install this packages from source in my computer >> using >> >>> install.packages("/Users/.../LaplacesDemon_15.03.19.tar.gz", repos >>> = NULL, type="source") >> >> answer: >> >>> Warning: invalid package 'Users/.../LaplacesDemon_15.03.19.tar.gzâ >>> Error: ERROR: no packages specified Warning message: In >>> install.packages("Users/.../LaplacesDemon_15.03.19.tar.gz", : >> installation of package âUsers/.../LaplacesDemon_15.03.19.tar.gzâ had >> non-zero exit status >> >> I also tried the 'Packages & Data' menu in R, selecting the source >> file or the directory from Finder and I got this message: >> >>> * installing *source* package âLaplacesDemonâ ... ** R ** data ** >>> inst ** byte-compile and prepare package for lazy loading ** help >>> *** installing help indices ** building package indices ** >>> installing vignettes ** testing if installed package can be loaded >>> * DONE (LaplacesDemon) >> >> but >> >>> library(LaplacesDemon) >> Error in library(LaplacesDemon) : there is no package called >> âLaplacesDemonâ >> >> Finally I tried >> >>> install.packages("devtools") library(devtools) >>> install_github("ecbrown/LaplacesDemon") >> >> but I am not able to install devtools (for similar reasons). So my >> questions are: >> >> -Do anyone knows how to install this packages in my Mac? -Do anyone >> knows were can I find the information previously content in >> http://www.bayesian-inference.com/software? > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 04-Feb-2016 Time: 22:03:31 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R project and the TPP
Saludos José! Could you please give a summary of the relevant parts of TPP that might affect the use of R? I have looked up TPP on Wikipedia without beginning to understand what it might imply for the use of R. Best wishes, Ted. On 04-Feb-2016 14:43:29 José Bustos wrote: > Hi everyone, > > I have a question regarding the use R software under the new TPP laws > adopted by some governments in the region. Who know how this new agreements > will affect researchers and the R community? > > Hope some of you knows better and can give ideas about it. > > saludos, > José ----- E-Mail: (Ted Harding) Date: 04-Feb-2016 Time: 22:01:42 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Has R-help changed reply-to policy?
Steve, I'm inclined to suspect that something *has* changed at your end (or along the line between you and R-help). In replying to your message, selecting "include all recipients" (i.e. reply to all), the result was: To: S Ellison Cc: r-help@r-project.org just as it always has been! So no change that *I* can perceive at the R-help end. Hoping this is useful, Ted. On 04-Feb-2016 16:33:29 S Ellison wrote: > Apologies if I've missed a post, but have the default treatment of posts and > reply-to changed on R-Help of late? > > I ask because as of today, my email client now only lists the OP email when > replying to an R-help message, even with a reply-all, so the default reply is > not to the list. > I also noticed a few months back that I no longer see my own posts, other > than as a receipt notification, and tinkering with my account defaults > doesn't change that. > > I don't _think_ things have changed at my end ... > > Steve Ellison - E-Mail: (Ted Harding) Date: 04-Feb-2016 Time: 17:03:37 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help mailing list activity / R-not-help?
My feelings exactly! (And since quite some time ago). Ted. On 25-Jan-2016 12:23:16 Fowler, Mark wrote: > I'm glad to see the issue of negative feedback addressed. I can especially > relate to the 'cringe' feeling when reading some authoritarian backhand to a > new user. We do see a number of obviously inappropriate or overly lazy > postings, but I encounter far more postings where I don't feel competent to > judge their merit. It might be better to simply disregard a posting one does > not like for some reason. It might also be worthwhile to actively counter > negative feedback when we experience that 'cringing' moment. I'm not thinking > to foster contention, but simply to provide some tangible reassurance to new > users, and not just the ones invoking the negative feedback, that a > particular respondent may not represent the perspective of the list. > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael > Friendly > Sent: January 24, 2016 5:43 PM > To: Jean-Luc Dupouey; r-help@r-project.org > Subject: Re: [R] R-help mailing list activity / R-not-help? > > > On 1/23/2016 7:28 AM, Jean-Luc Dupouey wrote: >> Dear members, >> >> Not a technical question: > But one worth raising... >> >> The number of threads in this mailing list, following a long period of >> increase, has been regularly and strongly decreasing since 2010, >> passing from more than 40K threads to less than 11K threads last year. >> The trend is similar for most of the "ancient" mailing lists of the >> R-project. > [snip ...] >> >> I hope it is the wright place to ask this question. Thanks in advance, >> > > In addition to the other replies, there is another trend I've seen that has > actively worked to suppress discussion on R-help and move it elsewhere. The > general things: > - R-help was too unwieldy and so it was a good idea to hive-off specialized > topics to various sub lists, R-SIG-Mac, R-SIG-Geo, etc. > - Many people posted badly-formed questions to R-help, and so it was a good > idea to develop and refer to the posting guide to mitigate the number of > purely junk postings. > > > Yet, the trend I've seen is one of increasing **R-not-help**, in that there > are many posts, often by new R users who get replies that not infrequently > range from just mildly off-putting to actively hostile: > > - Is this homework? We don't do homework (sometimes false alarms, where the > OP has to reply to say it is not) > - Didn't you bother to do your homework, RTFM, or Google? > - This is off-topic because XXX (e.g., it is not strictly an R programming > question). > - You asked about doing XXX, but this is a stupid thing to want to do. > - Don't ask here; you need to talk to a statistical consultant. > > I find this sad in a public mailing list sent to all R-help subscribers and I > sometimes cringe when I read replies to people who were actually trying to > get help with some R-related problem, but expressed it badly, didn't know > exactly what to ask for, or how to format it, or somehow motivated a > frequent-replier to publicly dis the OP. > > On the other hand, I still see a spirit of great generosity among some people > who frequently reply to R-help, taking a possibly badly posed or > ill-formatted question, and going to some lengths to provide a a helpful > answer of some sort. I applaud those who take the time and effort to do > this. > > I use R in a number of my courses, and used to advise students to post to > R-help for general programming questions (not just homework) they couldn't > solve. I don't do this any more, because several of them reported a negative > experience. > > In contrast, in the Stackexchange model, there are numerous sublists > cross-classified by their tags. If I have a specific knitr, ggplot2, LaTeX, > or statistical modeling question, I'm now more likely to post it there, and > the worst that can happen is that no one "upvotes" it or someone (helpfully) > marks it as a duplicate of a similar question. > But comments there are not propagated to all subscribers, and those who reply > helpfully, can see their solutions accepted or not, or commented on in that > specific topic. > > Perhaps one solution would be to create a new "R-not-help" list where, as in > a Monty Python skit, people could be directed there to be insulted and all > these unhelpful replies could be sent. > > A milder alternative is to encourage some R-help subscribers to click the > "Don't send" or "Save" button and think bett
Re: [R] If else
[Apologies if the message below should arrive twice. When first sent there was apparently something wrong with the email address to r-help, and it was held for moderation because "Message has implicit destination" (whatever that means). I have made sure that this time the email address is correct.] John Fox has given a neat expression to achieve the desired result! I would like to comment, however, on the somewhat insistent criticism of Val's request from several people. It can make sense to have three "sex"es. Suppose, for example, that the data are records of street crime reported by victims. The victim may be able to identify the sex of the preprator as definitely "M", or definitely "F". One of the aims of the analysis is to investgate whether there is an association between the gender of the offender and the type of crime. But in some cases the victim may not have been able to recognise the offender's sex. Then it would have to go in the record as "NA" (or equivalent). There can be two kinds of reason why the victim was unable to recognise the sex. One kind is where the victim simply did not see the offender (e.g. their purse was stolen while they were concentrating on something else, and they only found out later). Another kind is where the offender deliberately disguises their gender, so that it cannot be determined from their appearance. This second kind could be associated with a particular category of crime (and I leave it to people's lurid imaginations to think of possible examples ... ). Then one indeed has three "sex"es: Male, Female, and Indeterminate, for each of which there is a potential assoctiation with type of crime. With most analyses, however, a category of "NA" would be ignored (at least by R). And then one has a variable which is a factor with 3 levels, all of which can (as above) be meaningful), and "NA" would not be ignored. Hoping this helps to clarify! (And, Val, does the above somehow correspond to your objectives). Best wishes to all, Ted. On 31-Oct-2015 17:41:02 Jeff Newmiller wrote: > Rolf gave you two ways. There are others. They all misrepresent the data > (there are only two sexes but you are effectively acting as if there are > three); hence the inquisition in hopes of diverting you to a more correct > method of analysis. However, this is not the support forum for whatever other > software you plan to proceed with so never mind. > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > On October 31, 2015 10:15:33 AM PDT, Val wrote: >>Hi Jeff, >> >>I thought I answered. Yes I was not clear about it. The further >>analysis >>will no be done by R. It is another software that will not accept a >>character response variable. >> >>Why R is so complicated to do that. If it is SAS then I can do it on >>one >>statement. . >> >> >>On Sat, Oct 31, 2015 at 11:39 AM, Jeff Newmiller >> >>wrote: >> >>> You haven't actually answered John's question as to the type of >>analysis >>> you plan to do. It still looks from here like you should be using >>factor >>> data rather than numeric, but since you are not being clear we cannot >>give >>> specifics as to how to proceed. >>> >>--- >>> Jeff NewmillerThe . . Go >>Live... >>> DCN:Basics: ##.#. ##.#. Live >>> Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/BatteriesO.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--- >>> Sent from my phone. Please excuse my brevity. >>> >>> On October 31, 2015 8:23:05 AM PDT, Val wrote: >>> >Hi All, >>> > >>> > >>> >Yes I need to change to numeric because I am preparing a data set >>> >for >>> >further analysis. The variable to be changed from character to
Re: [R] Subscription request
On 14-Oct-2015 15:19:06 Manish Sindagi wrote: > Hi, > > I have a few R programming related questions that i wanted to ask. > Can you please accept my subscription request. > > Regards, > Manish. Visit the R-help info web page: https://stat.ethz.ch/mailman/listinfo/r-help Towards the bottom of this page is a section "Subscribing to R-help". Follow the instructions in this section, and it should work! Best wishes, Ted. ----- E-Mail: (Ted Harding) Date: 14-Oct-2015 Time: 19:34:55 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta distribution approximate to Normal distribution
Using non-central chi-squared (especially with df=1) is unlikely to generate random numbers anywhere near a Normal distribution (see below). And "rchisq(100, df=1, ncp=u/a)" won't work anyway with u<0, since ncp must be >= 0 (if < 0 then all are NA). Better to shoot straight for the target (truncated Normal), though several shots are likely to be required! For example (code which spells it out), taking u=3 and a=2: n <- 100 u <- 3 ; a <- 2 x <- NULL N <- length(x) while(N < n){ x <- c(x,rnorm(n,mean=u,sd=a)) x <- x[x>0] N <- length(x) } x <- x[1:n] Comparison with non-central chi-squared: y <- rchisq(100, df=1, ncp=u/a) hist(x) hist(y) On 15-Sep-2015 13:26:44 jlu...@ria.buffalo.edu wrote: > Your question makes no sense as stated. However, guessing at what you > want, you should perhaps consider the non-central chi-square density with > 1 df and ncp = u/a, i.e, > > rchisq(100, df=1, ncp=u/a) > > Joe > Joseph F. Lucke, PhD > Senior Statistician > Research Institute on Addictions > University at Buffalo > State University of New York > 1021 Main Street > Buffalo, NY 14203-1016 > > Chien-Pang Chin > Sent by: "R-help" > 09/15/2015 06:58 AM > > To > "r-help@r-project.org" , > > Subject > [R] Beta distribution approximate to Normal distribution > > Hi, > I need to generate 1000 numbers from N(u, a^2), however I don't > want to include 0 and negative values. How can I use beta distribution > approximate to N(u, a^2) in R. > > Thx for help - E-Mail: (Ted Harding) Date: 15-Sep-2015 Time: 16:12:35 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variance is different in R vs. Excel?
[See at end] On 09-Feb-2015 21:45:11 David L Carlson wrote: > Time for a new version of Excel? I cannot duplicate your results in Excel > 2013. > > R: >> apply(dat, 2, var) > [1] 21290.80 24748.75 > > Excel 2013: > =VAR.S(A2:A21) =VAR.S(B2:B21) > 21290.8 24748.74737 > > - > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Fetter > Sent: Monday, February 9, 2015 3:33 PM > To: r-help@r-project.org > Subject: [R] Variance is different in R vs. Excel? > > Hello everyone, I have a simple question. when I use the var() function in > R to find a variance, it differs greatly from the variance found in excel > using the =VAR.S function. Any explanations on what those two functions are > actually doing? > > Here is the data and the results: > > dat<-matrix(c(402,908,553,522,627,1040,756,679,806,711,713,734,683,790,597,872 > ,476,1026,423,476,419,591,376,640,550,601,588,499,646,693,351,730,632,707,779, > 838,814,771,533,818), > nrow=20, ncol=2, byrow=T) > > var(dat[,1]) >#21290.8 > > var(dat[,2]) >#24748.75 > >#in Excel, the variance of dat[,1] = 44763.91; for dat[,2] = 52034.2 > > Thanks, > Karl I suspect that something has happened to the reading-in of the data into Excel. (I don't know much about Excel, and that's because I don't want to ... ). The ratio of the variances of the two datasets in R is: var(dat[,2])/var(dat[,1]) # [1] 1.162415 while the ratio of th results from Excel is: 52034.2/44763.91 # [1] 1.162414 so they are almost identical. So it is as if Excel was evaluating the variances for data which are sqrt(44763.91/var(dat[,1])) # [1] 1.45 sqrt(52034.2/var(dat[,2])) # [1] 1.44 times the data used by R. So maybe there's a "nasty" lurking somewhere in the spreadsheet? (Excel is notorious for planting things invisibly in its spreadsheets which lead to messed-up results for no apparent reasion ... ). Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 09-Feb-2015 Time: 22:15:44 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the less-than-minus gotcha
On 02-Feb-2015 11:58:10 Rolf Turner wrote: > > On 02/02/15 14:26, Steve Taylor wrote: > >> All the more reason to use = instead of <- > > Couldn't agree less, Steve. The "<-" should be used for assignment. The > "=" sign should be reserved for handling function arguments in the > "name=value" form. Doing anything else invites confusion and > occasionally chaos. > > Lots of examples have been given in the past involving syntax of the > form foo(x = y) and foo(x <- y). > > cheers, > Rolf > -- > Rolf Turner As well as agreering with Rolf, it should also be said that Steve's advice "All the more reason to use = instead of <-" is not applicable in this context, which was: which( frame$var>4 ) # no problem which( frame$var<-4 ) # huge problem: frame$var is destroyed There is no place for an "=" here! What does not seems to have been mentioned so far is that this kind of thing can be safely wrapped in parentheses: which( frame$var>4 )# no problem oper which( frame$var<(-4) ) # [again no problem] Personally, I'm prone to using parentheses if I have any feeling that a "gotcha" may be lurking -- not only in the distinction between "var<-4" and "var< -4", but also to be confident that, for instance, my intentions are not being over-ridden by operator precedence rules. Some people object to code "clutter" from parentheses that could be more simply replaced (e.g. "var< -4" instead of "var<(-4)"), but parentheses ensure that it's right and also make it clear when one reads it. Best wishes to all, Ted. - E-Mail: (Ted Harding) Date: 02-Feb-2015 Time: 12:43:51 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate the median age interval
Sorry, a typo in my reply below. See at "###". On 12-Jan-2015 11:12:43 Ted Harding wrote: > On 12-Jan-2015 10:32:41 Erik B Svensson wrote: >> Hello >> I've got a problem I don't know how to solve. I have got a dataset that >> contains age intervals (age groups) of people and the number of persons in >> each age group each year (y1994-y1996). The number of persons varies each >> year. I only have access to the age intervals, not the age of each person, >> which would make things easier. >> >> I want to know the median age interval (not the median number) for each >> year. Let's say that in y1994 23 corresponds to the median age interval >> "45-54", I want to "45-54" as a result. How is that done? >> >> This is the sample dataset: >> > agegrp <- > c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74", > "75-84","84-") > y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) > y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) > y1996 <- c(1,3,1,4,22,37,41,24,24,26,8) > >> I look forward to your response >> >> Best regards, >> Erik Svensson > > In principle, this is straightforward. But in ##practice you may > need to be careful about how to deal with borderline cases -- and > about what you mean by "median age interval". > The underlying idea is based on: > > cumsum(y1994)/sum(y1994) > # [1] 0. 0.02183406 0.05240175 0.09170306 0.20087336 > # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1. > > Thus age intervals 1-7 ("<1" - "45-64") contain less that 50% > (0.49344978...), though "45-64" almost gets there. However, > age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence > the median age is within "49-64". ### Should be: age groups 1-8 ("<1" - 55-64") contain more than 50%. Hence the median age is within "55-64". > Implementing the above as a procedure: > > agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)] > # [1] "55-64" > > Note that the "obvious solution": > > agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))] > # [1] "45-54" > > gives an incorrect answer, since with these data it returns a group > whose maximum age is below the median. This is because the "<=" is > satisfied by "<" also. > > Hoping this helps! > Ted. > > - > E-Mail: (Ted Harding) > Date: 12-Jan-2015 Time: 11:12:39 > This message was sent by XFMail > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 12-Jan-2015 Time: 11:21:11 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate the median age interval
On 12-Jan-2015 10:32:41 Erik B Svensson wrote: > Hello > I've got a problem I don't know how to solve. I have got a dataset that > contains age intervals (age groups) of people and the number of persons in > each age group each year (y1994-y1996). The number of persons varies each > year. I only have access to the age intervals, not the age of each person, > which would make things easier. > > I want to know the median age interval (not the median number) for each > year. Let's say that in y1994 23 corresponds to the median age interval > "45-54", I want to "45-54" as a result. How is that done? > > This is the sample dataset: > agegrp <- c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74", "75-84","84-") y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) y1996 <- c(1,3,1,4,22,37,41,24,24,26,8) > I look forward to your response > > Best regards, > Erik Svensson In principle, this is straightforward. But in practice you may need to be careful about how to deal with borderline cases -- and about what you mean by "median age interval". The underlying idea is based on: cumsum(y1994)/sum(y1994) # [1] 0. 0.02183406 0.05240175 0.09170306 0.20087336 # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1. Thus age intervals 1-7 ("<1" - "45-64") contain less that 50% (0.49344978...), though "45-64" almost gets there. However, age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence the median age is within "49-64". Implementing the above as a procedure: agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)] # [1] "55-64" Note that the "obvious solution": agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))] # [1] "45-54" gives an incorrect answer, since with these data it returns a group whose maximum age is below the median. This is because the "<=" is satisfied by "<" also. Hoping this helps! Ted. - E-Mail: (Ted Harding) Date: 12-Jan-2015 Time: 11:12:39 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff question
I should have added an extra line to the code below, to complete the picture. Here it is (see below line "##". Ted. On 11-Jan-2015 08:48:06 Ted Harding wrote: > Troels, this is due to the usual tiny difference between numbers > as computed by R and the numbers that you think they are! tt <- seq(0,20,by=0.02) dtt <- diff(tt) length(dtt) # [1] 1000 r02 <- rep(0.02,1000) unique(r02 - dtt) # [1] 0.00e+00 3.469447e-18 -3.469447e-18 1.040834e-17 # [5] -1.734723e-17 3.816392e-17 9.367507e-17 2.046974e-16 # [9] 4.267420e-16 -4.614364e-16 -1.349615e-15 -3.125972e-15 ## sum(dtt != 0.02) # [1] 998 So only 2 values among the 1000 in diff(tt) are exactly equal to [R's representation of] 0.2! > Hoping this helps! > Ted. > > On 11-Jan-2015 08:29:26 Troels Ring wrote: >> R version 3.1.1 (2014-07-10) -- "Sock it to Me" >> Copyright (C) 2014 The R Foundation for Statistical Computing >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> Dear friends - I have a small problem with diff (I guess) >> I made a sequence with fixed interval between consecutive elements - and >> hence thought the diff would be as specified >> but had a vector with apparently identical 12 elements returned from diff >> tt <- seq(0,20,by=0.02) >> unique(diff(tt)) #[1] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 >> 0.02 0.02 >> Trying to see if these elements in diff were duplicated >> duplicated(diff(tt)) >>#[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE and from >> sum(duplicated(diff(tt))) >> [1] 988 >> saw that 12 of the elements in duplicated(diff(tt)) were FALSE. Would it >> be expected that the first was FALSE and the rest TRUE? >>|duplicated()|determines which elements of a vector or data frame are >> duplicates of elements with smaller subscripts, and returns a logical >> vector indicating which elements (rows) are duplicates. >> >> All best wishes >> Troels >> Aalborg, Denmark >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > - > E-Mail: (Ted Harding) > Date: 11-Jan-2015 Time: 08:48:03 > This message was sent by XFMail > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 11-Jan-2015 Time: 11:41:27 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff question
Troels, this is due to the usual tiny difference between numbers as computed by R and the numbers that you think they are! tt <- seq(0,20,by=0.02) dtt <- diff(tt) length(dtt) # [1] 1000 r02 <- rep(0.02,1000) unique(r02 - dtt) # [1] 0.00e+00 3.469447e-18 -3.469447e-18 1.040834e-17 # [5] -1.734723e-17 3.816392e-17 9.367507e-17 2.046974e-16 # [9] 4.267420e-16 -4.614364e-16 -1.349615e-15 -3.125972e-15 Hoping this helps! Ted. On 11-Jan-2015 08:29:26 Troels Ring wrote: > R version 3.1.1 (2014-07-10) -- "Sock it to Me" > Copyright (C) 2014 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Dear friends - I have a small problem with diff (I guess) > I made a sequence with fixed interval between consecutive elements - and > hence thought the diff would be as specified > but had a vector with apparently identical 12 elements returned from diff > tt <- seq(0,20,by=0.02) > unique(diff(tt)) #[1] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 > 0.02 0.02 > Trying to see if these elements in diff were duplicated > duplicated(diff(tt)) >#[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE and from > sum(duplicated(diff(tt))) > [1] 988 > saw that 12 of the elements in duplicated(diff(tt)) were FALSE. Would it > be expected that the first was FALSE and the rest TRUE? >|duplicated()|determines which elements of a vector or data frame are > duplicates of elements with smaller subscripts, and returns a logical > vector indicating which elements (rows) are duplicates. > > All best wishes > Troels > Aalborg, Denmark > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 11-Jan-2015 Time: 08:48:03 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding down with as.integer
I've been followeing this little tour round the murkier bistros in the back-streets of R with interest! Then it occurred to me: What is wrong with [using example data]: x0 <- c(0,1,2,0.325,1.12,1.9,1.003) x1 <- as.integer(as.character(1000*x0)) n1 <- c(0,1000,2000,325,1120,1900,1003) x1 - n1 ## [1] 0 0 0 0 0 0 0 ## But, of course: 1000*x0 - n1 ## [1] 0.00e+00 0.00e+00 0.00e+00 0.00e+00 ## [5] 0.00e+00 0.00e+00 -1.136868e-13 Or am I missing somthing else in what Mike Miller is seeking to do? Ted. On 01-Jan-2015 19:58:02 Mike Miller wrote: > I'd have to say thanks, but no thanks, to that one! ;-) The problem is > that it will take a long time and it will give the same answer. > > The first time I did this kind of thing, a year or two ago, I manipulated > the text data to produce integers before putting the data into R. The > data were a little different -- already zero padded with three digits to > the right of the decimal and one to the left, so all I had to do was drop > the decimal point. The as.integer(1000*x+.5) method is very fast and it > works great. > > I could have done that this time, but I was also saving to other formats, > so I had the data already in the format I described. > > Mike > > > On Thu, 1 Jan 2015, Richard M. Heiberger wrote: > >> Interesting. Following someone on this list today the goal is input >> the data correctly. >> My inclination would be to read the file as text, pad each number to >> the right, drop the decimal point, >> and then read it as an integer. >> 0 1 2 0.325 1.12 1.9 >> 0.000 1.000 2.000 0.325 1.120 1.900 >> 1000 2000 0325 1120 1900 >> >> The pad step is the interesting step. >> >> ## 0 1 2 0.325 1.12 1.9 >> ## 0.000 1.000 2.000 0.325 1.120 1.900 >> ## 1000 2000 0325 1120 1900 >> >> x.in <- scan(text=" >> 0 1 2 0.325 1.12 1.9 1. >> ", what="") >> >> padding <- c(".000", "000", "00", "0", "") >> >> x.pad <- paste(x.in, padding[nchar(x.in)], sep="") >> >> x.nodot <- sub(".", "", x.pad, fixed=TRUE) >> >> x <- as.integer(x.nodot) >> >> >> Rich >> >> >> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller wrote: >>> On Thu, 1 Jan 2015, Duncan Murdoch wrote: >>> >>>> On 31/12/2014 8:44 PM, David Winsemius wrote: >>>>> >>>>> >>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote: >>>>> >>>>>> This is probably a FAQ, and I don't really have a question about it, but >>>>>> I just ran across this in something I was working on: >>>>>> >>>>>>> as.integer(1000*1.003) >>>>>> >>>>>> [1] 1002 >>>>>> >>>>>> I didn't expect it, but maybe I should have. I guess it's about the >>>>>> machine precision added to the fact that as.integer always rounds down: >>>>>> >>>>>> >>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps) >>>>>> >>>>>> [1] 1002 >>>>>> >>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps) >>>>>> >>>>>> [1] 1003 >>>>>> >>>>>> >>>>>> This does it right... >>>>>> >>>>>>> as.integer( round( 1000*1.003 ) ) >>>>>> >>>>>> [1] 1003 >>>>>> >>>>>> ...but this seems to always give the same answer and it is a little >>>>>> faster in my application: >>>>>> >>>>>>> as.integer( 1000*1.003 + .1 ) >>>>>> >>>>>> [1] 1003 >>>>>> >>>>>> >>>>>> FYI - I'm reading in a long vector of numbers from a text file with no >>>>>> more than three digits to the right of the decimal. I'm converting them >>>>>> to >>>>>> integers and saving them in binary format. >>>>>> >>>>> >>>>> So just add 0.0001 or even .001 to all of them and coerce to integer. >>>> >>>> >>>> I don't think the original problem was stated clearly, so I'm not sure >>>> whether this is a solution, but it looks wrong to me. If you want to >>>> round >>>
Re: [R] Cox model -missing data.
Yes, your basic reasoning is correct. In general, the observed variables carry information about the variables with missing values, so (in some way) the missing values can be replaced with estimates ("imputations") and the standard regression method will then work as though the replacements were there is the first place. To incorporate the inevitable uncertainty about what the missing values really were, one approach ("multiple imputation") is to do the replacement many times over, sampling the replacement values from a posterior distribution estimated from the non-missing data. There are other approaches. This is where the "many questions" kick in! I don't have time at the moment, to go into further detail (there's a lot of it, and several R packages which deal with missing data in different ways), but I hope that someone can meanwhile point you in the right direction. With best wishes, Ted. On 19-Dec-2014 11:17:27 aoife doherty wrote: > Many thanks, I appreciate the response. > > When I convert the missing values to NA and run the cox model as described > in previous post, the cox model seems to remove all of the rows with a > missing value (as the number of rows "n" in the cox output after I > completely remove any row with missing data is the same as the number of > rows "n" in the cox output after I change the missing values to NA). > > What I had been hoping to do is not completely remove a row with missing > data for a co-variable, but rather somehow censor or estimate a value for > the missing value? > > In reality, I have ~600 people with survival data and say 6 variables > attached to them. After I incorporate a 7th variable (for which the > information isn't available for every individual), I have 400 people left. > Since I still have survival data and almost all of the information for the > other 200 people (the only thing missing is information about that 7th > variable), it seems a waste to remove all of the survival data for 200 > people over one co-variate. So I was hoping instead of completely removing > the rows, to just somehow acknowledge that the data for this particular > co-variate is missing in the model but not completely remove the row? This > is more what I was hoping someone would know if it's possible to > incorporate into the model I described above? > > Thanks > > > > On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding > wrote: >> >> Hi Aoife, >> I think that if you simply replace each "*" in the data file >> with "NA", then it should work ("NA" is usually interpreted >> as "missing" for those functions for which missingness is >> relevant). How you subsequently deal with records which have >> missing values is another question (or many questions ... ). >> >> So your data should look like: >> >> V1 V2 V3 Survival Event >> ann 13 WTHomo 41 >> ben 20 NA 51 >> tom 40 Variant 61 >> >> Hoping this helps, >> Ted. >> >> On 19-Dec-2014 10:12:00 aoife doherty wrote: >> > Hi all, >> > >> > I have a data set like this: >> > >> > Test.cox file: >> > >> > V1V2 V3 Survival Event >> > ann 13 WTHomo 41 >> > ben 20 *51 >> > tom 40 Variant 61 >> > >> > >> > where "*" indicates that I don't know what the value is for V3 for Ben. >> > >> > I've set up a Cox model to run like this: >> > >> >#!/usr/bin/Rscript >> > library(bdsmatrix) >> > library(kinship2) >> > library(survival) >> > library(coxme) >> > death.dat <- read.table("Test.cox",header=T) >> > deathdat.kmat <-2*with(death.dat,makekinship(famid,ID,faid,moid)) >> > sink("Test.cox.R.Output") >> > Model <- coxme(Surv(Survival,Event)~ strata(factor(V1)) + >> > strata(factor(V2)) + factor(V3)) + >> > (1|ID),data=death.dat,varlist=deathdat.kmat) >> > Model >> > sink() >> > >> > >> > >> > As you can see from the Test.cox file, I have a missing value "*". How >> and >> > where do I tell the R script "treat * as a missing variable". If I can't >> > incorporate missing values into the model, I assume the alternativ
Re: [R] Cox model -missing data.
Hi Aoife, I think that if you simply replace each "*" in the data file with "NA", then it should work ("NA" is usually interpreted as "missing" for those functions for which missingness is relevant). How you subsequently deal with records which have missing values is another question (or many questions ... ). So your data should look like: V1 V2 V3 Survival Event ann 13 WTHomo 41 ben 20 NA 51 tom 40 Variant 6 1 Hoping this helps, Ted. On 19-Dec-2014 10:12:00 aoife doherty wrote: > Hi all, > > I have a data set like this: > > Test.cox file: > > V1V2 V3 Survival Event > ann 13 WTHomo 41 > ben 20 *51 > tom 40 Variant 61 > > > where "*" indicates that I don't know what the value is for V3 for Ben. > > I've set up a Cox model to run like this: > >#!/usr/bin/Rscript > library(bdsmatrix) > library(kinship2) > library(survival) > library(coxme) > death.dat <- read.table("Test.cox",header=T) > deathdat.kmat <-2*with(death.dat,makekinship(famid,ID,faid,moid)) > sink("Test.cox.R.Output") > Model <- coxme(Surv(Survival,Event)~ strata(factor(V1)) + > strata(factor(V2)) + factor(V3)) + > (1|ID),data=death.dat,varlist=deathdat.kmat) > Model > sink() > > > > As you can see from the Test.cox file, I have a missing value "*". How and > where do I tell the R script "treat * as a missing variable". If I can't > incorporate missing values into the model, I assume the alternative is to > remove all of the rows with missing data, which will greatly reduce my data > set, as most rows have at least one missing variable. > > Thanks > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 19-Dec-2014 Time: 10:21:23 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing/Generating/Outputting a Table (Not Latex)
The program 'gv' is installed on just about any linux system. It has many available options (one, which might be useful, being "-watch", whose effect is that if the file being displayed is changed, e.g. by being over-written by a new file with the same name, then 'gv' automatically updates what it is displaying). And of course many linux users install 'acroread' (Acrobat Reader), though some object! Hoping this helps, Ted. On 09-Dec-2014 20:47:06 Richard M. Heiberger wrote: > the last one is wrong. That is the one for which I don't know the > right answer on linux. > > 'xdvi' displays dvi files. you need to display a pdf file. > whatever is the right program on linux to display pdf files is what > belongs there. > > On Macintosh we can avoid knowing by using 'open', which means use the > system standard. > I don't know what the linux equivalent is, either the exact program or > the instruction to use the standard. > > On Tue, Dec 9, 2014 at 3:36 PM, Kate Ignatius > wrote: >> I set these options: >> >> options(latexcmd='pdflatex') >> options(dviExtension='pdf') >> options(xdvicmd='xdvi') >> >> Maybe one too many? I'm running in Linux. >> >> >> >> On Tue, Dec 9, 2014 at 3:24 PM, Richard M. Heiberger wrote: >>> It looks like you skipped the step of setting the options. >>> the latex function doesn't do pdflatex (by default it does regular >>> latex) unless you set the options >>> as I indicated. >>> >>> On Tue, Dec 9, 2014 at 3:11 PM, Kate Ignatius >>> wrote: >>>> Ah yes, you're right. >>>> >>>> The log has this error: >>>> >>>> ! LaTeX Error: Missing \begin{document}. >>>> >>>> Though can't really find much online on how to resolve it. >>>> >>>> On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller >>>> wrote: >>>>> pdflatex appears to have run, because it exited. You should look at the >>>>> tex log file, the problem is more likely that the latex you sent out to >>>>> pdflatex was incomplete. >>>>> -- >>>>> - >>>>> Jeff NewmillerThe . . Go >>>>> Live... >>>>> DCN:Basics: ##.#. ##.#. Live >>>>> Go... >>>>> Live: OO#.. Dead: OO#.. Playing >>>>> Research Engineer (Solar/BatteriesO.O#. #.O#. with >>>>> /Software/Embedded Controllers) .OO#. .OO#. >>>>> rocks...1k >>>>> -- >>>>> - >>>>> Sent from my phone. Please excuse my brevity. >>>>> >>>>> On December 9, 2014 8:43:02 AM PST, Kate Ignatius >>>>> wrote: >>>>>>Thanks! I do get several errors though when running on Linux. >>>>>> >>>>>>Running your code, I get this: >>>>>> >>>>>>Error in system(cmd, intern = TRUE, wait = TRUE) : >>>>>>error in running command >>>>>> >>>>>>Fiddling around with the code and running this: >>>>>> >>>>>>tmp <- matrix(1:9,3,3) >>>>>>tmp.tex <- latex(tmp, file='tmp.tex') >>>>>>print.default(tmp.tex) >>>>>>tmp.dvi <- dvi(tmp.tex) >>>>>>tmp.dvi >>>>>>tmp.tex >>>>>>dvips(tmp.dvi) >>>>>>dvips(tmp.tex) >>>>>>library(tools) >>>>>>texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE) >>>>>> >>>>>>I get this: >>>>>> >>>>>>Error in texi2dvi(file="tmp.tex",, : >>>>>> Running 'texi2dvi' on 'tmp.tex' failed. >>>>>>Messages: >>>>>>/usr/bin/texi2dvi: pdflatex exited with bad status, quitting. >>>>>> >>>>>>I've read that it may have something to do with the path of pdflatex. >>>>>> >>>>>>Sys.which('pdflatex') >>>>>> >>>>>> pdflatex >>>>>> >>>>>>"/usr/bin/pdflatex" >>>>>> >>
Re: [R] Inverse Student t-value
And, with 1221 degrees of freedom, one cannot be far off a Normal distribution. So: abs(qnorm(0.408831/2)) [1] 4.102431 which is nearer to Duncan's answer (4.117, and a bit smaller, as it should be) than to yours (4.0891672, which, if accurate, should be greater than the Normal value 4.102431). Ted. On 30-Sep-2014 18:20:39 Duncan Murdoch wrote: > On 30/09/2014 2:11 PM, Andre wrote: >> Hi Duncan, >> >> No, that's correct. Actually, I have data set below; > > Then it seems Excel is worse than I would have expected. I confirmed > R's value in two other pieces of software, > OpenOffice and some software I wrote a long time ago based on an > algorithm published in 1977 in Applied Statistics. (They are probably > all using the same algorithm. I wonder what Excel is doing?) > >> N= 1223 >> alpha= 0.05 >> >> Then >> probability= 0.05/1223=0.408831 >> degree of freedom= 1223-2= 1221 >> >> So, TINV(0.408831,1221) returns 4.0891672 >> >> >> Could you show me more detail a manual equation. I really appreciate >> it if you may give more detail. > > I already gave you the expression: abs(qt(0.408831/2, df=1221)). > For more detail, I suppose you could look at the help page for the qt > function, using help("qt"). > > Duncan Murdoch > >> >> Cheers! >> >> >> On Wed, Oct 1, 2014 at 1:01 AM, Duncan Murdoch >> mailto:murdoch.dun...@gmail.com>> wrote: >> >> On 30/09/2014 1:31 PM, Andre wrote: >> >> Dear Sir/Madam, >> >> I am trying to use calculation for two-tailed inverse of the >> student`s >> t-distribution function presented by Excel functions like >> =TINV(probability, deg_freedom). >> >> For instance: The Excel function =TINV(0.408831,1221) = >> returns >> 4.0891672. >> >> Would you like to show me a manual calculation for this? >> >> Appreciate your helps in advance. >> >> >> That number looks pretty far off the true value. Have you got a >> typo in your example? >> >> You can compute the answer to your question as >> abs(qt(0.408831/2, df=1221)), but you'll get 4.117. >> >> Duncan Murdoch >> >> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 30-Sep-2014 Time: 19:35:45 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table/ifelse conditional new variable question
On 17-Aug-2014 03:50:33 John McKown wrote: > On Sat, Aug 16, 2014 at 9:02 PM, Kate Ignatius > wrote: > >> Actually - your code is not wrong... because this is a large file I >> went through the file to see if there was anything wrong with it - >> looks like there are two fathers or three mothers in some families. >> Taking these duplicates out fixed the problem. >> >> Sorry about the confusion! And thanks so much for your help! >> >> > Kate, > I hope you don't mind, but I have a curiosity question on my part. > Were the families with multiple fathers or mothers a mistake, just > duplicates (same Family.ID & Sample.ID), or more like an "intermixed" > family due to divorce and remarriage. Or even, like in some countries, > a case of polygamy? Sorry, I just get curious about the strangest > things sometimes. > -- > There is nothing more pleasant than traveling and meeting new people! > Genghis Khan > > Maranatha! <>< > John McKown When Kate first posted her query, similar thoughts to John's occurred to me. The potential for convoluted ancestry and kinship is enormous! For perhaps (or perhaps not) ultimate convolution, try reconstructing a canine pedigree from a breeding register of thoroughbreds, where again the primary data is for each individual is * ID of individual * ID of litter the individual was born in ("family") * ID of male parent * ID of female parent (as, for instance, registered with the UK Kennel Club). Similar convolutions can be found with race-horses. But even humans can compete. Here is a little challenge for anyone who has an R program that will work out a pedigree from data such as described above. I have used Kate's notation. Individuals are numbered from 1 up (with a gap): Sample.ID; Families from 101 up: Family.ID. Relationships are "sibling", "father", "mother". ID for father/mother may be "NA" (data not given). Family.ID Sample.ID Relationship 101 01sibling 101 02father 101 03mother 102 02sibling 102 04father 102 05mother 103 03sibling 103 06father 103 07mother 104 04sibling 104 08father 104 09mother 104 05sibling 104 08father 104 09mother 104 06sibling 104 08father 104 09mother 104 15sibling 104 08father 104 09mother 105 07sibling 105 04father 105 15mother 106 08sibling 106 16father 106 17mother 106 18sibling 106 16father 106 17mother 106 19sibling 106 16father 106 17mother 107 09sibling 107 18father 107 19mother 108 16sibling 108 NAfather 108 NAmother 109 17sibling 109 NAfather 109 NAmother That's the data. Now a little quiz question: Can you guess the identity of the person with sample.ID = 01 ? Best wishes to all, Ted. - E-Mail: (Ted Harding) Date: 17-Aug-2014 Time: 19:41:38 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A basic statistics question
On 12-Aug-2014 22:22:13 Ted Harding wrote: > On 12-Aug-2014 21:41:52 Rolf Turner wrote: >> On 13/08/14 07:57, Ron Michael wrote: >>> Hi, >>> >>> I would need to get a clarification on a quite fundamental statistics >>> property, hope expeRts here would not mind if I post that here. >>> >>> I leant that variance-covariance matrix of the standardized data is >>> equal to the correlation matrix for the unstandardized data. So I >>> used following data. >> >> >> >>> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] >>> >>> Point is that I am not getting exact CORR matrix. Can somebody point >>> me what I am missing here? >> >> You are using a denominator of "n" in calculating your "covariance" >> matrix for your normalized data. But these data were normalized using >> the sd() function which (correctly) uses a denominator of n-1 so as to >> obtain an unbiased estimator of the population standard deviation. >> >> If you calculated >> >> (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) >> >> then you would get the same result as you get from cor(Data) (to within >> about 1e-15). >> >> cheers, >> Rolf Turner > > One could argue about "(correctly)"! > >>From the "descriptive statistics" point of view, if one is given a single > number x, then this dataset has no variation, so one could say that > sd(x) = 0. And this is what one would get with a denominator of "n". > > But if the single value x is viewed as sampled from a distribution > (with positive dispersion), then the value of x gives no information > about the SD of the distribution. If you use denominator (n-1) then > sd(x) = NA, i.e. is indeterminate (as it should be in this application). > > The important thing when using pre-programmed functions is to know > which is being used. R uses (n-1), and this can be found from > looking at > > ?sd > > or (with more detail) at > > ?cor > > Ron had assumed that the denominator was n, apparently not being aware > that R uses (n-1). > > Just a few thoughts ... > Ted. > - > E-Mail: (Ted Harding) > Date: 12-Aug-2014 Time: 23:22:09 After some hesitation (not wanting to prolong a thread whose original query has been effectively settled), nevertheless I think it may be worth stating the general picture, for the sake of users who might be confused or misled regarding the use of the functions var(), cov(), cor(), sd() etc. The distinction to become aware of is between "summary" statistics and statistics which will be used for estimation/inference. Given a set of numbers, x[1], x[2], ... , x[N], they have a mean MEAN = sum(x)/N and their variance VAR is the mean of the deviations between the x[i] and MEAN: VAR = (sum(x - MEAN)^2)/N If a random value X is drawn from {x[1], x[2], ... , x[N]}, with uniform probability, then the expectation of X is again E(X) = MEAN and the variance of X is again Var(X) = E((X - MEAN)^2) = VAR with MEAN and VAR as given above. And the R function mean(x) will return MEAN is its value. However, the R function var(x) will not return VAR -- it will return (N/(N-1))*VAR The above definitions of MEAN and VAR use division by N, i.e. "denominator = N". But the R functions var(x), sd(x) etc. divide by (N-1), i.e. use "denominator = N-1", as explained in ?var ?sd etc. The basic reason for this is that, given a random sample X[1], ... ,X[n] of size n from {x[1], x[2], ... , x[N]}, the expectation of sum((X - mean(X))^2) is (n-1)*VAR, so to obtain an unbiased estimate of VAR from the X sample one must use the "bias-corrected" sample variance var(X) = sum((X - mean(X))^2)/(n-1) i.e. "denominator = (n-1)" as described in the help pages. So the function var(), with denominator = (n-1), is "correct" for obtaining an unbiased estimator. But it will not be correct for the variance sum((X - mean(X))^2)/n of the numbers X[1], ... ,X[n]. Since sd() also uses denominator = (n-1), sd(X) is the square root of var(X). But, while var(X) is unbiased for VAR, sd(X) is not unbiased for SD = sqrt(VAR), since, for a positive ransom variable Y, in general E(sqrt(Y)) < sqrt(E(Y)) i.e. E(sd(X)) = E(sqrt(var(X))) < sqrt(E(var(X))) = sqrt(VAR) = SD. The R functions var() etc., which use denominator = (n-1), do not have an option which allows the user to choose denominator = n. Therefore, in particular, a user who has a set of numbers {x[1], x[2], ... , x[N]} (e.g. a record of a popuation) and wants the SD of these (e.g. for use in summary stat
Re: [R] A basic statistics question
On 12-Aug-2014 21:41:52 Rolf Turner wrote: > On 13/08/14 07:57, Ron Michael wrote: >> Hi, >> >> I would need to get a clarification on a quite fundamental statistics >> property, hope expeRts here would not mind if I post that here. >> >> I leant that variance-covariance matrix of the standardized data is equal to >> the correlation matrix for the unstandardized data. So I used following >> data. > > > >> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] >> >> Point is that I am not getting exact CORR matrix. Can somebody point >> me what I am missing here? > > You are using a denominator of "n" in calculating your "covariance" > matrix for your normalized data. But these data were normalized using > the sd() function which (correctly) uses a denominator of n-1 so as to > obtain an unbiased estimator of the population standard deviation. > > If you calculated > > (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) > > then you would get the same result as you get from cor(Data) (to within > about 1e-15). > > cheers, > Rolf Turner One could argue about "(correctly)"! >From the "descriptive statistics" point of view, if one is given a single number x, then this dataset has no variation, so one could say that sd(x) = 0. And this is what one would get with a denominator of "n". But if the single value x is viewed as sampled from a distribution (with positive dispersion), then the value of x gives no information about the SD of the distribution. If you use denominator (n-1) then sd(x) = NA, i.e. is indeterminate (as it should be in this application). The important thing when using pre-programmed functions is to know which is being used. R uses (n-1), and this can be found from looking at ?sd or (with more detail) at ?cor Ron had assumed that the denominator was n, apparently not being aware that R uses (n-1). Just a few thoughts ... Ted. - E-Mail: (Ted Harding) Date: 12-Aug-2014 Time: 23:22:09 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A basic statistics question
On 12-Aug-2014 19:57:29 Ron Michael wrote: > Hi, > > I would need to get a clarification on a quite fundamental statistics > property, hope expeRts here would not mind if I post that here. > > I leant that variance-covariance matrix of the standardized data is equal to > the correlation matrix for the unstandardized data. So I used following data. > > Data <- structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L, 7L, > 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L, 0L, 10L, 10L, > 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L, 0L, 13L, 13L, 10L, 7L, > 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L, 10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, > 7L, 7L, 8L, 7L, 8L, 6L, 6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, > 12L, 8L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L, > 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 4L, 4L, > 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L, 5L, 4L, 6L, 6L, 6L, 7L, > 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L, 6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, > 6L, 6L, 5L, 5L, 6L, 6L, 4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, > 4L, 6L, 6L, 6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, > 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L, 9L, 10L, 8L, > 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L, > 13L, 10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L, 3L, 7L, > 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L, 10L, 8L, 8L, 9L, 9L, > 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L, > 1L, 1L, 3L, 3L, 4L, 6L, 4L, 5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, > 7L, 3L, 3L, 10L, 13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, > 3L, 0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L, 2L, 2L, > 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L, 4L, 4L, 8L, 7L, 10L, > 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L, 8L), .Dimnames = list(NULL, c("V1", > "V7", "V13", "V19", "V25", "V31", "V37", "V43"))) > > > Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) > > (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] > > > > Point is that I am not getting exact CORR matrix. Can somebody point me > what I am missing here? > > Thanks for your pointer. Try: Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) and compare the result with cor(Data) And why? Look at ?sd and note that: Details: Like 'var' this uses denominator n - 1. Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 12-Aug-2014 Time: 22:32:26 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate quasi-random positive numbers
On 05-Aug-2014 10:27:54 Frederico Mestre wrote: > Hello all: > > Is it possible to generate quasi-random positive numbers, given a standard > deviation and mean? I need all positive values to have the same probability > of selection (uniform distribution). Something like: > > runif(10, min = 0, max = 100) > > This way I'm generating random positive numbers from a uniform > distribution. However, using runif I can't previously select SD and mean > (as in rnorm). > > Alternatively, I'm able to generate a list of quasi-random numbers given a > SD and a mean. > > b <- (sqrt(SD^2*12)+(MEAN*2))/2 > a <- (MEAN*2) - b > x1 <- runif(N,a,b) > > However, negative values might be included, since "a" can assume a negative > value. > > Any help? > > Thanks, > Frederico There is an inevitable constraint on MEAN and SD for a uniform ditribution of positive numbers. Say the parent distribution is uniform on (a,b) with a >= 0 and b > a. Then MEAN = (a+b)/2, SD^2 = ((b-a)^2)/12, so 12*SD^2 = b^2 - 2*a*b + a^2 4*MEAN^2 = b^2 + 2*a*b + a^2 4*MEAN^2 - 12*SD^2 = 4*a*b MEAN^2 - 3*SD^2 = a*b Hence for a >= 0 and b > a you must have MEAN^2 >= 3*SD^2. Once you have MEAN and SD satisfying this constraint, you should be able to solve the equations for a and b. Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 05-Aug-2014 Time: 11:46:52 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple permutation question
I think Robert wants deterministic permutations. In the e1071 package -- load with library(e1071) -- there is a function permutations(): Description: Returns a matrix containing all permutations of the integers '1:n' (one permutation per row). Usage: permutations(n) Arguments: n: Number of element to permute. so, starting with x <- c("A","B","C","D","E") library(e1071) P <- permutations(length(x)) then, for say the 27th of these 120 permutations of x, x[P[27,]] will return it. Ted. On 25-Jun-2014 20:38:45 Cade, Brian wrote: > It is called sample(,replace=F), where the default argument is sampling > without replacement. > Try > x <- c("A","B","C","D","E") > sample(x) > > Brian > > Brian S. Cade, PhD > > U. S. Geological Survey > Fort Collins Science Center > 2150 Centre Ave., Bldg. C > Fort Collins, CO 80526-8818 > > email: ca...@usgs.gov > tel: 970 226-9326 > > > > On Wed, Jun 25, 2014 at 2:22 PM, Robert Latest wrote: > >> So my company has hired a few young McKinsey guys from overseas for a >> couple of weeks to help us with a production line optimization. They >> probably charge what I make in a year, but that's OK because I just >> never have the time to really dive into one particular time, and I have >> to hand it to the consultants that they came up with one or two really >> clever ideas to model the production line. Of course it's up to me to >> feed them the real data which they then churn through their Excel >> models that they cook up during the nights in their hotel rooms, and >> which I then implement back into my experimental system using live data. >> >> Anyway, whenever they need something or come up with something I skip >> out of the room, hack it into R, export the CSV and come back in about >> half the time it takes Excel to even read in the data, let alone >> process it. Of course that gor them curious, and I showed off a couple >> of scripts that condense their abysmal Excel convolutions in a few >> lean and mean lines of R code. >> >> Anyway, I'm in my office with this really attractive, clever young >> McKinsey girl (I'm in my mid-forties, married with kids and all, but I >> still enjoyed impressing a woman with computer stuff, of all things!), >> and one of her models involves a simple permutation of five letters -- >> "A" through "E". >> >> And that's when I find out that R doesn't have a permutation function. >> How is that possible? R has EVERYTHING, but not that? I'm >> flabbergasted. Stumped. And now it's up to me to spend the evening at >> home coding that model, and the only thing I really need is that >> permutation. >> >> So this is my first attempt: >> >> perm.broken <- function(x) { >> if (length(x) == 1) return(x) >> sapply(1:length(x), function(i) { >> cbind(x[i], perm(x[-i])) >> }) >> } >> >> But it doesn't work: >> > perm.broken(c("A", "B", "C")) >> [,1] [,2] [,3] >> [1,] "A" "B" "C" >> [2,] "A" "B" "C" >> [3,] "B" "A" "A" >> [4,] "C" "C" "B" >> [5,] "C" "C" "B" >> [6,] "B" "A" "A" >> > >> >> And I can't figure out for the life of me why. It should work because I >> go through the elements of x in order, use that in the leftmost column, >> and slap the permutation of the remaining elements to the right. What >> strikes me as particularly odd is that there doesn't even seem to be a >> systematic sequence of letters in any of the columns. OK, since I >> really need that function I wrote this piece of crap: >> >> perm.stupid <- function(x) { >> b <- as.matrix(expand.grid(rep(list(x), length(x >> b[!sapply(1:nrow(b), function(r) any(duplicated(b[r,]))),] >> } >> >> It works, but words cannot describe its ugliness. And it gets really >> slow really fast with growing x. >> >> So, anyway. My two questions are: >> 1. Does R really, really, seriously lack a permutation function? >> 2. OK, stop kidding me. So what's it called? >> 3. Why doesn't my recursive function work, and what would a >>working version look like? >> >> Thanks, >> robert >> >>
Re: [R] C: drive memory full
Don't forget the possibility of "hidden files", which will not be shown using normal file-listing procedures. It may be that R has stored those files as hidden files. See, for example: http://windows.microsoft.com/en-gb/windows/show-hidden-files http://www.sevenforums.com/tutorials/394-hidden-files-folders-show-hide.html [NB: These are the results of a google search. I am no expert on Windows myself ... ] Hoping this helps, Ted. On 17-Jun-2014 12:48:54 Hiyoshi, Ayako wrote: > Dear Martyn and Professor Ripley, > > Thank you so much for your help. I used Window's large file search (it was > useful! thank you), but there is no big files detected in C: drive . > Perhaps I will have to reinstall Windows.. > > Thank you so much for your replies. > > Best wishes, > Ayako > > From: Martyn Byng > Sent: 17 June 2014 12:10 > To: Hiyoshi, Ayako; Prof Brian Ripley; r-help@R-project.org > Subject: RE: [R] C: drive memory full > > Hi, > > Try > > http://social.technet.microsoft.com/wiki/contents/articles/19295.windows-8-how > -to-search-for-large-files.aspx > > Martyn > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Hiyoshi, Ayako > Sent: 17 June 2014 11:40 > To: Prof Brian Ripley; r-help@R-project.org > Subject: Re: [R] C: drive memory full > > Dear Professor Ripley, > > Thank you so much for your quick reply. > > I tried 'dianame(tempdir())' and removed several 'Rtemp' and all other files. > Unfortunately it did not changed C: drive space much. > > I am really sorry, but could there be other things stored in somewhere in C: > drive ? > > I called IT support, but they could not spot either. > > Kind regards, > Ayako > > > From: Prof Brian Ripley > Sent: 17 June 2014 10:37 > To: Hiyoshi, Ayako; r-help@R-project.org > Subject: Re: [R] C: drive memory full > > tempdir() is specific to an R session. > > Start up R and run > > dirname(tempdir()) > > and look at that directory. Shut down R, then remove all old files/folders > in that directory, especially those beginning with 'Rtmp'. > > An R process tries to clean up after itself but > > - it cannot do so if it segfaults > - Windows has rules which no other OS has which can result in files being > locked and hence not deletable. > > > On 17/06/2014 08:49, Hiyoshi, Ayako wrote: >> Dear R users, >> >> Hello, I am new to R and confused about my PC's memory space after >> using R a while. >> >> My PC is Windows 8, RAM 8GB. I have run some analyses using commands >> like "vglm", "aalen(Sruv())", lm()... some regressions. >> >> I also use Stata, and when I tried to run Stata (big file), Stata >> could not do something which used to do because of the lack of memory >> space. I suspect R's something because R is the only new activity I >> did recently. >> >> I tried to google and found 'tempdir()'. >> >> I checked my temporary file but it was empty. >> >> Just in case, after running 'unlink(tempdir(),recursive=TRUE)', I >> restarted my computer, but memory space did not change. But there >> seems still something big in my C: drive storage and nearly 12GB >> is eaten. >> >> Could it be possible that R saved something somewhere? >> >> As I finished analyses, all I need is to erase everything stored >> so that I can get my memory space back. >> >> Ayako >> [[alternative HTML version deleted]] - E-Mail: (Ted Harding) Date: 17-Jun-2014 Time: 14:14:40 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix column division by vector
Maybe I am missing the point -- but what is wrong with line 3 of: m=rbind(c(6,4,2),c(3,2,1)) v= c(3,2,1) m%*%diag(1/v) # [,1] [,2] [,3] # [1,]222 # [2,]111 Ted. On 14-May-2014 15:03:36 Frede Aakmann Tøgersen wrote: > Have a look at ?sweep > > Br. Frede > > > Sendt fra Samsung mobil > Oprindelig meddelelse > Fra: carol white > Dato:14/05/2014 16.53 (GMT+01:00) > Til: r-h...@stat.math.ethz.ch > Emne: [R] matrix column division by vector > > Hi, > What is the elegant script to divide the columns of a matrix by the > respective position of a vector elements? > > m=rbind(c(6,4,2),c(3,2,1)) > > v= c(3,2,1) > > res= 6/3 4/2 2/1 > 3/3 2/21/1 > > > this is correct > mat2 = NULL > > for (i in 1: ncol(m)) > > mat2 = cbind(mat2, m[,i]/ v[i]) > > > but how to do more compact and elegant with for ex do.call? > > Many thanks > > Carol > [[alternative HTML version deleted]] > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 14-May-2014 Time: 18:16:12 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Precedence and parentheses
On 06-May-2014 18:09:12 Göran Broström wrote: > A thread on r-devel ("Historical NA question") went (finally) off-topic, > heading towards "Precedence". This triggered a question that I think is > better put on this list: > > I have been more or less regularly been writing programs since the > seventies (Fortran, later C) and I early got the habit of using > parentheses almost everywhere, for two reasons. The first is the > obvious, to avoid mistakes with precedences, but the second is almost > as important: Readability. > > Now, I think I have seen somewhere that unnecessary parentheses in R > functions may slow down execution time considerably. Is this really > true, ant should I consequently get rid of my faiblesse for parentheses? > Or are there rules for when it matters and doesn't matter? > > Thanks, Göran I have much sympathy with your motives for using parentheses! (And I have a similar computing history). My general encouragement would be: continue using them when you feel that each usage brings a benefit. Of course, you would avoid silliness like x <- (1*(2*(3*(4*(5) and indeed, that does slow it down somewhat (it takes about twice as long as a <- 1*2*3*4*5): system.time(for(i in (1:1)) 1*2*3*4*5) # user system elapsed # 0.032 0.000 0.032 system.time(for(i in (1:1)) (1*(2*(3*(4*(5)) # user system elapsed # 0.056 0.000 0.054 The main reason, I suppose, is that a "(" forces a new level on a stack, which is not popped until the matching ")" arrives. Interestingly, the above silliness takes a little longer when the nesting is the other way round: system.time(for(i in (1:1)) 1*2*3*4*5) # user system elapsed # 0.028 0.000 0.029 system.time(for(i in (1:1)) (1)*2)*3)*4)*5) ) # user system elapsed # 0.052 0.000 0.081 (though in fact the times are somwhat variable in both cases, so I'm not sure of the value of the relationship). Best wishes, Ted. - E-Mail: (Ted Harding) Date: 06-May-2014 Time: 19:41:13 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with products in R ?
Thanks for this link, Gabor (and especially for the link therein to your posting on Thu May 7 14:10:53 CEST 2009). This confirms that the R 'bc' package is not on CRAN and points to where it can be sourced from. I used to have bc.R installed on an old machine, which has gone into terminal coma now. Best wishes, Ted. On 04-May-2014 17:10:00 Gabor Grothendieck wrote: > Checking this with the bc R package (https://code.google.com/p/r-bc/), > the Ryacas package (CRAN), the gmp package (CRAN) and the Windows 8.1 > calculator all four give the same result: > >> library(bc) >> bc("168988580159 * 36662978") > [1] "6195624596620653502" > >> library(Ryacas) >> yacas("168988580159 * 36662978", retclass = "character") > 6195624596620653502 > >> library(gmp) >> as.bigz("168988580159") * as.bigz("36662978") > Big Integer ('bigz') : > [1] 6195624596620653502 > > > On Sun, May 4, 2014 at 12:50 PM, Ted Harding > wrote: >> On 04-May-2014 14:13:27 Jorge I Velez wrote: >>> Try >>> >>> options(digits = 22) >>> 168988580159 * 36662978 >>># [1] 6195624596620653568 >>> >>> HTH, >>> Jorge.- >> >> Err, not quite ... ! >> I hitch my horses to my plough (with help from R): >> >> options(digits=22) >> 168988580159*8 = 1351908641272 (copy down) >> 168988580159*7 = 1182920061113 ( " " ) >> 168988580159*9 = 1520897221431 ( " " ) >> 168988580159*2 = 337977160318 ( " " ) >> 168988580159*6 = 1013931480954 ( " " )^3 >> 168988580159*3 = 506965740477 ( " " ) >> >> 1351908641272 >> 11829200611130 >>152089722143100 >>337977160318000 >> 1013931480954 >> 10139314809540 >>101393148095400 >>506965740477000 >> == >>6195624596620653502 >> [after adding up mentally] >> >> compared with Jorge's: >>6195624596620653568 >> >> ("02" vs "68" in the final two digits). >> >> Alternatively, if using a unixoid system with 'bc' present, >> one can try interfacing R with 'bc'. 'bc' is an calculating >> engine which works to arbitrary precision. >> >> There certainly used to be a utility in which R can evoke 'bc', >> into which one can enter a 'bc' command and get the result >> returned as a string, but I can't seem to find it on CRAN now. >> In any case, the raw UNIX command line for this calculation >> with 'bc' (with result) is: >> >> $ bc -l >> [...] >> 168988580159 * 36662978 >> 6195624596620653502 >> quit >> >> which agrees with my horse-drawn working. >> >> Best wishes to all, >> Ted. >> >>> On Sun, May 4, 2014 at 10:44 PM, ARTENTOR Diego Tentor < >>> diegotento...@gmail.com> wrote: >>> >>>> Trying algorithm for products with large numbers i encountered a >>>> difference >>>> between result of 168988580159 * 36662978 in my algorithm and r product. >>>> The Microsoft calculator confirm my number. >>>> >>>> Thanks. >>>> -- >>>> *Gráfica ARTENTOR * >>>> >>>> de Diego L. Tentor >>>> Echagüe 558 >>>> Tel.:0343 4310119 >>>> Paraná - Entre RÃos >> >> - >> E-Mail: (Ted Harding) >> Date: 04-May-2014 Time: 17:50:54 >> This message was sent by XFMail >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com - E-Mail: (Ted Harding) Date: 04-May-2014 Time: 18:28:23 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with products in R ?
On 04-May-2014 14:13:27 Jorge I Velez wrote: > Try > > options(digits = 22) > 168988580159 * 36662978 ># [1] 6195624596620653568 > > HTH, > Jorge.- Err, not quite ... ! I hitch my horses to my plough (with help from R): options(digits=22) 168988580159*8 = 1351908641272 (copy down) 168988580159*7 = 1182920061113 ( " " ) 168988580159*9 = 1520897221431 ( " " ) 168988580159*2 = 337977160318 ( " " ) 168988580159*6 = 1013931480954 ( " " )^3 168988580159*3 = 506965740477 ( " " ) 1351908641272 11829200611130 152089722143100 337977160318000 1013931480954 10139314809540 101393148095400 506965740477000 == 6195624596620653502 [after adding up mentally] compared with Jorge's: 6195624596620653568 ("02" vs "68" in the final two digits). Alternatively, if using a unixoid system with 'bc' present, one can try interfacing R with 'bc'. 'bc' is an calculating engine which works to arbitrary precision. There certainly used to be a utility in which R can evoke 'bc', into which one can enter a 'bc' command and get the result returned as a string, but I can't seem to find it on CRAN now. In any case, the raw UNIX command line for this calculation with 'bc' (with result) is: $ bc -l [...] 168988580159 * 36662978 6195624596620653502 quit which agrees with my horse-drawn working. Best wishes to all, Ted. > On Sun, May 4, 2014 at 10:44 PM, ARTENTOR Diego Tentor < > diegotento...@gmail.com> wrote: > >> Trying algorithm for products with large numbers i encountered a difference >> between result of 168988580159 * 36662978 in my algorithm and r product. >> The Microsoft calculator confirm my number. >> >> Thanks. >> -- >> *Gráfica ARTENTOR * >> >> de Diego L. Tentor >> Echagüe 558 >> Tel.:0343 4310119 >> Paraná - Entre Ríos - E-Mail: (Ted Harding) Date: 04-May-2014 Time: 17:50:54 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting a particular weekday for a given month
On 07-Apr-2014 17:49:09 Christofer Bogaso wrote: > Hi, > Given a month name, I am looking for some script to figure out, > what is the date for 3rd Wednesday. For example let say I have > following month: > > library(zoo) > Month <- as.yearmon(as.Date(Sys.time())) > > I need to answer: What is the date for 3rd Wednesday of 'Month'? > > Really appreciate for any pointer. > > Thanks for your time. The following may not suit you, but it the sort of approach I tend to adopt myself, using things I know about rather than getting lost in R documentation! (Outline of general method, not details). And it also assumes you are using a Unixoid system (e.g. Linux or Mac OS2). Your two commands currently give: library(zoo) Month <- as.yearmon(as.Date(Sys.time())) Month # [1] "Apr 2014" and it is straightforward to extract "Apr" and "2014" from Month. This is the point at which I attach my horses to my wooden plough ... In Unixoid systems there is a command 'cal' which, for "2014", yields output: $ cal 2014 2014 January February March Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 1 2 1 2 6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 27 28 29 30 3124 25 26 27 2824 25 26 27 28 29 30 31 April May June Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 61 2 3 4 1 7 8 9 10 11 12 13 5 6 7 8 9 10 11 2 3 4 5 6 7 8 14 15 16 17 18 19 20 12 13 14 15 16 17 18 9 10 11 12 13 14 15 21 22 23 24 25 26 27 19 20 21 22 23 24 25 16 17 18 19 20 21 22 28 29 30 26 27 28 29 30 31 23 24 25 26 27 28 29 30 July August September Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 1 2 3 1 2 3 4 5 6 7 7 8 9 10 11 12 13 4 5 6 7 8 9 10 8 9 10 11 12 13 14 14 15 16 17 18 19 20 11 12 13 14 15 16 17 15 16 17 18 19 20 21 21 22 23 24 25 26 27 18 19 20 21 22 23 24 22 23 24 25 26 27 28 28 29 30 31 25 26 27 28 29 30 31 29 30 October November December Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 3 4 5 1 2 1 2 3 4 5 6 7 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12 13 14 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19 20 21 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26 27 28 27 28 29 30 3124 25 26 27 28 29 30 29 30 31 After the first two lines, this consists of 4 blocks, each with 8 rows, each covering 3 months where each month consists of 7 columns, one for each day of the week (Mo - Su). Each column occupies 3 character spaces (excpet for the last -- only 2). >From "April" you can readily identify that this is the 4th month, so you need to go to Month 1 of the 2nd block of rows. The "We" column is the 3rd in that month, and you are looking for the date of the 3rd Wednesday. So count down to the 3rd non-blank entry[*] in this 3rd column, and you will find "16". Done. [*] Some months, e.g. November above, have an initial blank entry because this day belongs to the previous month. Quite how you would program this efficiently in R is another matter! But the principle is simple. To give R a text file to work on, at the shell prompt use a command like: $ cal 2014 > cal2014.txt and then "cal2014.txt" is accessible as a plain text file. Even simpler (if it is only one particular month you want, as in your example) is: $ cal April 2014 which yields: April 2014 Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 and now just count down the 3rd column (as before). Maybe this helps ... Ted. - E-Mail: (Ted Harding) Date: 07-Apr-2014 Time: 19:29:41 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about lm()
I suspect the problem may be with the structure of the data. Si Qi L wrote: [...] acc1<- lm(data$acc ~ dummy + data$Reqloanamount + data$Code + data$Code.1 + data$EmpNetMonthlyPay + data$Customer_Age + data$RTI) [...] These attributes are all numerical except the "acc"(factors) [...] If, as he implies, the "acc" variable in "data" is a factor, then lm() will not enjoy fitting an lm where the dependent variables (response) is a factor! Just a shot in the dark ... Ted. On 30-Mar-2014 18:46:27 Bert Gunter wrote: > 1. Post in plain text, not HTML. > > 2. Read ?lm and note the data argument. Use it in your lm call instead > of all the $data extractions. > > 3. Your problem is with the summary() call, so read ?summary.lm. Learn > about S3 methods if you do not know where the ".lm" part is coming > from by reading the "Introduction to R " tutorial that ships with R, > which you should have read already anyway. > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > H. Gilbert Welch > > > > > On Sun, Mar 30, 2014 at 9:43 AM, Frank Schwidom wrote: >> Please provide some data from your variable data. >> >> Show the output of dput( data) or an subset >> of data which leads toe the specific error. >> >> Regards >> >> >> On Sun, Mar 30, 2014 at 02:23:09PM +0100, Si Qi L. wrote: >>> Hi >>> >>> I have a problem with linear regression. This is my codes: >>> >>> acc1<- lm(data$acc ~ dummy + data$Reqloanamount + data$Code + data$Code.1 + >>> data$EmpNetMonthlyPay + data$Customer_Age + data$RTI) >>> summary(acc1) >>> >>> These attributes are all numerical except the "acc"(factors), so how can I >>> fix the problem R showed me? can anyone please help me out? Many thanks! >>> >>> But the R shows: >>> >>> Residuals:Error in quantile.default(resid) : factors are not allowedIn >>> addition: Warning message:In Ops.factor(r, 2) : ^ not meaningful for >>> factors >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 30-Mar-2014 Time: 21:12:16 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] points with-in boundaries of a map
On 23-Mar-2014 22:50:50 Jim Lemon wrote: > On 03/23/2014 10:29 PM, eliza botto wrote: >> Thankyou very much jim. it worked! but regarding second part of my >> question, isn't there a way to read the coordinates of intersecting >> lines with the premises of the map? > > Hi Eliza, > I think you want the "locator" function, which will return the > coordinates to the accuracy that you can click on a point in the plot. > > Jim There is a possible analytical approach to this. If the boundary of a simple closed region is given as a series of straight lines which join successive points as one proceeds continuously round the boundary of theregion, and one has the data of the (x,y) coordinates of the points in that order (with the last point the same as the first), then one can proceed as follows: Let the points be P1=(x1,y1), P2=(x2,y2), ... , PN=(xN,yN)=P1 in the order specified above. Since the grid lines have been drawn at specified intervals, you can work through every point (x0,y0) which is an intersection. Let (x0,y0) be a point of intersection of the grid of lines. Now you know the coordinates of (x0,y0) already, so the question is whether this point is inside or outside the region. Consider a line jpoining (x0,y0) to (x1,y1). Now work out the angle through which the line rotates when it is moved so that it joins (x0,y0) to (x2,y2); say this is phi2 (clockwise positive, anticlockise negative). Similarly work out phi3, the angle through which the line next rotates when it is further moved so that it joins (x0,y0) to (x2,y2); and so on. Now add up phi2 + phi3 + ... + phiN If the point (x0,y0) is inside the region, and you have proceeded consecutively along the boundary, then this sum of angles will be pi or -pi. If (x0,y0) is outside the region, then this sum will be zero. So you can work through all the intersections of the grid lines, and for each intersection you can determine whether it is inside or outside; and, if it is inside, you already know its coordinates. It can get more complicated if the region is not simply-connected. E.g. if you do not want to count points (x0,y0) which are within the boundary coastline but also are within the boundary of an inland lake; or, further, you do want to count points which are on an island in the lake; and so on. The essential for following a procedure of this kind is that you can extract from the data from which the map is drawn the coordinates of a series of points which are in consecutive order as one proceeds along the boundar of a region. Not all geographic data have the coordinates in such an order -- one can find datasets such that the boundary is drawn as a set of separate partial boundaries which are in no particular order as a whole; and in some datasets the different separate parts of the boundary do not exactly match up at the points where they should exactly join. Hoping this helps, Ted. ----- E-Mail: (Ted Harding) Date: 23-Mar-2014 Time: 23:47:15 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding to whole number
On 20-Mar-2014 19:42:35 Kristi Glover wrote: > Hi R User, > I was trying to convert a decimal value into integral (whole number). I used > round function but some of the cells have the value less than 0.5 and it > converted into 0. But I wanted these cell to have 1 instead of 0. Another > way, I could multiply by 10. But l did not want it because it affected my > results later. > I was wondering about how I can convert 0.2 to 1. For example > data<-structure(list(site1 = structure(1:4, .Label = c("s1", "s2", > "s3", "s4"), class = "factor"), A = c(0, 2.3, 2.6, 1.3), B = c(0.5, > 0.17, 2.9, 3)), .Names = c("site1", "A", "B"), class = "data.frame", > row.names = c(NA, > -4L)) > output<-structure(list(site1 = structure(1:4, .Label = c("s1", "s2", > "s3", "s4"), class = "factor"), A = c(0L, 3L, 3L, 2L), B = c(1L, > 1L, 3L, 3L)), .Names = c("site1", "A", "B"), class = "data.frame", row.names > = c(NA, > -4L)) > > Thanks for your suggestions in advance. > cheers, > KG Hi Kristi, the function round() has peculiarites (because of the IEEE standard). E.g. round(1+1/2) # [1] 2 round(0+1/2) # [1] 0 i.e. in "equipoised" cases like these, it rounds to the nearest *even" number; otherwise, if it is nearer to one integer value (above or below) than to the other (below or above), then it rounds to the nearest integer value. There are two unambiguous functions you could use for "equipoised" cases, depending on which way you want to go: floor(1+1/2) # [1] 1 floor(0+1/2) # [1] 0 ceiling(1+1/2) # [1] 2 ceiling(0+1/2) # [1] 1 The latter would certainly meet your request for "how I can convert 0.2 to 1", since ceiling(0.2) # [1] 1 But it will not round, say, (1+1/4) to the nearest integer. On the other hand, if you want a function which rounds to the nearest integer when not exactly halfway between, and rounds either always up or always down when the fractional part is exactly 1/2, then I think (but others will probably correct me) that you may have to write your own -- say roundup() or rounddown(): roundup <- function(x){ if((x-floor(x))==1/2){ ceiling(x) } else {round(x)} } rounddown <- function(x){ if((x-floor(x))==1/2){ floor(x) } else {round(x)} } Also have a look at the help page ?round Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 20-Mar-2014 Time: 20:04:20 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pattern Matching
On 02-Mar-2014 20:12:57 Benno Pütz wrote: > Try > > as.numeric(sub(".*\\(","", sub('\\)','',aa))) > > You may also want to look at regexec/regmatches for a more general approach > ... > > On 02 Mar 2014, at 20:55, Doran, Harold wrote: > >> "1 (472)" "2 (445)" "3 (431)" "3 (431)" "5 (415)" "6 (405)" "7 (1) > > Benno Pütz Another formulation, which breaks it into steps and may therefore be easier to adopt to similar but different cases, is aa0<-gsub("^[0-9]+ ","",aa) aa0 # [1] "(472)" "(445)" "(431)" "(431)" "(415)" "(405)" "(1)" as.numeric(gsub("[()]","",aa0)) # [1] 472 445 431 431 415 405 1 Ted. - E-Mail: (Ted Harding) Date: 02-Mar-2014 Time: 20:57:07 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geneation
[see at end] On 20-Feb-2014 10:47:50 Rui Barradas wrote: > Hello, > > I'm not sure I understand the question. When you use set.seed, it will > have effect in all calls to the random number generator following it. So > the value for y is also fixed. > As for your code, you don't need the second set.seed. And though it is > not syntatically incorrect, the way you are coding it is not very usual. > Try instead > > set.seed(100) > x <- 10*runif(10) > x > > y <- rnorm(10) > y > > y <- rnorm(10) #different y > y > > > Hope this helps, > > Rui Barradas > > Em 20-02-2014 07:05, IZHAK shabsogh escreveu: >> how do i use set.seed? for example i want to generate fix x with different >> value of y each time i.e >> >> genarate x<-rnorm(10) >> generate y<-rnorm(10) >> i want have x fix but y changes at each iteration. this what i try but is >> not working >> >> >> { >> set.seed(100) >> >> x<-10*runif(10) >> } >> x >> set.seed(y<-rnorm(10)) >> y It seems clear that Izhak seeks to detach the random generation of y from the random generation of x after using set.seed(). On my reading of ?RNG once set.seed has been used, as RUI says, it affects all subsequent calls to the generator. Initially, however: Note: Initially, there is no seed; a new one is created from the current time when one is required. Hence, different sessions started at (sufficiently) different times will give different simulation results, by default. But, even so, it still seems (perhaps) that using a RNG without the call to set.seed() will still establish a seed (and its consequences) for that session (in effect there is an implicit call to set.seed()). This leads me to suggest that a useful innovation could be to add a feature to set.seed() so that set.seed(NULL) (which currently generates an error) would undo the effect of any previous (explicit or implicit) call to set.seed() so that, for instance, set.seed(100) x<-10*runif(10) set.seed(NULL) y <- rnorm(10) would result in y being generated from a seed which was set from the system clock. There is no form of argument to set.seed() which instructs it to take its value from the system clock (or other sourc of external random events); and indeed it seems that only the system clock is available. On Linux/UNIX systems, at least, there is a possible accessible source of external randomness, namely '/dev/random', and its companion '/dev/urandom'. This accumulates random noise from high-resolution timings of system events like key-presses, mouse-clicks, disk accesses, etc., which take place under external influences and are by nature irregular in timings. The difference between /dev/random and /dev/urandom is that one read from /dev/random effectively resets it, and further reads may be blocked until sufficient new noise has accumulated; while repeated reads from /dev/urandom are always possible (though with short time-lapses there may not be much difference between successive reads). The basic mechanism for this is via the command 'dd', on the lines of dd if=/dev/urandom of=newseed count=1 bs=32 which makes one ("count=1") read from input file ("if") /dev/urandom of 32 bytes ("bs=32") into the output file ("of") newseed. When I ran the above command on my machine just now, and inspected the results in hex notation ('od -x newseed') I got (on-screen): od -x newseed 000 4be9 7634 41cf 5e17 b068 7898 879e 8b5f 020 fb4f 52e6 59ef 0b58 5258 4a3a df04 c18d 040 where the initial "000" etc. denote byte-counts to the beginning of the current line (expressed also in hex); so the actual byte content of newseed is: 4b e9 76 34 41 cf 5e 17 b0 68 78 98 87 9e 8b 5f fb 4f 52 e6 59 ef 0b 58 52 58 4a 3a df 04 c1 8d This could be achieved via a system() call from R; and the contents of newseed would then need to be converted into a format suitable for use as argument to set.seed(). For the time being (not having time just now) I leave the details to others ... Ted. - E-Mail: (Ted Harding) Date: 20-Feb-2014 Time: 12:00:07 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate probability of union of independent events
On 18-Feb-2014 22:08:38 Meinfelder, Florian wrote: > Dear all, > > I am looking for a way to calculate the probability of the union of k > independent events in R. Is there a function that takes a k-dimensional > probability vector as input and returns p(A_1 U A_2 U...U A_k)? > > Thanks, > Florian I don't know (off-hand); but it is very easy to write one's own! Since P(A1 U A2 U ... U Ak ) = 1 - P(not(A1 U A2 U ... u Ak)) = 1 - P((not A1) & (not A2) & ... & (not Ak)) = 1 - P(not A1)*P(not A2)* ... *P(not Ak) [by independence] = 1 - (1-p1)*(1-p2)* ... *(1-pk) where pj is P(Aj). Hence punion <- function(p){1 - prod(1-p)} should do it! Ted. ----- E-Mail: (Ted Harding) Date: 18-Feb-2014 Time: 23:51:31 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to plot a shifted Gamma distribution
On 13-Feb-2014 15:30:43 Rodrigo Cesar Silva wrote: > I have the parameters of a gamma distribution that I am trying to plot. > The parameters are shape = 2, scale = 5.390275 and the minimum value > x0 is 65.44945. > > Since the gamma is shifted by x0, we have > Mean = shape*scale + x0 = 76.23 > > My question is, how can I do it in r? > > I was trying to use dgamma function, but I saw no options to offset > the gamma distribution. I can only use the shape and scale parameters, > like this: > >> x <- seq(from=0, to=100, length.out=100) > >> plot(x, dgamma(x, shape=2, scale=5.390275), > main="Gamma",type='l') If all that is happening is that the distribution is the same as that of a Gamma distribution with origin at 0, but simply shifted to the right by an amount x0 (though I am wondering what context this could arise in), then the commands x <- seq(from=0, to=100, length.out=100) x0 <- 65.44945 plot(x+x0, dgamma(x, shape=2, scale=5.390275), main="Gamma",type='l') will produce such a plot. However, I wonder if you have correctly expressed the problem! Ted. > This generates a distribution with origin equal zero, but I want the > origin to be x0 > > How can I handle shifted gamma distribution in r? > > Thanks a lot! > Rodrigo. - E-Mail: (Ted Harding) Date: 13-Feb-2014 Time: 17:27:29 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating an equivalent of r-help on r.stackexchange.com ?
Ditto. And ditto. And (by the way -- no-one seems to have mentioned it) what are the possibilities, for mail appearing on something like Stack Exchange, of having the mail sent to oneself so that it can be stored locally, on one's own machine? That is the only way I would want to work -- anything interesting is sitting in my disk, I can edit it if I wish, I can make local copies, etc. etc. etc. etc. Anything which is not interesting gets deleted (though I can always dig into R-help archives if need be). Best wishes, Ted. On 03-Feb-2014 21:36:21 Rolf Turner wrote: > > For what it's worth, I would like to say that I concur completely with > Don and Bert. (Also I would like second Bert's vote of thanks to Don > for expressing the position so clearly.) > > cheers, > > Rolf Turner > > On 04/02/14 09:56, Bert Gunter wrote: >> Don: >> >> First, I apologize if this is off topic, but I thought I should reply >> publicly. >> >> I would only like to say thank you for so eloquently and elegantly >> summarizing my views, also. Maybe that makes me a dinosaur. If so, I >> happily accept the label. >> >> I find SO's voting for posting business especially irritating. I wish >> merely to post or to read the posts of others without being subjected >> to some kind of online pseudo game and ratings competition. That alone >> keeps me away. But Don said it better. >> >> I realize that I may be out of step with the masses here, and the >> masses should certainly decide. Hopefully I won't be around if/when >> they decide that R-help should go. >> >> Best, >> Bert >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> (650) 467-7374 >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> H. Gilbert Welch >> >> >> >> >> On Mon, Feb 3, 2014 at 12:42 PM, MacQueen, Don wrote: >>> Every browser-based interface I've ever seen has a number of features that >>> I find to be huge deterrents. To mention just two: >>> >>> - They waste copious amounts of screen space on irrelevant things such as >>> "votes", the number of views, the elapsed time since something or other >>> happened, fancy web-page headers, and so on. Oh, and advertisements. The >>> Mathematica stackexchange example given in a link in one of the emails >>> below (http://mathematica.stackexchange.com/) illustrates these >>> shortcomings -- and it's not the worst such example. >>> >>> - In most if not all cases, one has to login before posting. I have too >>> many usernames and passwords as it is. >>> >>> Right now, at this very moment, in my email client's window I can see and >>> browse the subject lines of 20 threads in r-help. And that's using only >>> about half of my screens vertical space. In contrast, in the Mathematica >>> stackexchange example, I can see at most 10, and that only by using the >>> entire vertical space of my screen. The "From" column in my email client >>> shows the names of several of the people contributing to the thread, which >>> the browser interface does not. In the email client, I can move through >>> messages, and between messages in a thread using my keyboard. In a >>> browser, I have to do lots of mousing and clicking, which is much less >>> efficient. >>> >>> As it is now, r-help messages come to me. I don't have to start up a >>> browser. So it's much easier to go take a quick look at what's new at any >>> time. >>> >>> True, I had to subscribe to the mailing list, which involves a username >>> and password. But once it's done, it's done. I don't have to login before >>> posting, which means I don't have to remember yet another username and >>> password. >>> >>> What "...duplicated efforts of monitoring multiple mailing lists)"? I have >>> no duplicated effort...in fact, I have almost no effort at all, since the >>> messages come to me. There was some initial setup, i.e., to filter >>> different r-* messages to different mailboxes in my email client, but now >>> that that's done, it's as simple as clicking on the correct mailbox. >>> >>> In other words, in every way that's important to me, the mailing list >>> approach is superior. I do not support abandoning the mailing list system >>> for any alternative. >>> >>> -Don >&
Re: [R] How to subscribe this mailing list
[Also inline] On 12-Jan-2014 17:45:03 Rui Barradas wrote: > Hello, > > Inline. > > Em 12-01-2014 10:48, gj1989lh escreveu: >> Hi, >> >> >> How can I subscribe this mailing list? > > Apparently, you already have. > Welcome. Well, apparently not. "gj1989lh" is not listed in the R-help subscriber list, and the original posting was moderator approved, so had been held for moderation (presumably because the address was not subscribed). The way to subscribe is to visit the web-page: https://stat.ethz.ch/mailman/listinfo/r-help and follow the instructions which are set out in the section "Subscribing to R-help" (including the instructions relating to the follow-up confirmation email which the subscriber will subsequently receive). Hoping this helps (and replying to the R-help list so that everone else can see that it has been dealt with). The best address for enquiries about subscribing to/using/posting to R-help is r-help-ow...@r-project.org Ted. >> thx >> [[alternative HTML version deleted]] > > Don't post in html, please. > > Rui Barradas >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 12-Jan-2014 Time: 18:01:25 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tracking what R actually executes
On 02-Jan-2014 23:55:28 Duncan Murdoch wrote: > On 14-01-02 6:05 PM, Fisher Dennis wrote: >> R 3.0.2 >> All platforms >> >> Colleagues >> >> This question is probably conceptual rather than technical and I have not >> thought out all of the issues yet. Lets say I have an extensive list of >> functions and some lines of code that call the functions. I would like to >> have a record of all the commands that were actually executed. I realize > that this could be voluminous but it might be necessary. >> For example, the code and function might be: >> >> ### >> INPUT: >> COUNTER <- function(N) >> for (i in 1:N) cat(count, i, \n) >> COUNTER(10) >> >> ### >> OUTPUT: >> cat(count, 1, \n) >> cat(count, 2, \n) >> cat(count, 3, \n) >> cat(count, 4, \n) >> cat(count, 5, \n) >> cat(count, 6, \n) >> cat(count, 7, \n) >> cat(count, 8, \n) >> cat(count, 9, \n) >> cat(count, 10, \n) >> >> # >> If I have formulated the question poorly, please do you best to understand >> the intent. >> >> Dennis > > As far as I know, R doesn't have exactly this built in, but the Rprof() > function gives an approximation. It will interrupt the execution at a > regular time interval (1/50 sec is the default, I think), and record all > functions that are currently active on the execution stack. So tiny > little functions could be missed, but bigger ones probably won't be. > > There are also options to Rprof to give other profiling information, > including more detail on execution position (down to the line number), > and various measures of memory use. > > Duncan Murdoch Also have a look at ?trace which you may be able to use for what you want. Ted. - E-Mail: (Ted Harding) Date: 03-Jan-2014 Time: 00:14:51 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Season's Greetings (and great news ... )!
Thanks for the comments, Bert and Mehmet! It is of course a serious and interesting area to explore (and I'm aware of the chaos context; I initially got into this areas year ago when I was exploring the possibilities for chaos in fish population dynamics -- and they're certainly there)! But, before anyone takes my posting *too* seriously, let me say that it was written tongue-in-cheek (or whatever the keyboard analogue of that may be). I'm certainly not "blaming R". Have fun anyway! Ted. On 22-Dec-2013 17:35:56 Bert Gunter wrote: > Yes. > > See also Feigenbaum's constant and chaos theory for the general context. > > Cheers, > Bert > > On Sun, Dec 22, 2013 at 8:54 AM, Suzen, Mehmet wrote: >> I wouldn't blame R for floating-point arithmetic and our personal >> feeling of what 'zero' should be. >> >>> options(digits=20) >>> pi >> [1] 3.141592653589793116 >>> sqrt(pi)^2 >> [1] 3.1415926535897926719 >>> (pi - sqrt(pi)^2) < 1e-15 >> [1] TRUE >> >> There was a similar post before, for example see: >> http://r.789695.n4.nabble.com/Why-does-sin-pi-not-return-0-td4676963.html >> >> There is an example by Martin Maechler (author of Rmpfr) on how to use >> arbitrary precision >> with your arithmetic. >> >> On 22 December 2013 10:59, Ted Harding wrote: >>> Greetings All! >>> With the Festive Season fast approaching, I bring you joy >>> with the news (which you will surely wish to celebrate) >>> that R cannot do arithmetic! >>> >>> Usually, this is manifest in a trivial way when users report >>> puzzlement that, for instance, >>> >>> sqrt(pi)^2 == pi >>> # [1] FALSE >>> >>> which is the result of a (generally trivial) rounding or >>> truncation error: >>> >>> sqrt(pi)^2 - pi >>> [1] -4.440892e-16 >>> >>> But for some very simple calculations R goes off its head. >>> >>> I had originally posted this example some years ago, but I >>> have since generalised it, and the generalisation is even >>> more entertaining than the original. >>> >>> The Original: >>> Consider a sequence generated by the recurrence relation >>> >>> x[n+1] = 2*x[n] if 0 <= x[n] <= 1/2 >>> x[n+1] = 2*(1 - x[n]) if 1/2 < x[n] <= 1 >>> >>> (for 0 <= x[n] <= 1). >>> >>> This has equilibrium points (x[n+1] = x[n]) at x[n] = 0 >>> and at x[n] = 2/3: >>> >>> 2/3 -> 2*(1 - 2/3) = 2/3 >>> >>> It also has periodic points, e.g. >>> >>> 2/5 -> 4/5 -> 2/5 (period 2) >>> 2/9 -> 4/9 -> 8/9 -> 2/9 (period 3) >>> >>> The recurrence relation can be implemented as the R function >>> >>> nextx <- function(x){ >>> if( (0<=x)&(x<=1/2) ) {x <- 2*x} else {x <- 2*(1 - x)} >>> } >>> >>> Now have a look at what happens when we start at the equilibrium >>> point x = 2/3: >>> >>> N <- 1 ; x <- 2/3 >>> while(x > 0){ >>> cat(sprintf("%i: %.9f\n",N,x)) >>> x <- nextx(x) ; N <- N+1 >>> } >>> cat(sprintf("%i: %.9f\n",N,x)) >>> >>> Run that, and you will see that successive values of x collapse >>> towards zero. Things look fine to start with: >>> >>> 1: 0.7 >>> 2: 0.7 >>> 3: 0.7 >>> 4: 0.7 >>> 5: 0.7 >>> ... >>> >>> but, later on, >>> >>> 24: 0.7 >>> 25: 0.6 >>> 26: 0.8 >>> 27: 0.4 >>> 28: 0.66672 >>> ... >>> >>> 46: 0.667968750 >>> 47: 0.664062500 >>> 48: 0.671875000 >>> 49: 0.65625 >>> 50: 0.68750 >>> 51: 0.62500 >>> 52: 0.75000 >>> 53: 0.5 >>> 54: 1.0 >>> 55: 0.0 >>> >>> What is happening is that, each time R multiplies by 2, the binary >>> representation is shifted up by one and a zero bit is introduced >>> at the bottom end. To illustrate this, do the calculation in >>> 7-bit arithmetic where 2/3 = 0.1010101, so: >>> >>> 0.1010101 x[1], >1/2 so subtract from 1 = 1.000 -> 0.0101011, >>> and then multiply by 2 to get x[2] = 0.1010110. Hence >>
[R] Season's Greetings (and great news ... )!
2 4 8 16 6 12 ... so period = 9. And so on ... So, one sniff of something like S<-19, and R is off its head! All it has to do is multiply by 2 -- and it gets it cumulatively wrong! R just doesn't add up ... Season's Greetings to all -- and may your calculations always be accurate -- to within machine precision ... Ted. - E-Mail: (Ted Harding) Date: 22-Dec-2013 Time: 09:59:00 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] iterated sum
On 14-Dec-2013 10:46:10 Ë®¾²Á÷Éî wrote: > x<-c(1,4,9,20,3,7) > i want to get a serie c(5,13,29,23,10). > y <- c() > for (i in 2:length(x)){ > y[i-1] <- x[i-1]+x[i]} > > is there more simple way to get? x <- c(1,4,9,20,3,7) N <- length(x) x[1:(N-1)] + x[2:N] # [1] 5 13 29 23 10 Best wishes, Ted. --------- E-Mail: (Ted Harding) Date: 14-Dec-2013 Time: 10:54:00 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting decimal to binary in R
> On Fri, Dec 13, 2013 at 10:11 PM, æ°´éæµæ·± <1248283...@qq.com> wrote: >> i have write a function to convert decimal number into binary number in R. >> >> dectobin<-function(x){ >> as.numeric(intToBits(x))->x1 >> paste(x1,collapse="")->x2 >> as.numeric(gsub("0+$","",x2))->x3 >> return(as.character(x3))} >> >> dectobin can get right result ,it is so long ,is there a build-in function >> to do ? On 14-Dec-2013 06:17:30 Richard M. Heiberger wrote: > I recommend > ?sprintf > > (4^(1/3))^3 != 4 > (4^(1/3))^3 == 4 > (4^(1/3))^3 - 4 > format(c((4^(1/3))^3 , 4), digits=17) > sprintf("%+13.13a", c((4^(1/3))^3 , 4)) The above generates a hexadecinal representation, not binary! So it needs further substitutions to get the binary representation. Can I add a tip which I have very often found useful for this kind of global substitution? Not just binary -- I first cooked it up in text-editing when faced with replacing "European" numbers by "Anglo-Saxon" numbers -- e.g. "1.234.567,89" needs to be converted into "1,234,567.89", therefore swapping "." and ",". But you don't want to do "." --> "," and then "," --> "." since you would then end up with "1.234.567,89" --> "1,234,567,89" --> "1.234.567.89" There, the trick was to use a character such as "#", which does not appear in the text, as a marker for the first substitution while the second is being made. Then substitute the desired character for "#": "1.234.567,89" --> "1#234#567,89" --> "1#234#567.89" --> "1,234,567.89" (first replacing "." by "#", then finally "#" by ","). You need to replace, for instance, "0" in hex by "" in binary, "1" by "0001", "2" by "0010", ... , "A" by "1010", and so on. However, you need to avoid replacing already-replaced symbols. So I suggest using, in a first round, "U" for "1" and "Z" for "0" (or whatever you prefer, provided it avoids "0" and "1"). So 0 -> , 1 -> ZZZU, ... , A -> UZUZ, etc. Then, finally, replace each "Z" by "0" and each "U" by "1". Hence (using a truncated representation), sqrt(pi) in hex is: sprintf("%+8.8A", sqrt(pi)) # [1] "+0X1.C5BF891BP+0" Then the successive substitutions (which can of course be programmed) would be: "+0X1.C5BF891BP+0" 0: "+X1.C5BF891BP+" 1: "+XZZZU.C5BF89ZZZUBP+" 2: "+XZZZU.C5BF89ZZZUBP+" 3: "+XZZZU.C5BF89ZZZUBP+" 4: "+XZZZU.C5BF89ZZZUBP+" 5: "+XZZZU.CZUZUBF89ZZZUBP+" 6: "+XZZZU.CZUZUBF89ZZZUBP+" 7: "+XZZZU.CZUZUBF89ZZZUBP+" 8: "+XZZZU.CZUZUBFUZZZ9ZZZUBP+" 9: "+XZZZU.CZUZUBFUZZZUZZUZZZUBP+" A: "+XZZZU.CZUZUBFUZZZUZZUZZZUBP+" B: "+XZZZU.CZUZUUZUUFUZZZUZZUZZZUUZUUP+" C: "+XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+" D: "+XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+" E: "+XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+" F: "+XZZZU.UUZZZUZUUZUUUZZZUZZUZZZUUZUUP+" Z: "+X000U.UU000U0UU0UUU000U00U000UU0UUP+" U: "+X0001.1100010110111000100100011011P+" The final result probably needs tidying up in accordance with the needs of subsequent uses! Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 14-Dec-2013 Time: 10:50:03 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fortune? [was: Re: quotation marks and scan]
[See in-line below] On 17-Nov-2013 22:38:30 Rolf Turner wrote: > > (1) The backslashes are not really there; they are an artefact of the R > print() function. > Try cat(u,"\n"). I think this might be an FAQ. > > (2) Is not your problem the fact that your are setting "replacement" > equal to the > thing you are trying to get rid of? I.e. don't you want > > v <- gsub(pattern='\"',replacement='',x=u) ??? > > Either I am misunderstanding your intent or you need another cup of coffee. Is the above line a Fortune? > cheers, > > Rolf > > On 11/18/13 11:07, Erin Hodgess wrote: >> Dear R People: >> >> I'm sure that this is a very simple problem, but I have been wresting with >> it for some time. >> >> I have the following file that has the following one line: >> >> CRS("+init=epsg:28992") >> >> Fair enough. I scan it into R and get the following: >> >>> u >> [1] "CRS(\"+init=epsg:28992\")" >>> gsub(pattern='\"',replacement='"',x=u) >> [1] "CRS(\"+init=epsg:28992\")" >> >> I need to get rid of the extra quotation marks and slashes. I've tried all >> sorts of things, including gsub, as you see, but no good. > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 17-Nov-2013 Time: 23:55:48 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] all combinations with replacement not ordered
On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote: > Hello! > > I need to obtain all possible combinations with replacement when > order is not important. > E.g. I have a population x{1,2,3}. > So I can get (choose(3+3-1,3)=) 10 combinations from this population > with 'size=3'. > How can I get a list of all that combinations? > > I have tried 'expand.grid()' and managed to get only samples where > order is important. > 'combn()' gave me samples without replacement. > > Best regards, > Konstantin Tretyakov. >From your description I infer that, from {1,2,3}, you want the result: 1 1 1 1 1 2 1 1 3 1 2 2 1 2 3 1 3 3 2 2 2 2 2 3 2 3 3 3 3 3 The following will do that: u <- c(1,2,3) unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1))) # [,1] [,2] [,3] # [1,]111 # [2,]112 # [3,]113 # [4,]122 # [5,]123 # [6,]133 # [7,]2 2 2 # [9,]233 #[10,]333 There may be a simpler way! Ted. - E-Mail: (Ted Harding) Date: 07-Nov-2013 Time: 17:04:50 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there something wrong with R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"?
On 16-Oct-2013 14:54:00 tom soyer wrote: > Hi, > > pnorm(-1.53,0,1) under version 3.0.2 gives 0.05155075. I am pretty sure it > should be 0.063. Is there something wrong with this version of R? > > I am using: > R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" > Copyright (C) 2013 The R Foundation for Statistical Computing > Platform: i686-pc-linux-gnu (32-bit) > -- > Tom If you did exactly as you describe, then something is indeed wrong: pnorm(-1.53,0,1) # [1] 0.06300836 [R version 2.11.0 (2010-04-22)] as you expected. However: qnorm(0.05155075) [1] -1.63 so maybe you mistyped "1.63" instead of "1.53"? - E-Mail: (Ted Harding) Date: 16-Oct-2013 Time: 16:12:56 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help: concurrent R sessions for different settings of simulations
[See at end] On 29-Sep-2013 17:22:24 Chee Chen wrote: > Dear All, > I have spent almost 2 days but did not succeed yet. > > Problem description: I have 3 parameters, p1, p2 and p3, for which > p1 take 1 of 5 possible distributions (e.g., normal, laplace), > p2 takes 1 of 3 possible distributions, and p3 takes 1 of 5 possible > distribution. These 3 parameters create 75 settings, and these 3 > parameters are arguments of a function F; and F is part of simulation > codes. To summarize: different value of the ordered triple (p1,p2,p3) > means different setting and this is the only difference in the > simulation codes. > > Target to achieve: instead of loop through each of the 75 settings > one after another, I would like to concurrently run all 75 settings > on the cluster. > > My attempts: via loops, I used Perl to create 75 files, each for a > different triple (p1,p2,p3), and Perl uses "system(R ..)" to execute > this setting once it is created. The Perl codes are submitted to > cluster correctly. But when I looked into the log file, the cluster > still executes it one setting after another setting. > > Request: any help is appreciated! It is because of the loops of Perl > that executes a setting once it is created? > > Have a nice day! > Chee Just a simple comment (which does not cionsider the technicalities of using Perl, using a cluster, etc.). >From your description, it looks as though the system waits for one item in the loop to finish before it starts the next one. If that is the case, and *if* you are using UNIX/Linux (or other UNIX-like OS), then you could try appending " &" to each submitted command. An outline exemplar: for( s in settings ){ system("R &") } The " &" has the effect, in a UNIX command line, of detaching the command from the executing program. So the program can continue to run (and take as long as it likes) while the system command-shell is immediately freed up for the next command. Therefore, with the above exemplar, is there were say 75 settings, then that loop would complete in a very short time, after which you would have 75 copies of R executing simulations, and your original R command-line would be available. Just a suggestion (which may have missed the essential point of your query, but worth a try ... ). I have no idea how to achieve a similar effect in Windows ... Ted. - E-Mail: (Ted Harding) Date: 29-Sep-2013 Time: 19:31:29 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does sin(pi) not return 0?
On 26-Sep-2013 07:55:38 Rolf Turner wrote: > On 09/26/13 19:31, Babak Bastan wrote: >> Hi experts, >> >> If I test sin(pi) in r, it returns me 1.224606e-16 >> >> Why doesn't return me 0? > > If you think that 1.224606e-16 is different from 0, you should probably not > be using computers. Is that a Fortune? And, if so, should R be using computers? sin(pi) # [1] 1.224606e-16 sin(pi)==0 # [1] FALSE > See FAQ 7.31 (which is in a way about the inverse of > your question, but it should provide the necessary insight). > > cheers, > Rolf Turner Though, mind you, FAQ 3.71 does also offer some consolation to R: all.equal(0,sin(pi)) # [1] TRUE So it depends on what you mean by "different from". Computers have their own fuzzy concept of this ... Babak has too fussy a concept. Ted. - E-Mail: (Ted Harding) Date: 26-Sep-2013 Time: 09:13:33 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate scales for pairs plot
On 21-Aug-2013 19:08:29 David Winsemius wrote: > > On Aug 21, 2013, at 10:30 AM, (Ted Harding) wrote: > >> Greetings all. >> >> I suspect this question has already been asked. Apologies >> for not having taced it ... >> >> In the default pairs plot produces by the function pairs(), >> the coordinate scales alternate between top and bottom and >> right and left sides. >> >> For example, in a 5x5 plot for variables X1, X2, X3, X4, X5 >> the coordinate scales for X2, X4 appear beside rows 2 and 4 >> on the left, and the scales for X1, X3, X5 appear beside rows >> 1, 3, 5 on the right. >> >> Similarly, the scales for X2 and X4 appear above columns 2 and 4, >> and the scales for X1, X3, X5 appear below columns 1, 3, 5. >> >> Is there a parameter lurking somewhere in the depths of this >> function which can be set so that the scales for all the variables >> X1,X2,X3,X4,X5 appear both above and below columns 1,2,3,4,5; >> and both to the left and to the right of rows 1,2,3,4,5? > > I've searched for a parameter and come up empty; Hacking the code for > pairs.default is not that difficult. I stripped out the conditionals that > were driving the Axis calls to alternating "sides": > Search for `box()` to start this surgery and replace everything to the 'mfg' > assignment to get uniform axis locations on sides 1 and 2. > > pairs.12 <- function(x, ... arglist same as pairs.default) >{upper portion of code > box() > if (i == nc ) > localAxis(1L , x[, j], x[, i], > ...) > if (j == 1 ) > localAxis(2L, x[, j], x[, i], ...) > > mfg <- par("mfg") >lower portion of code } > > Oooops, that wasn't what you asked for ... Use this instead: > > > box() # begin surgery > if (i == 1 ) > localAxis(3, x[, j], x[, i], ...) > if (i == nc ) > localAxis(1, x[, j], x[, i], ...) > if (j == 1 ) > localAxis(2L, x[, j], x[, i], ...) > if (j == nc ) > localAxis(4L, x[, j], x[, i], ...) > # end anastomosis > mfg <- par("mfg") > .. > -- > David Winsemius > Alameda, CA, USA Thanks very much, David! I'll give it a try. It looks promising. Good surgery, steady hand! Ted. - E-Mail: (Ted Harding) Date: 21-Aug-2013 Time: 21:23:39 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coordinate scales for pairs plot
Greetings all. I suspect this question has already been asked. Apologies for not having taced it ... In the default pairs plot produces by the function pairs(), the coordinate scales alternate between top and bottom and right and left sides. For example, in a 5x5 plot for variables X1, X2, X3, X4, X5 the coordinate scales for X2, X4 appear beside rows 2 and 4 on the left, and the scales for X1, X3, X5 appear beside rows 1, 3, 5 on the right. Similarly, the scales for X2 and X4 appear above columns 2 and 4, and the scales for X1, X3, X5 appear below columns 1, 3, 5. Is there a parameter lurking somewhere in the depths of this function which can be set so that the scales for all the variables X1,X2,X3,X4,X5 appear both above and below columns 1,2,3,4,5; and both to the left and to the right of rows 1,2,3,4,5? With thanks, Ted. - E-Mail: (Ted Harding) Date: 21-Aug-2013 Time: 18:30:44 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odds ratio per standard deviation
[Replies transposed so as to achieve bottom-posting ... ] On 12-Jun-2013 14:53:02 Greg Snow wrote: > > On Tue, Jun 11, 2013 at 7:38 PM, vinhnguyen04x wrote: > >> Hi all >> i have a question: >> >> why and when do we use odds ratio per standard deviation instead of odds >> ratio? >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/odds-ratio-per-standard-deviation-tp4669315.htm >> l >> Sent from the R help mailing list archive at Nabble.com. >> __ > > Without context this is a shot in the dark, but my guess is this is > referring to something like a logistic regression where the odds ratio > (exponential of the coefficient) refers to the change in odds for the > outcome for a 1 unit change in x. Now often a 1 unit change in x is very > meaningful, but other times it is not that meaningful, e.g. if x is > measuring the size of diamonds in carats and the data does not span an > entire carat, or x is measured in days and our x variable spans years. In > these cases it can make more sense to talk about the change in the odds > relative to a different step size than just a 1 unit change in x, a > reasonable thing to use is the change in odds for a one standard deviation > change in x (much smaller than 1 for the diamonds, much larger for the > days), this may be the odds ratio per standard deviation that you mention. > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com I think there is one comment that needs to be made about this! The odds ratio "per unit change in x" means exactly what it says, and can be converted into the odds ratio per any other amount of change in x very easily. With x originally in (say) days, and with estimated logistic regression logodds = a + b*x, if you change your unit of x to, say weeks, so that x' = x/7, then this is equivalent to changing b to b' = 7*b. Now just take your sliderule; no need for R (oops, now off-topic ... ). Another comment: I do not favour blindly "standardising" a variable relative to its standard deviation in the data. The SD may be what it is for any number of reasons, ranging from x being randomly sampled fron a population to x being assigned specific values in a designed experiment. Since, for exactly the same meanings of the variables in the regression, the standard deviation may change from one set of data to another of exactly the same kind, the "odds per standard deviation" could vary from dataset to dataset of the same investigation, and even vary dramatically. That looks like a situation to avoid, unless it is very carefully discussed! The one argument that might give some sense to "odds ratio per standard deviation" could apply when x has been sampled from a population in which the variation of x can be approximately described by a Normal distribution. Then "odds ratio per standard deviation" refers to a change from, say, the mean/median of the population to the 84th percentile, or from the 31st percentile to the 69th percentile, or from the 69th percentile to the 93rd percentile, etc. But these steps cover somewhat different proportions of the populatipn: 50th to 85th = 35%; 31st to 69th = 38%; 69th to 93rd = 24%. So you are still facing issues of what you mean, or what you want to mean. Simpler to stick to the original "odds per unit of x" and then apply it to whatever multiple of the unit you happen to be interested in as a change (along with the reasons for that interest). Ted. - E-Mail: (Ted Harding) Date: 12-Jun-2013 Time: 17:14:00 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Your message to R-help awaits moderator approval
[See at end] On 01-Jun-2013 17:52:01 Janh Anni wrote: > Hello, > > I don't understand why my mails are being held up. What could be the > problem? > > Thanks > Janh > -- Forwarded message -- > From: > Date: Sat, Jun 1, 2013 at 1:48 PM > Subject: Your message to R-help awaits moderator approval > To: annij...@gmail.com > > > Your mail to 'R-help' with the subject > > Re: [R] wilcox_test function in coin package > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > The message headers matched a filter rule > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > https://stat.ethz.ch/mailman/confirm/r-help/067afe28f7ead30dfea844b8a34449526c > d665d8 This can happen to anyone, depending on the current sensitivity of the mail-server's spam-detection filter to potential "triggers" in the message. It is particularly likely to arise with mails posted from a gmail account, as your was. This is because gmail is a major source of spam emails, and the spam filter is alert to these. I have had a look at your message (which was duly approved), and I see that it is in reply to a message which itself is in a thread that includes several messages sent via gmail. From the headers of your message: In-Reply-To: <51a9291c.70...@ucalgary.ca> References: <51a9291c.70...@ucalgary.ca> so that's a total of 6 references to gmail (including your own message) which is probably why the spam filter felt a bit twitchy! Don't worry about it. As I say, it can happen to anyone (though more often to some than to others). If it is a proper message to R-help, one of the moderators will approve it (though quite possible not immediately). Hoping this helps, Ted (one of the moderators) - E-Mail: (Ted Harding) Date: 01-Jun-2013 Time: 20:11:00 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of "apply" when FUN=sample
On 14-May-2013 09:46:32 Duncan Murdoch wrote: > On 13-05-14 4:52 AM, Luca Nanetti wrote: >> Dear experts, >> >> I wanted to signal a peculiar, unexpected behaviour of 'apply'. >> It is not a bug, it is per spec, but it is so counterintuitive >> that I thought it could be interesting. >> >> I have an array, let's say "test", dim=c(7,5). >> >>> test <- array(1:35, dim=c(7, 5)) >>> test >> >> [,1] [,2] [,3] [,4] [,5] >> [1,]18 15 22 29 >> [2,]29 16 23 30 >> [3,]3 10 17 24 31 >> [4,]4 11 18 25 32 >> [5,]5 12 19 26 33 >> [6,]6 13 20 27 34 >> [7,]7 14 21 28 35 >> >> I want a new array where the content of the rows (columns) are >> permuted, differently per row (per column) >> >> Let's start with the columns, i.e. the second MARGIN of the array: >>> test.m2 <- apply(test, 2, sample) >>> test.m2 >> >> [,1] [,2] [,3] [,4] [,5] >> [1,]1 10 18 23 32 >> [2,]79 16 25 30 >> [3,]6 14 17 22 33 >> [4,]4 11 15 24 34 >> [5,]2 12 21 28 31 >> [6,]58 20 26 29 >> [7,]3 13 19 27 35 >> >> perfect. That was exactly what I wanted: the content of each column is >> shuffled, and differently for each column. >> However, if I use the same with the rows (MARGIIN = 1), the output is >> transposed! >> >>> test.m1 <- apply(test, 1, sample) >>> test.m1 >> >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] >> [1,]12345 13 21 >> [2,] 22 30 17 18 19 20 35 >> [3,] 15 23 24 32 26 27 14 >> [4,] 29 16 31 25 33 34 28 >> [5,]89 10 11 1267 >> >> In other words, I wanted to permute the content of the rows of "test", and >> I expected to see in the output, well, the shuffled rows as rows, not as >> column! >> >> I would respectfully suggest to make this behavior more explicit in the >> documentation. > > It's is already very explicit: "If each call to FUN returns a vector of > length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) > if n > 1." In your first case, sample is applied to columns, and > returns length 7 results, so the shape of the final result is c(7, 5). > In the second case it is applied to rows, and returns length 5 results, > so the shape is c(5, 7). > > Duncan Murdoch And the (quite simple) practical implication of what Duncan points out is: test <- array(1:35, dim=c(7, 5)) test # [,1] [,2] [,3] [,4] [,5] # [1,]18 15 22 29 # [2,]29 16 23 30 # [3,]3 10 17 24 31 # [4,]4 11 18 25 32 # [5,]5 12 19 26 33 # [6,]6 13 20 27 34 # [7,]7 14 21 28 35 # To permute the rows: t(apply(t(test), 2, sample)) # [,1] [,2] [,3] [,4] [,5] # [1,] 22 298 151 # [2,] 30 16 2329 # [3,] 10 31 243 17 # [4,] 114 25 32 18 # [5,] 265 12 33 19 # [6,] 27 34 20 136 # [7,] 35 28 147 21 which looks right! Ted. - E-Mail: (Ted Harding) Date: 14-May-2013 Time: 11:07:46 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Choice of statistical test (in R) of two apparently different distributions
On 09-May-2013 01:42:07 Pascal Oettli wrote: > On 05/09/2013 10:29 AM, Gundala Viswanath wrote: >> I have the following list of data each has 10 samples. >> The values indicate binding strength of a particular molecule. >> >> What I want so show is that 'x' is statistically different from >> 'y', 'z' and 'w'. Which it does if you look at X it has >> more values greater than zero (2.8,1.00,5.4, etc) than others. >> >> I tried t-test, but all of them shows insignificant difference >> with high P-value. >> >> What's the appropriate test for that? >> >> Below is my code: >> >> x <- >> c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,0.065654 >> 548,0.971907344,3.325405405,0.606504718) >> y <- >> c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,0.888600425,0.72 >> 8667099,0.000375047,0.911153571,0.553786408); >> z <- >> c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,0.408554832,0.7 >> 99010791,0.737676439,0.433279599,0.947906524) >> w <- >> c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,0.338574774,0.2 >> 11689885,0.434050179,0.955522972,0.014195184) >> >> t.test(x,y) >> t.test(x,z) >> >> --END-- >> >> G.V. > > Hello, > > 1) Why 'x' should be statistically different from others? > 2) 'y' looks to be bimodal. The mean is not an appropriate measurement > for this kind of distribution. > > Regards, > Pascal Running the commands: plot(x,pch="+",col="red",ylim=c(0,6)) points(y,pch="+",col="green") points(z,pch="+",col="blue") points(w,pch="+",col="black") lines(x,col="red") lines(y,col="green") lines(z,col="blue") lines(w,col="black") indicates that y, z and w are similar to each other (with some suggestion of a serial structure). However, while part of x is also similar to y, z and w, it is clear that 3 values of x are "outliers" (well above the range of all other values, including those of x). [And I think Pascal meant "x" when he wrote "'y' looks to be bimodal."] And it may be of interest that these exceptional values of x occur at x[1], x[5], x[9] (i.e. every 4th observation). Taken together, these facts suggest that an examination of the procedure giving rise to the data may be relevant. As one example of the sort of thing to look for: were the 3 outlying observations obtained by the same worker/laboratory/apparatus as the others (or a similar question for x as opposed to y, z, w, raising issues of reliability). There are many similar questions one could think of raising, but knowledge of the background is essential for appropriate choice! I would agree with Pascal that a "routine" t-test is not appropriate. One thing that can be directly looked at statistically is, taking as given that there are 3 outliers somewhere in all 40 data, what is the probability that all three occur in one of the 4 groups (x,y,z,w) of data? This is 4 times the probability that they occur is a specific group (say x). The chance of all 3 being in x is the number of ways of choosing the remaining 7 out of the remaining 37, divided by the number of ways of choosing any 10 out of 40, i.e. (in R-speak) choose(37,7)/choose(40,10) # [1] 0.01214575 so the chance of all 3 being in some one of the 4 groups is 4*choose(37,7)/choose(40,10) # [1] 0.048583 which, if you are addicted to P-values, is just significant at the 5% (P <= 0.05) level. So this gives some indication that the "x" group of data is not on the same footing as the other ("y", "z", "w") groups. However, such a test does not address any question of why such outliers should be there in the first place; this needs to be addressed differently (see above). And one must not forget that the above "P-value" has been obtained by a method which was prompted by looking at the data in the first place. Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 09-May-2013 Time: 09:35:05 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decomposing a List
Thanks, Jorge, that seems to work beautifully! (Now to try to understand why ... but that's for later). Ted. On 25-Apr-2013 10:21:29 Jorge I Velez wrote: > Dear Dr. Harding, > > Try > > sapply(L, "[", 1) > sapply(L, "[", 2) > > HTH, > Jorge.- > > > > On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding wrote: > >> Greetings! >> For some reason I am not managing to work out how to do this >> (in principle) simple task! >> >> As a result of applying strsplit() to a vector of character strings, >> I have a long list L (N elements), where each element is a vector >> of two character strings, like: >> >> L[1] = c("A1","B1") >> L[2] = c("A2","B2") >> L[3] = c("A3","B3") >> [etc.] >> >> >From L, I wish to obtain (as directly as possible, e.g. avoiding >> a loop) two vectors each of length N where one contains the strings >> that are first in the pair, and the other contains the strings >> which are second, i.e. from L (as above) I would want to extract: >> >> V1 = c("A1","A2","A3",...) >> V2 = c("B1","B2","B3",...) >> >> Suggestions? >> >> With thanks, >> Ted. >> >> - >> E-Mail: (Ted Harding) >> Date: 25-Apr-2013 Time: 11:16:46 >> This message was sent by XFMail >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> - E-Mail: (Ted Harding) Date: 25-Apr-2013 Time: 11:31:57 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decomposing a List
Greetings! For some reason I am not managing to work out how to do this (in principle) simple task! As a result of applying strsplit() to a vector of character strings, I have a long list L (N elements), where each element is a vector of two character strings, like: L[1] = c("A1","B1") L[2] = c("A2","B2") L[3] = c("A3","B3") [etc.] >From L, I wish to obtain (as directly as possible, e.g. avoiding a loop) two vectors each of length N where one contains the strings that are first in the pair, and the other contains the strings which are second, i.e. from L (as above) I would want to extract: V1 = c("A1","A2","A3",...) V2 = c("B1","B2","B3",...) Suggestions? With thanks, Ted. - E-Mail: (Ted Harding) Date: 25-Apr-2013 Time: 11:16:46 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a large number into smaller numbers and find the largest product
On 18-Apr-2013 08:47:18 Janesh Devkota wrote: > Hello, > > I have a big number lets say of around hundred digits. I want to subset > that big number into consecutive number of 5 digits and find the product of > those 5 digits. For example my first 5 digit number would be 73167. I need > to check the product of the individual numbers in 73167 and so on. > > The sample number is as follows: > > > 731671765313306249192251196744265747423553491949349698352031277450632623957831 > 801698480186947885184385861560789112949495459501737958331952853208805511125406 > 98747158523863050715693290963295227443043557 > > I have a problem subsetting the small numbers out of the big number. > > Any help is highly appreciated. > > Best Regards, > Janesh Devkota Since the number is too large to be stored in any numerical format in R, it needs to be stored as a character string: X <- "73167176531330624963295227443043557". Then you can easily access successive 5-digit blocks as, e.g. for block i (i = 1:N, where 5*N is the length of the number): block <- substr(X, 1+5*(i-1), 5*i) You now have a 5-character string from which you can extract the individual digits. And then on to whatever you want to do ... Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 18-Apr-2013 Time: 10:06:43 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
[See in-line below[ On 16-Apr-2013 17:51:41 Julio Sergio wrote: > I thought I've understood the 'order' function, using simple examples like: > >order(c(5,4,-2)) >[1] 3 2 1 > > However, I arrived to the following example: > >order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) >[1] 8 9 10 7 11 6 5 4 3 2 1 > > and I was completely perplexed! > Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? > Do I have a damaged version of R? I think the simplest explanation can be given as: S <- c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045) cbind(Index=1:length(S), S, Order=order(S), Sort=sort(S)) IndexS Order Sort [1,] 1 2465 8 210 [2,] 2 2255 9 210 [3,] 3 208510 505 [4,] 4 1545 7 920 [5,] 5 133511 1045 [6,] 6 1210 6 1210 [7,] 7 920 5 1335 [8,] 8 210 4 1545 [9,] 9 210 3 2085 [10,]10 505 2 2255 [11,]11 1045 1 2465 showing that the value of 'order' for any one of the numbers is the Index (position) of that number in the original series, placed in the position that number occupies in the sorted series. (With a tie for S[8] = S[9] = 210). For example: which one of S occurs in 5th position in the sorted series? It is the 11th of S (1045). > I became still more astonished when I used the sort function and got the > right answer: > > sort(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) > [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 > since 'sort' documentation claims to be using 'order' to establish the right > order. Indeed, once you have order(S), you know which element of S to put in each position of the sorted order: S[order(S)] [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 Does this help to explain it? Ted. > Please help me to understand all this! > > Thanks, > > -Sergio. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 16-Apr-2013 Time: 19:12:21 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] %*%
On 11-Apr-2013 10:25:17 Shane Carey wrote: > What does these operators do: %*% > > Thanks > -- > Shane Enter the command ?"%*%" and you will see: Matrix Multiplication Description: Multiplies two matrices, if they are conformable. If one argument is a vector, it will be promoted to either a row or column matrix to make the two arguments conformable. If both are vectors it will return the inner product (as a matrix). Usage: x %*% y [etc.] Ted. ----- E-Mail: (Ted Harding) Date: 11-Apr-2013 Time: 11:41:22 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rep() fails at times=0.29*100
[See at end] On 09-Apr-2013 16:11:18 Jorge Fernando Saraiva de Menezes wrote: > Dear list, > > I have found an unusual behavior and would like to check if it is a > possible bug, and if updating R would fix it. I am not sure if should post > it in this mail list but I don't where is R bug tracker. The only mention I > found that might relate to this is "If times is a computed quantity it is > prudent to add a small fuzz." in rep() help, but not sure if it is related > to this particular problem > > Here it goes: > >> rep(TRUE,29) > [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > [28] TRUE TRUE >> rep(TRUE,0.29*100) > [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > [28] TRUE >> length(rep(TRUE,29)) > [1] 29 >> length(rep(TRUE,0.29*100)) > [1] 28 > > Just to make sure: >> 0.29*100 > [1] 29 > > This behavior seems to be independent of what is being repeated (rep()'s > first argument) >> length(rep(1,0.29*100)) > [1] 28 > > Also it occurs only with the 0.29. >> length(rep(1,0.291*100)) > [1] 29 >> for(a in seq(0,1,0.01)) {print(sum(rep(TRUE,a*100)))} #also shows correct > values in values from 0 to 1 except for 0.29. > > I have confirmed that this behavior happens in more than one machine > (though I only have session info of this one) > > >> sessionInfo() > R version 2.15.3 (2013-03-01) > Platform: x86_64-w64-mingw32/x64 (64-bit) > [1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 > LC_MONETARY=Portuguese_Brazil.1252 > [4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] spatstat_1.31-1 deldir_0.0-21 mgcv_1.7-22 > > loaded via a namespace (and not attached): > [1] grid_2.15.3 lattice_0.20-13 Matrix_1.0-11 nlme_3.1-108 > tools_2.15.3 The basic issue is, believe or not, that despite apparently: 0.29*100 # [1] 29 in "reality": 0.29*100 == 29 # [1] FALSE In other words, as computed by R, 0.29*100 is not exactly equal to 29: 29 - 0.29*100 # [1] 3.552714e-15 The difference is tiny, but it is sufficient to make 0.29*100 slightly smaller than 29, so rep(TRUE,0.29*100) uses the largest integer compatible with "times = 0.29*100", i.e. 28. Hence the recommendation to "add a little fuzz". On the other hand, when you use rep(1,0.291*100) you will be OK: This is because: 29 - 0.291*100 # [1] -0.1 so 0.291*100 is comfortably greater than 29 (but well clear of 30). The reason for the small inaccuracy (compared with "mathematical truth") is that R performs numerical calculations using binary representations of numbers, and there is no exact binary representation of 0.29, so the result of 0.29*100 will be slightly inaccurate. If you do need to do this sort of thing (e.g. the value of "times" will be the result of a calculation) then one useful precaution could be to round the result: round(0.29*100) # [1] 29 29-round(0.29*100) # [1] 0 length(rep(TRUE,0.29*100)) # [1] 28 length(rep(TRUE,round(0.29*100))) # [1] 29 (The default for round() is 0 decimal places, i.e. it rounds to an integer). So, compared with: 0.29*100 == 29 # [1] FALSE we have: round(0.29*100) == 29 # [1] TRUE Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 09-Apr-2013 Time: 17:56:33 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Console display "buffer size"
On 01-Apr-2013 21:26:07 Robert Baer wrote: > On 4/1/2013 4:08 PM, Peter Ehlers wrote: >> On 2013-04-01 13:37, Ted Harding wrote: >>> Greetings All. >>> This is a somewhat generic query (I'm really asking on behalf >>> of a friend who uses R on Windows, whereas I'm on Linux, but >>> the same phenomenon appears on both). >>> >>> Say one has a largish dataframe -- call it "G" -- which in the >>> case under discussion has 592 rows and 41 columns. The intention >>> is to display the data by simply entering its name (G) as command. >>> >>> Say the display console has been set wide enough to display 15 >>> columns (determined by lengths of column names). Then R will output >>> succesively to the console, in continuous flow: >>>Chunk 1: rows 1:592 of columns 1:15 >>>Chunk 2: rows 1:592 of columns 16:30 >>>Chunk 3: rows 1:592 of columns 31:41 >>> If the number of rows that can be displayed on the screen is, say, 60, >>> then only rows 533:592 of Chunk 3 will be visible in the first instance. >>> >>> However, on my Linux system at any rate, I can use Shift+PgUp to >>> scroll back up through what has been output to the console. It seems >>> that my friend proceeds similarly. >>> >>> But now, after a certain number of (Shift+PgUps), one runs out >>> of road before getting to Row 1 of Chunk 1 (in my friend's case, >>> only Rows 468-592 of Chunk 1 can be seen). >>> >>> The explanation which occurs to me is that the console has a "buffer" >>> in which such an output is stored, and if the dataframe is too big >>> then lines 1:N (for some N) of the output are dropped from the start >>> of the buffer, and it is impossible to go further back than line (N+1) >>> of Chunk 1 where in this case N=467 (of course one may not even be >>> able to go further back than Chunk K, for some K > 1, for a bigger >>> dataframe). >>> >>> The query I have is: In the light of the above, is there a way to >>> change the size of the "buffer" so that one can scroll all the way >>> back to the very first row of Chunk 1? (The size-change may perhaps >>> have to be determined empirically). >> >> Isn't this set by the 'bufbytes' and 'buflines' specifications in the >> Rconsole file? >> Anyway, it's probably best to use 'View' to inspect data. >> > Although I have not tried them, Windows RGUI [ -- Edit | GUI > Preferences --] has settings for both buffer and lines which on my > 64-bit machine default to 250000 and 8000 respectively. > > Rob Baer > >> Peter Ehlers Thanks for the replies, Rob and Peter. Rob's reply in particular corresponds to what my friend has himself just found out in his Windows installation of R. It seems, however, that this setting is once-and-for-all in an R session (I can't check that myself). Re Peter's comment: I don;t seem to have an "Rconsole" file anywhere in my Linux installation. Is it unique to Windows? Once again, thanks. Very helpful. Ted. - E-Mail: (Ted Harding) Date: 01-Apr-2013 Time: 23:06:21 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Console display "buffer size"
Greetings All. This is a somewhat generic query (I'm really asking on behalf of a friend who uses R on Windows, whereas I'm on Linux, but the same phenomenon appears on both). Say one has a largish dataframe -- call it "G" -- which in the case under discussion has 592 rows and 41 columns. The intention is to display the data by simply entering its name (G) as command. Say the display console has been set wide enough to display 15 columns (determined by lengths of column names). Then R will output succesively to the console, in continuous flow: Chunk 1: rows 1:592 of columns 1:15 Chunk 2: rows 1:592 of columns 16:30 Chunk 3: rows 1:592 of columns 31:41 If the number of rows that can be displayed on the screen is, say, 60, then only rows 533:592 of Chunk 3 will be visible in the first instance. However, on my Linux system at any rate, I can use Shift+PgUp to scroll back up through what has been output to the console. It seems that my friend proceeds similarly. But now, after a certain number of (Shift+PgUps), one runs out of road before getting to Row 1 of Chunk 1 (in my friend's case, only Rows 468-592 of Chunk 1 can be seen). The explanation which occurs to me is that the console has a "buffer" in which such an output is stored, and if the dataframe is too big then lines 1:N (for some N) of the output are dropped from the start of the buffer, and it is impossible to go further back than line (N+1) of Chunk 1 where in this case N=467 (of course one may not even be able to go further back than Chunk K, for some K > 1, for a bigger dataframe). The query I have is: In the light of the above, is there a way to change the size of the "buffer" so that one can scroll all the way back to the very first row of Chunk 1? (The size-change may perhaps have to be determined empirically). With thanks, Ted. --------- E-Mail: (Ted Harding) Date: 01-Apr-2013 Time: 21:37:17 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test correct true and false gives same answer
On 27-Mar-2013 21:04:51 David Arnold wrote: > All, > How come both of these are the same. Both say "1-sample proportions > test without continuity correction." I would suspect one would say > "without" and one would say "with." > > >> prop.test(118,236,.5,correct=FALSE,conf.level=0.95) > > 1-sample proportions test without continuity correction > > data: 118 out of 236, null probability 0.5 > X-squared = 0, df = 1, p-value = 1 > alternative hypothesis: true p is not equal to 0.5 > 95 percent confidence interval: > 0.4367215 0.5632785 > sample estimates: > p > 0.5 > >> prop.test(118,236,.5,correct=TRUE,conf.level=0.95) > > 1-sample proportions test without continuity correction > > data: 118 out of 236, null probability 0.5 > X-squared = 0, df = 1, p-value = 1 > alternative hypothesis: true p is not equal to 0.5 > 95 percent confidence interval: > 0.4367215 0.5632785 > sample estimates: > p > 0.5 Note what is said (admittedly somewhat deeply tucked away) under "Details" in ?prop.test: "Continuity correction is used only if it does not exceed the difference between sample and null proportions in absolute value." In your example, the sample proportion exactly matches the null-hypothesis proportion (0.5). Confirmation: [A] Your same example: prop.test(118,236,.5,correct=TRUE,conf.level=0.95) # 1-sample proportions test without continuity correction # data: 118 out of 236, null probability 0.5 # X-squared = 0, df = 1, p-value = 1 # alternative hypothesis: true p is not equal to 0.5 # 95 percent confidence interval: # 0.4367215 0.5632785 # sample estimates: # p # 0.5 [B1] Slightly change x, but keep "correct=TRUE": prop.test(117,236,.5,correct=TRUE,conf.level=0.95) # 1-sample proportions test with continuity correction # data: 117 out of 236, null probability 0.5 # X-squared = 0.0042, df = 1, p-value = 0.9481 # alternative hypothesis: true p is not equal to 0.5 # 95 percent confidence interval: # 0.4304724 0.5611932 # sample estimates: # p # 0.4957627 [B2] Slightly change x, but now "correct=FALSE": prop.test(117,236,.5,correct=FALSE,conf.level=0.95) # 1-sample proportions test without continuity correction # data: 117 out of 236, null probability 0.5 # X-squared = 0.0169, df = 1, p-value = 0.8964 # alternative hypothesis: true p is not equal to 0.5 # 95 percent confidence interval: # 0.4325543 0.5591068 # sample estimates: # p # 0.4957627 So it doesn't do the requested continuity correction in [A] because there is no need to. But in [B1] it makes a difference (compare with [B2]), so it does it. Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 27-Mar-2013 Time: 21:21:39 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] edit.data() read-only?
Thanks! ?View does indeed state "The object is then viewed in a spreadsheet-like data viewer, a read-only version of 'data.entry', which is what I was looking for! Ted. On 26-Mar-2013 10:23:59 Blaser Nello wrote: > Try ?View() > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Ted Harding > Sent: Dienstag, 26. März 2013 11:09 > To: r-help@r-project.org > Subject: [R] edit.data() read-only? > > Greetings All. > > The function edit.data() allows a convenient spreadsheet-like view of a > dataframe with too many rows/columns to fit on the screen (especially when > there are many columns). Very useful when scanning through a dataset (row & > column are conveniently identified by the labels at the side and above). > > However, there seens to be no option to set it "read-only" on start-up, with > the consequence that a clumsy key-press or mouse-click could cause a change > in the data which would then be stored after quitting edit.data(). > > Is there a possibility of a read-only option? Or some other function which > could offer similar viewing capability without the risk of data change? > > With thanks, > Ted. > > - > E-Mail: (Ted Harding) > Date: 26-Mar-2013 Time: 10:08:58 > This message was sent by XFMail > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > - E-Mail: (Ted Harding) Date: 26-Mar-2013 Time: 10:38:34 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] edit.data() read-only?
Sorry, I meant "data.entry()", not "edit.data()" (the latter due to mental cross-wiring with "edit.data.frame()"). I think that Nello Blaser's suggestion of "View" may be what I seek (when I can persuade it to find the font it seeks ... )! With thanks, Barry. Ted. On 26-Mar-2013 10:20:59 Barry Rowlingson wrote: > On Tue, Mar 26, 2013 at 10:09 AM, Ted Harding > wrote: >> Greetings All. >> >> The function edit.data() allows a convenient spreadsheet-like >> view of a dataframe with too many rows/columns to fit on the >> screen (especially when there are many columns). Very useful >> when scanning through a dataset (row & column are conveniently >> identified by the labels at the side and above). > > Do you mean: > > d=data.frame(x=1:10,y=runif(10)) > edit(d) > > ? Because I don't have an edit.data function (maybe its windows only). > >> However, there seens to be no option to set it "read-only" on >> start-up, with the consequence that a clumsy key-press or >> mouse-click could cause a change in the data which would then >> be stored after quitting edit.data(). >> >> Is there a possibility of a read-only option? Or some other >> function which could offer similar viewing capability without >> the risk of data change? > > If you just want to view the data, don't assign it back. The "edit" > function only updates the data if you assign it back, as in: > > d=edit(d) > > so a 'read only' version would be: > > invisible(edit(d)) > > except the user here can change the values in the cells, they just > don't go anywhere. Except into .Last.value if you decide you really > did want to get the values... > > Barry > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) Date: 26-Mar-2013 Time: 10:36:32 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] edit.data() read-only?
Greetings All. The function edit.data() allows a convenient spreadsheet-like view of a dataframe with too many rows/columns to fit on the screen (especially when there are many columns). Very useful when scanning through a dataset (row & column are conveniently identified by the labels at the side and above). However, there seens to be no option to set it "read-only" on start-up, with the consequence that a clumsy key-press or mouse-click could cause a change in the data which would then be stored after quitting edit.data(). Is there a possibility of a read-only option? Or some other function which could offer similar viewing capability without the risk of data change? With thanks, Ted. ----- E-Mail: (Ted Harding) Date: 26-Mar-2013 Time: 10:08:58 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order statistic of multivariate normal
On 22-Mar-2013 13:02:25 li li wrote: > Thank you all for the reply. > > One example of my question is as follows. > > Suppose X1, ..., X10 has multivariate normal distribution > and X(1), ..., X(10) are the corresponding order statistics. > > My question is that whether there is a R function that would > help compute the c which satisfies > P(X(4) Here beta is a known constant between 0 and 1. > > Thank you. >Hanna The basic question which needs to be answered (which has been hinted at in earlier replis) is: How do you define "order statistic" for multivariate observations? For example, here is a sample of 10 (to 3 d.p.) from a bivariate normal distribution: [,1] [,2] [1,] 1.143 -0.396 [2,] -0.359 -0.217 [3,] -0.391 -0.601 [4,] -0.416 -1.093 [5,] -1.810 -1.499 [6,] -0.367 -0.636 [7,] -2.238 0.563 [8,] 0.811 1.230 [9,] 0.082 0.174 [10,] -1.359 -0.364 Which one of these 10 rows is X(4)? There is an alternative interpretation of your question: "Suppose X1, ..., X10 has multivariate normal distribution and X(1), ..., X(10) are the corresponding order statistics." This could mean that the vector (X1,...,X10) has a multivariate normal distribution with 10 dimensions, and, for a single vector (X1,...,X10) drawn from this distribution, (X(1), ..., X(10)) is a vector consisting of these same values (X1,...,X10), but in increasing order. Is that what you mean? Hoping this helps, Ted. ----- E-Mail: (Ted Harding) Date: 22-Mar-2013 Time: 13:31:31 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Sample with constraints
On 03-Mar-2013 16:29:05 Angelo Scozzarella Tiscali wrote: > For example, I want to simulate different populations with same mean and > standard deviation but different distribution. > > Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto: >> Dear R friends, >> >> I'd like to generate random sample (variable size and range) without a >> specified distribution but with given mean and standard deviation. >> >> Could you help me? >> >> thanks in advance >> Angelo As Ben Bolker said, any random sample must come from some distribution, so you cannot generate one without specifying some distribution. Insofar as your question can be interpreted, it will be satisfied if, given the desired mean, M, and SD, S, you take any two available distributions with, respectively, known means M1 and M2 and known SDs S1 and S2. Let X1 denote a sample from t5he first, and X2 a sample from the second. Then (X1 - M1)/(S1/S) is a sample from the first distribution re-scaled to have mean M and SD S, as required. Similarly, (X2 - M2)/(S2/S) is a sample from the second distribution re-scaled to have mean M and SD S, as required. As for what the first distribution that you sample from, and the second, that can be at your own choice -- for eample, the first could be the Standard Normal (M1 = 0, S1 = 1); use rnomr(). The second could be the uniform on (0,1) (M2 = 0.5, S2 = 1/sqrt(12)); use runif(). Similar for other arbitrary choices of first and second distribution (so long as each has at least a second moment, hence excluding, for example, the Cauchy distribution). That's about as far as one can go with your question! Hoping it helps, howevr. Ted. - E-Mail: (Ted Harding) Date: 03-Mar-2013 Time: 17:12:50 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] arithmetic and logical operators
On 30-Jan-2013 20:39:34 Berend Hasselman wrote: > > On 30-01-2013, at 21:32, Dave Mitchell wrote: > >> Why, in R, does (0.1 + 0.05) > 0.15 evaluate to True? What am I missing >> here? How can I ensure this (ostensibly incorrect) behavior doesn't >> introduce bugs into my code? Thanks for your time. >> > > R-FAQ 7.31: > http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-num > bers-are-equal_003f > > Berend And, to put Dave's specific query into the context of that FAQ: (0.1 + 0.05) > 0.15 # [1] TRUE (0.1 + 0.05) - 0.15 # [1] 2.775558e-17 so that tiny 2.775558e-17 is the inexactitude (due to finite binary representation). As an interesting variant: (1.0 + 0.5) > 1.5 # [1] FALSE (1.0 + 0.5) - 1.5 # [1] 0 and that is because 1.0 and 0.5, and also 1.5, have exact finite binary representations, e.g.: 1.0 == 1. 0.5 == 0.1000 1.5 == 1.1000 whereas 0.1, 0.5 and 0.15 are these numbers divided by 10 = 2*5; and while you can exactly do the "/2" part (just shift right by one binary place), you can't exactly divide by 5 in finite binary arithmetic (any more than you can exactly divide by 3 in decimal), because 5 is not a factor of the base (2) of the binary representation (whereas, in decimal, both 2 and 5 are factors of 10; but 3 isn't). Whereas R has the function all.equal() to give the "right" answer for most things like (0.1 + 0.05) == 0.15 # [1] FALSE all.equal((0.1 + 0.05), 0.15) # [1] TRUE (because it tests for equality to within a very small tolerance, by default the square root of the binary precision of a double precision binary real number), R does not have a straightforward method for testing the truth of "(0.1 + 0.05) > 0.15" (and that is because the question is not clearly discernible from the expression, when imprecision underlies it). You could emulate all.equal() on the lines of (0.1 + 0.05) > (0.15 + .Machine$double.eps^0.5) # [1] FALSE (0.1 + 0.05) < (0.15 - .Machine$double.eps^0.5) # [1] FALSE (or similar). Observe that .Machine$double.eps^0.5 # [1] 1.490116e-08 .Machine$double.eps # [1] 2.220446e-16 (0.1 + 0.05) - 0.15 # [1] 2.775558e-17 Hoping this helps, Ted. - E-Mail: (Ted Harding) Date: 30-Jan-2013 Time: 23:22:53 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.