[R] notation question
Dear list, I am currently writing up some of my R models in a more formal sense for a paper, and I am having trouble with the notation. Although this isn't really an 'R' question, it should help me to understand a bit better what I am actually doing when fitting my models! Using the analysis of co-variance example from MASS (fourth edition, p 142), what is the correct notation for the formula Gas, ~ Insul/Temp - 1? Obviously, if we fit it as two separate models (as in the example above it), we would have something like y_i = \beta x_i for each of the two models. So my question is, when we have a single model with a k-level factor interaction term as in the equation above, what is the correct/standard statistical (LaTeX style) notation? Cheers, Carson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] notation question
Thank you Rolf, Using the analysis of co-variance example from MASS (fourth edition, p 142), what is the correct notation for the formula Gas, ~ Insul/Temp There shouldn't be a comma after ``Gas'' in that formula. - 1? Obviously, if we fit it as two separate models (as in the example above it), we would have something like y_i = \beta x_i for each of the two models. No. y_i = alpha + beta x_i . Clearly you need an intercept. Do you really expect gas consumption to be nil when the average external temperature is 0 degrees C ? Right, of course... both silly mistakes, my apologies! So my question is, when we have a single model with a k-level factor interaction term as in the equation above, what is the correct/standard statistical (LaTeX style) notation? The model is simply y_ij = alpha_i + beta_i x_ij where i = 1 (before) or 2 (after). I.e. you are allowing a different slope and intercept for each of the scenarios (before and after). But this is the ``deterministic'' part of the model. You should really include the random part: y_ij = alpha_i + beta_i x_ij + E_ij where the E_ij are independent random variables with mean 0 and common variance sigma^2. (Often the E_ij are assumed to be Gaussian, mainly because if all you have is a hammer, then everything looks like a nail). Perfect! Thanks for the clarification. I think I was previously trying to be a bit more clever than necessary (and as a result not being very clever at all :-p) Carson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Deviance of zeroinfl/hurdle models
Dear list, I'm wondering if anyone can help me calculate the deviance of either a zeroinfl or hurdle model from package pscl? Even if someone could point me to the correct formula for calculating the deviance, I could do the rest on my own. I am trying to calculate a pseudo-R-squared measure based on the R^{2}_{DEV} of [1], so I need to be able to calculate the deviance of the full and null models. Does anyone have any suggestions? Alternatively, does anyone have a suggestion for a better measure to report (I'm aware that R^2 measures aren't really appropriate here), preferably something that is easy enough to program or compute using existing packages... Thanks in advance, Carson [1] Cameron, A.C., Windmeijer, F.A.G., 1996. R^2 measures for count data regression models with applications to health-care utilization. J. Bus. Econom. Statist. 14, 209–220 -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'constrained' negative.binomial model estimates
Hello list, I am not sure if the terminology that I am using here is widely used, however, I provide an example in the hopes that my problem will become clear. My basic problem is that I am unsure of how to 'constrain' my model estimates to reproduce the aggregate (by factor levels) observed dependent variable for a negative.binomial model. I realize this sounds rather vague, so I provide an example to illustrate: When working with migration data, it is possible to model migration flows via a simple Poisson GLM like the following: model1 - glm(flows ~ destination.pop+distance+origin-0, data=migration, family=poisson()) model2 - glm(flows ~ destination.pop+distance+origin-0, data=migration, family=quasipoisson()) where origin is a factor of origins (1 - n). This has the effect of 'constraining' the predicted flows to reproduce the observed outgoing flows from each origin: sums - aggregate(migration$flows, by=list(migration$origin), sum) sums1 - aggregate(predict(model1, type='response'), by=list(migration$origin), sum) sums2 - aggregate(predict(model2, type='response'), by=list(migration$origin), sum) which works fine for both poisson and quasipoisson models. However, I would also like to fit a negative.binomial model to this data to (again) account for over-dispersion (which may still exist even after the constraint is imposed), however, the effect of the 'constraint' factor does not seem to work, due likely to the additional theta parameter. model3 - glm.nb(flows ~ destination.pop+distance+origin-0, data=migration) sums3 - aggregate(predict(model3,type='response'), by=list(migration$origin), sum) data.frame(origin=unique(migration$origin), observed=sums$x, poisson=sums1$x,quasi=sums2$x,negbin=sums3$x) origin observed poisson quasinegbin 1 1 676308 676308 676308 651113.7 2 2 1155811 1155811 1155811 1141729.4 3 3 1789112 1789112 1789112 1845908.1 4 4 942162 942162 942162 978599.6 5 5 2484387 2484387 2484387 2486435.1 6 6 819222 819222 819222 747022.1 7 7 1237079 1237079 1237079 1405065.0 8 8 1067069 1067069 1067069 963339.5 9 9 2143172 2143172 2143172 2139171.5 Can anyone tell me how I might go about doing this for the negative.binomial model? Or perhaps better still, an alternative way of enforcing this 'constraint'? The following dataset is what I used for the above example (this data comes from Fotheringham and O’Kelly, 1989. Spatial interaction models: Formulations and applications. Dordrecht: Kluwer Academic Publishers): dput(migration) structure(list(flows = c(283049L, 87267L, 29877L, 130830L, 21434L, 30287L, 21450L, 72114L, 180048L, 237229L, 60681L, 382565L, 53772L, 64645L, 43749L, 133122L, 79223L, 300345L, 286580L, 346407L, 287340L, 161645L, 97808L, 229764L, 26887L, 67280L, 281791L, 92308L, 49828L, 144980L, 113683L, 165405L, 198144L, 718673L, 551483L, 143860L, 316650L, 199466L, 89806L, 266305L, 17995L, 55094L, 230788L, 49892L, 252189L, 121366L, 25574L, 66324L, 35563L, 93434L, 178517L, 185618L, 192223L, 141679L, 158006L, 252039L, 30528L, 87987L, 172711L, 181868L, 89389L, 27409L, 134229L, 342948L, 110792L, 268458L, 394481L, 274629L, 279739L, 87938L, 289880L, 437255L), distance = c(219L, 1009L, 1514L, 974L, 1268L, 1795L, 2420L, 3174L, 219L, 831L, 1336L, 755L, 1049L, 1576L, 2242L, 2996L, 1009L, 831L, 505L, 1019L, 662L, 933L, 1451L, 2205L, 1514L, 1336L, 505L, 1370L, 888L, 654L, 946L, 1700L, 974L, 755L, 1019L, 1370L, 482L, 1144L, 2278L, 2862L, 1268L, 1049L, 662L, 888L, 484L, 662L, 1795L, 2380L, 1795L, 1576L, 933L, 654L, 1144L, 662L, 1278L, 1779L, 2420L, 2242L, 1451L, 946L, 2278L, 1795L, 1278L, 754L, 3174L, 2996L, 2205L, 1700L, 2862L, 2380L, 1779L, 754L), origin.pop = c(16.2876696349298, 16.2876696349298, 16.2876696349298, 16.2876696349298, 16.2876696349298, 16.2876696349298, 16.2876696349298, 16.2876696349298, 17.4279408399148, 17.4279408399148, 17.4279408399148, 17.4279408399148, 17.4279408399148, 17.4279408399148, 17.4279408399148, 17.4279408399148, 17.5110179983684, 17.5110179983684, 17.5110179983684, 17.5110179983684, 17.5110179983684, 17.5110179983684, 17.5110179983684, 17.5110179983684, 16.6083307371083, 16.6083307371083, 16.6083307371083, 16.6083307371083, 16.6083307371083, 16.6083307371083, 16.6083307371083, 16.6083307371083, 17.2140377110705, 17.2140377110705, 17.2140377110705, 17.2140377110705, 17.2140377110705, 17.2140377110705, 17.2140377110705, 17.2140377110705, 16.3878173980331, 16.3878173980331, 16.3878173980331, 16.3878173980331, 16.3878173980331, 16.3878173980331, 16.3878173980331, 16.3878173980331, 16.761264461712, 16.761264461712, 16.761264461712, 16.761264461712, 16.761264461712, 16.761264461712, 16.761264461712, 16.761264461712, 15.9304398925737, 15.9304398925737, 15.9304398925737, 15.9304398925737, 15.9304398925737, 15.9304398925737, 15.9304398925737, 15.9304398925737, 17.0532473904734, 17.0532473904734, 17.0532473904734, 17.0532473904734, 17.0532473904734, 17.0532473904734,
[R] rank of Matrix
Dear list, Can anyone tell me how to obtain the rank of a sparse Matrix, for example from package Matrix (class dgCMatrix)? Here is an example of QR decomposition of a sparse matrix (from the sparseQR class help). library(Matrix) data(KNex) mm - KNex$mm str(mmQR - qr(mm)) Similarly, using the functions/classes from the relatively new MatrixModels package: library(MatrixModels) str(trial - data.frame(counts=c(18,17,15,20,10,20,25,13,12), outcome=gl(3,1,9,labels=LETTERS[1:3]), treatment=gl(3,3,labels=letters[1:3]))) glmS - glm4(counts ~ 0+outcome + treatment, poisson, trial, verbose = TRUE, sparse = TRUE) str(glmS) str(X - glmS@pred@X) str(QR - qr(X)) I'm not very familiar with matrix decomposition, but is the 'p' slot from a sparseQR object the same as $pivot from base qr? Additionally, how do I obtain the rank of the input matrix? qr from the base package produces an object with several components, two of which are rank, and pivot, is there an equivalent way of getting at these values/vectors for sparse matrices? Basically, I am trying to get at the variance-covariance matrix in order to compute t-values for the coefficients of a (sparse) GLM (i.e. a summary function for class glpModel). Am I going about this the right way? Are they perhaps more efficient ways of doing this when working with sparse matrices. Is there a preferred way of calculating t-statistics? Note that 'rankMatrix' from package Matrix appears to densify the matrix before use, so this is not a viable option. I have asked a similar question before [1], though I am now trying to 'answer' it myself by digging in and actually coding a bit more myself, however, as you can see I'm still a bit stuck :-( Thanks for any suggestions, Carson [1] http://www.mail-archive.com/r-help@r-project.org/msg130145.html -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rank of Matrix
Hmm, looks like I'm 'answering' my own question here... library(Matrix) data(KNex) mm - KNex$mm str(mmQR - qr(mm)) # new stuff: R - mmQR@R Rdiag - diag(R) rank - sum(Rdiag max(dim(mm))*.Machine$double.eps*abs(R[1,1])) # this is the matlab default I think? # 712 for comparison, rankMatrix from package Matrix gives the same answer, but takes considerably more time and memory! rankMatrix(mm) # 712 Does the above make sense to others? Alternatively, is it possible to derive rank from the model itself (I can't see how)? library(MatrixModels) trial - data.frame(counts=c(18,17,15,20,10,20,25,13,12), outcome=gl(3,1,9,labels=LETTERS[1:3]), treatment=gl(3,3,labels=letters[1:3])) glmS - glm4(counts ~ 0+outcome + treatment, family=poisson, data=trial, verbose = TRUE, sparse = TRUE) str(glmS) -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] model diagnostics for MatrixModels
Dear list, I have been working with the MatrixModels package quite a bit this week, and it is proving to be extremely valuable for my current work (I am working with several factors with many levels, leading to a sparse model matrix). However, as my knowledge of statistical theory leaves much to be desired, there are certain aspects of model evaluation etc that I am having trouble with. Has anyone developed any examples of model diagnostics using the glpModel class (from the MatrixModels package)? Something akin to 'summary', or perhaps helper functions such as 'logLik', 'AIC', 'predict' or even a means of running some of the tests from package 'lmtest'? The structure of the 'glpModel' class is pretty straight-forward, so I am able to extract *some* of the relevant information to perform *some* of these tests myself, however, an example of performing some 'standard' model diagnostics or tests/comparisons would be very helpful (In this case I am really only interested in tests etc relevant to GLMs). I'm not against writing some of these functions myself, so any tips/suggestions/resources/examples are appreciated. Furthermore, does anyone know if there facilities to obtain, for example, the rank of the (sparse) model matrix efficiently? Quite a few tests require rank, but computing this separately using for example rankMatrix from package Matrix uses up tonnes of memory, which is exactly what I was trying to avoid by using sparse matrices and glm4 :-p Thanks for any pointers, Carson sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_IE.utf8 LC_NUMERIC=C [3] LC_TIME=en_IE.utf8LC_COLLATE=en_IE.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_IE.utf8 [7] LC_PAPER=en_IE.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_IE.utf8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] MatrixModels_0.2-1 pscl_1.03.6gam_1.03 akima_0.5-4 [5] coda_0.13-5mvtnorm_0.9-92 lmtest_0.9-27 zoo_1.6-4 [9] Matrix_0.999375-48 lattice_0.19-17MASS_7.3-7 loaded via a namespace (and not attached): [1] grid_2.12.2 tools_2.12.2 -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mixture models/latent class regression comparison
Dear list, I have been comparing the outputs of two packages for latent class regression, namely 'flexmix', and 'mmlcr'. What I have noticed is that the flexmix package appears to come up with a much better fit than the mmlcr package (based on logLik, AIC, BIC, and visual inspection). Has anyone else observed such behaviour? Has anyone else been successful in using the mmlcr package? I ask because I am interested in latent class negative binomial regression, which the mmlcr package appears to support, however, the results for basic Poisson latent class regression appear to be inferior to the results from flexmix. Below is a simple reproducible example to illustrate the comparison: library(flexmix) library(mmlcr) data(NPreg) # from package flexmix m1 - flexmix(yp ~ x, k=2, data=NPreg, model=FLXMRglm(family='poisson')) NPreg$id - 1:200 # mmlcr requires an id column m2 - mmlcr(outer=~1|id, components=list(list(formula=yp~x, class=poisonce)), data=NPreg, n.groups=2) # summary and coefficients for flexmix model summary(m1) summary(refit(m1)) # summary and coefficients for mmlcr model summary(m2) m2 Regards, Carson P.S. I have attached a copy of the mmlcr package with a modified mmlcr.poisonce function due to errors in the version available here: http://cran.r-project.org/src/contrib/Archive/mmlcr/. See also http://jeldi.com/Members/jthacher/tips-and-tricks/programs/r/mmlcr section Bugs? subsection Poisson. -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixture models/latent class regression comparison
Thanks for the reply Christian, I have never used mmlcr for this, but quite generally when fitting such models, the likelihood has often very many local optima. This means that the result of the EM (or a similar) algorithm depends on the initialisation, which in flexmix (and perhaps also in mmlcr) is done in a random fashion. This means that results may differ even if the same method is applied twice, and unfortunately, depending on the dataset, the result may be quite unstable. This may explain that the two functions give you strongly different results, not of course implying that one of them is generally better. I though this might be the issue here. So is the solution to simply run it many times and look for the best likelihood? I am still confused as to why the mmlcr function consistently produces 'poorer' results? Perhaps the flexmix package is being a bit more 'clever' in terms of avoiding local optima? I will have to dig a bit deeper. Cheers, Carson Dear list, I have been comparing the outputs of two packages for latent class regression, namely 'flexmix', and 'mmlcr'. What I have noticed is that the flexmix package appears to come up with a much better fit than the mmlcr package (based on logLik, AIC, BIC, and visual inspection). Has anyone else observed such behaviour? Has anyone else been successful in using the mmlcr package? I ask because I am interested in latent class negative binomial regression, which the mmlcr package appears to support, however, the results for basic Poisson latent class regression appear to be inferior to the results from flexmix. Below is a simple reproducible example to illustrate the comparison: library(flexmix) library(mmlcr) data(NPreg) # from package flexmix m1 - flexmix(yp ~ x, k=2, data=NPreg, model=FLXMRglm(family='poisson')) NPreg$id - 1:200 # mmlcr requires an id column m2 - mmlcr(outer=~1|id, components=list(list(formula=yp~x, class=poisonce)), data=NPreg, n.groups=2) # summary and coefficients for flexmix model summary(m1) summary(refit(m1)) # summary and coefficients for mmlcr model summary(m2) m2 Regards, Carson P.S. I have attached a copy of the mmlcr package with a modified mmlcr.poisonce function due to errors in the version available here: http://cran.r-project.org/src/contrib/Archive/mmlcr/. See also http://jeldi.com/Members/jthacher/tips-and-tricks/programs/r/mmlcr section Bugs? subsection Poisson. -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] negative binomial latent class regression in package flexmix
Hello list, Has anyone had any luck creating an M-step driver for negative binomial regression for use with package flexmix? I've had a look here: http://cran.r-project.org/web/packages/flexmix/vignettes/flexmix-intro.pdf as well as poking around in the flexmix source, but I haven't had much luck getting anything to work. I can't figure out how to a) come up with an initial estimate of theta, as I don't think this really belongs in the M-step, and b) how to properly implement the M-step so that refit() doesn't give me: Error in .local(object, ...) : not implemented yet for restricted parameters. (which I think is due to the additional theta parameter)? I realize this is a bit of a vague question, but my hope is that there is someone out there who has experience working with NB in a latent class setting that can help me out here. Frankly I'm pretty stumped on this one, and I don't think posting any code in this case will enlighten any one! Thanks in advance for any tips or pointers. P.S. I'm not married to using flexmix, though I do like the implementation and extensibility so far. -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spatial interaction (gravity) model as Poisson regression
Dear list, I posted essentially this same question to the r-sig-geo mailing list last week with no response :( Unfortunately I am no closer to reaching a solution, so I now post it here (with some clarifications) in the hope that someone following this list might have an answer for me: Has anyone had much experience with spatial interaction (or gravity) models, specifically in the form of Poisson regression? I'm a bit unsure of how to operationalize this using glm(), and would appreciate any pointers from those with more experience. I have searched the archives extensively, and there are several mentions of this question, but I have yet to find anything concrete that I can wrap my head around... apologies if I have missed something. Basically, the conventional origin constrained model would look something like this: T_{ij} = exp(\delta_{i} + \log{A_{j}} - \beta D_{ij}) ~ \varepsilon_{ij} where \delta_{i} is a constant parameter specific to the ith zone, A_{j} is the attractiveness of the jth location, and D_{ij} is the distance between i and j. Note that \varepsilon_{ij} is just the multiplicative error term of the flow from i to j, and \beta is the distance decay parameter. Similarly, the doubly constrained model follows the form: T_{ij} = exp(\delta_{i} + \gamma_{j} - \beta D_{ij}) ~ \varepsilon_{ij} where everything is defined as above, except exp(\gamma_{j}) is an estimate of the attractiveness of location A_{j}. Hopefully the above description makes things a bit clearer, essentially my question is this: What factors or in what form do I have to have my data in order to be able to run such a model following the glm syntax? I know this should be relatively straight-forward, I just can't seem to get my head wrapped around it at all? If it helps, I can provide some sample data to those who request it. TIA, Carson -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting radial dendrograms
Dear list, I am trying to plot a radial dendrogram using the ape package, which requires my data to be of class 'phylo'. Currently I have my dendrogram stored as an object of class 'dendrogram' which was produced from an outside bit of C code, but was made into an object of class 'igraph.eigenc' and converted to a dendrogram using 'as.dendrogram()' from the igraph package. I would like to find a way to convert this dendrogram to a 'phylo' object, either directly, or through conversion via 'hclust'. I have tried creating an 'hclust' object 'manually' but keep running into Error: segfault from C stack overflow when I try to plot it: result - list() result$merge-merges result$method - NULL result$dist.method - modularity result$height - seq(length(merges[,1]),1,-1) result$order - seq(length(merges[,1])+1) result$call - some call class(result) - hclust plot(result) where merges is a two column matrix of cluster joins that looks something like this: merges [,1] [,2] [1,]02 [2,]31 The dendrogram object was created using something like this: result - list() class(result) - igraph.eigenc result$merges - merges result$membership - membership dend - as.dendrogram(result) where merges is as above, and membership is: membership [1] 0 0 0 0 0 2 2 2 2 2 1 1 1 1 1 Any thoughts on how to convert a dendrogram object to an hclust or phylo object, or barring that, another way to plot a radial dendrogam in R? Regards, Carson -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] set values in data.frame to NA conditional on another data.frame
Hello List, Is there a faster way to set values in one data.frame equal to NA conditional on the corresponding value in another data.frame? Currently I am using: b[is.na(a)] - NA where 'a' and 'b' are data.frames of equal size/dimensions, and 'a' contains NAs but 'b' does not. This is extremely slow as is, as my data.frames are about 6000 columns by 6000 rows in size. Are there any tricks to speed things up here? Thank you in advance, Carson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replace values in data.frame conditional on another data.frame
Dear List, I am looking for an efficient method for replacing values in a data.frame conditional on the values of a separate data.frame. Here is my scenario: I have a data.frame (A) with say 1000 columns, and 365 rows. Each cell in the data.frame has either valid value, or NA. I have an additional data.frame (B) with the same number of rows and columns, with valid values in all cells. What I would like to do, is replace the cells in B with NA if and only if the corresponding cell in A is NA. I have search extensively for a method to do this, and I'm sure that in the end the solution will be embarrassingly simple, however, any help is greatly appreciated! Cheers, Carson Apologies if this has been posted twice, we are currently experiencing server problems... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fourier Analysis and Curve Fitting in R
Rolf Turner wrote: On 26/01/2008, at 10:54 AM, Carson Farmer wrote: Dear List, I am attempting to perform a harmonic analysis on a time series of snow depth, in which the annual curve is essentially asymmetric (i.e. snow accumulates slowly over time, and the subsequent melt occurs relatively rapidly). I am trying to fit a curve to the data, however, the actual frequency is unknown. In general the actual frequency of the curve will indeed be close to 1/(1 year). However, because I intend to perform this analysis on many regions, this will not always be the case. This is perhaps an acceptable assumption however... Obviously there is something I am not understanding here. I would have thought that the ``actual frequency'' would be 1/(1 year) (period = 1 year) --- modulo the fact that the length of the year is constantly changing a tiny bit. (But I would've thought that this would have no practical impact in respect of any observed series.) My sampling interval is daily. What is your sampling interval, BTW? Day? Week? Month? I have been trying to follow the methods in Peter Bloomfields text Fourier Analysis of Time Series, but am having trouble implementing this in R. Yes it certainly would. Note that even though the ``actual frequency'' is (???) 1/(1 year), the representation of the mean function in terms of sinusoids will involve in theory infinitely many terms/frequencies since the mean function is clearly (!) not a sinusoid. Does anyone have any suggestions, or perhaps directions on how this might be done properly? Am I using the right methods for fitting an asymmetric curve? What I am really trying to do is fit a relatively smooth line to my data which will preferentially weight the larger values. This method needs to be able to fit through data gaps however, which is why I was originally looking to fit sinusoids. A jpg of a single year of the data is available here: http://www.geog.uvic.ca/spar/carson/snowDepth.jpg to give you an idea of the shape of my curve. Thank you again for your help, Carson I would have to know more about what you are *really* trying to do, and what the data are like, before I could make any useful suggestions. Many modelling issues could come into play, and many modelling strategies are potentially applicable. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fourier Analysis and Curve Fitting in R
Dear List, I am attempting to perform a harmonic analysis on a time series of snow depth, in which the annual curve is essentially asymmetric (i.e. snow accumulates slowly over time, and the subsequent melt occurs relatively rapidly). I am trying to fit a curve to the data, however, the actual frequency is unknown. I have been trying to follow the methods in Peter Bloomfields text Fourier Analysis of Time Series, but am having trouble implementing this in R. Does anyone have any suggestions, or perhaps directions on how this might be done properly? Am I using the right methods for fitting an asymmetric curve? Any help is greatly appreciated, Sincerely, Carson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cycle Regression Analysis in R?
Hello R community, Does anyone know of a package that will perform cycle regression analysis? I have searched the R-help archives etc. but have come up with nothing so far. If I am unable to find an existing R package to do so, is there anyone familiar with fitting sine functions to data. My problem is this: I have a long time-series of daily SWE estimates (SWE = snow water equivalence, or the amount of water stored in a snowpack) which follows a sinusoidal pattern, and I need to estimate the parameters of the sine function that best fits this data. While there may be many contributing sine functions and/or linear trends, I am only interested in a single sine function that most closely fits the data (trends can be removed separately if need be). Perhaps some sort of non-linear least squares method would be best? Any help, or suggestions to get me on the right track are greatly appreciated. Carson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.