Re: [R] MGCV: Use of irls.reg option
Hi Simon, Thanks for taking the time to reply. Please let me explain a few more details. The problem that I am working on is essentially the same as the Bristol Channel Sole Egg distribution example in your book and in the soap paper but instead it is Herring Larvae in the English Channel - same same, but different. The model structure is: mdl - gam(nlarv ~ s(lon,lat) + s(day of year) + factor(year),data=dat, family=poisson(log=link)) ie a multiplicative structure with a poisson observation model. Now, the problem is that at the edges of the distribution in space (lon, lat) (and to a lesser extent time) the observations are rich in zeros, as we move away from the main spawning grounds. The model appears to be converging ok, but the residuals look horrible. In particular, there are some extremely large residuals around the edges (pearson residuals of 1000 or so), where we get a few larvae in a region where they are otherwise unlikely. When I look at the TPRS (on the linear predictor scale) it appears to be heading towards minus infinity - essentially we end up in a situation where we observe a single larvae, but the expected mean number is 1e-10, which creates these very large residuals. This was where I happened across the irls.reg argument - the description in the help file (i.e. lack of identifiability) sounds very much like the problem that I am having, which is what inspired the question. I've also tried using the soap smoother in place of the TPRS - the problem is not as severe, and I can limit it by making the boundaries of the soap film extremely tight around the non-zero data but the same underlying problem is still lurking in the corners... Do you have any suggestions as to how I can get around this edge-effects problem? Mark Hi Mark, irls.reg is kind of `legacy code'. Does model fitting actually fail for your example, or is it just that the estimated spatial smooth looks unpleasant? best, Simon On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com wrote: Hi, In the help files in the mgcv package for the gam.control() function, there is an option irls.reg. The help files describe this option as: For most models this should be 0. The iteratively re-weighted least squares method by which GAMs are fitted can fail to converge in some circumstances. For example, data with many zeroes can cause problems in a model with a log link, because a mean of zero corresponds to an infinite range of linear predictor values. Such convergence problems are caused by a fundamental lack of identifiability, but do not show up as lack of identifiability in the penalized linear model problems that have to be solved at each stage of iteration. In such circumstances it is possible to apply a ridge regression penalty to the model to impose identifiability, and irls.reg is the size of the penalty. I am trying to fit a poisson GLM model with a log-link function and am having problems similar to those described - in particular, the model has a spatial s(lon,lat) term and there are lot of zeros around the edges of my domain which are making the TPRS do strange thing. It sounds like irls.reg might be the answer to my problems. The question I have is how to use it? What is an appropriate value? I can't seem to find any more information than that provided, and I don't know if I really understand what it is doing. Are there any examples or references on this that I have overlooked during my googling that could help? Best wishes, Mark Payne DTU Aqua, Copenhagen, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scatter.smooth() line colour
Hi, I really like the scatter.smooth() function - its extremeley useful. However, as far as I understand it, there is no way to change the properties of the smoothing line e.g. col, lty, lwd. The scatter.smooth function accepts a ... argument, but these are not passed to the plotting function that actually plots the smoother - only to the function that plots the points. Could I please therefore request that an argument be added to this function to give easier control over line properties? e.g. line.par=list() Best wishes, Mark Payne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MGCV: Use of irls.reg option
Hi, In the help files in the mgcv package for the gam.control() function, there is an option irls.reg. The help files describe this option as: For most models this should be 0. The iteratively re-weighted least squares method by which GAMs are fitted can fail to converge in some circumstances. For example, data with many zeroes can cause problems in a model with a log link, because a mean of zero corresponds to an infinite range of linear predictor values. Such convergence problems are caused by a fundamental lack of identifiability, but do not show up as lack of identifiability in the penalized linear model problems that have to be solved at each stage of iteration. In such circumstances it is possible to apply a ridge regression penalty to the model to impose identifiability, and irls.reg is the size of the penalty. I am trying to fit a poisson GLM model with a log-link function and am having problems similar to those described - in particular, the model has a spatial s(lon,lat) term and there are lot of zeros around the edges of my domain which are making the TPRS do strange thing. It sounds like irls.reg might be the answer to my problems. The question I have is how to use it? What is an appropriate value? I can't seem to find any more information than that provided, and I don't know if I really understand what it is doing. Are there any examples or references on this that I have overlooked during my googling that could help? Best wishes, Mark Payne DTU Aqua, Copenhagen, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manually modifying an hclust dendrogram to remove singletons
Hi all, Thanks for the replies - they have helped shaped my thinking and are starting to push me in a better direction. Maybe I should explain a little more about what I'm trying to achieve. I am analysing satellite data across the global ocean, and am interested in trying to classify areas of the ocean according to the similarity between the pixels. Singletons in this case therefore represent individual pixels that are different to the rest in terms of the similarity metric, but aren't really all that interesting in terms of the broad picture - I consider them outliers or noise. However, they are annoying when it comes to splitting up the dendrogram, because I'm mainly interested in the reclassification of large areas of ocean at each step, rather than changes in the similarity. The dynamic tree-cut approach looks like a promising and sensible solution to the problem - I'll see if I can get something out of it. However, this discussion has started me wondering how I can use the spatial proximity of the pixels in the analysis - does anyone have any insights? Can the WGCNA approach be used in such a context? Best wishes, Mark Payne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Manually modifying an hclust dendrogram to remove singletons
Dear R-Help, I have a clustering problem with hclust that I hope someone can help me with. Consider the classic hclust example: hc - hclust(dist(USArrests), ave) plot(hc) I would like to cut the tree up in such a way so as to avoid small clusters, so that we get a minimum number of items in each cluster, and therefore avoid singletons. e.g. in this example, you can see that Hawaii is split off onto its own at quite a high level. I would like to avoid having a single item clustered on its own like this. How can I achieve this? I have tried manually modifying the tree using dendrapply but have not been able to produce a valid solution thus far.. Suggestions are welcome. Best wishes, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Package install error: _intel_fast_memcpy
Hi, I am trying to install the ncdf package on my Ubuntu 8.04 (Karmic) box in the basement. I have installed the most recent distribution of R, 2.12.0 without too many problems. However, when it comes to install the ncdf netcdf package, I unfortunately get the following error: Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object '/home/mpa/R/i486-pc-linux-gnu-library/2.12/ncdf/libs/ncdf.so': /home/mpa/R/i486-pc-linux-gnu-library/2.12/ncdf/libs/ncdf.so: undefined symbol: _intel_fast_memcpy I also get a similar error when trying to install the ncdf4 package. I have tried to find the meaning of the error, but I can't find anything - I'm assuming that its a dependency problem. Can anyone help me out here please? Best wishes, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice and high-resolution tiffs
Hi, I am trying to produce high-resolution (600dpi+) TIFF figures for use in a publication. Everything seems to work well when using the normal R-graphics and the relative text size obtained is independent of the output resolution. However, when I try and make lattice plots, the relative size of the text labels changes when I adjust the resolution - they become extremely small at high resolution. Some sample code is attached below - as you will see, the text in the low resolution plot is as expected, whilst that in the high-resolution is there, but very very small. I'm running R 2.9.1 on Windows - R.Version() info is contained at the very bottom. Any ideas? Cheers, Mark Sample code --- Depth - equal.count(quakes$depth, number=8, overlap=.1) p- xyplot(lat ~ long | Depth, data = quakes) tiff(Low res plot.tiff,width=155/25.4,height=155/25.4,units=in,res=72,pointsize=12,compression=none) print(p) dev.off() tiff(High res plot.tiff,width=155/25.4,height=155/25.4,units=in,res=600,pointsize=12,compression=none) print(p) dev.off() R.version() info R.Version() $platform [1] i386-pc-mingw32 $arch [1] i386 $os [1] mingw32 $system [1] i386, mingw32 $status [1] $major [1] 2 $minor [1] 9.1 $year [1] 2009 $month [1] 06 $day [1] 26 $`svn rev` [1] 48839 $language [1] R $version.string [1] R version 2.9.1 (2009-06-26) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Titles on lattice colorkey
Dear R-ers, I'm not sure if this is a missing feature, a support request, or stupidity on my part, but nevertheless, its a question. Is it possible to add titles to colorkey legends? As far as I can tell, there is a command to do it for normal key legends, but not for colorkeys. eg it works for a normal key, created through auto.key xyplot(decrease ~ treatment, OrchardSprays, groups = rowpos, type = a, auto.key = list(space = right, points = FALSE, lines = TRUE,title=Key title)) but there is no comparable command for a colorkey x - seq(pi/4, 5 * pi, length = 100) y - seq(pi/4, 5 * pi, length = 100) r - as.vector(sqrt(outer(x^2, y^2, +))) grid - expand.grid(x=x, y=y) grid$z - cos(r^2) * exp(-r/(pi^3)) levelplot(z~x*y, grid, cuts = 50, scales=list(log=e), xlab=, ylab=, main=Weird Function, sub=with log scales, region = TRUE, colorkey = list(space=right,title=Doesn't work)) Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Custom colourkey spacing in levelplot, contourplot
Dear R gurus, I have got stuck on how to customise the colorkey generated by levelplot and contourplot, in the lattice package. This best illustrated by an example: library(lattice) levelplot(volcano^20/1e45,at=c(0,0.001,0.01,0.1,1,10)) The reason for raising the volcano dataset to the 20th power is to create a dataset with a large vertical exaggeration - this is similar in nature to the data set that I am dealing with. Now, if you look at the colorkey to the right of the plot, you'll see that its basically dominated by the interval 1:10, and that the 0.001:0.01 color is virtually invisible, even though it makes up quite a decent fraction of the actual plot (its the light pink/lavendar colour between the white and the magenta) . So the question is, how can I customise the colourkey so that it consists of five, equally spaced colour intervals, with the appropriate annotation - in this case I am essentially try to have a logarithmic distribution of the ticks, but you can imagine that there will be cases when one wants something completely custom. Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.