Re: [R] MGCV: Use of irls.reg option

2012-06-22 Thread r-help . 20 . trevva
Hi Simon,

Thanks for taking the time to reply. Please let me explain a few more details.

The problem that I am working on is essentially the same as the
Bristol Channel Sole Egg distribution example in your book and in the
soap paper but instead it is Herring Larvae in the English Channel -
same same, but different. The model structure is:

mdl - gam(nlarv ~ s(lon,lat)  + s(day of year) +
factor(year),data=dat, family=poisson(log=link))

ie a multiplicative structure with a poisson observation model. Now,
the problem is that at the edges of the distribution in space (lon,
lat) (and to a lesser extent time) the observations are rich in zeros,
as we move away from the main spawning grounds. The model appears to
be converging ok, but the residuals look horrible. In particular,
there are some extremely large residuals around the edges (pearson
residuals of 1000 or so), where we get a few larvae in a region where
they are otherwise unlikely. When I look at the TPRS (on the linear
predictor scale) it appears to be heading towards minus infinity -
essentially we end up in a situation where we observe a single larvae,
but the expected mean number is 1e-10, which creates these very large
residuals. This was where I happened across the irls.reg argument -
the description in the help file (i.e. lack of identifiability) sounds
very much like the problem that I am having, which is what inspired
the question.

I've also tried using the soap smoother in place of the TPRS - the
problem is not as severe, and I can limit it by making the boundaries
of the soap film extremely tight around the non-zero data but the same
underlying problem is still lurking in the corners...

Do you have any suggestions as to how I can get around this
edge-effects problem?

Mark


Hi Mark,

irls.reg is kind of `legacy code'. Does model fitting actually fail for
your example, or is it just that the
estimated spatial smooth looks unpleasant?

best,
Simon


On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com wrote:
 Hi,

 In the help files in the  mgcv package for the gam.control() function,
 there is an option irls.reg. The help files describe this option as:

 For most models this should be 0. The iteratively re-weighted least squares
 method by which GAMs are fitted can fail to converge in some circumstances.
 For example, data with many zeroes can cause problems in a model with a log
 link, because a mean of zero corresponds to an infinite range of
 linear predictor
 values. Such convergence problems are caused by a fundamental lack of
 identifiability, but do not show up as lack of identifiability in the
 penalized linear
 model problems that have to be solved at each stage of iteration. In such
 circumstances it is possible to apply a ridge regression penalty to the model 
 to
 impose identifiability, and irls.reg is the size of the penalty.

 I am trying to fit a poisson GLM model with a log-link function and am
 having problems similar to those described - in particular, the model
 has a spatial s(lon,lat) term and there are lot of zeros around the
 edges of my domain which are making the TPRS do strange thing. It
 sounds like irls.reg might be the answer to my problems. The question
 I have is how to use it? What is an appropriate value? I can't seem to
 find any more information than that provided, and I don't know if I
 really understand what it is doing. Are there any examples or
 references on this that I have overlooked during my googling that
 could help?

 Best wishes,

 Mark Payne
 DTU Aqua,
 Copenhagen, Denmark


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scatter.smooth() line colour

2012-06-22 Thread r-help . 20 . trevva
Hi,

I really like the scatter.smooth() function - its extremeley useful.
However, as far as I understand it, there is no way to change the
properties of the smoothing line e.g. col, lty, lwd. The
scatter.smooth function accepts a ... argument, but these are not
passed to the plotting function that actually plots the smoother -
only to the function that plots the points. Could I please therefore
request that an argument be added to this function to give easier
control over line properties? e.g. line.par=list()

Best wishes,

Mark Payne

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MGCV: Use of irls.reg option

2012-06-20 Thread r-help . 20 . trevva
Hi,

In the help files in the  mgcv package for the gam.control() function,
there is an option irls.reg. The help files describe this option as:

For most models this should be 0. The iteratively re-weighted least squares
method by which GAMs are fitted can fail to converge in some circumstances.
For example, data with many zeroes can cause problems in a model with a log
link, because a mean of zero corresponds to an infinite range of
linear predictor
values. Such convergence problems are caused by a fundamental lack of
identifiability, but do not show up as lack of identifiability in the
penalized linear
model problems that have to be solved at each stage of iteration. In such
circumstances it is possible to apply a ridge regression penalty to the model to
impose identifiability, and irls.reg is the size of the penalty.

I am trying to fit a poisson GLM model with a log-link function and am
having problems similar to those described - in particular, the model
has a spatial s(lon,lat) term and there are lot of zeros around the
edges of my domain which are making the TPRS do strange thing. It
sounds like irls.reg might be the answer to my problems. The question
I have is how to use it? What is an appropriate value? I can't seem to
find any more information than that provided, and I don't know if I
really understand what it is doing. Are there any examples or
references on this that I have overlooked during my googling that
could help?

Best wishes,

Mark Payne
DTU Aqua,
Copenhagen, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manually modifying an hclust dendrogram to remove singletons

2012-05-25 Thread r-help . 20 . trevva
Hi all,

Thanks for the replies - they have helped shaped my thinking and are
starting to push me in a better direction. Maybe I should explain a
little more about what I'm trying to achieve.

I am analysing satellite data across the global ocean, and am
interested in trying to classify areas of the ocean according to the
similarity between the pixels. Singletons in this case therefore
represent individual pixels that are different to the rest in terms of
the similarity metric, but aren't really all that interesting in terms
of the broad picture - I consider them outliers or noise. However,
they are annoying when it comes to splitting up the dendrogram,
because I'm mainly interested in the reclassification of large areas
of ocean at each step, rather than changes in the similarity.

The dynamic tree-cut approach looks like a promising and sensible
solution to the problem - I'll see if I can get something out of it.

However,  this discussion has started me wondering how I can use the
spatial proximity of the pixels in the analysis - does anyone have any
insights? Can the WGCNA approach be used in such a context?

Best wishes,

Mark Payne

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Manually modifying an hclust dendrogram to remove singletons

2012-05-24 Thread r-help . 20 . trevva
Dear R-Help,

I have a clustering problem with hclust that I hope someone can help
me with. Consider the classic hclust example:

 hc - hclust(dist(USArrests), ave)
 plot(hc)

I would like to cut the tree up in such a way so as to avoid small
clusters, so that we get a minimum number of items in each cluster,
and therefore avoid singletons. e.g. in this example, you can see that
Hawaii is split off onto its own at quite a high level. I would like
to avoid having a single item clustered on its own like this. How can
I achieve this?

I have tried manually modifying the tree using dendrapply but have not
been able to produce a valid solution thus far..

Suggestions are welcome.

Best wishes,

Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package install error: _intel_fast_memcpy

2010-12-07 Thread r-help . 20 . trevva
Hi,

I am trying to install the ncdf package on my Ubuntu 8.04 (Karmic)
box in the basement. I have installed the most recent distribution of
R, 2.12.0 without too many problems. However, when it comes to install
the ncdf netcdf package, I unfortunately get the following error:

Error in dyn.load(file, DLLpath = DLLpath, ...) :
 unable to load shared object
'/home/mpa/R/i486-pc-linux-gnu-library/2.12/ncdf/libs/ncdf.so':
 /home/mpa/R/i486-pc-linux-gnu-library/2.12/ncdf/libs/ncdf.so:
undefined symbol: _intel_fast_memcpy

I also get a similar error when trying to install the ncdf4 package.

I have tried to find the meaning of the error, but I can't find
anything - I'm assuming that its a dependency problem. Can anyone help
me out here please?

Best wishes,

Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lattice and high-resolution tiffs

2009-06-29 Thread r-help . 20 . trevva
Hi,

I am trying to produce high-resolution (600dpi+) TIFF figures for use
in a publication. Everything seems to work well when using the
normal R-graphics and the relative text size obtained is independent
of the output resolution. However, when I try and make lattice plots,
the relative size of the text labels changes when I adjust the
resolution - they become extremely small at high resolution. Some
sample code is attached below - as you will see, the text in the low
resolution plot is as expected, whilst that in the high-resolution is
there, but very very small.
I'm running R 2.9.1 on Windows - R.Version() info is contained at the
very bottom.

Any ideas?
Cheers,

Mark


Sample code
---

Depth - equal.count(quakes$depth, number=8, overlap=.1)
p- xyplot(lat ~ long | Depth, data = quakes)

tiff(Low res 
plot.tiff,width=155/25.4,height=155/25.4,units=in,res=72,pointsize=12,compression=none)
print(p)
dev.off()

tiff(High res 
plot.tiff,width=155/25.4,height=155/25.4,units=in,res=600,pointsize=12,compression=none)
print(p)
dev.off()


R.version() info


 R.Version()

$platform
[1] i386-pc-mingw32

$arch
[1] i386

$os
[1] mingw32

$system
[1] i386, mingw32

$status
[1] 

$major
[1] 2

$minor
[1] 9.1

$year
[1] 2009

$month
[1] 06

$day
[1] 26

$`svn rev`
[1] 48839

$language
[1] R

$version.string
[1] R version 2.9.1 (2009-06-26)



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Titles on lattice colorkey

2009-04-03 Thread r-help . 20 . trevva

Dear R-ers,

I'm not sure if this is a missing feature, a support request, or stupidity on my part, but nevertheless, its a question. Is it possible to add titles to colorkey legends? As far as I can tell, there is a command to do it for normal key legends, but not for colorkeys. 


eg it works for a normal key, created through auto.key

xyplot(decrease ~ treatment, OrchardSprays, groups = rowpos,
  type = a,
  auto.key = list(space = right, points = FALSE, lines = TRUE,title=Key 
title))

but there is no comparable command for a colorkey

x - seq(pi/4, 5 * pi, length = 100)
y - seq(pi/4, 5 * pi, length = 100)
r - as.vector(sqrt(outer(x^2, y^2, +)))
grid - expand.grid(x=x, y=y)
grid$z - cos(r^2) * exp(-r/(pi^3))
levelplot(z~x*y, grid, cuts = 50, scales=list(log=e), xlab=,
 ylab=, main=Weird Function, sub=with log scales,
 region = TRUE,
 colorkey = list(space=right,title=Doesn't work))


Cheers,

Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Custom colourkey spacing in levelplot, contourplot

2007-10-23 Thread r-help . 20 . trevva
Dear R gurus,

I have got stuck on how to customise the colorkey generated by
levelplot and contourplot, in the lattice package. This best
illustrated by an example:

library(lattice)
levelplot(volcano^20/1e45,at=c(0,0.001,0.01,0.1,1,10))

The reason for raising the volcano dataset to the 20th power is to
create a dataset with a large vertical exaggeration - this is similar
in nature to the data set that I am dealing with. Now, if you look at
the colorkey to the right of the plot, you'll see that its basically
dominated by the interval 1:10, and that the 0.001:0.01 color is
virtually invisible, even though it makes up quite a decent fraction
of the actual plot (its the light pink/lavendar colour between the
white and the magenta) .

So the question is, how can I customise the colourkey so that it
consists of five, equally spaced colour intervals, with the
appropriate annotation - in this case I am essentially try to have a
logarithmic distribution of the ticks, but you can imagine that there
will be cases when one wants something completely custom.

Cheers,

Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.