Re: [R] Reporting ppr fits and using them externally.

Robert Chatfield Thu, 29 Jun 2006 15:39:32 -0700

A couple days of detective work have gone along way to reveal how one  
may
recover the actual knots and coefficients of the splines used in ppr.
It's not finished yet, but I may be due some mid-course correction.


First, ... I was asked, ... why?

On Jun 27, 2006, at 11:50 PM, Prof Brian Ripley wrote:

> It is normal to report smooth curves via plots of smooth curves.  
> There are examples for ppr in MASS4 p.241 (referenced on the help  
> page).  This is done by the plot() method.
>
> Please do consult the references in the documentation: those who  
> very carefully documented these tools put them there because they  
> do give additional information.

This  reading and plotting of course was all done before contacting r- 
help.

The plots of the smooth curves will be used to communicate with most  
readers of our
publication.  However, reviewers for J. Geophysical Research and  
Atmospheric Environment
will require specifics of the fit performed.  We hope that the  
results will be used
to greatly improve our understanding of smog ozone formation, and for  
field use,
the pure spline fit is required, unencumbered by the ~2000 fitted  
points of our estimation
data set.

So, we have found that (my colleague Bob E's words)
> Both smooth.spline and ppr, with spline or gcvspline option,
> eventually come to a call to qsbart.f. Therefore, find out
> which parameters ppr is using in the qsbart call, and
> determine which parameters to smooth.spline cause the
> same arguments to qsbart.
>
>
> From ppr, qsbart is called from at
> the end of the chain (some wrappers left out):
> ppr -> smart1 -> {subfit, fulfit} -> onetrm -> oneone -> supsmu ->  
> spline -> qsbart
> Nearly all the code in the fortran routines deals with the
> supersmoother case.
>
> smooth.spline (stats/R/smspline.R) calls subroutine spline directly.

I have found that qsbart is  in this file  ..../library/stats/src/ 
sbart.c

and that the inputs and outputs seem to be clearly documented at the  
beginning
of the code:

> /* A Cubic B-spline Smoothing routine.
>
>    The algorithm minimises:
>
>         (1/n) * sum ws(i)^2 * (ys(i)-sz(i))^2 + lambda* int  
> ( s"(x) )^2 dx
>
>    lambda is a function of the spar which is assumed to be between  
> 0 and 1
>
> INPUT
> -----
>    penalt       A penalty > 1 to be used in the gcv criterion
>    dofoff       either `df.offset' for GCV or `df' (to be matched).
>    n            number of data points
>    ys(n)        vector of length n containing the observations
>    ws(n)        vector containing the weights given to each data point
>    xs(n)        vector containing the ordinates of the observations
>    ssw          `centered weighted sum of y^2'
>    nk           number of b-spline coefficients to be estimated
>                 nk <= n+2
>    knot(nk+4)   vector of knot points defining the cubic b-spline  
> basis.
>                 To obtain full cubic smoothing splines one might
>                 have (provided the xs-values are strictly increasing)
>    spar         penalised likelihood smoothing parameter
>    ispar        indicating if spar is supplied (ispar=1) or to be  
> estimated
>    lspar, uspar lower and upper values for spar search;  0.,1. are  
> good values
>    tol, eps     used in Golden Search routine
>    isetup       setup indicator [initially 0
>    icrit        indicator saying which cross validation score is to  
> be computed
>                 0: none ;  1: GCV ;  2: CV ;  3: 'df matching'
>    ld4          the leading dimension of abd (ie ld4=4)
>    ldnk         the leading dimension of p2ip (not referenced)
>
> OUTPUT
> ------
>    coef(nk)     vector of spline coefficients
>    sz(n)        vector of smoothed z-values
>    lev(n)       vector of leverages
>    crit         either ordinary or generalized CV score
>    spar         if ispar != 1
>    lspar         == lambda (a function of spar and the design)
>    iter         number of iterations needed for spar search (if  
> ispar != 1)
>    ier          error indicator
>                 ier = 0 ___  everything fine
>                 ier = 1 ___  spar too small or too big
>                         problem in cholesky decomposition

These are returned to the calling FORTRAN code (we should note the  
need to
respect FORTRAN<->C array and call conventions).

So it remains for us to find either the results of the last call to  
qsbart.f
made from  subroutine spline (n, x, y, w, smo, edf) in  the ppr.f
code "spline"
>       subroutine spline (n, x, y, w, smo, edf)
> c
> c------------------------------------------------------------------
> c
> c input:
> c    n : number of observations.
> c    x(n) : ordered abscissa values.
> c    y(n) : corresponding ordinate (response) values.
> c    w(n) : weight for each (x,y) observation.
> c output:
> c   smo(n) : smoothed ordinate (response) values.
> c   edf : equivalent degrees of freedom
> c
> c------------------------------------------------------------------

Note that coefficients are not passed back up from here.

The predict method in subroutine pppred(np,x,smod,y,sc) appears
to use an interpolation on fitted points, certainly a fast technique.

So it appears necessary to
(a) hack the code to return the last successfully used spline
parameters for each direction vector used.

or
(b) to use smooth.spline or some other technique to go back and
fit the fitted values all over again.  (Conceivably this would make
equivalent degrees of freedom more understandable to the non-ppr-user.)

smooth.spline (laudably!) returns the values:
> fit
> list for use by predict.smooth.spline, with components
> knot:
> the knot sequence (including the repeated boundary knots).
> nk:
> number of coefficients or number of proper knots plus 2.
> coef:
> coefficients for the spline basis used.
> min, range:
> numbers giving the corresponding quantities of x.

Now we appear to have the work of determining what was the
last, most successful use of sbart.c

Any help and correction would be gratefully accepted.

to wrap up the previous discussion ...
>
> On Tue, 27 Jun 2006, Robert Chatfield wrote:
>
>> The pursuit projection packages ppr is an excellent contribution  
>> to R. It is great for one-to-three ridge fits, often somewhat  
>> intuitive, and for multi-ridge fits, where it at least describes a  
>> lot of variance.
>>
>> Like many folk, I need to report the fits obtained from ppr to the  
>> greater, outside, non-R world.  It is fairly obvious how to use  
>> the terms alpha and beta to report on directionality and importance.

> In what sense do you claim it to be `undocumented'?  Methods should  
> do what the generic is documented to do, and that is the case here.
Fully documented inputs and outputs do indeed help.
Even better, a commented guide, deep in the algorithm, to the  
critical code that makes
and decides that we have best estimates at each direction.

> [...]
>
> -- 
> Brian D. Ripley,                  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

To conclude,  I am pleased that after all the statistics chasing  
after smog ozone
data (just check the various R datasets), that simplest-to-modest ppr  
studies have helped.

I have found the control over smoothness and sheer fitting ability
of ppr very helpful.  That I cannot push down degrees of freedom (df,  
gcvpen),  beyond a certain
point is a minor annoyance --  about which I have not enough  
information at this point.

Sometimes I need 2 or 3 directions, often I need only one,
I  even find that a spline fit on a conjectured, so-called "single  
index" variable,
(one direction, one term)  is all that I think I should credibly fit.
(Parsimony and physical insight are big-time review criteria in my  
field.)
Knowledge of the specific spline-fit parameters is crucial in these  
cases.
The plots will indicate that the listed equivalent degrees of freedom  
are appropriate.

It's really good idea to have all of this in one package.

Maybe we can work toward a confluence of techniques to address these  
needs.

Thanks again for ppr and the MASS book which led me to it.

Dr. Robert Chatfield
Earth Sciences, MS 245-5
NASA Ames Research Center
Moffett Field, CA 94035    USA

Ph: 650-604-5490  FAX 650-604-3625

http://geo.arc.nasa.gov/sgg/chatfield


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Reporting ppr fits and using them externally.

Reply via email to