I looked at this some more last night and now agree that if you are
just computing SSE, scoring the data and running that one sum in a
second pass should in general be more accurate.  The problem is, as
Luc pointed out, the need to store all of the data and I don't see any
way around that.  If there are better stateless formulas, then we
should look at them.  I am still -0 on adding a separate stateful
impl, but could be convinced if others feel differently and someone is
willing to volunteer to research, code, doc and write tests for it.

Phil
On 7/11/06, Phil Steitz <[EMAIL PROTECTED]> wrote:
On 7/11/06, Luc Maisonobe <[EMAIL PROTECTED]> wrote:
> J.Pietschmann wrote :
>
> > Well, the majority of the num math text books on my shelf actually
> > recommend computing the sum of the squared errors instead of the
> > algebraic equivalent form given in the more analytically oriented
> > text books (and used above). This is, of course, more complicated
> > and still prone to adverse numerical effects unless the sequence
> > is also sorted.
>

Can you provide references?

> You are right, but this would also imply storing all values and either
> recompute everything as points are added/removed or set up a "dirty"
> flag to perform lazy evaluation only when needed. This has an impact on
> both memory and CPU usage.
>
> The current implementation does not retain each points, it simply
> handles them on the fly by updating a few running sums. It can handle an
> extremely large number of points with a very little memory footprint.
>
> Do you think we should provide two implementations, one being memory/CPU
> friendly and the other one being accuracy-friendly ?
>

No, unless there are compelling arguments indicating that direct
computation is in fact more accurate for many instances (contradicting
references in the javadoc), in which case we would as you point out
need to maintain two versions, since we can't abandon the scalability
and performance of the current (essentially stateless) impl.   See the
references to the Chan / Golub article on accumulating sums of squares
in the addData javadoc and the appliled regression text (Weisberg)
cited there.  See also, e.g., Neter and Wasserman,  Applied Linear
Statistical Models [isbn 0256117365].

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to