On 9/26/10 2:30 PM, Luc Maisonobe wrote:
Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit :
2010/9/26 Gilles Sadowski<gil...@harfang.homelinux.org>:
[...]

  1) do nothing to check the array is the same between calls and blindly
     assumes it IS the same. Users would really need to call clearCache
     when they provide a new array
     pros: very simple
     cons: error-prone for the user as it relies only on reading and
     understanding a documentation that would change with new version

  2) check only array reference equality (with ==) to check the array
     is the same as the previous call or not. Users would need to call
     clearCache only if they use the same array but have changed its
     content.
     pros: trade-off between efficiency and reliability,
           handles most cases properly
     cons: may be wrong in corner cases

  3) check array content using an hash code. Users would generally don't
     need to call clearCache at all so it would not be provided
     pros: works in all cases
     cons: add linear cost for array checking

  4) remove the double[] values parameter from the API and use a separate
     addValues method, hence the class will reset its cache only when
     addValues is called
     pros: works in all cases
     cons: Percentile could not implement UnivariateStatistic anymore

My preference is choice 2.

What do you think ?

IIUC, the interface method ("evaluate") was designed around the assumption
that successive calls are independent: the array argument is not
encapsulated for further processing (e.g. caching).
IMO, the equality check and the "clearCache" method look like workarounds,
not a clean solution.
As an example, what if a user calls "evaluate" several times alternating on
two different arrays:

---CUT---
    double[] a = new double[] {1, 2, 3};
    double[] b = new double[] {1, 2, 3, 4, 5};
    Percentile p = new Percentile();

    dourble r;
    r = p.evaluate(a, 50);
    r = p.evaluate(b, 50);
    r = p.evaluate(a, 50);
    r = p.evaluate(b, 50);

    // etc.
---CUT---

Doesn't this kind of use-case nullify the expected optimization?

If the array is going to be reused, the user call should reflect that fact;
e.g. by passing the array in a constructor:

---CUT---
    double[] a = new double[] {1, 2, 3};
    double[] b = new double[] {1, 2, 3, 4, 5};
    Percentile pA = new Percentile(a);
    Percentile pB = new Percentile(b);

    double r;
    r = pA.evaluate(50);
    r = pB.evaluate(50);
    r = pA.evaluate(50);
    r = pB.evaluate(50);
---CUT---

That way, later calls can benefit from whatever preprocessing was done in
previous calls.
The instance will always control all the information needed (e.g. after a
call to an "addValues" method) for the processing without the need to rely
on the user for calling "clearCache" whenever necessary.


Gilles

+1
I think that is a really good idea and I agree on the points made.

This proposal is point 4. It breaks the UnivariateStatistic API and it
breaks what the user found interesting in this API, i.e. have a general
statistics framework where one statistic can be replaced by another one.

If you read again one of my earlier messages from today, we will combine
this method (i.e. evaluate without values) and the UnivariateStatistics API.

Perhaps we could add these new methods (i.e. addValues and evaluate
without values) to UnivariateStatistics, but this can only be done on 3.0.


We could add to AbstractUnivariateStatistics in 2.x and update the interface in 3.0, if we can work out and agree on the setData() idea in my last post. While only some statistics can benefit internally from caching, I think the setData idea is consistent with UnivariateStatistics, as this is the interface for stats that require the full set of data to be stored in memory.

Phil
Luc


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to