On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
On 6/29/14, 9:48 AM, venkatesha murthy wrote:
On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <phil.ste...@gmail.com> wrote:

On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
Le 25/06/2014 06:05, venkatesha murthy a écrit :
On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <l...@spaceroots.org>
wrote:
Hi Venkat,

Le 23/06/2014 21:08, venkatesha murthy a écrit :
On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org>
wrote:
Hi all,

While looking further in Percentile class for MATH-1120, I have found another problem in the current implementation. NaNStrategy.FIXED
should
leave the NaNs in place, but at the end of the KthSelector.select method, a call to Arrays.sort moves the NaNs to the end of the small sub-array. What is really implemented here is therefore closer to NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
test cases because they use very short arrays, and we directly
switch to
this part of the select method.
Are NaNs considered higher than +Inf ?
If MAXIMAL represent replacing for +inf ; you need something to
indicate beyond this for NaN.
Well, we can just keep the NaN themselves and handled them
appropriately, hoping not to trash performances too much.

Agreed.

What is the test input you see an issue and what is the actual error
you are seeing. Please share the test case.
Just look at PercentileTest.testReplaceNanInRange(). The first check in the test corresponds to a Percentile configuration at 50% percentile, and NaNStrategy.FIXED. The array has an odd number of entries, so the 50% percentile is exactly one of the entries: the one at index 5 in the
final array.

The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So for the NaNStrategy.FIXED setting, it should not be modified at all in the selection algorithm and the result for 50% should be the first NaN of the array, at index 5. In fact, due to the Arrays.sort, we *do* reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
the result is 5.

Agreed. just verified by putting back the earlier insertionSort
function.
If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
result Double.POSITIVE_INFINITY instead of Double.NaN.

If we agree to leave FIXED as unchanged behaviour with your
insertionSort
code; then atleast MAXIMAL/MINIMAL should be allowed for transformation
of
NaN to +/-Inf
I'm fine with it, as long as we clearly state it in the NaNStrategy
documentation, saying somethig like:

When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by +/- infinity, hence the results will be computed as if infinities were
 there in the first place.
-1 - this is mixing concerns.  NaNStrategy exists for one specific
purpose - to turn extend the ordering on doubles a total ordering.
Strategies prescribe only how NaNs are to be treated in the
ordering.  Side effects on the input array don't make sense in this
context.  The use case for which this class was created was rank
transformations. Returning infinities in rank transformations would
blow up computations in these classes.  If a floating point
computations need to transform NaNs, the specific stats / other
classes that do this transformation should perform and document the
behavior.

Phil

OK. Agreed realized it later. Hence i have not touched NaNStrategy in my patch(MATH-1132) . Now i have added a separate transformer to transform NaNs but it uses NanStrategy. Please let me know on this as this trasnformation
itself
can be used in different places.

Honestly, I don't see the value of this.  I would be more favorable
to an addition to MathArrays supporting NaN (or infinity) filtering
/ transformation.  Several years back we made the decision to just
let NaNs propagate in computations, which basically means that if
you feed NaNs to [math] you are going to get NaNs out in results.
If we do get into the business of filtering NaNs (or infinities for
that matter), I think it is probably best to do that in MathArrays
or somesuch - i.e., don't decorate APIs with NaN or
infinity-handling handling descriptions.  That would be a huge task
and would likely end up bleeding implementation detail into the
APIs.  As I said above, NaNStrategy has to exist for rank
transformations to be well-defined.  NaN-handling in floating point
computations is defined and as long as we fully document what our
algorithms do, I think it is fair enough to "let the NaNs fall where
they may" - which basically means users need to do the filtering /
transformation themselves.

So, do I understand correctly that it's OK to add the "remove" and
"replace" utility methods (cf. MATH-1130) in "MathArrays"?
Or does this thread conclude that we don't have a use for this within
Commons Math?

Gilles


Phil

[...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to