[
https://issues.apache.org/jira/browse/MAHOUT-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016766#comment-13016766
]
Sean Owen commented on MAHOUT-653:
----------------------------------
I think the Approximations are useful to have this in the code, sure.
These DistanceMeasures are used by passing their class name to a job -- that
is, they have to have a no-arg constructor. So I think MinkowskiDistanceMeasure
needs to retain a default no-arg constructor that selects Math.pow() rather
than the approximation. (I might call the param "fastApproximation" but that's
a matter of taste.)
My first question is whether this may get used in any situation where the
accuracy is a problem. I don't see any at the moment. Distance is used as an
indicator of closeness and doesn't necessarily need exact answers.
In theory, a less exact answer means very slightly less accurate clustering
results. I worry whether the approximations become very off at very large or
small values, which you might encounter in clustering situations. I don't know,
open question.
And then the question is comparing that to the speedup -- how much time is
spent in this versus I/O, since I'm thinking these come up in the context of
Hadoop jobs.
Open questions, anyone know more?
> Approximations to standard functions
> ------------------------------------
>
> Key: MAHOUT-653
> URL: https://issues.apache.org/jira/browse/MAHOUT-653
> Project: Mahout
> Issue Type: New Feature
> Reporter: Lance Norskog
> Attachments: MAHOUT-653.patch, MAHOUT-653.patch
>
>
> These give approximate versions of pow(value, exponent), exp(value), and
> natural log(value).
> log() and exp() stolen from:
> [http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/]
> pow() stolen from:
> [http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira