[ 
https://issues.apache.org/jira/browse/MAHOUT-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016766#comment-13016766
 ] 

Sean Owen commented on MAHOUT-653:
----------------------------------

I think the Approximations are useful to have this in the code, sure. 

These DistanceMeasures are used by passing their class name to a job -- that 
is, they have to have a no-arg constructor. So I think MinkowskiDistanceMeasure 
needs to retain a default no-arg constructor that selects Math.pow() rather 
than the approximation. (I might call the param "fastApproximation" but that's 
a matter of taste.)

My first question is whether this may get used in any situation where the 
accuracy is a problem. I don't see any at the moment. Distance is used as an 
indicator of closeness and doesn't necessarily need exact answers.

In theory, a less exact answer means very slightly less accurate clustering 
results. I worry whether the approximations become very off at very large or 
small values, which you might encounter in clustering situations. I don't know, 
open question.

And then the question is comparing that to the speedup -- how much time is 
spent in this versus I/O, since I'm thinking these come up in the context of 
Hadoop jobs.

Open questions, anyone know more?

> Approximations to standard functions
> ------------------------------------
>
>                 Key: MAHOUT-653
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-653
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Lance Norskog
>         Attachments: MAHOUT-653.patch, MAHOUT-653.patch
>
>
> These give approximate versions of pow(value, exponent), exp(value), and 
> natural log(value).
> log() and exp() stolen from:
> [http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/]
> pow() stolen from:
> [http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to