[ 
https://issues.apache.org/jira/browse/MADLIB-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168125#comment-16168125
 ] 

Frank McQuillan commented on MADLIB-1059:
-----------------------------------------

Adding a comment from [~njayaram] that he put in
https://issues.apache.org/jira/browse/MADLIB-1129

"Himanshu, 
Since you are working on including more distance functions for kNN, I thought
extending that to the output layer might also be useful. Right now, it looks 
like
MADlib does a simple average of the k-nearest neighbors to come up with the
final value for both classification and regression. Doing a weighted average 
instead
might be a desirable functionality. The weighting for the average can be based 
on the
distance of the k-nearest neighbors.
We can probably provide an optional parameter to let users choose how the final
classification label or regression score has to be computed (avg or weighted 
avg).
Frank McQuillan any thoughts?"

I think this is a good idea to do at the same time as adding the distance 
functions.

> Add additional distance metrics for k-NN
> ----------------------------------------
>
>                 Key: MADLIB-1059
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1059
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: k-NN
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>              Labels: starter
>             Fix For: v2.0
>
>
> Follow on from https://issues.apache.org/jira/browse/MADLIB-927
> which supports one distance function.  This JIRA is to 
> (1)
> add additional distance metrics.  The model is follow is
> http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> fn_dist (optional)
> TEXT, default: squared_dist_norm2'. The name of the function to use to 
> calculate the distance between data points.
> The following distance functions can be used (computation of barycenter/mean 
> in parentheses):
> dist_norm1: 1-norm/Manhattan (element-wise median [Note that MADlib does not 
> provide a median aggregate function for support and performance reasons.])
> dist_norm2: 2-norm/Euclidean (element-wise mean)
> squared_dist_norm2: squared Euclidean distance (element-wise mean)
> dist_angle: angle (element-wise mean of normalized points)
> dist_tanimoto: tanimoto (element-wise mean of normalized points [5])
> user defined function with signature DOUBLE PRECISION[] x, DOUBLE PRECISION[] 
> y -> DOUBLE PRECISION
> and also check of there are other distance functions under
> http://madlib.apache.org/docs/latest/group__grp__linalg.html
> that might make sense to include while you are at it, in addition to the ones 
> listed above
> (2) Add an option for weighted average in the voting.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to