[ https://issues.apache.org/jira/browse/FLINK-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532343#comment-14532343 ]
ASF GitHub Bot commented on FLINK-1933: --------------------------------------- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/629#discussion_r29837576 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/metrics/distances/CosineDistanceMeasure.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.math.metrics.distances + +import org.apache.flink.ml.math.Vector + +/** This class implements a cosine distance metric. The class calculates the distance between + * the given vectors by dividing the dot product of two vectors by the product of their lengths. + * We convert the result of division to a usable distance. So, 1 - cos(angle) is actually returned. + * + * @see http://en.wikipedia.org/wiki/Cosine_similarity + */ +class CosineDistanceMeasure extends DistanceMeasure { + override def distance(a: Vector, b: Vector): Double = { + checkValidArguments(a, b) + + val dotProd: Double = a.dot(b) + val denominator: Double = a.magnitude * b.magnitude + if (dotProd == 0 && denominator == 0) { --- End diff -- @tillrohrmann That is for case with zero-vector. Without the code dealing the zero-vector case, we got a `DivisionByZeroError` in the case. I followed [the result of Wolfram Alpha](http://goo.gl/NXGLgo) and [implementation of Mahout](https://github.com/apache/mahout/blob/master/mr/src/main/java/org/apache/mahout/common/distance/CosineDistanceMeasure.java#L66) for resolving this case. > Add distance measure interface and basic implementation to machine learning > library > ----------------------------------------------------------------------------------- > > Key: FLINK-1933 > URL: https://issues.apache.org/jira/browse/FLINK-1933 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Chiwan Park > Assignee: Chiwan Park > Labels: ML > > Add distance measure interface to calculate distance between two vectors and > some implementations of the interface. In FLINK-1745, [~till.rohrmann] > suggests a interface following: > {code} > trait DistanceMeasure { > def distance(a: Vector, b: Vector): Double > } > {code} > I think that following list of implementation is sufficient to provide first > to ML library users. > * Manhattan distance [1] > * Cosine distance [2] > * Euclidean distance (and Squared) [3] > * Tanimoto distance [4] > * Minkowski distance [5] > * Chebyshev distance [6] > [1]: http://en.wikipedia.org/wiki/Taxicab_geometry > [2]: http://en.wikipedia.org/wiki/Cosine_similarity > [3]: http://en.wikipedia.org/wiki/Euclidean_distance > [4]: > http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29 > [5]: http://en.wikipedia.org/wiki/Minkowski_distance > [6]: http://en.wikipedia.org/wiki/Chebyshev_distance -- This message was sent by Atlassian JIRA (v6.3.4#6332)