[
https://issues.apache.org/jira/browse/FLINK-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824436#comment-15824436
]
ASF GitHub Bot commented on FLINK-5423:
---------------------------------------
Github user Fokko commented on a diff in the pull request:
https://github.com/apache/flink/pull/3077#discussion_r96289143
--- Diff:
flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/outlier/StochasticOutlierSelectionITSuite.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.outlier
+
+import breeze.linalg.{sum, DenseVector => BreezeDenseVector}
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common.LabeledVector
+import org.apache.flink.ml.math.DenseVector
+import
org.apache.flink.ml.outlier.StochasticOutlierSelection.BreezeLabeledVector
+import org.apache.flink.ml.util.FlinkTestBase
+import org.scalatest.{FlatSpec, Matchers}
+
+class StochasticOutlierSelectionITSuite extends FlatSpec with Matchers
with FlinkTestBase {
+ behavior of "Stochastic Outlier Selection algorithm"
+ val EPSILON = 1e-16
+
+ /*
+ Unit-tests created based on the Python scripts of the algorithms
author'
+ https://github.com/jeroenjanssens/scikit-sos
+
+ For more information about SOS, see
https://github.com/jeroenjanssens/sos
+ J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik.
Stochastic
+ Outlier Selection. Technical Report TiCC TR 2012-001, Tilburg
University,
+ Tilburg, the Netherlands, 2012.
+ */
+
+ val perplexity = 3
+ val errorTolerance = 0
+ val maxIterations = 5000
+ val parameters = new
StochasticOutlierSelection().setPerplexity(perplexity).parameters
+
+ val env = ExecutionEnvironment.getExecutionEnvironment
+
+ it should "Compute the perplexity of the vector and return the correct
error" in {
+ val vector = BreezeDenseVector(Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0,
8.0, 9.0, 10.0))
+
+ val output = Array(
+ 0.39682901665799636,
+ 0.15747326846175236,
+ 0.06248996227359784,
+ 0.024797830280027126,
+ 0.009840498605275054,
+ 0.0039049953849556816,
+ 6.149323865970302E-4,
+ 2.4402301428445443E-4,
+ 9.683541280042027E-5
+ )
+
+ val search = StochasticOutlierSelection.binarySearch(
+ vector,
+ Math.log(perplexity),
+ maxIterations,
+ errorTolerance
+ ).toArray
+
+ search should be(output)
+ }
+
+ it should "Compute the distance matrix and give symmetrical distances"
in {
+
+ val data = env.fromCollection(List(
+ BreezeLabeledVector(0, BreezeDenseVector(Array(1.0, 3.0))),
+ BreezeLabeledVector(1, BreezeDenseVector(Array(5.0, 1.0)))
+ ))
+
+ val distanceMatrix = StochasticOutlierSelection
+ .computeDissimilarityVectors(data)
+ .map(_.data)
+ .collect()
+ .toArray
+
+ print(distanceMatrix)
--- End diff --
Oops, still in there from the debugging.
> Implement Stochastic Outlier Selection
> --------------------------------------
>
> Key: FLINK-5423
> URL: https://issues.apache.org/jira/browse/FLINK-5423
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Fokko Driesprong
> Assignee: Fokko Driesprong
>
> I've implemented the Stochastic Outlier Selection (SOS) algorithm by Jeroen
> Jansen.
> http://jeroenjanssens.com/2013/11/24/stochastic-outlier-selection.html
> Integrated as much as possible with the components from the machine learning
> library.
> The algorithm itself has been compared to four other algorithms and it it
> shows that SOS has a higher performance on most of these real-world datasets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)