[ 
https://issues.apache.org/jira/browse/FLINK-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824436#comment-15824436
 ] 

ASF GitHub Bot commented on FLINK-5423:
---------------------------------------

Github user Fokko commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3077#discussion_r96289143
  
    --- Diff: 
flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/outlier/StochasticOutlierSelectionITSuite.scala
 ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.ml.outlier
    +
    +import breeze.linalg.{sum, DenseVector => BreezeDenseVector}
    +import org.apache.flink.api.scala._
    +import org.apache.flink.ml.common.LabeledVector
    +import org.apache.flink.ml.math.DenseVector
    +import 
org.apache.flink.ml.outlier.StochasticOutlierSelection.BreezeLabeledVector
    +import org.apache.flink.ml.util.FlinkTestBase
    +import org.scalatest.{FlatSpec, Matchers}
    +
    +class StochasticOutlierSelectionITSuite extends FlatSpec with Matchers 
with FlinkTestBase {
    +  behavior of "Stochastic Outlier Selection algorithm"
    +  val EPSILON = 1e-16
    +
    +  /*
    +    Unit-tests created based on the Python scripts of the algorithms 
author'
    +    https://github.com/jeroenjanssens/scikit-sos
    +
    +    For more information about SOS, see 
https://github.com/jeroenjanssens/sos
    +    J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik. 
Stochastic
    +    Outlier Selection. Technical Report TiCC TR 2012-001, Tilburg 
University,
    +    Tilburg, the Netherlands, 2012.
    +   */
    +
    +  val perplexity = 3
    +  val errorTolerance = 0
    +  val maxIterations = 5000
    +  val parameters = new 
StochasticOutlierSelection().setPerplexity(perplexity).parameters
    +
    +  val env = ExecutionEnvironment.getExecutionEnvironment
    +
    +  it should "Compute the perplexity of the vector and return the correct 
error" in {
    +    val vector = BreezeDenseVector(Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 
8.0, 9.0, 10.0))
    +
    +    val output = Array(
    +      0.39682901665799636,
    +      0.15747326846175236,
    +      0.06248996227359784,
    +      0.024797830280027126,
    +      0.009840498605275054,
    +      0.0039049953849556816,
    +      6.149323865970302E-4,
    +      2.4402301428445443E-4,
    +      9.683541280042027E-5
    +    )
    +
    +    val search = StochasticOutlierSelection.binarySearch(
    +      vector,
    +      Math.log(perplexity),
    +      maxIterations,
    +      errorTolerance
    +    ).toArray
    +
    +    search should be(output)
    +  }
    +
    +  it should "Compute the distance matrix and give symmetrical distances" 
in {
    +
    +    val data = env.fromCollection(List(
    +      BreezeLabeledVector(0, BreezeDenseVector(Array(1.0, 3.0))),
    +      BreezeLabeledVector(1, BreezeDenseVector(Array(5.0, 1.0)))
    +    ))
    +
    +    val distanceMatrix = StochasticOutlierSelection
    +      .computeDissimilarityVectors(data)
    +      .map(_.data)
    +      .collect()
    +      .toArray
    +
    +    print(distanceMatrix)
    --- End diff --
    
    Oops, still in there from the debugging.


> Implement Stochastic Outlier Selection
> --------------------------------------
>
>                 Key: FLINK-5423
>                 URL: https://issues.apache.org/jira/browse/FLINK-5423
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Fokko Driesprong
>            Assignee: Fokko Driesprong
>
> I've implemented the Stochastic Outlier Selection (SOS) algorithm by Jeroen 
> Jansen.
> http://jeroenjanssens.com/2013/11/24/stochastic-outlier-selection.html
> Integrated as much as possible with the components from the machine learning 
> library.
> The algorithm itself has been compared to four other algorithms and it it 
> shows that SOS has a higher performance on most of these real-world datasets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to