[ https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705023#comment-14705023 ]
ASF GitHub Bot commented on FLINK-2030: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37536712 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/DiscreteHistogram.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.statistics + +import scala.collection.mutable + +/** Implementation of a discrete valued online histogram + * + * =Parameters= + * -[[numCategories]]: + * Number of categories in the histogram + */ +class DiscreteHistogram(numCategories: Int) extends OnlineHistogram { + + require(numCategories > 0, "Capacity must be greater than zero") + val data = new mutable.HashMap[Double, Int]() + + /** Number of categories in the histogram + * + * @return number of categories + */ + override def bins: Int = { + numCategories + } + + /** Increment count of category c + * + * @param c category whose count needs to be incremented + */ + override def add(c: Double): Unit = { + data.get(c) match { + case None => + require(data.size < numCategories, "Insufficient capacity. Failed to add.") + data.put(c, 1) + case Some(value) => + data.update(c, value + 1) + } + } + + /** Merges the histogram with h and returns a new histogram --- End diff -- what is *h*? Would be easier to understand if you write *Merges this histogram with the given histogram h. The result is a new histogram* . > Implement an online histogram with Merging and equalization features > -------------------------------------------------------------------- > > Key: FLINK-2030 > URL: https://issues.apache.org/jira/browse/FLINK-2030 > Project: Flink > Issue Type: Sub-task > Components: Machine Learning Library > Reporter: Sachin Goel > Assignee: Sachin Goel > Priority: Minor > Labels: ML > > For the implementation of the decision tree in > https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an > histogram with online updates, merging and equalization features. A reference > implementation is provided in [1] > [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)