[ https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704862#comment-14704862 ]
ASF GitHub Bot commented on FLINK-2030: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37527495 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,69 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: <a href="../ml">FlinkML</a> - Statistics +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +* This will be replaced by the TOC +{:toc} + +## Description + + The statistics utility provides features such as building histograms over data. + +## Methods + + The Statistics utility provides two major functions: `createHistogram` and + `createDiscreteHistogram`. + +### Creating a histogram + + There are two types of histograms: + 1. **Continuous Histograms**: These histograms are formed on a data set `X: DataSet[Double]` + when the values in `X` are from a continuous range. These histograms support + `quantile` and `sum` operations. Here `quantile(q)` refers to a value $x_q$ such that $|x: x + \leq x_q| = q * |X|$. Further, `sum(s)` refers to the number of elements $x \leq s$, which can + be construed as a cumulative probability value at $s$[Of course, *scaled* probability]. + 2. A continuous histogram can be formed by calling `X.createHistogram(b)` where `b` is the --- End diff -- What's the meaning of the bins? > Implement an online histogram with Merging and equalization features > -------------------------------------------------------------------- > > Key: FLINK-2030 > URL: https://issues.apache.org/jira/browse/FLINK-2030 > Project: Flink > Issue Type: Sub-task > Components: Machine Learning Library > Reporter: Sachin Goel > Assignee: Sachin Goel > Priority: Minor > Labels: ML > > For the implementation of the decision tree in > https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an > histogram with online updates, merging and equalization features. A reference > implementation is provided in [1] > [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)