[
https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728625#comment-14728625
]
ASF GitHub Bot commented on FLINK-2030:
---------------------------------------
Github user chiwanpark commented on a diff in the pull request:
https://github.com/apache/flink/pull/861#discussion_r38619843
--- Diff:
flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java ---
@@ -248,6 +251,58 @@ public void mapPartition(Iterable<T> values,
Collector<Tuple2<Long, T>> out) thr
input.getType(), sampleInCoordinator, callLocation);
}
+ /**
+ * Creates a {@link
org.apache.flink.api.common.accumulators.DiscreteHistogram} from the data set
+ *
+ * @param data Discrete valued data set
+ * @return A histogram over data
+ */
+ public static DataSet<DiscreteHistogram>
createDiscreteHistogram(DataSet<Double> data) {
+ return data.mapPartition(new RichMapPartitionFunction<Double,
DiscreteHistogram>() {
+ @Override
+ public void mapPartition(Iterable<Double> values,
Collector<DiscreteHistogram> out)
+ throws Exception {
+ DiscreteHistogram histogram = new
DiscreteHistogram();
+ for (double value : values) {
+ histogram.add(value);
+ }
+ out.collect(histogram);
+ }
+ }).reduce(new ReduceFunction<DiscreteHistogram>() {
+ @Override
+ public DiscreteHistogram reduce(DiscreteHistogram
value1, DiscreteHistogram value2) throws Exception {
+ value1.merge(value2);
+ return value1;
+ }
+ });
+ }
+
+ /**
+ * Creates a {@link
org.apache.flink.api.common.accumulators.DiscreteHistogram} from the data set
+ *
+ * @param data Discrete valued data set
+ * @param bins Number of bins in the histogram
+ * @return A histogram over data
+ */
+ public static DataSet<ContinuousHistogram>
createContinuousHistogram(DataSet<Double> data, final int bins) {
+ return data.mapPartition(new RichMapPartitionFunction<Double,
ContinuousHistogram>() {
+ @Override
+ public void mapPartition(Iterable<Double> values,
Collector<ContinuousHistogram> out)
+ throws Exception {
--- End diff --
Same here (unnecessary new line)
> Implement an online histogram with Merging and equalization features
> --------------------------------------------------------------------
>
> Key: FLINK-2030
> URL: https://issues.apache.org/jira/browse/FLINK-2030
> Project: Flink
> Issue Type: Sub-task
> Components: Machine Learning Library
> Reporter: Sachin Goel
> Assignee: Sachin Goel
> Priority: Minor
> Labels: ML
>
> For the implementation of the decision tree in
> https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an
> histogram with online updates, merging and equalization features. A reference
> implementation is provided in [1]
> [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)