[ 
https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736617#comment-14736617
 ] 

ASF GitHub Bot commented on FLINK-1297:
---------------------------------------

Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/605#issuecomment-138866235
  
    I tried again, this works:
    
    ```java
        @Override
        public OperatorStatistics clone(){
                OperatorStatistics clone = new OperatorStatistics(config);
                clone.min = min;
                clone.max = max;
                clone.cardinality = cardinality;
    
                try {
                        ICardinality copy;
                        if (countDistinct instanceof LinearCounting) {
                                copy = new 
LinearCounting(config.getCountDbitmap());
                        } else if (countDistinct instanceof HyperLogLog) {
                                copy = new HyperLogLog(config.getCountDlog2m());
                        } else {
                                throw new IllegalStateException("Unsupported 
counter.");
                        }
                        clone.countDistinct = copy.merge(countDistinct);
                } catch (CardinalityMergeException e) {
                        throw new RuntimeException("Faild to clone 
OperatorStatistics!");
                }
    
                try {
                        HeavyHitter copy;
                        if (heavyHitter instanceof LossyCounting) {
                                copy = new 
LossyCounting(config.getHeavyHitterFraction(), config.getHeavyHitterError());
                        } else if (heavyHitter instanceof CountMinHeavyHitter) {
                                copy = new 
CountMinHeavyHitter(config.getHeavyHitterFraction(),
                                                config.getHeavyHitterError(),
                                                
config.getHeavyHitterConfidence(),
                                                config.getHeavyHitterSeed());
                        } else {
                                throw new IllegalStateException("Unsupported 
counter.");
                        }
                        copy.merge(heavyHitter);
                        clone.heavyHitter = copy;
                } catch (HeavyHitterMergeException e) {
                        throw new RuntimeException("Failed to clone 
OperatorStatistics!");
                }
    
                return clone;
        }
    ```
    
    Do you think we could merge your pull request with this change?


> Add support for tracking statistics of intermediate results
> -----------------------------------------------------------
>
>                 Key: FLINK-1297
>                 URL: https://issues.apache.org/jira/browse/FLINK-1297
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>            Reporter: Alexander Alexandrov
>            Assignee: Alexander Alexandrov
>             Fix For: 0.10
>
>   Original Estimate: 1,008h
>  Remaining Estimate: 1,008h
>
> One of the major problems related to the optimizer at the moment is the lack 
> of proper statistics.
> With the introduction of staged execution, it is possible to instrument the 
> runtime code with a statistics facility that collects the required 
> information for optimizing the next execution stage.
> I would therefore like to contribute code that can be used to gather basic 
> statistics for the (intermediate) result of dataflows (e.g. min, max, count, 
> count distinct) and make them available to the job manager.
> Before I start, I would like to hear some feedback form the other users.
> In particular, to handle skew (e.g. on grouping) it might be good to have 
> some sort of detailed sketch about the key distribution of an intermediate 
> result. I am not sure whether a simple histogram is the most effective way to 
> go. Maybe somebody would propose another lightweight sketch that provides 
> better accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to