subject:"\[jira\] \[Updated\] \(FLINK\-1297\) Add support for tracking statistics of intermediate results"

[jira] [Updated] (FLINK-1297) Add support for tracking statistics of intermediate results

2015-08-25 Thread Ufuk Celebi (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ufuk Celebi updated FLINK-1297:
---
Fix Version/s: (was: 0.9)
0.10

Add support for tracking statistics of intermediate results
---

Key: FLINK-1297
URL: https://issues.apache.org/jira/browse/FLINK-1297
Project: Flink
Issue Type: Improvement
Components: Distributed Runtime
Reporter: Alexander Alexandrov
Assignee: Alexander Alexandrov
Fix For: 0.10

Original Estimate: 1,008h
Remaining Estimate: 1,008h

One of the major problems related to the optimizer at the moment is the lack
of proper statistics.
With the introduction of staged execution, it is possible to instrument the
runtime code with a statistics facility that collects the required
information for optimizing the next execution stage.
I would therefore like to contribute code that can be used to gather basic
statistics for the (intermediate) result of dataflows (e.g. min, max, count,
count distinct) and make them available to the job manager.
Before I start, I would like to hear some feedback form the other users.
In particular, to handle skew (e.g. on grouping) it might be good to have
some sort of detailed sketch about the key distribution of an intermediate
result. I am not sure whether a simple histogram is the most effective way to
go. Maybe somebody would propose another lightweight sketch that provides
better accuracy.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1297) Add support for tracking statistics of intermediate results

2015-01-19 Thread Robert Metzger (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Metzger updated FLINK-1297:
--
Fix Version/s: (was: 0.8)
0.9

Add support for tracking statistics of intermediate results
---

Original Estimate: 1,008h
Remaining Estimate: 1,008h

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1297) Add support for tracking statistics of intermediate results

[jira] [Updated] (FLINK-1297) Add support for tracking statistics of intermediate results

2 matches

Site Navigation

Mail list logo

Footer information