[jira] [Created] (FLINK-5396) flink-dist replace scala version in opt.xml by change-scala-version.sh

2016-12-27 Thread shijinkui (JIRA)
shijinkui created FLINK-5396:


 Summary: flink-dist replace scala version in opt.xml by 
change-scala-version.sh
 Key: FLINK-5396
 URL: https://issues.apache.org/jira/browse/FLINK-5396
 Project: Flink
  Issue Type: Improvement
Reporter: shijinkui


flink-dist have configured for replacing bin.xml, but not opt.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5395) support locally build distribution by script create_release_files.sh

2016-12-27 Thread shijinkui (JIRA)
shijinkui created FLINK-5395:


 Summary: support locally build distribution by script 
create_release_files.sh
 Key: FLINK-5395
 URL: https://issues.apache.org/jira/browse/FLINK-5395
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: shijinkui


create_release_files.sh is build flink release only. It's hard to build custom 
local Flink release distribution.

Let create_release_files.sh support:
1. custom git repo url
2. custom build special scala and hadoop version
3. fix flink-dist opt.xml have no replace the scala version by 
change-scala-version.sh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work

2016-12-27 Thread zhangjing (JIRA)
zhangjing created FLINK-5394:


 Summary: the estimateRowCount method of DataSetCalc didn't work
 Key: FLINK-5394
 URL: https://issues.apache.org/jira/browse/FLINK-5394
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: zhangjing
Assignee: zhangjing


The estimateRowCount method of DataSetCalc didn't work now. 
If I run the following code,
`
Table table = tableEnv
.fromDataSet(data, "a, b, c")
.groupBy("a")
.select("a, a.avg, b.sum, c.count")
.where("a == 1");
`
the cost of every node in Optimized node tree is :
`
DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, 
COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 
cpu, 28000.0 io}, id = 28
  DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative 
cost = {2000.0 rows, 2000.0 cpu, 0.0 io}, id = 27
DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost 
= {1000.0 rows, 1000.0 cpu, 0.0 io}, id = 20
`
We expect the input rowcount of DataSetAggregate less than 1000, however the 
actual input rowcount is still 1000 because the the estimateRowCount method of 
DataSetCalc didn't work. 

There are two reasons caused to this:
1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to 
estimate its input rowcount which would dispatch to RelMdRowCount.
2. DataSetCalc is subclass of SingleRel, so previous function call would match 
getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use 
DataSetCalc.estimateRowCount.

I plan to resolve this problem by adding a FlinkRelMdRowCount which contains 
specific getRowCount of Flink RelNodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)