[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

2017-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3058


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

2017-01-14 Thread beyond1920
Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3058#discussion_r96112195
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala
 ---
@@ -71,6 +72,21 @@ class DataSetSort(
 )
   }
 
+  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
+val inputRowCnt = metadata.getRowCount(this.getInput)
+if (inputRowCnt == null) {
+  inputRowCnt
+} else {
+  val rowCount = Math.max(inputRowCnt - limitStart, 0D)
--- End diff --

Yes, that make sense.  I already fix it. Thanks @fhueske .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

2017-01-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3058#discussion_r95968750
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala
 ---
@@ -71,6 +72,21 @@ class DataSetSort(
 )
   }
 
+  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
+val inputRowCnt = metadata.getRowCount(this.getInput)
+if (inputRowCnt == null) {
+  inputRowCnt
+} else {
+  val rowCount = Math.max(inputRowCnt - limitStart, 0D)
--- End diff --

inputRowCount might also just be an estimate and not guaranteed to be 
precise. Returning 1 is more robust, because it does not result in no-cost 
operators downstream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

2017-01-13 Thread beyond1920
Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3058#discussion_r95967156
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala
 ---
@@ -71,6 +72,21 @@ class DataSetSort(
 )
   }
 
+  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
+val inputRowCnt = metadata.getRowCount(this.getInput)
+if (inputRowCnt == null) {
+  inputRowCnt
+} else {
+  val rowCount = Math.max(inputRowCnt - limitStart, 0D)
--- End diff --

Hmm, the estimation 0 only  happens if inputRowCount <= start and (fetch  
is null or fetchValue<=0). I think this estimation is reasonable in this case.  
Besides, why choose return 1 instead of 0.1 or 0.01 or other values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

2017-01-10 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3058#discussion_r95372931
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala
 ---
@@ -71,6 +72,21 @@ class DataSetSort(
 )
   }
 
+  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
+val inputRowCnt = metadata.getRowCount(this.getInput)
+if (inputRowCnt == null) {
+  inputRowCnt
+} else {
+  val rowCount = Math.max(inputRowCnt - limitStart, 0D)
--- End diff --

Returning a cardinality estimate of `0` is not a good idea because all 
remaining operations might appear to have no costs at all. Rather be 
conservative and return `1` which is still low but does not invalidate any 
subsequent costs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

2017-01-04 Thread beyond1920
GitHub user beyond1920 opened a pull request:

https://github.com/apache/flink/pull/3058

[FLINK-5394] [Table API & SQL]the estimateRowCount method of DataSetCalc 
didn't work

This pr aims to fix a bug which is referenced by 
https://issues.apache.org/jira/browse/FLINK-5394.
The main changes including:
1. add FlinkRelMdRowCount and  FlinkDefaultRelMetadataProvider to override 
getRowCount  of some Flink RelNodes
2. add getRowCount method in DatasetSort to provide more accurate estimate

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink flink-5394

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3058.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3058


commit 8099920fb8759ed1068e7b8153816a7b63089e45
Author: beyond1920 
Date:   2016-12-29T07:52:17Z

the estimateRowCount method of DataSetCalc didn't work now, fix it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---