[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3058 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...
Github user beyond1920 commented on a diff in the pull request: https://github.com/apache/flink/pull/3058#discussion_r96112195 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala --- @@ -71,6 +72,21 @@ class DataSetSort( ) } + override def estimateRowCount(metadata: RelMetadataQuery): Double = { +val inputRowCnt = metadata.getRowCount(this.getInput) +if (inputRowCnt == null) { + inputRowCnt +} else { + val rowCount = Math.max(inputRowCnt - limitStart, 0D) --- End diff -- Yes, that make sense. I already fix it. Thanks @fhueske . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3058#discussion_r95968750 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala --- @@ -71,6 +72,21 @@ class DataSetSort( ) } + override def estimateRowCount(metadata: RelMetadataQuery): Double = { +val inputRowCnt = metadata.getRowCount(this.getInput) +if (inputRowCnt == null) { + inputRowCnt +} else { + val rowCount = Math.max(inputRowCnt - limitStart, 0D) --- End diff -- inputRowCount might also just be an estimate and not guaranteed to be precise. Returning 1 is more robust, because it does not result in no-cost operators downstream. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...
Github user beyond1920 commented on a diff in the pull request: https://github.com/apache/flink/pull/3058#discussion_r95967156 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala --- @@ -71,6 +72,21 @@ class DataSetSort( ) } + override def estimateRowCount(metadata: RelMetadataQuery): Double = { +val inputRowCnt = metadata.getRowCount(this.getInput) +if (inputRowCnt == null) { + inputRowCnt +} else { + val rowCount = Math.max(inputRowCnt - limitStart, 0D) --- End diff -- Hmm, the estimation 0 only happens if inputRowCount <= start and (fetch is null or fetchValue<=0). I think this estimation is reasonable in this case. Besides, why choose return 1 instead of 0.1 or 0.01 or other values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3058#discussion_r95372931 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala --- @@ -71,6 +72,21 @@ class DataSetSort( ) } + override def estimateRowCount(metadata: RelMetadataQuery): Double = { +val inputRowCnt = metadata.getRowCount(this.getInput) +if (inputRowCnt == null) { + inputRowCnt +} else { + val rowCount = Math.max(inputRowCnt - limitStart, 0D) --- End diff -- Returning a cardinality estimate of `0` is not a good idea because all remaining operations might appear to have no costs at all. Rather be conservative and return `1` which is still low but does not invalidate any subsequent costs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...
GitHub user beyond1920 opened a pull request: https://github.com/apache/flink/pull/3058 [FLINK-5394] [Table API & SQL]the estimateRowCount method of DataSetCalc didn't work This pr aims to fix a bug which is referenced by https://issues.apache.org/jira/browse/FLINK-5394. The main changes including: 1. add FlinkRelMdRowCount and FlinkDefaultRelMetadataProvider to override getRowCount of some Flink RelNodes 2. add getRowCount method in DatasetSort to provide more accurate estimate You can merge this pull request into a Git repository by running: $ git pull https://github.com/alibaba/flink flink-5394 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3058.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3058 commit 8099920fb8759ed1068e7b8153816a7b63089e45 Author: beyond1920 Date: 2016-12-29T07:52:17Z the estimateRowCount method of DataSetCalc didn't work now, fix it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---