[jira] [Comment Edited] (FLINK-4758) Remove IOReadableWritable from classes where not needed
[ https://issues.apache.org/jira/browse/FLINK-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782230#comment-15782230 ] jiwengang edited comment on FLINK-4758 at 12/28/16 6:53 AM: just remove the interface? Does the implementation classes still have to keep the method in *IOReadableWritable*? was (Author: jeffreyji666): just remove the interface? Does the implementation classes still have to keep the method in **IOReadableWritable**? > Remove IOReadableWritable from classes where not needed > --- > > Key: FLINK-4758 > URL: https://issues.apache.org/jira/browse/FLINK-4758 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.2.0 >Reporter: Stephan Ewen > Fix For: 2.0.0 > > > Many classes implement for historic reasons the {{IOReadableWritable}} > interface, where it is not needed any more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-4758) Remove IOReadableWritable from classes where not needed
[ https://issues.apache.org/jira/browse/FLINK-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782230#comment-15782230 ] jiwengang edited comment on FLINK-4758 at 12/28/16 6:50 AM: just remove the interface? Does the implementation classes still have to keep the method in **IOReadableWritable**? was (Author: jeffreyji666): just remove the interface? Does the implementation classes still have to keep the method in IOReadableWritable? > Remove IOReadableWritable from classes where not needed > --- > > Key: FLINK-4758 > URL: https://issues.apache.org/jira/browse/FLINK-4758 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.2.0 >Reporter: Stephan Ewen > Fix For: 2.0.0 > > > Many classes implement for historic reasons the {{IOReadableWritable}} > interface, where it is not needed any more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4758) Remove IOReadableWritable from classes where not needed
[ https://issues.apache.org/jira/browse/FLINK-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782230#comment-15782230 ] jiwengang commented on FLINK-4758: -- just remove the interface? Does the implementation classes still have to keep the method in IOReadableWritable? > Remove IOReadableWritable from classes where not needed > --- > > Key: FLINK-4758 > URL: https://issues.apache.org/jira/browse/FLINK-4758 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.2.0 >Reporter: Stephan Ewen > Fix For: 2.0.0 > > > Many classes implement for historic reasons the {{IOReadableWritable}} > interface, where it is not needed any more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #3048: Clarified the import path of the Breeze DenseVector
Github user Fokko commented on the issue: https://github.com/apache/flink/pull/3048 Locally the tests just pass. Looking at the error logs, it doesn't have to do with the changes in the PR, for example: ``` java.io.FileNotFoundException: build-target/lib/flink-dist-*.jar (No such file or directory) at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:220) at java.util.zip.ZipFile.(ZipFile.java:150) at java.util.zip.ZipFile.(ZipFile.java:121) at sun.tools.jar.Main.list(Main.java:1060) at sun.tools.jar.Main.run(Main.java:291) at sun.tools.jar.Main.main(Main.java:1233) find: `./flink-yarn-tests/target/flink-yarn-tests*': No such file or directory ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-4920) Add a Scala Function Gauge
[ https://issues.apache.org/jira/browse/FLINK-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781253#comment-15781253 ] Pattarawat Chormai edited comment on FLINK-4920 at 12/27/16 9:04 PM: - [~Zentol] Thanks for your suggestion. [~StephanEwen] could you please assign me to the issue? was (Author: heytitle): @Chesney Thanks for your suggestion. @Stephan, could you please assign me to the issue? > Add a Scala Function Gauge > -- > > Key: FLINK-4920 > URL: https://issues.apache.org/jira/browse/FLINK-4920 > Project: Flink > Issue Type: Improvement > Components: Metrics, Scala API >Reporter: Stephan Ewen > Labels: easyfix, starter > > A useful metrics utility for the Scala API would be to add a Gauge that > obtains its value by calling a Scala Function0. > That way, one can add Gauges in Scala programs using Scala lambda notation or > function references. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4920) Add a Scala Function Gauge
[ https://issues.apache.org/jira/browse/FLINK-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781253#comment-15781253 ] Pattarawat Chormai commented on FLINK-4920: --- @Chesney Thanks for your suggestion. @Stephan, could you please assign me to the issue? > Add a Scala Function Gauge > -- > > Key: FLINK-4920 > URL: https://issues.apache.org/jira/browse/FLINK-4920 > Project: Flink > Issue Type: Improvement > Components: Metrics, Scala API >Reporter: Stephan Ewen > Labels: easyfix, starter > > A useful metrics utility for the Scala API would be to add a Gauge that > obtains its value by calling a Scala Function0. > That way, one can add Gauges in Scala programs using Scala lambda notation or > function references. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5280) Extend TableSource to support nested data
[ https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780914#comment-15780914 ] ASF GitHub Bot commented on FLINK-5280: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/3039 Hi @fhueske, @wuchong Thank you for your reviews. I think creating three abstract classes is a good idea, since we don't expect any new types of table sources, so there will not be a lot of combinations. I'll try to update the PR tomorrow according to all current comments. > Extend TableSource to support nested data > - > > Key: FLINK-5280 > URL: https://issues.apache.org/jira/browse/FLINK-5280 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.2.0 >Reporter: Fabian Hueske >Assignee: Ivan Mushketyk > > The {{TableSource}} interface does currently only support the definition of > flat rows. > However, there are several storage formats for nested data that should be > supported such as Avro, Json, Parquet, and Orc. The Table API and SQL can > also natively handle nested rows. > The {{TableSource}} interface and the code to register table sources in > Calcite's schema need to be extended to support nested data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #3039: [FLINK-5280] Update TableSource to support nested data
Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/3039 Hi @fhueske, @wuchong Thank you for your reviews. I think creating three abstract classes is a good idea, since we don't expect any new types of table sources, so there will not be a lot of combinations. I'll try to update the PR tomorrow according to all current comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5280) Extend TableSource to support nested data
[ https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780667#comment-15780667 ] ASF GitHub Bot commented on FLINK-5280: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3039 Thanks for working on this @mushketyk and the reviews @wuchong. I just add a comment regarding the Scala trait with implemented function. I'll do a more thorough review in the next days. Thanks, Fabian > Extend TableSource to support nested data > - > > Key: FLINK-5280 > URL: https://issues.apache.org/jira/browse/FLINK-5280 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.2.0 >Reporter: Fabian Hueske >Assignee: Ivan Mushketyk > > The {{TableSource}} interface does currently only support the definition of > flat rows. > However, there are several storage formats for nested data that should be > supported such as Avro, Json, Parquet, and Orc. The Table API and SQL can > also natively handle nested rows. > The {{TableSource}} interface and the code to register table sources in > Calcite's schema need to be extended to support nested data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #3039: [FLINK-5280] Update TableSource to support nested data
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3039 Thanks for working on this @mushketyk and the reviews @wuchong. I just add a comment regarding the Scala trait with implemented function. I'll do a more thorough review in the next days. Thanks, Fabian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3039: [FLINK-5280] Update TableSource to support nested ...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3039#discussion_r93943868 --- Diff: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaTableSource.java --- @@ -111,24 +112,17 @@ return kafkaSource; } - @Override - public int getNumberOfFields() { - return fieldNames.length; - } - - @Override public String[] getFieldsNames() { - return fieldNames; + return TableSource$class.getFieldsNames(this); --- End diff -- Hi, I'm not sure about implementing this as a Scala trait with implemented methods. IMO, this makes it much harder to implement TableSources in Java (esp. for users who are not familiar with Scala and its implications). What do you think about implementing `TableSource` as abstract class and providing three other abstract classes that extend `TableSource` with the batch, the stream, and both interfaces? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5395) support locally build distribution by script create_release_files.sh
[ https://issues.apache.org/jira/browse/FLINK-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780661#comment-15780661 ] ASF GitHub Bot commented on FLINK-5395: --- GitHub user shijinkui opened a pull request: https://github.com/apache/flink/pull/3049 [FLINK-5395] [Build System] support locally build distribution by script create_release_files.sh create_release_files.sh is build flink release only. It's hard to build custom local Flink release distribution. Let create_release_files.sh support: 1. custom git repo url 2. custom build special scala and hadoop version 3. add `tools/flink` to .gitignore 4. add usage - [X] General - The pull request references the related JIRA issue ("[FLINK-5395] support locally build distribution by script create_release_files.sh") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/shijinkui/flink FLINK-5395 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3049.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3049 commit 3b41c0942ef7ddd5921a32afbee2133392a594b7 Author: shijinkui Date: 2016-12-27T15:51:10Z [FLINK-5395] [Build System] support locally build distribution by script create_release_files.sh > support locally build distribution by script create_release_files.sh > > > Key: FLINK-5395 > URL: https://issues.apache.org/jira/browse/FLINK-5395 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: shijinkui > > create_release_files.sh is build flink release only. It's hard to build > custom local Flink release distribution. > Let create_release_files.sh support: > 1. custom git repo url > 2. custom build special scala and hadoop version > 3. add `tools/flink` to .gitignore > 4. add usage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5280) Extend TableSource to support nested data
[ https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780663#comment-15780663 ] ASF GitHub Bot commented on FLINK-5280: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3039#discussion_r93943868 --- Diff: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaTableSource.java --- @@ -111,24 +112,17 @@ return kafkaSource; } - @Override - public int getNumberOfFields() { - return fieldNames.length; - } - - @Override public String[] getFieldsNames() { - return fieldNames; + return TableSource$class.getFieldsNames(this); --- End diff -- Hi, I'm not sure about implementing this as a Scala trait with implemented methods. IMO, this makes it much harder to implement TableSources in Java (esp. for users who are not familiar with Scala and its implications). What do you think about implementing `TableSource` as abstract class and providing three other abstract classes that extend `TableSource` with the batch, the stream, and both interfaces? > Extend TableSource to support nested data > - > > Key: FLINK-5280 > URL: https://issues.apache.org/jira/browse/FLINK-5280 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.2.0 >Reporter: Fabian Hueske >Assignee: Ivan Mushketyk > > The {{TableSource}} interface does currently only support the definition of > flat rows. > However, there are several storage formats for nested data that should be > supported such as Avro, Json, Parquet, and Orc. The Table API and SQL can > also natively handle nested rows. > The {{TableSource}} interface and the code to register table sources in > Calcite's schema need to be extended to support nested data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #3049: [FLINK-5395] [Build System] support locally build ...
GitHub user shijinkui opened a pull request: https://github.com/apache/flink/pull/3049 [FLINK-5395] [Build System] support locally build distribution by script create_release_files.sh create_release_files.sh is build flink release only. It's hard to build custom local Flink release distribution. Let create_release_files.sh support: 1. custom git repo url 2. custom build special scala and hadoop version 3. add `tools/flink` to .gitignore 4. add usage - [X] General - The pull request references the related JIRA issue ("[FLINK-5395] support locally build distribution by script create_release_files.sh") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/shijinkui/flink FLINK-5395 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3049.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3049 commit 3b41c0942ef7ddd5921a32afbee2133392a594b7 Author: shijinkui Date: 2016-12-27T15:51:10Z [FLINK-5395] [Build System] support locally build distribution by script create_release_files.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5395) support locally build distribution by script create_release_files.sh
[ https://issues.apache.org/jira/browse/FLINK-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shijinkui updated FLINK-5395: - Description: create_release_files.sh is build flink release only. It's hard to build custom local Flink release distribution. Let create_release_files.sh support: 1. custom git repo url 2. custom build special scala and hadoop version 3. add `tools/flink` to .gitignore 4. add usage was: create_release_files.sh is build flink release only. It's hard to build custom local Flink release distribution. Let create_release_files.sh support: 1. custom git repo url 2. custom build special scala and hadoop version 3. fix flink-dist opt.xml have no replace the scala version by change-scala-version.sh > support locally build distribution by script create_release_files.sh > > > Key: FLINK-5395 > URL: https://issues.apache.org/jira/browse/FLINK-5395 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: shijinkui > > create_release_files.sh is build flink release only. It's hard to build > custom local Flink release distribution. > Let create_release_files.sh support: > 1. custom git repo url > 2. custom build special scala and hadoop version > 3. add `tools/flink` to .gitignore > 4. add usage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #3048: Clarified the import path of the Breeze DenseVecto...
GitHub user Fokko opened a pull request: https://github.com/apache/flink/pull/3048 Clarified the import path of the Breeze DenseVector Guys, I'm working on an extension of the ml library on Flink, but I stumbled upon this. Since it is such a trivial fix, I didn't created a JIRA request. Keep up the good work! Cheers, You can merge this pull request into a Git repository by running: $ git pull https://github.com/Fokko/flink fd-cleanup-package-structure Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3048.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3048 commit 3fd38fe9785d607a05d045cd54a05af9ed48e350 Author: Fokko Driesprong Date: 2016-12-27T14:43:15Z Replaced the full import path with the BreezeDenseVector itself to make it more readable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5376) Misleading log statements in UnorderedStreamElementQueue
[ https://issues.apache.org/jira/browse/FLINK-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-5376: -- Description: The following are two examples where ordered stream element queue is mentioned: {code} LOG.debug("Put element into ordered stream element queue. New filling degree " + "({}/{}).", numberEntries, capacity); return true; } else { LOG.debug("Failed to put element into ordered stream element queue because it " + {code} I guess OrderedStreamElementQueue was coded first. was: The following are two examples where ordered stream element queue is mentioned: {code} LOG.debug("Put element into ordered stream element queue. New filling degree " + "({}/{}).", numberEntries, capacity); return true; } else { LOG.debug("Failed to put element into ordered stream element queue because it " + {code} I guess OrderedStreamElementQueue was coded first. > Misleading log statements in UnorderedStreamElementQueue > > > Key: FLINK-5376 > URL: https://issues.apache.org/jira/browse/FLINK-5376 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > The following are two examples where ordered stream element queue is > mentioned: > {code} > LOG.debug("Put element into ordered stream element queue. New filling > degree " + > "({}/{}).", numberEntries, capacity); > return true; > } else { > LOG.debug("Failed to put element into ordered stream element queue > because it " + > {code} > I guess OrderedStreamElementQueue was coded first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3849) Add FilterableTableSource interface and translation rule
[ https://issues.apache.org/jira/browse/FLINK-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780544#comment-15780544 ] Fabian Hueske commented on FLINK-3849: -- Do you mean we need a single rule for pushing projection and filters into a {{BatchTableSourceScan}}, so basically extending the existing {{PushProjectIntoBatchTableSourceScanRule}} and {{PushProjectIntoStreamTableSourceScanRule}}? Can you explain why it would not be possible to have to separate rules? Thanks, Fabian > Add FilterableTableSource interface and translation rule > > > Key: FLINK-3849 > URL: https://issues.apache.org/jira/browse/FLINK-3849 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Fabian Hueske >Assignee: Anton Solovev > > Add a {{FilterableTableSource}} interface for {{TableSource}} implementations > which support filter push-down. > The interface could look as follows > {code} > def trait FilterableTableSource { > // returns unsupported predicate expression > def setPredicate(predicate: Expression): Expression > } > {code} > In addition we need Calcite rules to push a predicate (or parts of it) into a > TableScan that refers to a {{FilterableTableSource}}. We might need to tweak > the cost model as well to push the optimizer in the right direction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5358) Support RowTypeInfo extraction in TypeExtractor by Row instance
[ https://issues.apache.org/jira/browse/FLINK-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780533#comment-15780533 ] ASF GitHub Bot commented on FLINK-5358: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3027 Thanks @tonycox. PR is good to merge. > Support RowTypeInfo extraction in TypeExtractor by Row instance > --- > > Key: FLINK-5358 > URL: https://issues.apache.org/jira/browse/FLINK-5358 > Project: Flink > Issue Type: Improvement >Reporter: Anton Solovev >Assignee: Anton Solovev > > {code} > Row[] data = new Row[]{}; > TypeInformation typeInfo = TypeExtractor.getForObject(data[0]); > {code} > method {{getForObject}} wraps it into > {code} > GenericTypeInfo > {code} > the method should return {{RowTypeInfo}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #3027: [FLINK-5358] add RowTypeInfo exctraction in TypeExtractor
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3027 Thanks @tonycox. PR is good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3027: [FLINK-5358] add RowTypeInfo exctraction in TypeEx...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93937117 --- Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java --- @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } - + @Test + public void testRow() { --- End diff -- Oh, I'm sorry. I overlooked that check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5358) Support RowTypeInfo extraction in TypeExtractor by Row instance
[ https://issues.apache.org/jira/browse/FLINK-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780528#comment-15780528 ] ASF GitHub Bot commented on FLINK-5358: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93937117 --- Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java --- @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } - + @Test + public void testRow() { --- End diff -- Oh, I'm sorry. I overlooked that check. > Support RowTypeInfo extraction in TypeExtractor by Row instance > --- > > Key: FLINK-5358 > URL: https://issues.apache.org/jira/browse/FLINK-5358 > Project: Flink > Issue Type: Improvement >Reporter: Anton Solovev >Assignee: Anton Solovev > > {code} > Row[] data = new Row[]{}; > TypeInformation typeInfo = TypeExtractor.getForObject(data[0]); > {code} > method {{getForObject}} wraps it into > {code} > GenericTypeInfo > {code} > the method should return {{RowTypeInfo}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5392) flink-dist build failed when change scala version to 2.11
[ https://issues.apache.org/jira/browse/FLINK-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780523#comment-15780523 ] Fabian Hueske commented on FLINK-5392: -- There is another PR addressing this issue: https://github.com/apache/flink/pull/3047 > flink-dist build failed when change scala version to 2.11 > - > > Key: FLINK-5392 > URL: https://issues.apache.org/jira/browse/FLINK-5392 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.3.0 > Environment: jdk 1.8.112 >Reporter: 刘喆 > > when using > tools/change-scala-version.sh 2.11 > then building will fail at flink-dist. > Because some scala verion string is hard coded in > flink-dist/src/main/assemblies/opt.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5396) flink-dist replace scala version in opt.xml by change-scala-version.sh
[ https://issues.apache.org/jira/browse/FLINK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-5396. Resolution: Duplicate Duplicate of FLINK-5392. I'll add a link to PR [#3047|https://github.com/apache/flink/pull/3047] to FLINK-5392. > flink-dist replace scala version in opt.xml by change-scala-version.sh > -- > > Key: FLINK-5396 > URL: https://issues.apache.org/jira/browse/FLINK-5396 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: shijinkui > > flink-dist have configured for replacing bin.xml, but not opt.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4088) Add interface to save and load TableSources
[ https://issues.apache.org/jira/browse/FLINK-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780452#comment-15780452 ] Fabian Hueske commented on FLINK-4088: -- A Scala trait can be used as a Java interface if it does not have member variables and implemented methods. It would be good if the Scala trait fulfills these condition. > Add interface to save and load TableSources > --- > > Key: FLINK-4088 > URL: https://issues.apache.org/jira/browse/FLINK-4088 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Fabian Hueske >Assignee: Minudika Malshan > > Add an interface to save and load table sources similar to Java's > {{Serializable}} interface. > TableSources should implement the interface to become saveable and loadable. > This could be used as follows: > {code} > val cts = new CsvTableSource( > "/path/to/csvfile", > Array("name", "age", "address"), > Array(BasicTypeInfo.STRING_TYPEINFO, ...), > ... > ) > cts.saveToFile("/path/to/tablesource/file") > // - > val tEnv: TableEnvironment = ??? > tEnv.loadTableSource("persons", "/path/to/tablesource/file") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5396) flink-dist replace scala version in opt.xml by change-scala-version.sh
[ https://issues.apache.org/jira/browse/FLINK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780308#comment-15780308 ] ASF GitHub Bot commented on FLINK-5396: --- GitHub user shijinkui opened a pull request: https://github.com/apache/flink/pull/3047 [FLINK-5396] [Build System] flink-dist replace scala version in opt.x… flink-dist have configured for replacing bin.xml, but not opt.xml - [X] General - The pull request references the related JIRA issue ("[FLINK-5396] flink-dist replace scala version in opt.xml by change-scala-version.sh") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/shijinkui/flink FLINK-5396 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3047.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3047 commit d8dc5015079dd0780c1831aa8b52d82e5a9bbbed Author: shijinkui Date: 2016-12-27T12:29:16Z [FLINK-5396] [Build System] flink-dist replace scala version in opt.xml by change-scala-version.sh > flink-dist replace scala version in opt.xml by change-scala-version.sh > -- > > Key: FLINK-5396 > URL: https://issues.apache.org/jira/browse/FLINK-5396 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: shijinkui > > flink-dist have configured for replacing bin.xml, but not opt.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #3047: [FLINK-5396] [Build System] flink-dist replace sca...
GitHub user shijinkui opened a pull request: https://github.com/apache/flink/pull/3047 [FLINK-5396] [Build System] flink-dist replace scala version in opt.x⦠flink-dist have configured for replacing bin.xml, but not opt.xml - [X] General - The pull request references the related JIRA issue ("[FLINK-5396] flink-dist replace scala version in opt.xml by change-scala-version.sh") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/shijinkui/flink FLINK-5396 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3047.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3047 commit d8dc5015079dd0780c1831aa8b52d82e5a9bbbed Author: shijinkui Date: 2016-12-27T12:29:16Z [FLINK-5396] [Build System] flink-dist replace scala version in opt.xml by change-scala-version.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5396) flink-dist replace scala version in opt.xml by change-scala-version.sh
[ https://issues.apache.org/jira/browse/FLINK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shijinkui updated FLINK-5396: - Component/s: Build System > flink-dist replace scala version in opt.xml by change-scala-version.sh > -- > > Key: FLINK-5396 > URL: https://issues.apache.org/jira/browse/FLINK-5396 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: shijinkui > > flink-dist have configured for replacing bin.xml, but not opt.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5396) flink-dist replace scala version in opt.xml by change-scala-version.sh
shijinkui created FLINK-5396: Summary: flink-dist replace scala version in opt.xml by change-scala-version.sh Key: FLINK-5396 URL: https://issues.apache.org/jira/browse/FLINK-5396 Project: Flink Issue Type: Improvement Reporter: shijinkui flink-dist have configured for replacing bin.xml, but not opt.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5395) support locally build distribution by script create_release_files.sh
shijinkui created FLINK-5395: Summary: support locally build distribution by script create_release_files.sh Key: FLINK-5395 URL: https://issues.apache.org/jira/browse/FLINK-5395 Project: Flink Issue Type: Improvement Components: Build System Reporter: shijinkui create_release_files.sh is build flink release only. It's hard to build custom local Flink release distribution. Let create_release_files.sh support: 1. custom git repo url 2. custom build special scala and hadoop version 3. fix flink-dist opt.xml have no replace the scala version by change-scala-version.sh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work
[ https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing updated FLINK-5394: - Description: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel. So previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. The question would also appear to all Flink RelNodes which are subclass of SingleRel. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. was: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel. So previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. > the estimateRowCount method of DataSetCalc didn't work > -- > > Key: FLINK-5394 > URL: https://issues.apache.org/jira/browse/FLINK-5394 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Reporter: zhangjing >Assignee: zhangjing > > The estimateRowCount method of DataSetCalc didn't work now. > If I run the following code, > ` > Table table = tableEnv > .fromDataSet(data, "a, b, c") > .groupBy("a") > .select("a, a.avg, b.sum, c.count") > .where("a == 1"); > ` > the cost of every node in Optimized node tree is : > ` > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > ` > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > There are two reasons caused to this: > 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls > RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount > which would dispatch to RelM
[jira] [Updated] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work
[ https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing updated FLINK-5394: - Description: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel. So previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. was: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. > the estimateRowCount method of DataSetCalc didn't work > -- > > Key: FLINK-5394 > URL: https://issues.apache.org/jira/browse/FLINK-5394 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Reporter: zhangjing >Assignee: zhangjing > > The estimateRowCount method of DataSetCalc didn't work now. > If I run the following code, > ` > Table table = tableEnv > .fromDataSet(data, "a, b, c") > .groupBy("a") > .select("a, a.avg, b.sum, c.count") > .where("a == 1"); > ` > the cost of every node in Optimized node tree is : > ` > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > ` > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > There are two reasons caused to this: > 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls > RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount > which would dispatch to RelMdRowCount. > 2. DataSetCalc is subclass of SingleRel. So previous function call would > match getRowCount(SingleRel rel, RelMetadataQue
[jira] [Updated] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work
[ https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing updated FLINK-5394: - Description: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. was: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} |_ DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} |_ DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. > the estimateRowCount method of DataSetCalc didn't work > -- > > Key: FLINK-5394 > URL: https://issues.apache.org/jira/browse/FLINK-5394 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Reporter: zhangjing >Assignee: zhangjing > > The estimateRowCount method of DataSetCalc didn't work now. > If I run the following code, > ` > Table table = tableEnv > .fromDataSet(data, "a, b, c") > .groupBy("a") > .select("a, a.avg, b.sum, c.count") > .where("a == 1"); > ` > the cost of every node in Optimized node tree is : > ` > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > ` > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > There are two reasons caused to this: > 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to > estimate its input rowcount which would dispatch to RelMdRowCount. > 2. DataSetCalc is subclass of SingleRel, so previous function call would > match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use > DataSetCalc.estimateRowCount. > I plan to resolve this problem
[jira] [Updated] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work
[ https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing updated FLINK-5394: - Description: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} |_ DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} |_ DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. was: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. > the estimateRowCount method of DataSetCalc didn't work > -- > > Key: FLINK-5394 > URL: https://issues.apache.org/jira/browse/FLINK-5394 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Reporter: zhangjing >Assignee: zhangjing > > The estimateRowCount method of DataSetCalc didn't work now. > If I run the following code, > ` > Table table = tableEnv > .fromDataSet(data, "a, b, c") > .groupBy("a") > .select("a, a.avg, b.sum, c.count") > .where("a == 1"); > ` > the cost of every node in Optimized node tree is : > ` > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > |_ DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > |_ DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > ` > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > There are two reasons caused to this: > 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to > estimate its input rowcount which would dispatch to RelMdRowCount. > 2. DataSetCalc is subclass of SingleRel, so previous function call would > match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use > DataSetCalc.estimateRowCount. > I plan to resolve this probl
[jira] [Updated] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work
[ https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing updated FLINK-5394: - Description: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. was: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io}, id = 28 DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}, id = 27 DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io}, id = 20 ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. > the estimateRowCount method of DataSetCalc didn't work > -- > > Key: FLINK-5394 > URL: https://issues.apache.org/jira/browse/FLINK-5394 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Reporter: zhangjing >Assignee: zhangjing > > The estimateRowCount method of DataSetCalc didn't work now. > If I run the following code, > ` > Table table = tableEnv > .fromDataSet(data, "a, b, c") > .groupBy("a") > .select("a, a.avg, b.sum, c.count") > .where("a == 1"); > ` > the cost of every node in Optimized node tree is : > ` > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, > COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, > 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, > cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative > cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > ` > We expect the input rowcount of DataSetAggregate less than 1000, however the > actual input rowcount is still 1000 because the the estimateRowCount method > of DataSetCalc didn't work. > There are two reasons caused to this: > 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to > estimate its input rowcount which would dispatch to RelMdRowCount. > 2. DataSetCalc is subclass of SingleRel, so previous function call would > match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use > DataSetCalc.estimateRowCount. > I plan to res
[jira] [Created] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work
zhangjing created FLINK-5394: Summary: the estimateRowCount method of DataSetCalc didn't work Key: FLINK-5394 URL: https://issues.apache.org/jira/browse/FLINK-5394 Project: Flink Issue Type: Bug Components: Table API & SQL Reporter: zhangjing Assignee: zhangjing The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io}, id = 28 DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}, id = 27 DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io}, id = 20 ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel, so previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #3036: [FLINK-5368] Throw exception if kafka topic doesn'...
GitHub user HungUnicorn reopened a pull request: https://github.com/apache/flink/pull/3036 [FLINK-5368] Throw exception if kafka topic doesn't exist As a developer when reading data from many topics, I want Kafka consumer to show something if any topic is not available. The motivation is we read many topics as list at one time, and sometimes we fail to recognize that one or two topics' names have been changed or deprecated, and Flink Kafka connector didn't show the error. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HungUnicorn/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3036.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3036 commit b632edead770fd8386a65b6f67c739ad9c280a7c Author: HungUnicorn Date: 2016-12-27T10:37:30Z [FLINK-5368] log msg if kafka topic doesn't have any partitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5368) Let Kafka consumer show something when it fails to read one topic out of topic list
[ https://issues.apache.org/jira/browse/FLINK-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780140#comment-15780140 ] ASF GitHub Bot commented on FLINK-5368: --- GitHub user HungUnicorn reopened a pull request: https://github.com/apache/flink/pull/3036 [FLINK-5368] Throw exception if kafka topic doesn't exist As a developer when reading data from many topics, I want Kafka consumer to show something if any topic is not available. The motivation is we read many topics as list at one time, and sometimes we fail to recognize that one or two topics' names have been changed or deprecated, and Flink Kafka connector didn't show the error. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HungUnicorn/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3036.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3036 commit b632edead770fd8386a65b6f67c739ad9c280a7c Author: HungUnicorn Date: 2016-12-27T10:37:30Z [FLINK-5368] log msg if kafka topic doesn't have any partitions > Let Kafka consumer show something when it fails to read one topic out of > topic list > --- > > Key: FLINK-5368 > URL: https://issues.apache.org/jira/browse/FLINK-5368 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Sendoh >Assignee: Sendoh >Priority: Critical > > As a developer when reading data from many topics, I want Kafka consumer to > show something if any topic is not available. The motivation is we read many > topics as list at one time, and sometimes we fail to recognize that one or > two topics' names have been changed or deprecated, and Flink Kafka connector > doesn't show the error. > My proposed change would be either to throw RuntimeException or to use > LOG.error(topic + "doesn't have any partition") if partitionsForTopic is null > at this function. > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer09.java#L208 > Any suggestion is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5368) Let Kafka consumer show something when it fails to read one topic out of topic list
[ https://issues.apache.org/jira/browse/FLINK-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780108#comment-15780108 ] ASF GitHub Bot commented on FLINK-5368: --- Github user HungUnicorn closed the pull request at: https://github.com/apache/flink/pull/3036 > Let Kafka consumer show something when it fails to read one topic out of > topic list > --- > > Key: FLINK-5368 > URL: https://issues.apache.org/jira/browse/FLINK-5368 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Sendoh >Assignee: Sendoh >Priority: Critical > > As a developer when reading data from many topics, I want Kafka consumer to > show something if any topic is not available. The motivation is we read many > topics as list at one time, and sometimes we fail to recognize that one or > two topics' names have been changed or deprecated, and Flink Kafka connector > doesn't show the error. > My proposed change would be either to throw RuntimeException or to use > LOG.error(topic + "doesn't have any partition") if partitionsForTopic is null > at this function. > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer09.java#L208 > Any suggestion is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #3036: [FLINK-5368] Throw exception if kafka topic doesn'...
Github user HungUnicorn closed the pull request at: https://github.com/apache/flink/pull/3036 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5368) Let Kafka consumer show something when it fails to read one topic out of topic list
[ https://issues.apache.org/jira/browse/FLINK-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780093#comment-15780093 ] ASF GitHub Bot commented on FLINK-5368: --- Github user HungUnicorn commented on a diff in the pull request: https://github.com/apache/flink/pull/3036#discussion_r93916373 --- Diff: flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer09.java --- @@ -208,13 +208,12 @@ public FlinkKafkaConsumer09(List topics, KeyedDeserializationSchema d if (partitionsForTopic != null) { partitions.addAll(convertToFlinkKafkaTopicPartition(partitionsForTopic)); } + else{ + throw new RuntimeException("Unable to retrieve any partitions for the requested topics " + topic); --- End diff -- Agree. I will change the exception message to INFO log message. > Let Kafka consumer show something when it fails to read one topic out of > topic list > --- > > Key: FLINK-5368 > URL: https://issues.apache.org/jira/browse/FLINK-5368 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Sendoh >Assignee: Sendoh >Priority: Critical > > As a developer when reading data from many topics, I want Kafka consumer to > show something if any topic is not available. The motivation is we read many > topics as list at one time, and sometimes we fail to recognize that one or > two topics' names have been changed or deprecated, and Flink Kafka connector > doesn't show the error. > My proposed change would be either to throw RuntimeException or to use > LOG.error(topic + "doesn't have any partition") if partitionsForTopic is null > at this function. > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer09.java#L208 > Any suggestion is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #3036: [FLINK-5368] Throw exception if kafka topic doesn'...
Github user HungUnicorn commented on a diff in the pull request: https://github.com/apache/flink/pull/3036#discussion_r93916373 --- Diff: flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer09.java --- @@ -208,13 +208,12 @@ public FlinkKafkaConsumer09(List topics, KeyedDeserializationSchema d if (partitionsForTopic != null) { partitions.addAll(convertToFlinkKafkaTopicPartition(partitionsForTopic)); } + else{ + throw new RuntimeException("Unable to retrieve any partitions for the requested topics " + topic); --- End diff -- Agree. I will change the exception message to INFO log message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-4670) Add watch mechanism on current RPC framework
[ https://issues.apache.org/jira/browse/FLINK-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing closed FLINK-4670. Resolution: Not A Problem > Add watch mechanism on current RPC framework > > > Key: FLINK-4670 > URL: https://issues.apache.org/jira/browse/FLINK-4670 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management >Reporter: zhangjing >Assignee: zhangjing > Fix For: 1.2.0 > > > Add watch mechanism on current RPC framework so that RPC gateway could be > watched to make sure the rpc server is running just like previous DeathWatch > in akka -- This message was sent by Atlassian JIRA (v6.3.4#6332)