[jira] [Commented] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251365#comment-15251365 ] Fabian Hueske commented on FLINK-3781: -- [~readman] Yes, definitely :-) Thanks for your contribution. I gave you contributor permissions as well. You can now assign issues to yourself if you'd like to continue contributing. > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Chenguang He >Priority: Minor > Fix For: 1.1.0 > > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske updated FLINK-3781: - Assignee: Chenguang He > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Chenguang He >Priority: Minor > Fix For: 1.1.0 > > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3229) Kinesis streaming consumer with integration of Flink's checkpointing mechanics
[ https://issues.apache.org/jira/browse/FLINK-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251355#comment-15251355 ] ASF GitHub Bot commented on FLINK-3229: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/1911#discussion_r60531075 --- Diff: flink-streaming-connectors/pom.xml --- @@ -45,6 +45,7 @@ under the License. flink-connector-rabbitmq flink-connector-twitter flink-connector-nifi + flink-connector-kinesis --- End diff -- Thanks, I missed the "include-kinesis" profile defined below. We'll probably need a more general profile name in the future though (ex. include-aws-connectors), for example when we start including more Amazon licensed libraries for other connectors such as for DynamoDB > Kinesis streaming consumer with integration of Flink's checkpointing mechanics > -- > > Key: FLINK-3229 > URL: https://issues.apache.org/jira/browse/FLINK-3229 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Opening a sub-task to implement data source consumer for Kinesis streaming > connector (https://issues.apache.org/jira/browser/FLINK-3211). > An example of the planned user API for Flink Kinesis Consumer: > {code} > Properties kinesisConfig = new Properties(); > config.put(KinesisConfigConstants.CONFIG_AWS_REGION, "us-east-1"); > config.put(KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_TYPE, > "BASIC"); > config.put( > KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_ACCESSKEYID, > "aws_access_key_id_here"); > config.put( > KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_SECRETKEY, > "aws_secret_key_here"); > config.put(KinesisConfigConstants.CONFIG_STREAM_INIT_POSITION_TYPE, > "LATEST"); // or TRIM_HORIZON > DataStream kinesisRecords = env.addSource(new FlinkKinesisConsumer<>( > "kinesis_stream_name", > new SimpleStringSchema(), > kinesisConfig)); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3229] Flink streaming consumer for AWS ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/1911#discussion_r60531075 --- Diff: flink-streaming-connectors/pom.xml --- @@ -45,6 +45,7 @@ under the License. flink-connector-rabbitmq flink-connector-twitter flink-connector-nifi + flink-connector-kinesis --- End diff -- Thanks, I missed the "include-kinesis" profile defined below. We'll probably need a more general profile name in the future though (ex. include-aws-connectors), for example when we start including more Amazon licensed libraries for other connectors such as for DynamoDB --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-3629) In wikiedits Quick Start example, "The first call, .window()" should be "The first call, .timeWindow()"
[ https://issues.apache.org/jira/browse/FLINK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Fanxi closed FLINK-3629. --- > In wikiedits Quick Start example, "The first call, .window()" should be "The > first call, .timeWindow()" > --- > > Key: FLINK-3629 > URL: https://issues.apache.org/jira/browse/FLINK-3629 > Project: Flink > Issue Type: Bug > Components: Quickstarts >Affects Versions: 1.0.0 >Reporter: Li Fanxi >Priority: Trivial > Fix For: 1.0.0, 1.0.1 > > Original Estimate: 5m > Remaining Estimate: 5m > > On Quick Start page > (https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html). > {quote} > The first call, .window(), specifies that we want to have tumbling > (non-overlapping) windows of five seconds. > {quote} > This is not consistent with the sample code. In the sample code, {code:java} > KeyedStream.timeWindow() {code} function is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3789) Overload methods which trigger program execution to allow naming job
[ https://issues.apache.org/jira/browse/FLINK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251023#comment-15251023 ] Greg Hogan commented on FLINK-3789: --- I was thinking on Clustering Coefficient, for which we return the local clustering coefficient for each vertex as in DataSet via a GraphAlgorithm, that it would also be nice to compute the global clustering coefficient which would need to access accumulators. Both local and global clustering coefficient count triangles so their is certainly advantage it computing the two simultaneously, but there is extra cost for each so we should allow separate computation. So there is need to do similar things as collect and count but still allow the user to perform the execute (which of course allows direct configuration of the job name) so they can compose multiple algorithms and analytics. Perhaps instead of overloading these functions we can provide alternative, slightly more sophisticated options which would allow configuring a job name. In many ways the current implementation of count, collect, print, and checksum is very limiting because you can only perform that single action per job. You can't print and count, or print and write. The current DataSet API works well because it's simple, but I think we could expand on this. > Overload methods which trigger program execution to allow naming job > > > Key: FLINK-3789 > URL: https://issues.apache.org/jira/browse/FLINK-3789 > Project: Flink > Issue Type: Improvement > Components: Java API >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > Overload the following functions to additionally accept a job name to pass to > {{ExecutionEnvironment.execute(String)}}. > * {{DataSet.collect()}} > * {{DataSet.count()}} > * {{DataSetUtils.checksumHashCode(DataSet)}} > * {{GraphUtils.checksumHashCode(Graph)}} > Once the deprecated {{DataSet.print(String)}} and > {{DataSet.printToErr(String)}} are removed we can overload > {{DataSet.print()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250980#comment-15250980 ] Chenguang He commented on FLINK-3781: - Hello Fabian, Could you please assign this to me, this is the first time i try to contribute in JIRA and i really want to remember this moment > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Fix For: 1.1.0 > > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3665) Range partitioning lacks support to define sort orders
[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-3665. Resolution: Implemented Implemented with d8fb23085839a514fbd0b56cc681dd52aac503ac Thanks for the contribution [~dawidwys]! > Range partitioning lacks support to define sort orders > -- > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Fabian Hueske >Assignee: Dawid Wysakowicz > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2998) Support range partition comparison for multi input nodes.
[ https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-2998. -- Resolution: Implemented Fix Version/s: 1.1.0 Implemented with 605b6d870d0da38fb5446675709c7243127cdff1 Thanks for the contribution! > Support range partition comparison for multi input nodes. > - > > Key: FLINK-2998 > URL: https://issues.apache.org/jira/browse/FLINK-2998 > Project: Flink > Issue Type: New Feature > Components: Optimizer >Reporter: Chengxiang Li >Assignee: GaoLun >Priority: Minor > Fix For: 1.1.0 > > > The optimizer may have potential opportunity to optimize the DAG while it > found two input range partition are equivalent, we does not support the > comparison yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2998) Support range partition comparison for multi input nodes.
[ https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske updated FLINK-2998: - Assignee: GaoLun > Support range partition comparison for multi input nodes. > - > > Key: FLINK-2998 > URL: https://issues.apache.org/jira/browse/FLINK-2998 > Project: Flink > Issue Type: New Feature > Components: Optimizer >Reporter: Chengxiang Li >Assignee: GaoLun >Priority: Minor > > The optimizer may have potential opportunity to optimize the DAG while it > found two input range partition are equivalent, we does not support the > comparison yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-3781. -- Resolution: Fixed Fix Version/s: 1.1.0 Fixed with a43bade0d87d373ff236152d13d8d56e17220605 > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Fix For: 1.1.0 > > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders
[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250953#comment-15250953 ] ASF GitHub Bot commented on FLINK-3665: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1848 > Range partitioning lacks support to define sort orders > -- > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Fabian Hueske >Assignee: Dawid Wysakowicz > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250955#comment-15250955 ] ASF GitHub Bot commented on FLINK-3781: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1908 > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2998) Support range partition comparison for multi input nodes.
[ https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250954#comment-15250954 ] ASF GitHub Bot commented on FLINK-2998: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1838 > Support range partition comparison for multi input nodes. > - > > Key: FLINK-2998 > URL: https://issues.apache.org/jira/browse/FLINK-2998 > Project: Flink > Issue Type: New Feature > Components: Optimizer >Reporter: Chengxiang Li >Priority: Minor > > The optimizer may have potential opportunity to optimize the DAG while it > found two input range partition are equivalent, we does not support the > comparison yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2998] Support range partition compariso...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1838 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1908 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3665] Implemented sort orders support i...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1848 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Flink 3691 extend avroinputformat to support g...
GitHub user gna-phetsarath opened a pull request: https://github.com/apache/flink/pull/1920 Flink 3691 extend avroinputformat to support generic records Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [ ] General - The pull request references the related JIRA issue - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [ ] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/gna-phetsarath/flink FLINK-3691-extend_avroinputformat_to_support_generic_records Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1920.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1920 commit 78b16a080105a188fcfc0f2a1731b87857f4080f Author: Phetsarath, Sourigna Date: 2016-04-05T22:01:24Z [FLINK-3691] Extend AvroInputFormat to support Avro GenericRecord commit d122c6b0e125af39163514d286fe7abdbf16765d Author: Phetsarath, Sourigna Date: 2016-04-06T19:21:06Z [FLINK-3691] Extend AvroInputFormat to support Avro GenericRecord. Fixed Style issue after running verify. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library
[ https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250825#comment-15250825 ] ASF GitHub Bot commented on FLINK-2157: --- Github user rawkintrevo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212633912 Also two quick issues. *pipelines* ```scala val scaler = MinMaxScaler() val pipeline = scaler.chainPredictor(mlr) val evaluationDS = survivalLV.map(x => (x.vector, x.label)) pipeline.fit(survivalLV) scorer.evaluate(evaluationDS, pipeline).collect().head ``` When using this with a ChainedPredictor as the predictor I get the following error: error: could not find implicit value for parameter evaluateOperation: org.apache.flink.ml.pipeline.EvaluateDataSetOperation[org.apache.flink.ml.pipeline.ChainedPredictor[org.apache.flink.ml.preprocessing.MinMaxScaler,org.apache.flink.ml.regression.MultipleLinearRegression],(org.apache.flink.ml.math.Vector, Double),Double] *MinMaxScaler()* Merging for me broke the following code: ```scala val scaler = MinMaxScaler() val scaledSurvivalLV = scaler.transform(survivalLV) ``` With the following error (omiting part of the stack trace) Caused by: java.lang.NoSuchMethodError: breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf; at org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:156) at org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:154) at org.apache.flink.api.scala.DataSet$$anon$7.reduce(DataSet.scala:584) at org.apache.flink.runtime.operators.chaining.ChainedAllReduceDriver.collect(ChainedAllReduceDriver.java:93) at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:480) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) I'm looking for a work around. Just saying I found a regression. Other than that, looks/works AWESOME well done. > Create evaluation framework for ML library > -- > > Key: FLINK-2157 > URL: https://issues.apache.org/jira/browse/FLINK-2157 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 1.0.0 > > > Currently, FlinkML lacks means to evaluate the performance of trained models. > It would be great to add some {{Evaluators}} which can calculate some score > based on the information about true and predicted labels. This could also be > used for the cross validation to choose the right hyper parameters. > Possible scores could be F score [1], zero-one-loss score, etc. > Resources > [1] [http://en.wikipedia.org/wiki/F1_score] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...
Github user rawkintrevo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212633912 Also two quick issues. *pipelines* ```scala val scaler = MinMaxScaler() val pipeline = scaler.chainPredictor(mlr) val evaluationDS = survivalLV.map(x => (x.vector, x.label)) pipeline.fit(survivalLV) scorer.evaluate(evaluationDS, pipeline).collect().head ``` When using this with a ChainedPredictor as the predictor I get the following error: error: could not find implicit value for parameter evaluateOperation: org.apache.flink.ml.pipeline.EvaluateDataSetOperation[org.apache.flink.ml.pipeline.ChainedPredictor[org.apache.flink.ml.preprocessing.MinMaxScaler,org.apache.flink.ml.regression.MultipleLinearRegression],(org.apache.flink.ml.math.Vector, Double),Double] *MinMaxScaler()* Merging for me broke the following code: ```scala val scaler = MinMaxScaler() val scaledSurvivalLV = scaler.transform(survivalLV) ``` With the following error (omiting part of the stack trace) Caused by: java.lang.NoSuchMethodError: breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf; at org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:156) at org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:154) at org.apache.flink.api.scala.DataSet$$anon$7.reduce(DataSet.scala:584) at org.apache.flink.runtime.operators.chaining.ChainedAllReduceDriver.collect(ChainedAllReduceDriver.java:93) at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:480) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) I'm looking for a work around. Just saying I found a regression. Other than that, looks/works AWESOME well done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-3665) Range partitioning lacks support to define sort orders
[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske updated FLINK-3665: - Assignee: Dawid Wysakowicz > Range partitioning lacks support to define sort orders > -- > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Fabian Hueske >Assignee: Dawid Wysakowicz > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2998) Support range partition comparison for multi input nodes.
[ https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250727#comment-15250727 ] ASF GitHub Bot commented on FLINK-2998: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1838#issuecomment-212608296 Merging this PR > Support range partition comparison for multi input nodes. > - > > Key: FLINK-2998 > URL: https://issues.apache.org/jira/browse/FLINK-2998 > Project: Flink > Issue Type: New Feature > Components: Optimizer >Reporter: Chengxiang Li >Priority: Minor > > The optimizer may have potential opportunity to optimize the DAG while it > found two input range partition are equivalent, we does not support the > comparison yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3665] Implemented sort orders support i...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1848#issuecomment-212608472 Merging this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250730#comment-15250730 ] ASF GitHub Bot commented on FLINK-3781: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1908#issuecomment-212608557 Merging this PR > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1908#issuecomment-212608557 Merging this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders
[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250729#comment-15250729 ] ASF GitHub Bot commented on FLINK-3665: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1848#issuecomment-212608472 Merging this PR > Range partitioning lacks support to define sort orders > -- > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Fabian Hueske > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2998] Support range partition compariso...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1838#issuecomment-212608296 Merging this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders
[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250645#comment-15250645 ] ASF GitHub Bot commented on FLINK-3665: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1848#issuecomment-212590297 Thanks @dawidwys, unrelated test failures are OK. I'll run the tests again after I rebasing and merging the PR. > Range partitioning lacks support to define sort orders > -- > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Fabian Hueske > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3665] Implemented sort orders support i...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1848#issuecomment-212590297 Thanks @dawidwys, unrelated test failures are OK. I'll run the tests again after I rebasing and merging the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250610#comment-15250610 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1917#issuecomment-212583555 Thanks for the PR. Looks good. I had a few minor comments. I also marked a few tests that could be removed because they do not increase test coverage, IMO. In general, we should be more careful when adding tests. Builds start to fail on Travis because the 2h threshold is exceeded. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1917#issuecomment-212583555 Thanks for the PR. Looks good. I had a few minor comments. I also marked a few tests that could be removed because they do not increase test coverage, IMO. In general, we should be more careful when adding tests. Builds start to fail on Travis because the 2h threshold is exceeded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250602#comment-15250602 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477843 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/FilterITCase.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class FilterITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSimpleFilter(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable WHERE a = 3" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testAllRejectingFilter(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable WHERE false" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +assertEquals(true, StreamITCase.testResults.isEmpty) + } + + @Test + def testAllPassingFilter(): Unit = { --- End diff -- This test might be removed. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250606#comment-15250606 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477908 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/SelectITCase.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class SelectITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSelectStarFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList( + "1,1,Hi", + "2,2,Hello", + "3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testSelectFirstFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM a FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("1", "2", "3") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testSelectExpressionFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM a * 2, b - 1 FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("2,0", "4,1", "6,1") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + + @Test + def testSelectSecondFromDataStream(): Unit = { --- End diff -- This test might be removed. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the supp
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250604#comment-15250604 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477897 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/SelectITCase.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class SelectITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSelectStarFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList( + "1,1,Hi", + "2,2,Hello", + "3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testSelectFirstFromTable(): Unit = { --- End diff -- This test might be removed. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250600#comment-15250600 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477818 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/FilterITCase.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class FilterITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSimpleFilter(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable WHERE a = 3" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testAllRejectingFilter(): Unit = { --- End diff -- This test might be removed. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250603#comment-15250603 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477888 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/SelectITCase.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class SelectITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSelectStarFromTable(): Unit = { --- End diff -- This test might be removed. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477908 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/SelectITCase.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class SelectITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSelectStarFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList( + "1,1,Hi", + "2,2,Hello", + "3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testSelectFirstFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM a FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("1", "2", "3") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testSelectExpressionFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM a * 2, b - 1 FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("2,0", "4,1", "6,1") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + + @Test + def testSelectSecondFromDataStream(): Unit = { --- End diff -- This test might be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477888 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/SelectITCase.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class SelectITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSelectStarFromTable(): Unit = { --- End diff -- This test might be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477897 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/SelectITCase.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class SelectITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSelectStarFromTable(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList( + "1,1,Hi", + "2,2,Hello", + "3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testSelectFirstFromTable(): Unit = { --- End diff -- This test might be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-3794) Add checks for unsupported operations in streaming table API
[ https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri reassigned FLINK-3794: Assignee: Vasia Kalavri > Add checks for unsupported operations in streaming table API > > > Key: FLINK-3794 > URL: https://issues.apache.org/jira/browse/FLINK-3794 > Project: Flink > Issue Type: Improvement > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Unsupported operations on streaming tables currently fail during plan > translation. It would be nicer to add checks in the Table API methods and > fail with an informative message that the operation is not supported. The > operations that are not currently supported are: > - aggregations inside select > - groupBy > - distinct > - join > We can simply check if the Table's environment is a streaming environment and > throw an unsupported operation exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477843 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/FilterITCase.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class FilterITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSimpleFilter(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable WHERE a = 3" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testAllRejectingFilter(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable WHERE false" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +assertEquals(true, StreamITCase.testResults.isEmpty) + } + + @Test + def testAllPassingFilter(): Unit = { --- End diff -- This test might be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60477818 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/streaming/test/FilterITCase.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala.sql.streaming.test + +import org.apache.flink.api.scala._ +import org.apache.flink.api.scala.table._ +import org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, StreamTestData} +import org.apache.flink.api.table.{Row, TableEnvironment} +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase +import org.junit.Assert._ +import org.junit._ + +import scala.collection.mutable + +class FilterITCase extends StreamingMultipleProgramsTestBase { + + @Test + def testSimpleFilter(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = TableEnvironment.getTableEnvironment(env) +StreamITCase.testResults = mutable.MutableList() + +val sqlQuery = "SELECT STREAM * FROM MyTable WHERE a = 3" + +val t = StreamTestData.getSmall3TupleDataStream(env).toTable(tEnv).as('a, 'b, 'c) +tEnv.registerTable("MyTable", t) + +val result = tEnv.sql(sqlQuery).toDataStream[Row] +result.addSink(new StreamITCase.StringSink) +env.execute() + +val expected = mutable.MutableList("3,2,Hello world") +assertEquals(expected.sorted, StreamITCase.testResults.sorted) + } + + @Test + def testAllRejectingFilter(): Unit = { --- End diff -- This test might be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3665] Implemented sort orders support i...
Github user dawidwys commented on the pull request: https://github.com/apache/flink/pull/1848#issuecomment-212581664 I think I applied all changes. Unfortunately the travis build fails but I don't know why. The test it fails has nothing in common with my changes. It also passes locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders
[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250596#comment-15250596 ] ASF GitHub Bot commented on FLINK-3665: --- Github user dawidwys commented on the pull request: https://github.com/apache/flink/pull/1848#issuecomment-212581664 I think I applied all changes. Unfortunately the travis build fails but I don't know why. The test it fails has nothing in common with my changes. It also passes locally. > Range partitioning lacks support to define sort orders > -- > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Fabian Hueske > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3790) Rolling File sink does not pick up hadoop configuration
[ https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-3790: - Assignee: Gyula Fora > Rolling File sink does not pick up hadoop configuration > --- > > Key: FLINK-3790 > URL: https://issues.apache.org/jira/browse/FLINK-3790 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.0.2 >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Critical > > In the rolling file sink function, a new hadoop configuration is created to > get the FileSystem every time, which completely ignores the hadoop config set > in flink-conf.yaml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3798) Clean up RocksDB state backend access modifiers
[ https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-3798: - Assignee: Gyula Fora > Clean up RocksDB state backend access modifiers > --- > > Key: FLINK-3798 > URL: https://issues.apache.org/jira/browse/FLINK-3798 > Project: Flink > Issue Type: Improvement > Components: Streaming Connectors >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Minor > > The RocksDB state backend uses a lot package private methods and fields which > makes it very hard to subclass the different parts for added functionality. I > think these should be protected instead. > Also the AbstractRocksDBState declares some methods final when there are > use-cases when a subclass migh want to call them. > Just to give you an example I am creating a version of the value state which > would keep a small cache on heap. For this it would be enough to subclass the > RockDBStateBackend and RocksDBVAlue state classes if the above mentioned > changes were made. Now I have to use reflection to access package private > fields and actually copy classes due to final methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3790] [streaming] Use proper hadoop con...
GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/1919 [FLINK-3790] [streaming] Use proper hadoop config in rolling sink You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink rolling-conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1919.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1919 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration
[ https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250570#comment-15250570 ] ASF GitHub Bot commented on FLINK-3790: --- GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/1919 [FLINK-3790] [streaming] Use proper hadoop config in rolling sink You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink rolling-conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1919.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1919 > Rolling File sink does not pick up hadoop configuration > --- > > Key: FLINK-3790 > URL: https://issues.apache.org/jira/browse/FLINK-3790 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.0.2 >Reporter: Gyula Fora >Priority: Critical > > In the rolling file sink function, a new hadoop configuration is created to > get the FileSystem every time, which completely ignores the hadoop config set > in flink-conf.yaml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250559#comment-15250559 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60473805 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/TableEnvironment.scala --- @@ -98,8 +98,42 @@ abstract class TableEnvironment(val config: TableConfig) { checkValidTableName(name) -val tableTable = new TableTable(table.getRelNode) -registerTableInternal(name, tableTable) +table.tableEnv match { + case e: BatchTableEnvironment => +val tableTable = new TableTable(table.getRelNode) +registerTableInternal(name, tableTable) + case e: StreamTableEnvironment => +val sTableTable = new TransStreamTable(table.getRelNode, true) +tables.add(name, sTableTable) +} + + } + + protected def registerStreamTableInternal(name: String, table: AbstractTable): Unit = { + +if (isRegistered(name)) { + throw new TableException(s"Table \'$name\' already exists. " + +s"Please, choose a different name.") +} else { + tables.add(name, table) +} + } + + /** + * Replaces a registered Table with another Table under the same name. + * We use this method to replace a [[org.apache.flink.api.table.plan.schema.DataStreamTable]] + * with a [[org.apache.calcite.schema.TranslatableTable]]. + * + * @param name + * @param table + */ + protected def replaceRegisteredStreamTable(name: String, table: AbstractTable): Unit = { --- End diff -- rename to `replaceRegisteredTable()` since it is not specific for stream tables? > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60473805 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/TableEnvironment.scala --- @@ -98,8 +98,42 @@ abstract class TableEnvironment(val config: TableConfig) { checkValidTableName(name) -val tableTable = new TableTable(table.getRelNode) -registerTableInternal(name, tableTable) +table.tableEnv match { + case e: BatchTableEnvironment => +val tableTable = new TableTable(table.getRelNode) +registerTableInternal(name, tableTable) + case e: StreamTableEnvironment => +val sTableTable = new TransStreamTable(table.getRelNode, true) +tables.add(name, sTableTable) +} + + } + + protected def registerStreamTableInternal(name: String, table: AbstractTable): Unit = { + +if (isRegistered(name)) { + throw new TableException(s"Table \'$name\' already exists. " + +s"Please, choose a different name.") +} else { + tables.add(name, table) +} + } + + /** + * Replaces a registered Table with another Table under the same name. + * We use this method to replace a [[org.apache.flink.api.table.plan.schema.DataStreamTable]] + * with a [[org.apache.calcite.schema.TranslatableTable]]. + * + * @param name + * @param table + */ + protected def replaceRegisteredStreamTable(name: String, table: AbstractTable): Unit = { --- End diff -- rename to `replaceRegisteredTable()` since it is not specific for stream tables? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250557#comment-15250557 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60473701 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/TableEnvironment.scala --- @@ -98,8 +98,42 @@ abstract class TableEnvironment(val config: TableConfig) { checkValidTableName(name) -val tableTable = new TableTable(table.getRelNode) -registerTableInternal(name, tableTable) +table.tableEnv match { + case e: BatchTableEnvironment => +val tableTable = new TableTable(table.getRelNode) +registerTableInternal(name, tableTable) + case e: StreamTableEnvironment => +val sTableTable = new TransStreamTable(table.getRelNode, true) +tables.add(name, sTableTable) +} + + } + + protected def registerStreamTableInternal(name: String, table: AbstractTable): Unit = { --- End diff -- same as `registerTableInternal()`, no? So can be removed, IMO. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60473701 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/TableEnvironment.scala --- @@ -98,8 +98,42 @@ abstract class TableEnvironment(val config: TableConfig) { checkValidTableName(name) -val tableTable = new TableTable(table.getRelNode) -registerTableInternal(name, tableTable) +table.tableEnv match { + case e: BatchTableEnvironment => +val tableTable = new TableTable(table.getRelNode) +registerTableInternal(name, tableTable) + case e: StreamTableEnvironment => +val sTableTable = new TransStreamTable(table.getRelNode, true) +tables.add(name, sTableTable) +} + + } + + protected def registerStreamTableInternal(name: String, table: AbstractTable): Unit = { --- End diff -- same as `registerTableInternal()`, no? So can be removed, IMO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers
[ https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250554#comment-15250554 ] ASF GitHub Bot commented on FLINK-3798: --- GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/1918 [FLINK-3798] [streaming] Clean up RocksDB backend field/method access The RocksDB state backend uses a lot package private methods and fields which makes it very hard to subclass the different parts for added functionality. I think these should be protected instead. Also the AbstractRocksDBState declares some methods final when there are use-cases when a subclass migh want to call them. Just to give you an example I am creating a version of the value state which would keep a small cache on heap. For this it would be enough to subclass the RockDBStateBackend and RocksDBVAlue state classes if the above mentioned changes were made. Now I have to use reflection to access package private fields and actually copy classes due to final methods. Activity You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink rocks-access Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1918.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1918 commit a1e323650bd5f603fbf3809fe1fe901cbab1a56b Author: Gyula Fora Date: 2016-04-20T19:33:30Z [FLINK-3798] [streaming] Clean up RocksDB backend field/method access > Clean up RocksDB state backend access modifiers > --- > > Key: FLINK-3798 > URL: https://issues.apache.org/jira/browse/FLINK-3798 > Project: Flink > Issue Type: Improvement > Components: Streaming Connectors >Reporter: Gyula Fora >Priority: Minor > > The RocksDB state backend uses a lot package private methods and fields which > makes it very hard to subclass the different parts for added functionality. I > think these should be protected instead. > Also the AbstractRocksDBState declares some methods final when there are > use-cases when a subclass migh want to call them. > Just to give you an example I am creating a version of the value state which > would keep a small cache on heap. For this it would be enough to subclass the > RockDBStateBackend and RocksDBVAlue state classes if the above mentioned > changes were made. Now I have to use reflection to access package private > fields and actually copy classes due to final methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...
GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/1918 [FLINK-3798] [streaming] Clean up RocksDB backend field/method access The RocksDB state backend uses a lot package private methods and fields which makes it very hard to subclass the different parts for added functionality. I think these should be protected instead. Also the AbstractRocksDBState declares some methods final when there are use-cases when a subclass migh want to call them. Just to give you an example I am creating a version of the value state which would keep a small cache on heap. For this it would be enough to subclass the RockDBStateBackend and RocksDBVAlue state classes if the above mentioned changes were made. Now I have to use reflection to access package private fields and actually copy classes due to final methods. Activity You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink rocks-access Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1918.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1918 commit a1e323650bd5f603fbf3809fe1fe901cbab1a56b Author: Gyula Fora Date: 2016-04-20T19:33:30Z [FLINK-3798] [streaming] Clean up RocksDB backend field/method access --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250550#comment-15250550 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60473002 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/StreamTableEnvironment.scala --- @@ -112,19 +122,31 @@ abstract class StreamTableEnvironment(config: TableConfig) extends TableEnvironm * * @param name The name under which the table is registered in the catalog. * @param dataStream The [[DataStream]] to register as table in the catalog. +* @param wrapper True if the registration has to wrap the datastreamTable + *into a [[org.apache.calcite.schema.StreamableTable]] --- End diff -- comment indention > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60473002 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/StreamTableEnvironment.scala --- @@ -112,19 +122,31 @@ abstract class StreamTableEnvironment(config: TableConfig) extends TableEnvironm * * @param name The name under which the table is registered in the catalog. * @param dataStream The [[DataStream]] to register as table in the catalog. +* @param wrapper True if the registration has to wrap the datastreamTable + *into a [[org.apache.calcite.schema.StreamableTable]] --- End diff -- comment indention --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60472854 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/StreamTableEnvironment.scala --- @@ -86,6 +86,8 @@ abstract class StreamTableEnvironment(config: TableConfig) extends TableEnvironm if (isRegistered(tableName)) { relBuilder.scan(tableName) + //val delta: LogicalDelta = LogicalDelta.create(relBuilder.build()) --- End diff -- can be removed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250549#comment-15250549 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60472854 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/StreamTableEnvironment.scala --- @@ -86,6 +86,8 @@ abstract class StreamTableEnvironment(config: TableConfig) extends TableEnvironm if (isRegistered(tableName)) { relBuilder.scan(tableName) + //val delta: LogicalDelta = LogicalDelta.create(relBuilder.build()) --- End diff -- can be removed? > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250547#comment-15250547 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60472476 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/java/table/StreamTableEnvironment.scala --- @@ -83,11 +83,54 @@ class StreamTableEnvironment( .toArray val name = createUniqueTableName() -registerDataStreamInternal(name, dataStream, exprs) +registerDataStreamInternal(name, dataStream, exprs, false) ingest(name) } /** + * Registers the given [[DataStream]] as table in the --- End diff -- nitpicking, ScalaDoc indents the `*` under the second `*` of the first line, i.e., ``` /** * */ ``` > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60472476 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/java/table/StreamTableEnvironment.scala --- @@ -83,11 +83,54 @@ class StreamTableEnvironment( .toArray val name = createUniqueTableName() -registerDataStreamInternal(name, dataStream, exprs) +registerDataStreamInternal(name, dataStream, exprs, false) ingest(name) } /** + * Registers the given [[DataStream]] as table in the --- End diff -- nitpicking, ScalaDoc indents the `*` under the second `*` of the first line, i.e., ``` /** * */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3798) Clean up RocksDB state backend access modifiers
Gyula Fora created FLINK-3798: - Summary: Clean up RocksDB state backend access modifiers Key: FLINK-3798 URL: https://issues.apache.org/jira/browse/FLINK-3798 Project: Flink Issue Type: Improvement Components: Streaming Connectors Reporter: Gyula Fora Priority: Minor The RocksDB state backend uses a lot package private methods and fields which makes it very hard to subclass the different parts for added functionality. I think these should be protected instead. Also the AbstractRocksDBState declares some methods final when there are use-cases when a subclass migh want to call them. Just to give you an example I am creating a version of the value state which would keep a small cache on heap. For this it would be enough to subclass the RockDBStateBackend and RocksDBVAlue state classes if the above mentioned changes were made. Now I have to use reflection to access package private fields and actually copy classes due to final methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60471750 --- Diff: docs/apis/batch/libs/table.md --- @@ -419,8 +438,8 @@ Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A `LO SQL The Table API also supports embedded SQL queries. -In order to use a `Table` or `DataSet` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name. -A registered `Table` can be retrieved back from the `TableEnvironment` using the `scan()` method: +In order to use a `Table`, `DataSet`, or `DataStream` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name. +A registered Dataset `Table` can be retrieved back from the `TableEnvironment` using the `scan()` method: --- End diff -- Add a sentence about the `ingest()` method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250542#comment-15250542 ] ASF GitHub Bot commented on FLINK-3727: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1917#discussion_r60471750 --- Diff: docs/apis/batch/libs/table.md --- @@ -419,8 +438,8 @@ Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A `LO SQL The Table API also supports embedded SQL queries. -In order to use a `Table` or `DataSet` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name. -A registered `Table` can be retrieved back from the `TableEnvironment` using the `scan()` method: +In order to use a `Table`, `DataSet`, or `DataStream` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name. +A registered Dataset `Table` can be retrieved back from the `TableEnvironment` using the `scan()` method: --- End diff -- Add a sentence about the `ingest()` method. > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library
[ https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250529#comment-15250529 ] ASF GitHub Bot commented on FLINK-2157: --- Github user rawkintrevo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212563858 Are there going to be useage docs on this? > Create evaluation framework for ML library > -- > > Key: FLINK-2157 > URL: https://issues.apache.org/jira/browse/FLINK-2157 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 1.0.0 > > > Currently, FlinkML lacks means to evaluate the performance of trained models. > It would be great to add some {{Evaluators}} which can calculate some score > based on the information about true and predicted labels. This could also be > used for the cross validation to choose the right hyper parameters. > Possible scores could be F score [1], zero-one-loss score, etc. > Resources > [1] [http://en.wikipedia.org/wiki/F1_score] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...
Github user rawkintrevo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212563858 Are there going to be useage docs on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3708] Scala API for CEP (initial).
Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/1905#discussion_r60452526 --- Diff: flink-libraries/flink-cep-scala/src/main/scala/org/apache/flink/cep/scala/pattern/package.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.cep.scala + +import org.apache.flink.cep.pattern.{FollowedByPattern => JFollowedByPattern, Pattern => JPattern} + +import _root_.scala.reflect.ClassTag + +package object pattern { + /** +* Utility method to wrap { @link org.apache.flink.cep.pattern.Pattern} and its subclasses +* for usage with the Scala API. +* +* @param javaPattern The underlying pattern from the Java API +* @tparam T Base type of the elements appearing in the pattern +* @tparam F Subtype of T to which the current pattern operator is constrained +* @return A pattern from the Scala API which wraps the pattern from the Java API +*/ + private[flink] def wrapPattern[ + T: ClassTag, F <: T : ClassTag](javaPattern: JPattern[T, F]) + : Pattern[T, F] = javaPattern match { +case f: JFollowedByPattern[T, F] => FollowedByPattern[T, F](f) +case p: JPattern[T, F] => Pattern[T, F](p) +case _ => null --- End diff -- I am not sure if this case should trigger an exception, because the Java API is actually allowed to return null here in case there is no previous pattern. Our pattern will then wrap this as an undefined Option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3708) Scala API for CEP
[ https://issues.apache.org/jira/browse/FLINK-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250334#comment-15250334 ] ASF GitHub Bot commented on FLINK-3708: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/1905#discussion_r60452526 --- Diff: flink-libraries/flink-cep-scala/src/main/scala/org/apache/flink/cep/scala/pattern/package.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.cep.scala + +import org.apache.flink.cep.pattern.{FollowedByPattern => JFollowedByPattern, Pattern => JPattern} + +import _root_.scala.reflect.ClassTag + +package object pattern { + /** +* Utility method to wrap { @link org.apache.flink.cep.pattern.Pattern} and its subclasses +* for usage with the Scala API. +* +* @param javaPattern The underlying pattern from the Java API +* @tparam T Base type of the elements appearing in the pattern +* @tparam F Subtype of T to which the current pattern operator is constrained +* @return A pattern from the Scala API which wraps the pattern from the Java API +*/ + private[flink] def wrapPattern[ + T: ClassTag, F <: T : ClassTag](javaPattern: JPattern[T, F]) + : Pattern[T, F] = javaPattern match { +case f: JFollowedByPattern[T, F] => FollowedByPattern[T, F](f) +case p: JPattern[T, F] => Pattern[T, F](p) +case _ => null --- End diff -- I am not sure if this case should trigger an exception, because the Java API is actually allowed to return null here in case there is no previous pattern. Our pattern will then wrap this as an undefined Option. > Scala API for CEP > - > > Key: FLINK-3708 > URL: https://issues.apache.org/jira/browse/FLINK-3708 > Project: Flink > Issue Type: Improvement > Components: CEP >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Stefan Richter > > Currently, The CEP library does not support Scala case classes, because the > {{TypeExtractor}} cannot handle them. In order to support them, it would be > necessary to offer a Scala API for the CEP library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3797) Restart flow button
Maxim Dobryakov created FLINK-3797: -- Summary: Restart flow button Key: FLINK-3797 URL: https://issues.apache.org/jira/browse/FLINK-3797 Project: Flink Issue Type: Improvement Components: Web Client Affects Versions: 1.0.1 Reporter: Maxim Dobryakov Priority: Minor Will be better to have "Restart" button in Flink Dashboard for restart flow (create savepoint, cancel flow, run new flow with created savepoint). Such functionality can be useful to start flow on new task managers after downtime or adding new nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3796) FileSourceFunction doesn't respect InputFormat's life cycle methods
Maximilian Michels created FLINK-3796: - Summary: FileSourceFunction doesn't respect InputFormat's life cycle methods Key: FLINK-3796 URL: https://issues.apache.org/jira/browse/FLINK-3796 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 1.0.1, 1.0.0, 1.0.2 Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 1.1.0 The {{FileSourceFunction}} wraps {{InputFormat}} but doesn't execute its life cycle functions correctly. 1) It doesn't call {{close()}} before reading the next InputSplit via {{open(InputSplit split)}}. 2) It calls {{close()}} even if no InputSplit has been read (and thus open(..) hasn't been called previously). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3795) Cancel button is hidden on MacBook Pro 13''
Maxim Dobryakov created FLINK-3795: -- Summary: Cancel button is hidden on MacBook Pro 13'' Key: FLINK-3795 URL: https://issues.apache.org/jira/browse/FLINK-3795 Project: Flink Issue Type: Improvement Components: Web Client Affects Versions: 1.0.1 Reporter: Maxim Dobryakov Priority: Minor Flink Dashboard doesn't show "Cancel" button on Macbook Pro 13''. So, not possible to use it. Screenshot [here|http://i.imgur.com/5IH7caH.png]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3794) Add checks for unsupported operations in streaming table API
Vasia Kalavri created FLINK-3794: Summary: Add checks for unsupported operations in streaming table API Key: FLINK-3794 URL: https://issues.apache.org/jira/browse/FLINK-3794 Project: Flink Issue Type: Improvement Components: Table API Affects Versions: 1.1.0 Reporter: Vasia Kalavri Unsupported operations on streaming tables currently fail during plan translation. It would be nicer to add checks in the Table API methods and fail with an informative message that the operation is not supported. The operations that are not currently supported are: - aggregations inside select - groupBy - distinct - join We can simply check if the Table's environment is a streaming environment and throw an unsupported operation exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3793) Re-organize the Table API and SQL docs
Vasia Kalavri created FLINK-3793: Summary: Re-organize the Table API and SQL docs Key: FLINK-3793 URL: https://issues.apache.org/jira/browse/FLINK-3793 Project: Flink Issue Type: Bug Components: Documentation, Table API Affects Versions: 1.1.0 Reporter: Vasia Kalavri Now that we have added SQL and soon streaming SQL support, we need to reorganize the Table API documentation. - The current guide is under "apis/batch/libs". We should either split it into a streaming and a batch part or move it to under "apis". The second option might be preferable, as the batch and stream APIs have a lot in common. - The current guide has separate sections for Java and Scala APIs. These can be merged and organized with tabs, like other parts of the docs. - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the software stack figure and homepage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3229) Kinesis streaming consumer with integration of Flink's checkpointing mechanics
[ https://issues.apache.org/jira/browse/FLINK-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250214#comment-15250214 ] ASF GitHub Bot commented on FLINK-3229: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/1911#discussion_r60443897 --- Diff: flink-streaming-connectors/pom.xml --- @@ -45,6 +45,7 @@ under the License. flink-connector-rabbitmq flink-connector-twitter flink-connector-nifi + flink-connector-kinesis --- End diff -- I think we have to remove this line again. The module is included in the profile below (you have to activate the "include-kinesis" maven build profile) > Kinesis streaming consumer with integration of Flink's checkpointing mechanics > -- > > Key: FLINK-3229 > URL: https://issues.apache.org/jira/browse/FLINK-3229 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Opening a sub-task to implement data source consumer for Kinesis streaming > connector (https://issues.apache.org/jira/browser/FLINK-3211). > An example of the planned user API for Flink Kinesis Consumer: > {code} > Properties kinesisConfig = new Properties(); > config.put(KinesisConfigConstants.CONFIG_AWS_REGION, "us-east-1"); > config.put(KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_TYPE, > "BASIC"); > config.put( > KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_ACCESSKEYID, > "aws_access_key_id_here"); > config.put( > KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_SECRETKEY, > "aws_secret_key_here"); > config.put(KinesisConfigConstants.CONFIG_STREAM_INIT_POSITION_TYPE, > "LATEST"); // or TRIM_HORIZON > DataStream kinesisRecords = env.addSource(new FlinkKinesisConsumer<>( > "kinesis_stream_name", > new SimpleStringSchema(), > kinesisConfig)); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3229] Flink streaming consumer for AWS ...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/1911#discussion_r60443897 --- Diff: flink-streaming-connectors/pom.xml --- @@ -45,6 +45,7 @@ under the License. flink-connector-rabbitmq flink-connector-twitter flink-connector-nifi + flink-connector-kinesis --- End diff -- I think we have to remove this line again. The module is included in the profile below (you have to activate the "include-kinesis" maven build profile) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3229) Kinesis streaming consumer with integration of Flink's checkpointing mechanics
[ https://issues.apache.org/jira/browse/FLINK-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250211#comment-15250211 ] ASF GitHub Bot commented on FLINK-3229: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1911#issuecomment-212504465 Great, thank you. I'll review the PR soon. > Kinesis streaming consumer with integration of Flink's checkpointing mechanics > -- > > Key: FLINK-3229 > URL: https://issues.apache.org/jira/browse/FLINK-3229 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Opening a sub-task to implement data source consumer for Kinesis streaming > connector (https://issues.apache.org/jira/browser/FLINK-3211). > An example of the planned user API for Flink Kinesis Consumer: > {code} > Properties kinesisConfig = new Properties(); > config.put(KinesisConfigConstants.CONFIG_AWS_REGION, "us-east-1"); > config.put(KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_TYPE, > "BASIC"); > config.put( > KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_ACCESSKEYID, > "aws_access_key_id_here"); > config.put( > KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_SECRETKEY, > "aws_secret_key_here"); > config.put(KinesisConfigConstants.CONFIG_STREAM_INIT_POSITION_TYPE, > "LATEST"); // or TRIM_HORIZON > DataStream kinesisRecords = env.addSource(new FlinkKinesisConsumer<>( > "kinesis_stream_name", > new SimpleStringSchema(), > kinesisConfig)); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3229] Flink streaming consumer for AWS ...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1911#issuecomment-212504465 Great, thank you. I'll review the PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)
[ https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250196#comment-15250196 ] ASF GitHub Bot commented on FLINK-3727: --- GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/1917 [FLINK-3727] Embedded streaming SQL projection, filtering, union This PR adds support for embedded streaming SQL (projection, filtering, union): - methods to register DataStreams - sql translation method in StreamTableEnvironment - a custom rule to convert to streamable table - docs for streaming table and sql - java SQL tests A streaming SQL query can be executed on a streaming Table by simply adding the `STREAM` keyword in front of the table name. Registering DataStream tables and conversions work in a similar way to that of DataStream tables. Here's a filtering example: ``` val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env) val dataStream = env.addSource(...) val t = dataStream.toTable(tEnv).as('a, 'b, 'c) tEnv.registerTable("MyTable", t) val sqlQuery = "SELECT STREAM * FROM MyTable WHERE a = 3" val result = tEnv.sql(sqlQuery).toDataStream[Row] ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink stream-sql Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1917.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1917 commit 6b747bd6902074d0475de9519c1c3bd693487eef Author: vasia Date: 2016-04-15T11:35:24Z [FLINK-3727] Add support for embedded streaming SQL (projection, filter, union) - add methods to register DataStreams - add sql translation method in StreamTableEnvironment - add a custom rule to convert to streamable table - add docs for streaming table and sql > Add support for embedded streaming SQL (projection, filter, union) > -- > > Key: FLINK-3727 > URL: https://issues.apache.org/jira/browse/FLINK-3727 > Project: Flink > Issue Type: Sub-task > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > Similar to the support for SQL embedded in batch Table API programs, this > issue tracks the support for SQL embedded in stream Table API programs. The > only currently supported operations on streaming Tables are projection, > filtering, and union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...
GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/1917 [FLINK-3727] Embedded streaming SQL projection, filtering, union This PR adds support for embedded streaming SQL (projection, filtering, union): - methods to register DataStreams - sql translation method in StreamTableEnvironment - a custom rule to convert to streamable table - docs for streaming table and sql - java SQL tests A streaming SQL query can be executed on a streaming Table by simply adding the `STREAM` keyword in front of the table name. Registering DataStream tables and conversions work in a similar way to that of DataStream tables. Here's a filtering example: ``` val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env) val dataStream = env.addSource(...) val t = dataStream.toTable(tEnv).as('a, 'b, 'c) tEnv.registerTable("MyTable", t) val sqlQuery = "SELECT STREAM * FROM MyTable WHERE a = 3" val result = tEnv.sql(sqlQuery).toDataStream[Row] ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink stream-sql Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1917.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1917 commit 6b747bd6902074d0475de9519c1c3bd693487eef Author: vasia Date: 2016-04-15T11:35:24Z [FLINK-3727] Add support for embedded streaming SQL (projection, filter, union) - add methods to register DataStreams - add sql translation method in StreamTableEnvironment - add a custom rule to convert to streamable table - add docs for streaming table and sql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3717) Add functionality to be a able to restore from specific point in a FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250154#comment-15250154 ] ASF GitHub Bot commented on FLINK-3717: --- Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212492749 Thanks for the comments @aljoscha ! I will integrate them and get back to you. > Add functionality to be a able to restore from specific point in a > FileInputFormat > -- > > Key: FLINK-3717 > URL: https://issues.apache.org/jira/browse/FLINK-3717 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > > This is the first step in order to make the File Sources fault-tolerant. We > have to be able to get the start from a specific point in a file despite any > caching performed during reading. This will guarantee that the task that will > take over the execution of the failed one will be able to start from the > correct point in the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3717 - Be able to re-start reading from ...
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212492749 Thanks for the comments @aljoscha ! I will integrate them and get back to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3792) RowTypeInfo equality should not depend on field names
[ https://issues.apache.org/jira/browse/FLINK-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250140#comment-15250140 ] Fabian Hueske commented on FLINK-3792: -- I think this should be checked at API level, not when the plan is optimized and the DataSet / DataStream program is constructed. Field names are not relevant when the program (Table or SQL) is executed. Also, {{Row}} does not provide access by name but only by position. > RowTypeInfo equality should not depend on field names > - > > Key: FLINK-3792 > URL: https://issues.apache.org/jira/browse/FLINK-3792 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > Currently, two Rows with the same field types but different field names are > not considered equal by the Table API and SQL. This behavior might create > problems, e.g. it makes the following union query fail: > {code} > SELECT STREAM a, b, c FROM T1 UNION ALL > (SELECT STREAM d, e, f FROM T2 WHERE d < 3) > {code} > where a, b, c and d, e, f are fields of corresponding types. > {code} > Cannot union streams of different types: org.apache.flink.api.table.Row(a: > Integer, b: Long, c: String) and org.apache.flink.api.table.Row(d: Integer, > e: Long, f: String) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3777] Managed closeInputFormat()
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1903#issuecomment-212483482 Thanks for this PR @fpompermaier. We need to check all places where `InputFormat.configure()` is called to ensure that we free resources as promised in the JavaDocs of `RichInputFormat.closeInputFormat()`. I found a few more spots where we need to call `closeInputFormat()` if the IF is a `RichInputFormat`. - ReplicatingInputFormat which is a wrapper for other input formats. - DataSourceNode.computeOperatorSpecificDefaultEstimates() which tries to obtain some input statistics and initializes the input format with `configure()`. - `InputFormatVertex.initializeOnMaster()` calls `configure()` as well. It would also be good to add a test that the closeInputFormat() method is actually called. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI
[ https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250128#comment-15250128 ] ASF GitHub Bot commented on FLINK-3754: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1916#issuecomment-212485745 Thanks @yjshen for working on this issue! Unified validation and exceptions are a big improvement, IMO. I'll try to have a look soon. @twalthr, can you have a look at the changes done on the expressions? > Add a validation phase before construct RelNode using TableAPI > -- > > Key: FLINK-3754 > URL: https://issues.apache.org/jira/browse/FLINK-3754 > Project: Flink > Issue Type: Improvement > Components: Table API >Affects Versions: 1.0.0 >Reporter: Yijie Shen >Assignee: Yijie Shen > > Unlike sql string's execution, which have a separate validation phase before > RelNode construction, Table API lacks the counterparts and the validation is > scattered in many places. > I suggest to add a single validation phase and detect problems as early as > possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1916#issuecomment-212485745 Thanks @yjshen for working on this issue! Unified validation and exceptions are a big improvement, IMO. I'll try to have a look soon. @twalthr, can you have a look at the changes done on the expressions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3758) Add possibility to register accumulators in custom triggers
[ https://issues.apache.org/jira/browse/FLINK-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250106#comment-15250106 ] Konstantin Knauf commented on FLINK-3758: - {quote} In the Trigger we would not know when to call addAccumulator() {quote} That's what I thought, yes. {getAccumulator(String name, Accumulator defaultAccumulator)} sounds gut. I will have a look at it. > Add possibility to register accumulators in custom triggers > --- > > Key: FLINK-3758 > URL: https://issues.apache.org/jira/browse/FLINK-3758 > Project: Flink > Issue Type: Improvement >Reporter: Konstantin Knauf >Priority: Minor > > For monitoring purposes it would be nice to be able to to use accumulators in > custom trigger functions. > Basically, the trigger context could just expose {{getAccumulator}} of > {{RuntimeContext}} or does this create problems I am not aware of? > Adding accumulators in a trigger function is more difficult, I think, but > that's not really neccessary as the accummulator could just be added in some > other upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3777) Add open and close methods to manage IF lifecycle
[ https://issues.apache.org/jira/browse/FLINK-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250112#comment-15250112 ] ASF GitHub Bot commented on FLINK-3777: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1903#issuecomment-212483482 Thanks for this PR @fpompermaier. We need to check all places where `InputFormat.configure()` is called to ensure that we free resources as promised in the JavaDocs of `RichInputFormat.closeInputFormat()`. I found a few more spots where we need to call `closeInputFormat()` if the IF is a `RichInputFormat`. - ReplicatingInputFormat which is a wrapper for other input formats. - DataSourceNode.computeOperatorSpecificDefaultEstimates() which tries to obtain some input statistics and initializes the input format with `configure()`. - `InputFormatVertex.initializeOnMaster()` calls `configure()` as well. It would also be good to add a test that the closeInputFormat() method is actually called. > Add open and close methods to manage IF lifecycle > - > > Key: FLINK-3777 > URL: https://issues.apache.org/jira/browse/FLINK-3777 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.1 >Reporter: Flavio Pompermaier >Assignee: Flavio Pompermaier > Labels: inputformat, lifecycle > > At the moment the opening and closing of an inputFormat are not managed, > although open() could be (improperly IMHO) simulated by configure(). > This limits the possibility to reuse expensive resources (like database > connections) and manage their release. > Probably the best option would be to add 2 methods (i.e. openInputformat() > and closeInputFormat() ) to RichInputFormat* > * NOTE: the best option from a "semantic" point of view would be to rename > the current open() and close() to openSplit() and closeSplit() respectively > while using open() and close() methods for the IF lifecycle management, but > this would cause a backward compatibility issue... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-3758) Add possibility to register accumulators in custom triggers
[ https://issues.apache.org/jira/browse/FLINK-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250106#comment-15250106 ] Konstantin Knauf edited comment on FLINK-3758 at 4/20/16 3:42 PM: -- {quote} In the Trigger we would not know when to call addAccumulator() {quote} That's what I thought, yes. {{getAccumulator(String name, Accumulator defaultAccumulator)}} sounds gut. I will have a look at it. was (Author: knaufk): {quote} In the Trigger we would not know when to call addAccumulator() {quote} That's what I thought, yes. {getAccumulator(String name, Accumulator defaultAccumulator)} sounds gut. I will have a look at it. > Add possibility to register accumulators in custom triggers > --- > > Key: FLINK-3758 > URL: https://issues.apache.org/jira/browse/FLINK-3758 > Project: Flink > Issue Type: Improvement >Reporter: Konstantin Knauf >Priority: Minor > > For monitoring purposes it would be nice to be able to to use accumulators in > custom trigger functions. > Basically, the trigger context could just expose {{getAccumulator}} of > {{RuntimeContext}} or does this create problems I am not aware of? > Adding accumulators in a trigger function is more difficult, I think, but > that's not really neccessary as the accummulator could just be added in some > other upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3750) Make JDBCInputFormat a parallel source
[ https://issues.apache.org/jira/browse/FLINK-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250099#comment-15250099 ] ASF GitHub Bot commented on FLINK-3750: --- Github user fpompermaier commented on the pull request: https://github.com/apache/flink/pull/1885#issuecomment-212479891 Depends on Flink-3777 (https://github.com/apache/flink/pull/1903) to properly manage the single connection instantiation > Make JDBCInputFormat a parallel source > -- > > Key: FLINK-3750 > URL: https://issues.apache.org/jira/browse/FLINK-3750 > Project: Flink > Issue Type: Improvement > Components: Batch >Affects Versions: 1.0.1 >Reporter: Flavio Pompermaier >Assignee: Flavio Pompermaier >Priority: Minor > Labels: connector, jdbc > > At the moment the batch JDBC InputFormat does not support parallelism > (NonParallelInput). I'd like to remove such limitation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Improved JDBCInputFormat (FLINK-3750) and othe...
Github user fpompermaier commented on the pull request: https://github.com/apache/flink/pull/1885#issuecomment-212479891 Depends on Flink-3777 (https://github.com/apache/flink/pull/1903) to properly manage the single connection instantiation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3717) Add functionality to be a able to restore from specific point in a FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250085#comment-15250085 ] ASF GitHub Bot commented on FLINK-3717: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212477715 I like that you're adding comments to existing parts of the code. This will help other people who might be looking at it in the future. 😃 > Add functionality to be a able to restore from specific point in a > FileInputFormat > -- > > Key: FLINK-3717 > URL: https://issues.apache.org/jira/browse/FLINK-3717 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > > This is the first step in order to make the File Sources fault-tolerant. We > have to be able to get the start from a specific point in a file despite any > caching performed during reading. This will guarantee that the task that will > take over the execution of the failed one will be able to start from the > correct point in the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3717 - Be able to re-start reading from ...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212477715 I like that you're adding comments to existing parts of the code. This will help other people who might be looking at it in the future. 😃 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3792) RowTypeInfo equality should not depend on field names
[ https://issues.apache.org/jira/browse/FLINK-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250083#comment-15250083 ] Vasia Kalavri commented on FLINK-3792: -- That's a valid point. I don't know what Calcite does, but the SQL validation phase does not fail. Renaming the fields does not work either. > RowTypeInfo equality should not depend on field names > - > > Key: FLINK-3792 > URL: https://issues.apache.org/jira/browse/FLINK-3792 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > Currently, two Rows with the same field types but different field names are > not considered equal by the Table API and SQL. This behavior might create > problems, e.g. it makes the following union query fail: > {code} > SELECT STREAM a, b, c FROM T1 UNION ALL > (SELECT STREAM d, e, f FROM T2 WHERE d < 3) > {code} > where a, b, c and d, e, f are fields of corresponding types. > {code} > Cannot union streams of different types: org.apache.flink.api.table.Row(a: > Integer, b: Long, c: String) and org.apache.flink.api.table.Row(d: Integer, > e: Long, f: String) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3717) Add functionality to be a able to restore from specific point in a FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250075#comment-15250075 ] ASF GitHub Bot commented on FLINK-3717: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212473493 Could it happen that an `InputFormat` is opened with some `InputSplit` but then restored to a position that would lie in another split? Maybe it would make sense to have not a `restore` method but an alternative `open()` method that takes not only an `InputSplit` but also a read position that was snapshotted earlier, for this. This would make the "open from snapshot" consistent. > Add functionality to be a able to restore from specific point in a > FileInputFormat > -- > > Key: FLINK-3717 > URL: https://issues.apache.org/jira/browse/FLINK-3717 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > > This is the first step in order to make the File Sources fault-tolerant. We > have to be able to get the start from a specific point in a file despite any > caching performed during reading. This will guarantee that the task that will > take over the execution of the failed one will be able to start from the > correct point in the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3717 - Be able to re-start reading from ...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212473493 Could it happen that an `InputFormat` is opened with some `InputSplit` but then restored to a position that would lie in another split? Maybe it would make sense to have not a `restore` method but an alternative `open()` method that takes not only an `InputSplit` but also a read position that was snapshotted earlier, for this. This would make the "open from snapshot" consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-3717 - Be able to re-start reading from ...
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212468812 I will also squash the commits as soon as I integrate any possible comments! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3717) Add functionality to be a able to restore from specific point in a FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250064#comment-15250064 ] ASF GitHub Bot commented on FLINK-3717: --- Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212468812 I will also squash the commits as soon as I integrate any possible comments! > Add functionality to be a able to restore from specific point in a > FileInputFormat > -- > > Key: FLINK-3717 > URL: https://issues.apache.org/jira/browse/FLINK-3717 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > > This is the first step in order to make the File Sources fault-tolerant. We > have to be able to get the start from a specific point in a file despite any > caching performed during reading. This will guarantee that the task that will > take over the execution of the failed one will be able to start from the > correct point in the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
Github user gaoyike commented on the pull request: https://github.com/apache/flink/pull/1908#issuecomment-212460975 Done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()
[ https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250035#comment-15250035 ] ASF GitHub Bot commented on FLINK-3781: --- Github user gaoyike commented on the pull request: https://github.com/apache/flink/pull/1908#issuecomment-212460975 Done! > BlobClient may be left unclosed in BlobCache#deleteGlobal() > --- > > Key: FLINK-3781 > URL: https://issues.apache.org/jira/browse/FLINK-3781 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > public void deleteGlobal(BlobKey key) throws IOException { > delete(key); > BlobClient bc = createClient(); > bc.delete(key); > bc.close(); > {code} > If delete() throws IOException, BlobClient would be left inclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3717) Add functionality to be a able to restore from specific point in a FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250028#comment-15250028 ] ASF GitHub Bot commented on FLINK-3717: --- Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212457887 Could somebody review this? This is also part of the FLINK-2314 issue. > Add functionality to be a able to restore from specific point in a > FileInputFormat > -- > > Key: FLINK-3717 > URL: https://issues.apache.org/jira/browse/FLINK-3717 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > > This is the first step in order to make the File Sources fault-tolerant. We > have to be able to get the start from a specific point in a file despite any > caching performed during reading. This will guarantee that the task that will > take over the execution of the failed one will be able to start from the > correct point in the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3717 - Be able to re-start reading from ...
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1895#issuecomment-212457887 Could somebody review this? This is also part of the FLINK-2314 issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---