[jira] [Created] (FLINK-9854) Allow passing multi-line input to SQL Client CLI
Timo Walther created FLINK-9854: --- Summary: Allow passing multi-line input to SQL Client CLI Key: FLINK-9854 URL: https://issues.apache.org/jira/browse/FLINK-9854 Project: Flink Issue Type: Sub-task Components: Table API & SQL Reporter: Timo Walther We should support {{flink-cli < query01.sql}} or {{echo "INSERT INTO bar SELECT * FROM foo" | flink-cli}} for convenience. I'm not sure how well we support multilines and EOF right now. Currenlty, with the experimental {{-u}} flag the user also gets the correct error code after the submission, with {{flink-cli < query01.sql}} the CLI would either stay in interactive mode or always return success. We should also discuss which statements are allowed. Actually, only DDL and {{INSERT INTO}} statements make sense so far. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Flink Query Optimizer
+1. Having table statistics is one of the main blockers for more advanced optimization rules. I would love to contribute to this effort! However I think @Alberts case is more on the data set side. Was there any plan to integrate with data set table statistics first then extend to data stream domain? -- Rong On Sun, Jul 15, 2018 at 7:21 AM Piotr Nowojski wrote: > Hi, > > Currently the biggest limitation that prevents better query optimisation > is lack of table statistics (which are not trivial to provide in > streaming), thus Joins/Aggregation reordering doesn’t work. We have some > ideas how to tackle this issue and definitely at some point of time we will > improve this. > > Piotrek > > > On 14 Jul 2018, at 06:48, Xingcan Cui wrote: > > > > Hi Albert, > > > > Calcite provides a rule-based optimizer (as a framework), which means > users can customize it by adding rules. That’s exactly what Flink did. From > the logical plan to the physical plan, the translations are triggered by > different sets of rules, according to which the relational expressions are > replaced, reordered or optimized. > > > > However, IMO, the current optimization rules in Flink Table API are > quite primal. Some SQL statements (e.g., multiple joins) are just > translated to feasible execution plans, instead of optimized ones, since > it’s much more difficult to conduct query optimization on large datasets or > dynamic streams. You could first start from the Calcite query optimizer, > and then try to make your own rules. > > > > Best, > > Xingcan > > > >> On Jul 14, 2018, at 11:55 AM, vino yang wrote: > >> > >> Hi Albert, > >> > >> First I guess the query optimizer you mentioned is about Flink table & > sql > >> (for batch API there is another optimizer which is implemented by > Flink). > >> > >> Yes, now for table & sql, Flink use Apache Calcite's query optimizer to > >> translate into a Calcite plan > >> which is then optimized according to Calcite's optimization rules. > >> > >> The following rules are applied so far: > >> > https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/FlinkRuleSets.scala > >> > >> In view of Flink depends on the Calcite to do the optimization, I think > >> enhance Flink and Calcite would be the right direction. > >> > >> Hope for you provide more idea and details. Flink community welcome your > >> idea and contribution. > >> > >> Thanks. > >> Vino. > >> > >> > >> 2018-07-13 23:39 GMT+08:00 Albert Jonathan : > >> > >>> Hello, > >>> > >>> I am just wondering, does Flink use Apache Calcite's query optimizer to > >>> generate an optimal logical plan, or does it have its own query > optimizer? > >>> From what I observed so far, the Flink's query optimizer only groups > >>> operator together without changing the order of aggregation operators > >>> (e.g., join). Did I miss anything? > >>> > >>> I am thinking of extending Flink to apply query optimization as in the > >>> RDBMS by either integrating it with Calcite or implementing it as a new > >>> module. > >>> Any feedback or guidelines will be highly appreciated. > >>> > >>> Thank you, > >>> Albert > >>> > > > >
Re: Flink Query Optimizer
Hi, Currently the biggest limitation that prevents better query optimisation is lack of table statistics (which are not trivial to provide in streaming), thus Joins/Aggregation reordering doesn’t work. We have some ideas how to tackle this issue and definitely at some point of time we will improve this. Piotrek > On 14 Jul 2018, at 06:48, Xingcan Cui wrote: > > Hi Albert, > > Calcite provides a rule-based optimizer (as a framework), which means users > can customize it by adding rules. That’s exactly what Flink did. From the > logical plan to the physical plan, the translations are triggered by > different sets of rules, according to which the relational expressions are > replaced, reordered or optimized. > > However, IMO, the current optimization rules in Flink Table API are quite > primal. Some SQL statements (e.g., multiple joins) are just translated to > feasible execution plans, instead of optimized ones, since it’s much more > difficult to conduct query optimization on large datasets or dynamic streams. > You could first start from the Calcite query optimizer, and then try to make > your own rules. > > Best, > Xingcan > >> On Jul 14, 2018, at 11:55 AM, vino yang wrote: >> >> Hi Albert, >> >> First I guess the query optimizer you mentioned is about Flink table & sql >> (for batch API there is another optimizer which is implemented by Flink). >> >> Yes, now for table & sql, Flink use Apache Calcite's query optimizer to >> translate into a Calcite plan >> which is then optimized according to Calcite's optimization rules. >> >> The following rules are applied so far: >> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/FlinkRuleSets.scala >> >> In view of Flink depends on the Calcite to do the optimization, I think >> enhance Flink and Calcite would be the right direction. >> >> Hope for you provide more idea and details. Flink community welcome your >> idea and contribution. >> >> Thanks. >> Vino. >> >> >> 2018-07-13 23:39 GMT+08:00 Albert Jonathan : >> >>> Hello, >>> >>> I am just wondering, does Flink use Apache Calcite's query optimizer to >>> generate an optimal logical plan, or does it have its own query optimizer? >>> From what I observed so far, the Flink's query optimizer only groups >>> operator together without changing the order of aggregation operators >>> (e.g., join). Did I miss anything? >>> >>> I am thinking of extending Flink to apply query optimization as in the >>> RDBMS by either integrating it with Calcite or implementing it as a new >>> module. >>> Any feedback or guidelines will be highly appreciated. >>> >>> Thank you, >>> Albert >>> >
[jira] [Created] (FLINK-9853) add hex support in table api and sql
xueyu created FLINK-9853: Summary: add hex support in table api and sql Key: FLINK-9853 URL: https://issues.apache.org/jira/browse/FLINK-9853 Project: Flink Issue Type: Improvement Components: Table API & SQL Reporter: xueyu like in mysql, HEX could take int or string arguments, For a integer argument N, it returns a hexadecimal string representation of the value of N. For a string argument str, it returns a hexadecimal string representation of str where each byte of each character in str is converted to two hexadecimal digits. Syntax: HEX(100) = 64 HEX('This is a test String.') = '546869732069732061207465737420537472696e672e' See more: [link MySQL|https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9852) Expose descriptor-based sink creation in table environments
Timo Walther created FLINK-9852: --- Summary: Expose descriptor-based sink creation in table environments Key: FLINK-9852 URL: https://issues.apache.org/jira/browse/FLINK-9852 Project: Flink Issue Type: New Feature Components: Table API & SQL Reporter: Timo Walther Assignee: Timo Walther Currently, only a table source can be created using the unified table descriptors with {{tableEnv.from(...)}}. A similar approach should be supported for defining sinks or even both types at the same time. I suggest the following syntax: {code} tableEnv.connect(Kafka(...)).registerSource("name") tableEnv.connect(Kafka(...)).registerSink("name") tableEnv.connect(Kafka(...)).registerSourceAndSink("name") {code} A table could then access the registered source/sink. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9851) Add documentation for unified table sources/sinks
Timo Walther created FLINK-9851: --- Summary: Add documentation for unified table sources/sinks Key: FLINK-9851 URL: https://issues.apache.org/jira/browse/FLINK-9851 Project: Flink Issue Type: Improvement Components: Documentation, Table API & SQL Reporter: Timo Walther Assignee: Timo Walther FLINK-8558 and FLINK-8866 reworked a lot of the existing table sources/sinks and the way they are discovered. We should rework the documentation about: - Built-in sinks/source/formats and their properties for Table API and SQL Client - How to write custom sinks/sources/formats - Limitations such as {{property-version}}, {{rowtime.timestamps.from-source}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9850) Add a string to the print method to identify output for DataStream
Hequn Cheng created FLINK-9850: -- Summary: Add a string to the print method to identify output for DataStream Key: FLINK-9850 URL: https://issues.apache.org/jira/browse/FLINK-9850 Project: Flink Issue Type: New Feature Components: DataStream API Reporter: Hequn Cheng The output of the print method of {[DataSet}} allows the user to supply a String to identify the output(see [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But {[DataStream}} doesn't support now. It is valuable to add this feature for {{DataStream}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)