[jira] [Created] (FLINK-9854) Allow passing multi-line input to SQL Client CLI

2018-07-15 Thread Timo Walther (JIRA)
Timo Walther created FLINK-9854:
---

 Summary: Allow passing multi-line input to SQL Client CLI
 Key: FLINK-9854
 URL: https://issues.apache.org/jira/browse/FLINK-9854
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Reporter: Timo Walther


We should support {{flink-cli < query01.sql}} or {{echo "INSERT INTO bar SELECT 
* FROM foo" | flink-cli}} for convenience. I'm not sure how well we support 
multilines and EOF right now. Currenlty, with the experimental {{-u}} flag the 
user also gets the correct error code after the submission, with {{flink-cli < 
query01.sql}} the CLI would either stay in interactive mode or always return 
success.

We should also discuss which statements are allowed. Actually, only DDL and 
{{INSERT INTO}} statements make sense so far.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Flink Query Optimizer

2018-07-15 Thread Rong Rong
+1. Having table statistics is one of the main blockers for more advanced
optimization rules. I would love to contribute to this effort!

However I think @Alberts case is more on the data set side. Was there any
plan to integrate with data set table statistics first then extend to data
stream domain?

--
Rong

On Sun, Jul 15, 2018 at 7:21 AM Piotr Nowojski 
wrote:

> Hi,
>
> Currently the biggest limitation that prevents better query optimisation
> is lack of table statistics (which are not trivial to provide in
> streaming), thus Joins/Aggregation reordering doesn’t work. We have some
> ideas how to tackle this issue and definitely at some point of time we will
> improve this.
>
> Piotrek
>
> > On 14 Jul 2018, at 06:48, Xingcan Cui  wrote:
> >
> > Hi Albert,
> >
> > Calcite provides a rule-based optimizer (as a framework), which means
> users can customize it by adding rules. That’s exactly what Flink did. From
> the logical plan to the physical plan, the translations are triggered by
> different sets of rules, according to which the relational expressions are
> replaced, reordered or optimized.
> >
> > However, IMO, the current optimization rules in Flink Table API are
> quite primal. Some SQL statements (e.g., multiple joins) are just
> translated to feasible execution plans, instead of optimized ones, since
> it’s much more difficult to conduct query optimization on large datasets or
> dynamic streams. You could first start from the Calcite query optimizer,
> and then try to make your own rules.
> >
> > Best,
> > Xingcan
> >
> >> On Jul 14, 2018, at 11:55 AM, vino yang  wrote:
> >>
> >> Hi Albert,
> >>
> >> First I guess the query optimizer you mentioned is about Flink table &
> sql
> >> (for batch API there is another optimizer which is implemented by
> Flink).
> >>
> >> Yes, now for table & sql, Flink use Apache Calcite's query optimizer to
> >> translate into a Calcite plan
> >> which is then optimized according to Calcite's optimization rules.
> >>
> >> The following rules are applied so far:
> >>
> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/FlinkRuleSets.scala
> >>
> >> In view of Flink depends on the Calcite to do the optimization, I think
> >> enhance Flink and Calcite would be the right direction.
> >>
> >> Hope for you provide more idea and details. Flink community welcome your
> >> idea and contribution.
> >>
> >> Thanks.
> >> Vino.
> >>
> >>
> >> 2018-07-13 23:39 GMT+08:00 Albert Jonathan :
> >>
> >>> Hello,
> >>>
> >>> I am just wondering, does Flink use Apache Calcite's query optimizer to
> >>> generate an optimal logical plan, or does it have its own query
> optimizer?
> >>> From what I observed so far, the Flink's query optimizer only groups
> >>> operator together without changing the order of aggregation operators
> >>> (e.g., join). Did I miss anything?
> >>>
> >>> I am thinking of extending Flink to apply query optimization as in the
> >>> RDBMS by either integrating it with Calcite or implementing it as a new
> >>> module.
> >>> Any feedback or guidelines will be highly appreciated.
> >>>
> >>> Thank you,
> >>> Albert
> >>>
> >
>
>


Re: Flink Query Optimizer

2018-07-15 Thread Piotr Nowojski
Hi,

Currently the biggest limitation that prevents better query optimisation is 
lack of table statistics (which are not trivial to provide in streaming), thus 
Joins/Aggregation reordering doesn’t work. We have some ideas how to tackle 
this issue and definitely at some point of time we will improve this.

Piotrek

> On 14 Jul 2018, at 06:48, Xingcan Cui  wrote:
> 
> Hi Albert,
> 
> Calcite provides a rule-based optimizer (as a framework), which means users 
> can customize it by adding rules. That’s exactly what Flink did. From the 
> logical plan to the physical plan, the translations are triggered by 
> different sets of rules, according to which the relational expressions are 
> replaced, reordered or optimized.
> 
> However, IMO, the current optimization rules in Flink Table API are quite 
> primal. Some SQL statements (e.g., multiple joins) are just translated to 
> feasible execution plans, instead of optimized ones, since it’s much more 
> difficult to conduct query optimization on large datasets or dynamic streams. 
> You could first start from the Calcite query optimizer, and then try to make 
> your own rules.
> 
> Best,
> Xingcan
> 
>> On Jul 14, 2018, at 11:55 AM, vino yang  wrote:
>> 
>> Hi Albert,
>> 
>> First I guess the query optimizer you mentioned is about Flink table & sql
>> (for batch API there is another optimizer which is implemented by Flink).
>> 
>> Yes, now for table & sql, Flink use Apache Calcite's query optimizer to
>> translate into a Calcite plan
>> which is then optimized according to Calcite's optimization rules.
>> 
>> The following rules are applied so far:
>> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/FlinkRuleSets.scala
>> 
>> In view of Flink depends on the Calcite to do the optimization, I think
>> enhance Flink and Calcite would be the right direction.
>> 
>> Hope for you provide more idea and details. Flink community welcome your
>> idea and contribution.
>> 
>> Thanks.
>> Vino.
>> 
>> 
>> 2018-07-13 23:39 GMT+08:00 Albert Jonathan :
>> 
>>> Hello,
>>> 
>>> I am just wondering, does Flink use Apache Calcite's query optimizer to
>>> generate an optimal logical plan, or does it have its own query optimizer?
>>> From what I observed so far, the Flink's query optimizer only groups
>>> operator together without changing the order of aggregation operators
>>> (e.g., join). Did I miss anything?
>>> 
>>> I am thinking of extending Flink to apply query optimization as in the
>>> RDBMS by either integrating it with Calcite or implementing it as a new
>>> module.
>>> Any feedback or guidelines will be highly appreciated.
>>> 
>>> Thank you,
>>> Albert
>>> 
> 



[jira] [Created] (FLINK-9853) add hex support in table api and sql

2018-07-15 Thread xueyu (JIRA)
xueyu created FLINK-9853:


 Summary: add hex support in table api and sql
 Key: FLINK-9853
 URL: https://issues.apache.org/jira/browse/FLINK-9853
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: xueyu


like in mysql, HEX could take int or string arguments, For a integer argument 
N, it returns a hexadecimal string representation of the value of N. For a 
string argument str, it returns a hexadecimal string representation of str 
where each byte of each character in str is converted to two hexadecimal 
digits. 

Syntax:

HEX(100) = 64

HEX('This is a test String.') = '546869732069732061207465737420537472696e672e'

See more: [link 
MySQL|https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9852) Expose descriptor-based sink creation in table environments

2018-07-15 Thread Timo Walther (JIRA)
Timo Walther created FLINK-9852:
---

 Summary: Expose descriptor-based sink creation in table 
environments
 Key: FLINK-9852
 URL: https://issues.apache.org/jira/browse/FLINK-9852
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Timo Walther
Assignee: Timo Walther


Currently, only a table source can be created using the unified table 
descriptors with {{tableEnv.from(...)}}. A similar approach should be supported 
for defining sinks or even both types at the same time.

I suggest the following syntax:
{code}
tableEnv.connect(Kafka(...)).registerSource("name")
tableEnv.connect(Kafka(...)).registerSink("name")
tableEnv.connect(Kafka(...)).registerSourceAndSink("name")
{code}

A table could then access the registered source/sink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9851) Add documentation for unified table sources/sinks

2018-07-15 Thread Timo Walther (JIRA)
Timo Walther created FLINK-9851:
---

 Summary: Add documentation for unified table sources/sinks
 Key: FLINK-9851
 URL: https://issues.apache.org/jira/browse/FLINK-9851
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table API & SQL
Reporter: Timo Walther
Assignee: Timo Walther


FLINK-8558 and FLINK-8866 reworked a lot of the existing table sources/sinks 
and the way they are discovered. We should rework the documentation about:

- Built-in sinks/source/formats and their properties for Table API and SQL 
Client
- How to write custom sinks/sources/formats
- Limitations such as {{property-version}}, {{rowtime.timestamps.from-source}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9850) Add a string to the print method to identify output for DataStream

2018-07-15 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-9850:
--

 Summary: Add a string to the print method to identify output for 
DataStream
 Key: FLINK-9850
 URL: https://issues.apache.org/jira/browse/FLINK-9850
 Project: Flink
  Issue Type: New Feature
  Components: DataStream API
Reporter: Hequn Cheng


The output of the print method of {[DataSet}} allows the user to supply a 
String to identify the output(see 
[FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But 
{[DataStream}} doesn't support now. It is valuable to add this feature for 
{{DataStream}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)