from:"godfrey he"

[jira] [Created] (FLINK-17126) Correct the execution behavior of BatchTableEnvironment

2020-04-13 Thread godfrey he (Jira)

godfrey he created FLINK-17126:
--

 Summary: Correct the execution behavior of BatchTableEnvironment
 Key: FLINK-17126
 URL: https://issues.apache.org/jira/browse/FLINK-17126
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


This issue is similar to 
[FLINK-16363|https://issues.apache.org/jira/browse/FLINK-16363].
In previous versions, BatchTableEnvironment.execute() can both trigger table 
and DataSet programs. Since 1.11.0, table programs can only be triggered by 
BatchTableEnvironment.execute(). Once table program is convereted into DataSet 
program (through toDataSet() method), it can only be triggered by 
ExecutionEnvironment.execute().






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-71: E2E View support in Flink SQL

2020-04-12 Thread godfrey he

+1 (non-binding)

Best,
Godfrey

Benchao Li  于2020年4月12日周日 下午12:28写道：

> +1 (non-binding)
>
> zoudan  于2020年4月12日周日 上午9:52写道：
>
> > +1 (non-binding)
> >
> > Best,
> > Dan Zou
> >
> >
> > > 在 2020年4月10日，09:30，Danny Chan  写道：
> > >
> > > +1 from my side.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年4月9日 +0800 PM9:23，Timo Walther ，写道：
> > >> +1 (binding)
> > >>
> > >> Thanks for your efforts.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 09.04.20 14:46, Zhenghua Gao wrote:
> > >>> Hi all,
> > >>>
> > >>> I'd like to start the vote for FLIP-71[1] which adds E2E view support
> > in
> > >>> Flink SQL.
> > >>> This FLIP is discussed in the thread[2].
> > >>>
> > >>> The vote will be open for at least 72 hours. Unless there is an
> > objection.
> > >>> I will try to
> > >>> close it by April 13, 2020 09:00 UTC if we have received sufficient
> > votes.
> > >>>
> > >>> [1]
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-71%3A+E2E+View+support+in+FLINK+SQL
> > >>>
> > >>> [2]
> > >>>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-71-E2E-View-support-in-Flink-SQL-td33131.html#a39787
> > >>>
> > >>> *Best Regards,*
> > >>> *Zhenghua Gao*
> > >>>
> > >>
> >
> >
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

[jira] [Created] (FLINK-17052) Introduce PlanGenerator

2020-04-08 Thread godfrey he (Jira)

godfrey he created FLINK-17052:
--

 Summary: Introduce PlanGenerator
 Key: FLINK-17052
 URL: https://issues.apache.org/jira/browse/FLINK-17052
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: godfrey he
 Fix For: 1.11.0


As [FLINK-16533|https://issues.apache.org/jira/browse/FLINK-16533] discussed, 
We move the most part logic of {{ExecutionEnvironment#createProgramPlan}} 
method to {{PlanGenerator}}, which can be used by {{ExecutionEnvironment}} and 
flink-table-planner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-08 Thread godfrey he

Hi everyone,

Thanks all for the votes.
So far, we have

   - 3 binding +1 votes (Timo, Kurt, Dawid)
   - 1 non-binding +1 votes (Terry)
   - No -1 votes

The voting time has past and there is enough +1 votes to consider the FLIP-84
approved.
Thank you all.


Best,
Godfrey

Dawid Wysakowicz  于2020年4月7日周二 下午2:29写道：

> +1
>
> Best,
>
> Dawid
>
> On 07/04/2020 07:44, godfrey he wrote:
> > Hi, Kurt
> >
> > yes. `TableEnvironement#executeSql` also could execute `SELECT`
> statement,
> > which is similar to `Table#execute`.
> > I add this to the document.
> >
> > Best,
> > Godfrey
> >
> > Kurt Young  于2020年4月7日周二 上午11:52写道：
> >
> >> +1 (binding)
> >>
> >> The latest doc looks good to me. One minor comment is with the latest
> >> changes, it seems also very easy
> >> to support running SELECT query in TableEnvironement#executeSql method.
> >> Will this also be supported?
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther 
> wrote:
> >>
> >>> Thanks, for the update.
> >>>
> >>> +1 (binding) for this FLIP
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 06.04.20 16:47, godfrey he wrote:
> >>>> Hi Timo,
> >>>>
> >>>> Sorry for late reply, and thanks for your correction. I have fixed the
> >>> typo
> >>>> and updated the document.
> >>>>
> >>>> Best,
> >>>> Godfrey
> >>>>
> >>>> Timo Walther  于2020年4月6日周一 下午6:05写道：
> >>>>
> >>>>> Hi Godfrey,
> >>>>>
> >>>>> did you see my remaining feedback in the discussion thread? We could
> >>>>> finish this FLIP if this gets resolved.
> >>>>>
> >>>>> Thanks,
> >>>>> Timo
> >>>>>
> >>>>> On 03.04.20 15:12, Terry Wang wrote:
> >>>>>> +1 (non-binding)
> >>>>>> Looks great to me, Thanks for driving on this.
> >>>>>>
> >>>>>> Best,
> >>>>>> Terry Wang
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> 2020年4月3日 21:07，godfrey he  写道：
> >>>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> I'd like to start the vote of FLIP-84[1] again, which is discussed
> >> and
> >>>>>>> reached consensus in the discussion thread[2].
> >>>>>>>
> >>>>>>> The vote will be open for at least 72 hours. Unless there is an
> >>>>> objection,
> >>>>>>> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> >>>>>>> sufficient votes.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>> [2]
> >>>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>>>>>
> >>>>>>> Bests,
> >>>>>>> Godfrey
> >>>>>>>
> >>>>>>> godfrey he  于2020年3月31日周二 下午8:42写道：
> >>>>>>>
> >>>>>>>> Hi, Timo
> >>>>>>>>
> >>>>>>>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Godfrey
> >>>>>>>>
> >>>>>>>> Timo Walther  于2020年3月31日周二 下午5:26写道：
> >>>>>>>>
> >>>>>>>>> -1
> >>>>>>>>>
> >>>>>>>>> The current discussion has not completed. The last comments were
> >>> sent
> >>>>>>>>> less than 24h ago.
> >>>>>>>>>
> >>>>>>>>> Let's wait a bit longer to collect feedback from all
> stakeholders.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>> On 31.03.20 08:31, godfrey he wrote:
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> I'd like to start the vote of FLIP-84[1] again, because we have
> >>> some
> >>>>>>>>>> feedbacks. The feedbacks are all about new introduced methods,
> >> here
> >>>>> is
> >>>>>>>>> the
> >>>>>>>>>> discussion thread [2].
> >>>>>>>>>>
> >>>>>>>>>> The vote will be open for at least 72 hours. Unless there is an
> >>>>>>>>> objection,
> >>>>>>>>>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
> >> received
> >>>>>>>>>> sufficient votes.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>>>>>>>>
> >>>>>>>>>> Bests,
> >>>>>>>>>> Godfrey
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
>
>

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread godfrey he

Hi, Kurt

yes. `TableEnvironement#executeSql` also could execute `SELECT` statement,
which is similar to `Table#execute`.
I add this to the document.

Best,
Godfrey

Kurt Young  于2020年4月7日周二 上午11:52写道：

> +1 (binding)
>
> The latest doc looks good to me. One minor comment is with the latest
> changes, it seems also very easy
> to support running SELECT query in TableEnvironement#executeSql method.
> Will this also be supported?
>
> Best,
> Kurt
>
>
> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther  wrote:
>
> > Thanks, for the update.
> >
> > +1 (binding) for this FLIP
> >
> > Regards,
> > Timo
> >
> >
> > On 06.04.20 16:47, godfrey he wrote:
> > > Hi Timo,
> > >
> > > Sorry for late reply, and thanks for your correction. I have fixed the
> > typo
> > > and updated the document.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther  于2020年4月6日周一 下午6:05写道：
> > >
> > >> Hi Godfrey,
> > >>
> > >> did you see my remaining feedback in the discussion thread? We could
> > >> finish this FLIP if this gets resolved.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >> On 03.04.20 15:12, Terry Wang wrote:
> > >>> +1 (non-binding)
> > >>> Looks great to me, Thanks for driving on this.
> > >>>
> > >>> Best,
> > >>> Terry Wang
> > >>>
> > >>>
> > >>>
> > >>>> 2020年4月3日 21:07，godfrey he  写道：
> > >>>>
> > >>>> Hi everyone,
> > >>>>
> > >>>> I'd like to start the vote of FLIP-84[1] again, which is discussed
> and
> > >>>> reached consensus in the discussion thread[2].
> > >>>>
> > >>>> The vote will be open for at least 72 hours. Unless there is an
> > >> objection,
> > >>>> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> > >>>> sufficient votes.
> > >>>>
> > >>>>
> > >>>> [1]
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>
> > >>>> [2]
> > >>>>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> > >>>>
> > >>>>
> > >>>> Bests,
> > >>>> Godfrey
> > >>>>
> > >>>> godfrey he  于2020年3月31日周二 下午8:42写道：
> > >>>>
> > >>>>> Hi, Timo
> > >>>>>
> > >>>>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> > >>>>>
> > >>>>> Best,
> > >>>>> Godfrey
> > >>>>>
> > >>>>> Timo Walther  于2020年3月31日周二 下午5:26写道：
> > >>>>>
> > >>>>>> -1
> > >>>>>>
> > >>>>>> The current discussion has not completed. The last comments were
> > sent
> > >>>>>> less than 24h ago.
> > >>>>>>
> > >>>>>> Let's wait a bit longer to collect feedback from all stakeholders.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Timo
> > >>>>>>
> > >>>>>> On 31.03.20 08:31, godfrey he wrote:
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> I'd like to start the vote of FLIP-84[1] again, because we have
> > some
> > >>>>>>> feedbacks. The feedbacks are all about new introduced methods,
> here
> > >> is
> > >>>>>> the
> > >>>>>>> discussion thread [2].
> > >>>>>>>
> > >>>>>>> The vote will be open for at least 72 hours. Unless there is an
> > >>>>>> objection,
> > >>>>>>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
> received
> > >>>>>>> sufficient votes.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> [1]
> > >>>>>>>
> > >>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>>>>
> > >>>>>>> [2]
> > >>>>>>>
> > >>>>>>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Bests,
> > >>>>>>> Godfrey
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>
> > >>
> > >
> >
> >
>

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread godfrey he

Hi Timo,

Sorry for late reply, and thanks for your correction. I have fixed the typo
and updated the document.

Best,
Godfrey

Timo Walther  于2020年4月6日周一 下午6:05写道：

> Hi Godfrey,
>
> did you see my remaining feedback in the discussion thread? We could
> finish this FLIP if this gets resolved.
>
> Thanks,
> Timo
>
> On 03.04.20 15:12, Terry Wang wrote:
> > +1 (non-binding)
> > Looks great to me, Thanks for driving on this.
> >
> > Best,
> > Terry Wang
> >
> >
> >
> >> 2020年4月3日 21:07，godfrey he  写道：
> >>
> >> Hi everyone,
> >>
> >> I'd like to start the vote of FLIP-84[1] again, which is discussed and
> >> reached consensus in the discussion thread[2].
> >>
> >> The vote will be open for at least 72 hours. Unless there is an
> objection,
> >> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> >> sufficient votes.
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>
> >> [2]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>
> >>
> >> Bests,
> >> Godfrey
> >>
> >> godfrey he  于2020年3月31日周二 下午8:42写道：
> >>
> >>> Hi, Timo
> >>>
> >>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> >>>
> >>> Best,
> >>> Godfrey
> >>>
> >>> Timo Walther  于2020年3月31日周二 下午5:26写道：
> >>>
> >>>> -1
> >>>>
> >>>> The current discussion has not completed. The last comments were sent
> >>>> less than 24h ago.
> >>>>
> >>>> Let's wait a bit longer to collect feedback from all stakeholders.
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>> On 31.03.20 08:31, godfrey he wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I'd like to start the vote of FLIP-84[1] again, because we have some
> >>>>> feedbacks. The feedbacks are all about new introduced methods, here
> is
> >>>> the
> >>>>> discussion thread [2].
> >>>>>
> >>>>> The vote will be open for at least 72 hours. Unless there is an
> >>>> objection,
> >>>>> I will try to close it by Apr 3, 2020 06:30 UTC if we have received
> >>>>> sufficient votes.
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>
> >>>>> [2]
> >>>>>
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>>>
> >>>>>
> >>>>> Bests,
> >>>>> Godfrey
> >>>>>
> >>>>
> >>>>
>
>

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-06 Thread godfrey he

Hi Timo,

Sorry for the late reply, and thanks for your correction.
I missed DQL for job submission scenario.
I'll fix the document right away.

Best,
Godfrey

Timo Walther  于2020年4月3日周五 下午9:53写道：

> Hi Godfrey,
>
> I'm sorry to jump in again but I still need to clarify some things
> around TableResult.
>
> The FLIP says:
> "For DML, this method returns TableResult until the job is submitted.
> For other statements, TableResult is returned until the execution is
> finished."
>
> I thought we agreed on making every execution async? This also means
> returning a TableResult for DQLs even though the execution is not done
> yet. People need access to the JobClient also for batch jobs in order to
> cancel long lasting queries. If people want to wait for the completion
> they can hook into JobClient or collect().
>
> Can we rephrase this part to:
>
> The FLIP says:
> "For DML and DQL, this method returns TableResult once the job has been
> submitted. For DDL and DCL statements, TableResult is returned once the
> operation has finished."
>
> Regards,
> Timo
>
>
> On 02.04.20 05:27, godfrey he wrote:
> > Hi Aljoscha, Dawid, Timo,
> >
> > Thanks so much for the detailed explanation.
> > Agree with you that the multiline story is not completed now, and we can
> > keep discussion.
> > I will add current discussions and conclusions to the FLIP.
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Timo Walther  于2020年4月1日周三 下午11:27写道：
> >
> >> Hi Godfrey,
> >>
> >> first of all, I agree with Dawid. The multiline story is not completed
> >> by this FLIP. It just verifies the big picture.
> >>
> >> 1. "control the execution logic through the proposed method if they know
> >> what the statements are"
> >>
> >> This is a good point that also Fabian raised in the linked google doc. I
> >> could also imagine to return a more complicated POJO when calling
> >> `executeMultiSql()`.
> >>
> >> The POJO would include some `getSqlProperties()` such that a platform
> >> gets insights into the query before executing. We could also trigger the
> >> execution more explicitly instead of hiding it behind an iterator.
> >>
> >> 2. "there are some special commands introduced in SQL client"
> >>
> >> For platforms and SQL Client specific commands, we could offer a hook to
> >> the parser or a fallback parser in case the regular table environment
> >> parser cannot deal with the statement.
> >>
> >> However, all of that is future work and can be discussed in a separate
> >> FLIP.
> >>
> >> 3. +1 for the `Iterator` instead of `Iterable`.
> >>
> >> 4. "we should convert the checked exception to unchecked exception"
> >>
> >> Yes, I meant using a runtime exception instead of a checked exception.
> >> There was no consensus on putting the exception into the `TableResult`.
> >>
> >> Regards,
> >> Timo
> >>
> >> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> >>> When considering the multi-line support I think it is helpful to start
> >>> with a use case in mind. In my opinion consumers of this method will
> be:
> >>>
> >>>   1. sql-client
> >>>   2. third-part sql based platforms
> >>>
> >>> @Godfrey As for the quit/source/... commands. I think those belong to
> >>> the responsibility of aforementioned. I think they should not be
> >>> understandable by the TableEnvironment. What would quit on a
> >>> TableEnvironment do? Moreover I think such commands should be prefixed
> >>> appropriately. I think it's a common practice to e.g. prefix those with
> >>> ! or : to say they are meta commands of the tool rather than a query.
> >>>
> >>> I also don't necessarily understand why platform users need to know the
> >>> kind of the query to use the proposed method. They should get the type
> >>> from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
> >>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
> >>> we can enrich the TableResult with more explicit kind of query, but so
> >>> far I don't see such a need.
> >>>
> >>> @Kurt In those cases I would assume the developers want to present
> >>> results of the queries anyway. Moreover I think it is safe to assume
> >>> they can adhere to such a contract that the results

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-03 Thread godfrey he

Hi everyone,

I'd like to start the vote of FLIP-84[1] again, which is discussed and
reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Apr 6, 2020 13:10 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html


Bests,
Godfrey

godfrey he  于2020年3月31日周二 下午8:42写道：

> Hi, Timo
>
> So sorry about that, I'm in a little hurry. Let's wait for 24h.
>
> Best,
> Godfrey
>
> Timo Walther  于2020年3月31日周二 下午5:26写道：
>
>> -1
>>
>> The current discussion has not completed. The last comments were sent
>> less than 24h ago.
>>
>> Let's wait a bit longer to collect feedback from all stakeholders.
>>
>> Thanks,
>> Timo
>>
>> On 31.03.20 08:31, godfrey he wrote:
>> > Hi everyone,
>> >
>> > I'd like to start the vote of FLIP-84[1] again, because we have some
>> > feedbacks. The feedbacks are all about new introduced methods, here is
>> the
>> > discussion thread [2].
>> >
>> > The vote will be open for at least 72 hours. Unless there is an
>> objection,
>> > I will try to close it by Apr 3, 2020 06:30 UTC if we have received
>> > sufficient votes.
>> >
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> >
>> > [2]
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
>> >
>> >
>> > Bests,
>> > Godfrey
>> >
>>
>>

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread godfrey he

Hi Aljoscha, Dawid, Timo,

Thanks so much for the detailed explanation.
Agree with you that the multiline story is not completed now, and we can
keep discussion.
I will add current discussions and conclusions to the FLIP.

Best,
Godfrey



Timo Walther  于2020年4月1日周三 下午11:27写道：

> Hi Godfrey,
>
> first of all, I agree with Dawid. The multiline story is not completed
> by this FLIP. It just verifies the big picture.
>
> 1. "control the execution logic through the proposed method if they know
> what the statements are"
>
> This is a good point that also Fabian raised in the linked google doc. I
> could also imagine to return a more complicated POJO when calling
> `executeMultiSql()`.
>
> The POJO would include some `getSqlProperties()` such that a platform
> gets insights into the query before executing. We could also trigger the
> execution more explicitly instead of hiding it behind an iterator.
>
> 2. "there are some special commands introduced in SQL client"
>
> For platforms and SQL Client specific commands, we could offer a hook to
> the parser or a fallback parser in case the regular table environment
> parser cannot deal with the statement.
>
> However, all of that is future work and can be discussed in a separate
> FLIP.
>
> 3. +1 for the `Iterator` instead of `Iterable`.
>
> 4. "we should convert the checked exception to unchecked exception"
>
> Yes, I meant using a runtime exception instead of a checked exception.
> There was no consensus on putting the exception into the `TableResult`.
>
> Regards,
> Timo
>
> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> > When considering the multi-line support I think it is helpful to start
> > with a use case in mind. In my opinion consumers of this method will be:
> >
> >  1. sql-client
> >  2. third-part sql based platforms
> >
> > @Godfrey As for the quit/source/... commands. I think those belong to
> > the responsibility of aforementioned. I think they should not be
> > understandable by the TableEnvironment. What would quit on a
> > TableEnvironment do? Moreover I think such commands should be prefixed
> > appropriately. I think it's a common practice to e.g. prefix those with
> > ! or : to say they are meta commands of the tool rather than a query.
> >
> > I also don't necessarily understand why platform users need to know the
> > kind of the query to use the proposed method. They should get the type
> > from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
> > DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
> > we can enrich the TableResult with more explicit kind of query, but so
> > far I don't see such a need.
> >
> > @Kurt In those cases I would assume the developers want to present
> > results of the queries anyway. Moreover I think it is safe to assume
> > they can adhere to such a contract that the results must be iterated.
> >
> > For direct users of TableEnvironment/Table API this method does not make
> > much sense anyway, in my opinion. I think we can rather safely assume in
> > this scenario they do not want to submit multiple queries at a single
> time.
> >
> > Best,
> >
> > Dawid
> >
> >
> > On 01/04/2020 15:07, Kurt Young wrote:
> >> One comment to `executeMultilineSql`, I'm afraid sometimes user might
> >> forget to
> >> iterate the returned iterators, e.g. user submits a bunch of DDLs and
> >> expect the
> >> framework will execute them one by one. But it didn't.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek
> wrote:
> >>
> >>> Agreed to what Dawid and Timo said.
> >>>
> >>> To answer your question about multi line SQL: no, we don't think we
> need
> >>> this in Flink 1.11, we only wanted to make sure that the interfaces
> that
> >>> we now put in place will potentially allow this in the future.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>> On 01.04.20 09:31, godfrey he wrote:
> >>>> Hi, Timo & Dawid,
> >>>>
> >>>> Thanks so much for the effort of `multiline statements supporting`,
> >>>> I have a few questions about this method:
> >>>>
> >>>> 1. users can well control the execution logic through the proposed
> method
> >>>>if they know what the statements are (a statement is a DDL, a DML
> or
> >>>> others).
> >>>> but if a statem

Re: [ANNOUNCE] New Committers and PMC member

2020-04-01 Thread godfrey he

Congratulations to all of you~

Best,
Godfrey

Ismaël Mejía  于2020年4月2日周四 上午6:42写道：

> Congrats everyone!
>
> On Thu, Apr 2, 2020 at 12:16 AM Rong Rong  wrote:
> >
> > Congratulations to all!!!
> >
> > --
> > Rong
> >
> > On Wed, Apr 1, 2020 at 2:27 PM Thomas Weise  wrote:
> >
> > > Congratulations!
> > >
> > >
> > > On Wed, Apr 1, 2020 at 9:31 AM Fabian Hueske 
> wrote:
> > >
> > > > Congrats everyone!
> > > >
> > > > Cheers, Fabian
> > > >
> > > > Am Mi., 1. Apr. 2020 um 18:26 Uhr schrieb Yun Tang  >:
> > > >
> > > > > Congratulations to all of you!
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > > 
> > > > > From: Yang Wang 
> > > > > Sent: Wednesday, April 1, 2020 22:28
> > > > > To: dev 
> > > > > Subject: Re: [ANNOUNCE] New Committers and PMC member
> > > > >
> > > > > Congratulations all.
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Leonard Xu  于2020年4月1日周三 下午10:15写道：
> > > > >
> > > > > > Congratulations Konstantin, Dawid and Zhijiang!  Well deserved!
> > > > > >
> > > > > > Best,
> > > > > > Leonard Xu
> > > > > > > 在 2020年4月1日，21:22，Jark Wu  写道：
> > > > > > >
> > > > > > > Congratulations to you all!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > > On Wed, 1 Apr 2020 at 20:33, Kurt Young 
> wrote:
> > > > > > >
> > > > > > >> Congratulations to you all!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Kurt
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Apr 1, 2020 at 7:41 PM Danny Chan <
> yuzhao@gmail.com>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> Congratulations!
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Danny Chan
> > > > > > >>> 在 2020年4月1日 +0800 PM7:36，dev@flink.apache.org，写道：
> > > > > > 
> > > > > >  Congratulations!
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread godfrey he

Hi Timo,

Regarding to "`execute` method throws checked exception",
 is that mean we should convert the checked exception to unchecked
exception
or we need add ERROR type in ResultKind.

for the second approach, I still think it's not convenient
for the user to check exception when calling `collect` method and `print`
method.
the code looks like:

// add `getError()` method in TableResult and store the exception
in TableResult independent
TableResult result = tEnv.executeSql("select xxx");
if (result.getResultKind() == ResultKind.ERROR) {
  print result.getError();
} else {
  Iterator it =  result.collect();
  it...
}

 // treat the exception as a kind of result, and get exception through
`collect` method
TableResult result = tEnv.executeSql("select xxx");
if (result.getResultKind() == ResultKind.ERROR) {
   Iterator it =  result.collect();
   Row row = it.next();
   print row.getField(0);
} else {
  Iterator it =  result.collect();
  it...
}

// for fluent programming
Iterator it = tEnv.executeSql("select xxx").collect();
it...

Best,
Godfrey

Timo Walther  于2020年4月1日周三 上午1:27写道：

> Hi Godfrey,
>
> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In
> particular, we discussed how the current status of the FLIP and the
> future requirements around multiline statements, async/sync, collect()
> fit together.
>
> We also updated the FLIP-84 Feedback Summary document [1] with some use
> cases.
>
> We believe that we found a good solution that also fits to what is in
> the current FLIP. So no bigger changes necessary, which is great!
>
> Our findings were:
>
> 1. Async vs sync submission of Flink jobs:
>
> Having a blocking `execute()` in DataStream API was rather a mistake.
> Instead all submissions should be async because this allows supporting
> both modes if necessary. Thus, submitting all queries async sounds good
> to us. If users want to run a job sync, they can use the JobClient and
> wait for completion (or collect() in case of batch jobs).
>
> 2. Multi-statement execution:
>
> For the multi-statement execution, we don't see a contradication with
> the async execution behavior. We imagine a method like:
>
> TableEnvironment#executeMultilineSql(String statements):
> Iterable
>
> Where the `Iterator#next()` method would trigger the next statement
> submission. This allows a caller to decide synchronously when to submit
> statements async to the cluster. Thus, a service such as the SQL Client
> can handle the result of each statement individually and process
> statement by statement sequentially.
>
> 3. The role of TableResult and result retrieval in general
>
> `TableResult` is similar to `JobClient`. Instead of returning a
> `CompletableFuture` of something, it is a concrete util class where some
> methods have the behavior of completable future (e.g. collect(),
> print()) and some are already completed (getTableSchema(),
> getResultKind()).
>
> `StatementSet#execute()` returns a single `TableResult` because the
> order is undefined in a set and all statements have the same schema. Its
> `collect()` will return a row for each executed `INSERT INTO` in the
> order of statement definition.
>
> For simple `SELECT * FROM ...`, the query execution might block until
> `collect()` is called to pull buffered rows from the job (from
> socket/REST API what ever we will use in the future). We can say that a
> statement finished successfully, when the `collect#Iterator#hasNext` has
> returned false.
>
> I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
>
> It would be great if we can add these findings to the FLIP before we
> start voting.
>
> One minor thing: some `execute()` methods still throw a checked
> exception; can we remove that from the FLIP? Also the above mentioned
> `Iterator#next()` would trigger an execution without throwing a checked
> exception.
>
> Thanks,
> Timo
>
> [1]
>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
>
> On 31.03.20 06:28, godfrey he wrote:
> > Hi, Timo & Jark
> >
> > Thanks for your explanation.
> > Agree with you that async execution should always be async,
> > and sync execution scenario can be covered  by async execution.
> > It helps provide an unified entry point for batch and streaming.
> > I think we can also use sync execution for some testing.
> > So, I agree with you that we provide `executeSql` method and it's async
> > method.
> > If we want sync method in the future, we can add method named
> > `executeSqlSync`.
> >
> > I think we've reached an agreement. I will update the document, and start
> > voting process.
> >
> > Best,
> > Go

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread godfrey he

`, the query execution might block until
> > `collect()` is called to pull buffered rows from the job (from
> > socket/REST API what ever we will use in the future). We can say that
> > a statement finished successfully, when the `collect#Iterator#hasNext`
> > has returned false.
> >
> > I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
> >
> > It would be great if we can add these findings to the FLIP before we
> > start voting.
> >
> > One minor thing: some `execute()` methods still throw a checked
> > exception; can we remove that from the FLIP? Also the above mentioned
> > `Iterator#next()` would trigger an execution without throwing a
> > checked exception.
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
> >
> > On 31.03.20 06:28, godfrey he wrote:
> >> Hi, Timo & Jark
> >>
> >> Thanks for your explanation.
> >> Agree with you that async execution should always be async,
> >> and sync execution scenario can be covered  by async execution.
> >> It helps provide an unified entry point for batch and streaming.
> >> I think we can also use sync execution for some testing.
> >> So, I agree with you that we provide `executeSql` method and it's async
> >> method.
> >> If we want sync method in the future, we can add method named
> >> `executeSqlSync`.
> >>
> >> I think we've reached an agreement. I will update the document, and
> >> start
> >> voting process.
> >>
> >> Best,
> >> Godfrey
> >>
> >>
> >> Jark Wu  于2020年3月31日周二 上午12:46写道：
> >>
> >>> Hi,
> >>>
> >>> I didn't follow the full discussion.
> >>> But I share the same concern with Timo that streaming queries should
> >>> always
> >>> be async.
> >>> Otherwise, I can image it will cause a lot of confusion and problems if
> >>> users don't deeply keep the "sync" in mind (e.g. client hangs).
> >>> Besides, the streaming mode is still the majority use cases of Flink
> >>> and
> >>> Flink SQL. We should put the usability at a high priority.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Mon, 30 Mar 2020 at 23:27, Timo Walther  wrote:
> >>>
> >>>> Hi Godfrey,
> >>>>
> >>>> maybe I wasn't expressing my biggest concern enough in my last mail.
> >>>> Even in a singleline and sync execution, I think that streaming
> >>>> queries
> >>>> should not block the execution. Otherwise it is not possible to call
> >>>> collect() or print() on them afterwards.
> >>>>
> >>>> "there are too many things need to discuss for multiline":
> >>>>
> >>>> True, I don't want to solve all of them right now. But what I know is
> >>>> that our newly introduced methods should fit into a multiline
> >>>> execution.
> >>>> There is no big difference of calling `executeSql(A),
> >>>> executeSql(B)` and
> >>>> processing a multiline file `A;\nB;`.
> >>>>
> >>>> I think the example that you mentioned can simply be undefined for
> >>>> now.
> >>>> Currently, no catalog is modifying data but just metadata. This is a
> >>>> separate discussion.
> >>>>
> >>>> "result of the second statement is indeterministic":
> >>>>
> >>>> Sure this is indeterministic. But this is the implementers fault
> >>>> and we
> >>>> cannot forbid such pipelines.
> >>>>
> >>>> How about we always execute streaming queries async? It would unblock
> >>>> executeSql() and multiline statements.
> >>>>
> >>>> Having a `executeSqlAsync()` is useful for batch. However, I don't
> >>>> want
> >>>> `sync/async` be the new batch/stream flag. The execution behavior
> >>>> should
> >>>> come from the query itself.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 30.03.20 11:12, godfrey he wrote:
> >>>>> Hi Timo,
> >>>>>
> >>>>> Agree with you that streaming queries is our top priority

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-03-31 Thread godfrey he

Hi, Timo

So sorry about that, I'm in a little hurry. Let's wait for 24h.

Best,
Godfrey

Timo Walther  于2020年3月31日周二 下午5:26写道：

> -1
>
> The current discussion has not completed. The last comments were sent
> less than 24h ago.
>
> Let's wait a bit longer to collect feedback from all stakeholders.
>
> Thanks,
> Timo
>
> On 31.03.20 08:31, godfrey he wrote:
> > Hi everyone,
> >
> > I'd like to start the vote of FLIP-84[1] again, because we have some
> > feedbacks. The feedbacks are all about new introduced methods, here is
> the
> > discussion thread [2].
> >
> > The vote will be open for at least 72 hours. Unless there is an
> objection,
> > I will try to close it by Apr 3, 2020 06:30 UTC if we have received
> > sufficient votes.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >
> >
> > Bests,
> > Godfrey
> >
>
>

[jira] [Created] (FLINK-16881) use Catalog's total size info in planner

2020-03-31 Thread godfrey he (Jira)

godfrey he created FLINK-16881:
--

 Summary: use Catalog's total size info in planner
 Key: FLINK-16881
 URL: https://issues.apache.org/jira/browse/FLINK-16881
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he


in some case, {{Catalog}} only contains {{totalSize}} and row count is unknown. 
we also can use {{totalSize}} to infer row count, or even use {{totalSize}} to 
decide whether the join is broadcast join



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-03-31 Thread godfrey he

Hi everyone,

I'd like to start the vote of FLIP-84[1] again, because we have some
feedbacks. The feedbacks are all about new introduced methods, here is the
discussion thread [2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Apr 3, 2020 06:30 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html


Bests,
Godfrey

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread godfrey he

Hi, Timo & Jark

Thanks for your explanation.
Agree with you that async execution should always be async,
and sync execution scenario can be covered  by async execution.
It helps provide an unified entry point for batch and streaming.
I think we can also use sync execution for some testing.
So, I agree with you that we provide `executeSql` method and it's async
method.
If we want sync method in the future, we can add method named
`executeSqlSync`.

I think we've reached an agreement. I will update the document, and start
voting process.

Best,
Godfrey


Jark Wu  于2020年3月31日周二 上午12:46写道：

> Hi,
>
> I didn't follow the full discussion.
> But I share the same concern with Timo that streaming queries should always
> be async.
> Otherwise, I can image it will cause a lot of confusion and problems if
> users don't deeply keep the "sync" in mind (e.g. client hangs).
> Besides, the streaming mode is still the majority use cases of Flink and
> Flink SQL. We should put the usability at a high priority.
>
> Best,
> Jark
>
>
> On Mon, 30 Mar 2020 at 23:27, Timo Walther  wrote:
>
> > Hi Godfrey,
> >
> > maybe I wasn't expressing my biggest concern enough in my last mail.
> > Even in a singleline and sync execution, I think that streaming queries
> > should not block the execution. Otherwise it is not possible to call
> > collect() or print() on them afterwards.
> >
> > "there are too many things need to discuss for multiline":
> >
> > True, I don't want to solve all of them right now. But what I know is
> > that our newly introduced methods should fit into a multiline execution.
> > There is no big difference of calling `executeSql(A), executeSql(B)` and
> > processing a multiline file `A;\nB;`.
> >
> > I think the example that you mentioned can simply be undefined for now.
> > Currently, no catalog is modifying data but just metadata. This is a
> > separate discussion.
> >
> > "result of the second statement is indeterministic":
> >
> > Sure this is indeterministic. But this is the implementers fault and we
> > cannot forbid such pipelines.
> >
> > How about we always execute streaming queries async? It would unblock
> > executeSql() and multiline statements.
> >
> > Having a `executeSqlAsync()` is useful for batch. However, I don't want
> > `sync/async` be the new batch/stream flag. The execution behavior should
> > come from the query itself.
> >
> > Regards,
> > Timo
> >
> >
> > On 30.03.20 11:12, godfrey he wrote:
> > > Hi Timo,
> > >
> > > Agree with you that streaming queries is our top priority,
> > > but I think there are too many things need to discuss for multiline
> > > statements:
> > > e.g.
> > > 1. what's the behaivor of DDL and DML mixing for async execution:
> > > create table t1 xxx;
> > > create table t2 xxx;
> > > insert into t2 select * from t1 where xxx;
> > > drop table t1; // t1 may be a MySQL table, the data will also be
> deleted.
> > >
> > > t1 is dropped when "insert" job is running.
> > >
> > > 2. what's the behaivor of unified scenario for async execution: (as you
> > > mentioned)
> > > INSERT INTO t1 SELECT * FROM s;
> > > INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> > >
> > > The result of the second statement is indeterministic, because the
> first
> > > statement maybe is running.
> > > I think we need to put a lot of effort to define the behavior of
> > logically
> > > related queries.
> > >
> > > In this FLIP, I suggest we only handle single statement, and we also
> > > introduce an async execute method
> > > which is more important and more often used for users.
> > >
> > > Dor the sync methods (like `TableEnvironment.executeSql` and
> > > `StatementSet.execute`),
> > > the result will be returned until the job is finished. The following
> > > methods will be introduced in this FLIP:
> > >
> > >   /**
> > >* Asynchronously execute the given single statement
> > >*/
> > > TableEnvironment.executeSqlAsync(String statement): TableResult
> > >
> > > /**
> > >   * Asynchronously execute the dml statements as a batch
> > >   */
> > > StatementSet.executeAsync(): TableResult
> > >
> > > public interface TableResult {
> > > /**
> > >  * return JobClient for DQL and DML in async mode, else return
> > > Optional.empty
> > >

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread godfrey he

Hi Timo,

Agree with you that streaming queries is our top priority,
but I think there are too many things need to discuss for multiline
statements:
e.g.
1. what's the behaivor of DDL and DML mixing for async execution:
create table t1 xxx;
create table t2 xxx;
insert into t2 select * from t1 where xxx;
drop table t1; // t1 may be a MySQL table, the data will also be deleted.

t1 is dropped when "insert" job is running.

2. what's the behaivor of unified scenario for async execution: (as you
mentioned)
INSERT INTO t1 SELECT * FROM s;
INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;

The result of the second statement is indeterministic, because the first
statement maybe is running.
I think we need to put a lot of effort to define the behavior of logically
related queries.

In this FLIP, I suggest we only handle single statement, and we also
introduce an async execute method
which is more important and more often used for users.

Dor the sync methods (like `TableEnvironment.executeSql` and
`StatementSet.execute`),
the result will be returned until the job is finished. The following
methods will be introduced in this FLIP:

 /**
  * Asynchronously execute the given single statement
  */
TableEnvironment.executeSqlAsync(String statement): TableResult

/**
 * Asynchronously execute the dml statements as a batch
 */
StatementSet.executeAsync(): TableResult

public interface TableResult {
   /**
* return JobClient for DQL and DML in async mode, else return
Optional.empty
*/
   Optional getJobClient();
}

what do you think?

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午9:15写道：

> Hi Godfrey,
>
> executing streaming queries must be our top priority because this is
> what distinguishes Flink from competitors. If we change the execution
> behavior, we should think about the other cases as well to not break the
> API a third time.
>
> I fear that just having an async execute method will not be enough
> because users should be able to mix streaming and batch queries in a
> unified scenario.
>
> If I remember it correctly, we had some discussions in the past about
> what decides about the execution mode of a query. Currently, we would
> like to let the query decide, not derive it from the sources.
>
> So I could image a multiline pipeline as:
>
> USE CATALOG 'mycat';
> INSERT INTO t1 SELECT * FROM s;
> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
>
> For executeMultilineSql():
>
> sync because regular SQL
> sync because regular Batch SQL
> async because Streaming SQL
>
> For executeAsyncMultilineSql():
>
> async because everything should be async
> async because everything should be async
> async because everything should be async
>
> What we should not start for executeAsyncMultilineSql():
>
> sync because DDL
> async because everything should be async
> async because everything should be async
>
> What are you thoughts here?
>
> Regards,
> Timo
>
>
> On 26.03.20 12:50, godfrey he wrote:
> > Hi Timo,
> >
> > I agree with you that streaming queries mostly need async execution.
> > In fact, our original plan is only introducing sync methods in this FLIP,
> > and async methods (like "executeSqlAsync") will be introduced in the
> future
> > which is mentioned in the appendix.
> >
> > Maybe the async methods also need to be considered in this FLIP.
> >
> > I think sync methods is also useful for streaming which can be used to
> run
> > bounded source.
> > Maybe we should check whether all sources are bounded in sync execution
> > mode.
> >
> >> Also, if we block for streaming queries, we could never support
> >> multiline files. Because the first INSERT INTO would block the further
> >> execution.
> > agree with you, we need async method to submit multiline files,
> > and files should be limited that the DQL and DML should be always in the
> > end for streaming.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther  于2020年3月26日周四 下午4:29写道：
> >
> >> Hi Godfrey,
> >>
> >> having control over the job after submission is a requirement that was
> >> requested frequently (some examples are [1], [2]). Users would like to
> >> get insights about the running or completed job. Including the jobId,
> >> jobGraph etc., the JobClient summarizes these properties.
> >>
> >> It is good to have a discussion about synchronous/asynchronous
> >> submission now to have a complete execution picture.
> >>
> >> I thought we submit streaming queries mostly async and just wait for the
> >> successful submission. If we block for streaming queries, how can we
> >> collect() or print() results?
>

[jira] [Created] (FLINK-16822) The config set by SET command does not work

2020-03-26 Thread godfrey he (Jira)

godfrey he created FLINK-16822:
--

 Summary: The config set by SET command does not work
 Key: FLINK-16822
 URL: https://issues.apache.org/jira/browse/FLINK-16822
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.10.0
Reporter: godfrey he
 Fix For: 1.11.0


Users can add or change the properties for execution behavior through SET 
command in SQL client, e.g. {{SET execution.parallelism=10}}, {{SET 
table.optimizer.join-reorder-enabled=true}}. But the {{table.xx}} config can't 
change the TableEnvironment behavior, because the property set from CLI does 
not be set into TableEnvironment's table config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-26 Thread godfrey he

Hi Timo,

I agree with you that streaming queries mostly need async execution.
In fact, our original plan is only introducing sync methods in this FLIP,
and async methods (like "executeSqlAsync") will be introduced in the future
which is mentioned in the appendix.

Maybe the async methods also need to be considered in this FLIP.

I think sync methods is also useful for streaming which can be used to run
bounded source.
Maybe we should check whether all sources are bounded in sync execution
mode.

>Also, if we block for streaming queries, we could never support
> multiline files. Because the first INSERT INTO would block the further
> execution.
agree with you, we need async method to submit multiline files,
and files should be limited that the DQL and DML should be always in the
end for streaming.

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午4:29写道：

> Hi Godfrey,
>
> having control over the job after submission is a requirement that was
> requested frequently (some examples are [1], [2]). Users would like to
> get insights about the running or completed job. Including the jobId,
> jobGraph etc., the JobClient summarizes these properties.
>
> It is good to have a discussion about synchronous/asynchronous
> submission now to have a complete execution picture.
>
> I thought we submit streaming queries mostly async and just wait for the
> successful submission. If we block for streaming queries, how can we
> collect() or print() results?
>
> Also, if we block for streaming queries, we could never support
> multiline files. Because the first INSERT INTO would block the further
> execution.
>
> If we decide to block entirely on streaming queries, we need the async
> execution methods in the design already. However, I would rather go for
> non-blocking streaming queries. Also with the `EMIT STREAM` key word in
> mind that we might add to SQL statements soon.
>
> Regards,
> Timo
>
> [1] https://issues.apache.org/jira/browse/FLINK-16761
> [2] https://issues.apache.org/jira/browse/FLINK-12214
>
> On 25.03.20 16:30, godfrey he wrote:
> > Hi Timo,
> >
> > Thanks for the updating.
> >
> > Regarding to "multiline statement support", I'm also fine that
> > `TableEnvironment.executeSql()` only supports single line statement, and
> we
> > can support multiline statement later (needs more discussion about this).
> >
> > Regarding to "StatementSet.explian()", I don't have strong opinions about
> > that.
> >
> > Regarding to "TableResult.getJobClient()", I think it's unnecessary. The
> > reason is: first, many statements (e.g. DDL, show xx, use xx)  will not
> > submit a Flink job. second, `TableEnvironment.executeSql()` and
> > `StatementSet.execute()` are synchronous method, `TableResult` will be
> > returned only after the job is finished or failed.
> >
> > Regarding to "whether StatementSet.execute() needs to throw exception", I
> > think we should choose a unified way to tell whether the execution is
> > successful. If `TableResult` contains ERROR kind (non-runtime exception),
> > users need to not only check the result but also catch the runtime
> > exception in their code. or `StatementSet.execute()` does not throw any
> > exception (including runtime exception), all exception messages are in
> the
> > result.  I prefer "StatementSet.execute() needs to throw exception". cc
> @Jark
> > Wu 
> >
> > I will update the agreed parts to the document first.
> >
> > Best,
> > Godfrey
> >
> >
> > Timo Walther  于2020年3月25日周三 下午6:51写道：
> >
> >> Hi Godfrey,
> >>
> >> thanks for starting the discussion on the mailing list. And sorry again
> >> for the late reply to FLIP-84. I have updated the Google doc one more
> >> time to incorporate the offline discussions.
> >>
> >>   From Dawid's and my view, it is fine to postpone the multiline support
> >> to a separate method. This can be future work even though we will need
> >> it rather soon.
> >>
> >> If there are no objections, I suggest to update the FLIP-84 again and
> >> have another voting process.
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >> On 25.03.20 11:17, godfrey he wrote:
> >>> Hi community,
> >>> Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. The
> >> feedbacks
> >>> are all about new introduced methods. We had a discussion yesterday,
> and
> >>> most of feedbacks have been agreed upon. Here is the conclusions:
> >>>
> >>> *1. about propo

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-25 Thread godfrey he

Hi Timo,

Thanks for the updating.

Regarding to "multiline statement support", I'm also fine that
`TableEnvironment.executeSql()` only supports single line statement, and we
can support multiline statement later (needs more discussion about this).

Regarding to "StatementSet.explian()", I don't have strong opinions about
that.

Regarding to "TableResult.getJobClient()", I think it's unnecessary. The
reason is: first, many statements (e.g. DDL, show xx, use xx)  will not
submit a Flink job. second, `TableEnvironment.executeSql()` and
`StatementSet.execute()` are synchronous method, `TableResult` will be
returned only after the job is finished or failed.

Regarding to "whether StatementSet.execute() needs to throw exception", I
think we should choose a unified way to tell whether the execution is
successful. If `TableResult` contains ERROR kind (non-runtime exception),
users need to not only check the result but also catch the runtime
exception in their code. or `StatementSet.execute()` does not throw any
exception (including runtime exception), all exception messages are in the
result.  I prefer "StatementSet.execute() needs to throw exception". cc @Jark
Wu 

I will update the agreed parts to the document first.

Best,
Godfrey


Timo Walther  于2020年3月25日周三 下午6:51写道：

> Hi Godfrey,
>
> thanks for starting the discussion on the mailing list. And sorry again
> for the late reply to FLIP-84. I have updated the Google doc one more
> time to incorporate the offline discussions.
>
>  From Dawid's and my view, it is fine to postpone the multiline support
> to a separate method. This can be future work even though we will need
> it rather soon.
>
> If there are no objections, I suggest to update the FLIP-84 again and
> have another voting process.
>
> Thanks,
> Timo
>
>
> On 25.03.20 11:17, godfrey he wrote:
> > Hi community,
> > Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. The
> feedbacks
> > are all about new introduced methods. We had a discussion yesterday, and
> > most of feedbacks have been agreed upon. Here is the conclusions:
> >
> > *1. about proposed methods in `TableEnvironment`:*
> >
> > the original proposed methods:
> >
> > TableEnvironment.createDmlBatch(): DmlBatch
> > TableEnvironment.executeStatement(String statement): ResultTable
> >
> > the new proposed methods:
> >
> > // we should not use abbreviations in the API, and the term "Batch" is
> > easily confused with batch/streaming processing
> > TableEnvironment.createStatementSet(): StatementSet
> >
> > // every method that takes SQL should have `Sql` in its name
> > // supports multiline statement ???
> > TableEnvironment.executeSql(String statement): TableResult
> >
> > // new methods. supports explaining DQL and DML
> > TableEnvironment.explainSql(String statement, ExplainDetail... details):
> > String
> >
> >
> > *2. about proposed related classes:*
> >
> > the original proposed classes:
> >
> > interface DmlBatch {
> >  void addInsert(String insert);
> >  void addInsert(String targetPath, Table table);
> >  ResultTable execute() throws Exception ;
> >  String explain(boolean extended);
> > }
> >
> > public interface ResultTable {
> >  TableSchema getResultSchema();
> >  Iterable getResultRows();
> > }
> >
> > the new proposed classes:
> >
> > interface StatementSet {
> >  // every method that takes SQL should have `Sql` in its name
> >  // return StatementSet instance for fluent programming
> >  addInsertSql(String statement): StatementSet
> >
> >  // return StatementSet instance for fluent programming
> >  addInsert(String tablePath, Table table): StatementSet
> >
> >  // new method. support overwrite mode
> >  addInsert(String tablePath, Table table, boolean overwrite):
> > StatementSet
> >
> >  explain(): String
> >
> >  // new method. supports adding more details for the result
> >  explain(ExplainDetail... extraDetails): String
> >
> >  // throw exception ???
> >  execute(): TableResult
> > }
> >
> > interface TableResult {
> >  getTableSchema(): TableSchema
> >
> >  // avoid custom parsing of an "OK" row in programming
> >  getResultKind(): ResultKind
> >
> >  // instead of `get` make it explicit that this is might be
> triggering
> > an expensive operation
> >  collect(): Iterable
> >
> >  // for fluent programming
> >  print():

[DISCUSS] FLIP-84 Feedback Summary

2020-03-25 Thread godfrey he

Hi community,
Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. The feedbacks
are all about new introduced methods. We had a discussion yesterday, and
most of feedbacks have been agreed upon. Here is the conclusions:

*1. about proposed methods in `TableEnvironment`:*

the original proposed methods:

TableEnvironment.createDmlBatch(): DmlBatch
TableEnvironment.executeStatement(String statement): ResultTable

the new proposed methods:

// we should not use abbreviations in the API, and the term "Batch" is
easily confused with batch/streaming processing
TableEnvironment.createStatementSet(): StatementSet

// every method that takes SQL should have `Sql` in its name
// supports multiline statement ???
TableEnvironment.executeSql(String statement): TableResult

// new methods. supports explaining DQL and DML
TableEnvironment.explainSql(String statement, ExplainDetail... details):
String


*2. about proposed related classes:*

the original proposed classes:

interface DmlBatch {
void addInsert(String insert);
void addInsert(String targetPath, Table table);
ResultTable execute() throws Exception ;
String explain(boolean extended);
}

public interface ResultTable {
TableSchema getResultSchema();
Iterable getResultRows();
}

the new proposed classes:

interface StatementSet {
// every method that takes SQL should have `Sql` in its name
// return StatementSet instance for fluent programming
addInsertSql(String statement): StatementSet

// return StatementSet instance for fluent programming
addInsert(String tablePath, Table table): StatementSet

// new method. support overwrite mode
addInsert(String tablePath, Table table, boolean overwrite):
StatementSet

explain(): String

// new method. supports adding more details for the result
explain(ExplainDetail... extraDetails): String

// throw exception ???
execute(): TableResult
}

interface TableResult {
getTableSchema(): TableSchema

// avoid custom parsing of an "OK" row in programming
getResultKind(): ResultKind

// instead of `get` make it explicit that this is might be triggering
an expensive operation
collect(): Iterable

// for fluent programming
print(): Unit
}

enum ResultKind {
SUCCESS, // for DDL, DCL and statements with a simple "OK"
SUCCESS_WITH_CONTENT, // rows with important content are available
(DML, DQL)
}


*3. new proposed methods in `Table`*

`Table.insertInto()` will be deprecated, and the following methods are
introduced:

Table.executeInsert(String tablePath): TableResult
Table.executeInsert(String tablePath, boolean overwrite): TableResult
Table.explain(ExplainDetail... details): String
Table.execute(): TableResult

There are two issues need further discussion, one is whether
`TableEnvironment.executeSql(String statement): TableResult` needs to
support multiline statement (or whether `TableEnvironment` needs to support
multiline statement), and another one is whether `StatementSet.execute()`
needs to throw exception.

please refer to the feedback document [2] for the details.

Any suggestions are warmly welcomed!

[1]
https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2]
https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit

Best,
Godfrey

Re: [DISCUSS] Features of Apache Flink 1.11

2020-03-11 Thread godfrey he

Hi Zhijiang and Piotr,

I think we can remove "FLIP-91 Introduce SQL client gateway and provide
JDBC driver" from the list, because we have decided the first step to
support sql gateway and jdbc driver as ecosystem in ververica, we are not
going to put more effort on it now.

Thanks for updating the list!

Bests,
Godfrey


Timo Walther  于2020年3月11日周三 下午4:13写道：

> Hi Zhijiang and Piotr,
>
> from the SQL side we also plan to rework the source and sink interfaces
> in 1.11. The FLIP is not yet published but already reserved and
> requirement for FLIP-105:
>
> FLIP-95: New TableSource and TableSink interfaces
>
> Thanks for compiling the list!
>
> Regards,
> Timo
>
>
> On 11.03.20 09:05, Hequn Cheng wrote:
> > Thanks Zhijiang and Piotr for kicking off the discussion and providing
> the
> > detailed list.
> > This would be very helpful for tracking the features.
> >
> > BTW, as for PyFlink, it would be great if the feature list can also
> include
> > the following features:
> > - FLIP-112: Support User-Defined Metrics in Python UDF
> > - FLIP-114: Support Python UDF in SQL Client
> >
> > Looking forward to the release!
> >
> > Best,
> > Hequn
> >
> >
> >
> > On Wed, Mar 11, 2020 at 1:02 PM Yu Li  wrote:
> >
> >> Thanks for compiling the list of 1.11 efforts Zhijiang and Piotr! This
> >> helps a lot to better understand what the community is currently working
> >> on. Looking forward to another successful release.
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Wed, 11 Mar 2020 at 11:17, Zhijiang  >> .invalid>
> >> wrote:
> >>
> >>> Hi community,
> >>>
> >>>
> >>> Not more than one month ago we have released Flink 1.10. We are now
> >>> heading for the Flink 1.11 release and we, as release managers, would
> >> like
> >>> to share with you what are the features that the community is currently
> >>> working on and we are hoping that will be part of the Flink 1.11
> release.
> >>> Currently we are aiming with the feature freeze to happen in late
> April.
> >>>
> >>> As for now, some of the features are in the very early stages of the
> >>> development or even brainstorming. Because of that, some of them do not
> >>> have associated JIRA tickets or FLIP documents. For the next progress
> >>> announcement we are hoping that this will be no longer the case.
> >>>
> >>> Please also note that because we are still relatively at the beginning
> of
> >>> the release cycle, some of the FLIPs haven’t yet been voted.
> >>>
> >>> - SQL / Table
> >>> - FLIP-42: Restructure documentation [1]
> >>> - FLIP-65: New type inference for Table API UDFs [2]
> >>> - FLIP-84: Improve TableEnv’s interface [3]
> >>> - FLIP-91 Introduce SQL client gateway and provide JDBC driver [4]
> >>> - FLIP-93: Introduce JDBC catalog and Postgres catalog [5]
> >>> - FLIP-105: Support to interpret and emit changelog in Flink SQL [6]
> >>> - FLIP-107: Reading table columns from different parts of source
> records
> >>> [7]
> >>> - [FLINK-14807] Add Table#collect API for fetching data [8]
> >>> - Support query and table hints
> >>> - ML / Connectors
> >>> - FLIP-27: New source API [9]
> >>> - [FLINK-15670] Wrap a source/sink pair to persist intermediate result
> >> for
> >>> subgraph failure recovery [10]
> >>> - Pulsar source / sink / catalog
> >>> - Update ML Pipeline API interface to better support Flink ML lib
> >>> algorithms
> >>> - PyFlink
> >>> - FLIP-58: Debugging and monitoring of Python UDF [11]
> >>> - FLIP-106: Expand the usage scope of Python UDF [12]
> >>> - Integration with most popular Python libraries (Pandas)
> >>> - Performance improvements of Python UDF
> >>> - Support running python UDF in docker workers
> >>> - Add Python ML API
> >>> - Fully support all kinds of Python UDF
> >>> - Web UI
> >>> - FLIP-98: Better back pressure detection [13]
> >>> - FLIP-99: Make max exception configurable [14]
> >>> - FLIP-100: Add attempt information [15]
> >>> - FLIP-102: Add more metrics to TaskManager [16]
> >>> - FLIP-103: Better TM/JM log display [17]
> >>> - [FLINK-14816] Add thread dump feature for TaskManager [18]
> >>> - Runtime
> >>> - FLIP-56: Support for dynamic slots on the TaskExecutor [19]
> >>> - FLIP-67: Support for cluster partitions [20]
> >>> - FLIP-76: Unaligned checkpoints [21]
> >>> - FLIP-83: Flink e2e performance testing framework [22]
> >>> - FLIP-85: Support cluster deploy mode [23]
> >>> - FLIP-92: Add N-Ary input stream operator in Flink [24]
> >>> - FLIP-108: Add GPU to the resource management (specifically for UDTF &
> >>> UDF) [25]
> >>> - FLIP-111: Consolidate docker images [26]
> >>> - Unified memory configuration for JobManager
> >>> - Specify upper bound for number of allocated TaskManagers
> >>> - [FLINK-9407] ORC format for StreamingFileSink [27]
> >>> - [FLINK-10742] Let Netty use Flink's buffers on downstream side [28]
> >>> - [FLINK-10934] Support per-job mode for Kubernetes integration [29]
> >>> - [FLINK-11395] Avro writer for StreamingFileSink [30]
> >>> - [FLINK-11427] Protobuf parquet writer for

[jira] [Created] (FLINK-16535) BatchTableSink#emitDataSet returns DataSink

2020-03-10 Thread godfrey he (Jira)

godfrey he created FLINK-16535:
--

 Summary: BatchTableSink#emitDataSet returns DataSink
 Key: FLINK-16535
 URL: https://issues.apache.org/jira/browse/FLINK-16535
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.11.0


Add return value for {{BatchTableSink#emitDataSet}} to support generating 
{{DataSet}} plan in {{BatchTableEnvironment}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16533) ExecutionEnvironment supports executing plan

2020-03-10 Thread godfrey he (Jira)

godfrey he created FLINK-16533:
--

 Summary: ExecutionEnvironment supports executing plan
 Key: FLINK-16533
 URL: https://issues.apache.org/jira/browse/FLINK-16533
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: godfrey he
 Fix For: 1.11.0


Currently, {{ExecutionEnvironment}} only supports executing the plan generated 
by self.
FLIP-84 proposes {{TableEnvironment}} can only trigger the table program and 
the {{StreamExecutionEnvironment}}/{{ExecutionEnvironment}} can only trigger 
{{DataStream}}/{{DataSet}} program. This requires that {{ExecutionEnvironment}} 
can execute the plan generated by {{TableEnvironment}}. We propose to add two 
methods in  {{ExecutionEnvironment}}: (which is similar to 
{{StreamExecutionEnvironment}}#execute(StreamGraph) and 
{{StreamExecutionEnvironment}}#executeAsync(StreamGraph))

{code:java}
@Internal
public JobExecutionResult execute(Plan plan) throws Exception {
.
}

@Internal
public JobClient executeAsync(Plan plan) throws Exception {
.
}
{code}








--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16519) CheckpointCoordinatorFailureTest logs LinkageErrors

2020-03-09 Thread godfrey he (Jira)

godfrey he created FLINK-16519:
--

 Summary: CheckpointCoordinatorFailureTest logs LinkageErrors
 Key: FLINK-16519
 URL: https://issues.apache.org/jira/browse/FLINK-16519
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Reporter: godfrey he
 Fix For: 1.11.0


This issue is in 
https://travis-ci.org/apache/flink/jobs/660152153?utm_medium=notification_source=slack

Log output

{code:java}
2020-03-09 15:52:14,550 main ERROR Could not reconfigure JMX 
java.lang.LinkageError: loader constraint violation: loader (instance of 
org/powermock/core/classloader/javassist/JavassistMockClassLoader) previously 
initiated loading for a different type with name "javax/management/MBeanServer"
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
at 
org.powermock.core.classloader.javassist.JavassistMockClassLoader.loadUnmockedClass(JavassistMockClassLoader.java:90)
at 
org.powermock.core.classloader.MockClassLoader.loadClassByThisClassLoader(MockClassLoader.java:104)
at 
org.powermock.core.classloader.DeferSupportingClassLoader.loadClass1(DeferSupportingClassLoader.java:147)
at 
org.powermock.core.classloader.DeferSupportingClassLoader.loadClass(DeferSupportingClassLoader.java:98)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at 
org.apache.logging.log4j.core.jmx.Server.unregisterAllMatching(Server.java:337)
at 
org.apache.logging.log4j.core.jmx.Server.unregisterLoggerContext(Server.java:261)
at 
org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:165)
at 
org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:141)
at 
org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:590)
at 
org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:651)
at 
org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:668)
at 
org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:253)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:138)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:45)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:48)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:30)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:329)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:349)
at org.apache.flink.util.TestLogger.(TestLogger.java:36)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinatorFailureTest.(CheckpointCoordinatorFailureTest.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.createTestInstance(PowerMockJUnit44RunnerDelegateImpl.java:197)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.createTest(PowerMockJUnit44RunnerDelegateImpl.java:182)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:204)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:160)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:134)
at 
org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:34)
at 
org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:44)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:136)
at 
org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:117)
at 
org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunn

Re: [DISCUSS] FLIP-114: Support Python UDF in SQL Client

2020-03-09 Thread godfrey he

Hi Wei, thanks for the proposal.

I think it's better to give two more examples, one is how to use python UDF
in SQL, another is how to start sql-client.sh with full python dependencies.

Best,
Godfrey

Wei Zhong  于2020年3月9日周一 下午10:09写道：

> Hi everyone,
>
> I would like to start discussion about how to support Python UDF in SQL
> Client.
>
> Flink Python UDF(FLIP-58[1]) has already been introduced in the release of
> 1.10.0 and the support for SQL DDL is introduced in FLIP-106[2].
>
> SQL Client defines UDF via the environment file and has its own CLI
> implementation to manage dependencies, but neither of which supports Python
> UDF. We want to introduce the support of Python UDF for SQL Client,
> including the registration and the dependency management of Python UDF.
>
> Here is the design doc:
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client
>
> Looking forward to your feedback!
>
> Best,
> Wei
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
>
>

[jira] [Created] (FLINK-16367) Introduce createDmlBatch method in TableEnvironment

2020-03-02 Thread godfrey he (Jira)

godfrey he created FLINK-16367:
--

 Summary: Introduce createDmlBatch method in TableEnvironment 
 Key: FLINK-16367
 URL: https://issues.apache.org/jira/browse/FLINK-16367
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


As we deprecates {{execute}} method and {{explain}} method because of buffering 
SQLs/Tables execution problem. This issue aims to introduce a new method named 
createDmlBatch to support executing and explaining the batching queries.

The method likes like:

{code:java}
interface TableEnvironment {

 /** 
  * Create a DmlBatch instance which can add dml statements or Tables to the 
batch,
  * the planner can optimize all added statements and Tables together for 
better performance.
  */
  DmlBatch createDmlBatch();
}

interface DmlBatch {

  /** 
* add insert statement to the batch.
*/
   void addInsert(String insert);

  /** 
   * add Table with the given sink table name to the batch. 
   */
   void addInsert(String targetPath, Table table);

  /** 
   * execute all statements and Tables as a batch.
   * 
   * The added statements and Tables will be cleared when  this method. 
   */
   ResultTable execute() throws Exception;
  
   /** 
* returns the AST and the execution plan to compute the result of the all 
statements and Tables.
* 
* @param extended if the plan should contain additional properties. e.g. 
estimated cost, traits
*/
String explain(boolean extended);

}
{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16366) Introduce executeStatement method in TableEnvironment

2020-03-02 Thread godfrey he (Jira)

godfrey he created FLINK-16366:
--

 Summary: Introduce executeStatement method in TableEnvironment
 Key: FLINK-16366
 URL: https://issues.apache.org/jira/browse/FLINK-16366
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


This issue aims to introduce {{executeStatement}} which synchronously executes 
the given single statement immediately, and returns the execution result.


{code:java}
/**
 * Synchronously execute the given single statement immediately and the 
statement can be DDL/DML/SHOW/DESCRIBE/EXPLAIN/USE. 
 * If the statement is translated to a Flink job, the result will be returned 
until the job is finished.
 *  
 * @return result for SHOW/DESCRIBE/EXPLAIN, the affected row count for `DML` 
(-1 means unknown), or a string message ("OK") for other  statements.
 * @throws Exception which occurs during the execution.
*/
ResultTable executeStatement(String statement) throws Exception;
{code}


{code:java}
/** 
 * A ResultTable is the representation of the statement execution result.
 */
public interface ResultTable {


  /** 
   * Get the schema of ResultTable. 
   */
TableSchema getResultSchema();


  /**
*Get the result contents as an iterable rows. 
*/
Iterable getResultRows();
}
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-03-01 Thread godfrey he

Thanks all for the votes.
So far, we have

   - 3 binding +1 votes (Kurt, Jark, Jingsong)
   - 2 non-binding +1 votes (Terry, Benchao)
   - No -1 votes

The voting time has past and there is enough +1 votes to consider the FLIP-84
approved.
Thank you all.


Best,
Godfrey

godfrey he  于2020年3月2日周一 下午3:32写道：

> Thanks Jingsong for the reminding. I will update it now.
>
> Jingsong Lee  于2020年3月2日周一 下午2:46写道：
>
>> Thanks for driving.
>>
>> +1 from my side.
>>
>> > For current messy Flink table program trigger point, we propose that:
>> for
>> TableEnvironment and StreamTableEnvironment, you must use
>> `TableEnvironment.execute()` to trigger table program execution.
>>
>> Looks like this is an incompatible change. You need update Compatibility
>> chapter? And should add it to 1.11 release note in future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Feb 28, 2020 at 10:10 PM Benchao Li  wrote:
>>
>> > +1 (non-binding)
>> >
>> > Jark Wu  于2020年2月28日周五 下午5:11写道：
>> >
>> > > +1 from my side.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 28 Feb 2020 at 15:07, kant kodali  wrote:
>> > >
>> > > > Nice!!
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > > On Feb 27, 2020, at 9:03 PM, godfrey he 
>> wrote:
>> > > > >
>> > > > > Hi kant, yes. We hope to deprecate the methods which confuse
>> users
>> > > ASAP.
>> > > > >
>> > > > > Bests,
>> > > > > godfrey
>> > > > >
>> > > > > kant kodali  于2020年2月28日周五 上午11:17写道：
>> > > > >
>> > > > >> Is this targeted towards Flink 1.11?
>> > > > >>
>> > > > >>> On Thu, Feb 27, 2020 at 6:32 PM Kurt Young 
>> > wrote:
>> > > > >>>
>> > > > >>> +1 (binding)
>> > > > >>>
>> > > > >>> Best,
>> > > > >>> Kurt
>> > > > >>>
>> > > > >>>
>> > > > >>>> On Fri, Feb 28, 2020 at 9:15 AM Terry Wang > >
>> > > > wrote:
>> > > > >>>
>> > > > >>>> I look through the whole design and it’s a big improvement of
>> > > > usability
>> > > > >>> on
>> > > > >>>> TableEnvironment’s api.
>> > > > >>>>
>> > > > >>>> +1 (non-binding)
>> > > > >>>>
>> > > > >>>> Best,
>> > > > >>>> Terry Wang
>> > > > >>>>
>> > > > >>>>
>> > > > >>>>
>> > > > >>>>> 2020年2月27日 14:59，godfrey he  写道：
>> > > > >>>>>
>> > > > >>>>> Hi everyone,
>> > > > >>>>>
>> > > > >>>>> I'd like to start the vote of FLIP-84[1], which proposes to
>> > > deprecate
>> > > > >>>> some
>> > > > >>>>> old APIs and introduce some new APIs in TableEnvironment. This
>> > FLIP
>> > > > >> is
>> > > > >>>>> discussed and reached consensus in the discussion thread[2].
>> > > > >>>>>
>> > > > >>>>> The vote will be open for at least 72 hours. Unless there is
>> an
>> > > > >>>> objection,
>> > > > >>>>> I will try to close it by Mar 1, 2020 07:00 UTC if we have
>> > received
>> > > > >>>>> sufficient votes.
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> [1]
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment
>> > > > >>>>>
>> > > > >>>>> [2]
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> Bests,
>> > > > >>>>> Godfrey
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>> > Benchao Li
>> > School of Electronics Engineering and Computer Science, Peking
>> University
>> > Tel:+86-15650713730
>> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
>> >
>>
>>
>> --
>> Best, Jingsong Lee
>>
>

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-03-01 Thread godfrey he

Thanks Jingsong for the reminding. I will update it now.

Jingsong Lee  于2020年3月2日周一 下午2:46写道：

> Thanks for driving.
>
> +1 from my side.
>
> > For current messy Flink table program trigger point, we propose that: for
> TableEnvironment and StreamTableEnvironment, you must use
> `TableEnvironment.execute()` to trigger table program execution.
>
> Looks like this is an incompatible change. You need update Compatibility
> chapter? And should add it to 1.11 release note in future.
>
> Best,
> Jingsong Lee
>
> On Fri, Feb 28, 2020 at 10:10 PM Benchao Li  wrote:
>
> > +1 (non-binding)
> >
> > Jark Wu  于2020年2月28日周五 下午5:11写道：
> >
> > > +1 from my side.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 28 Feb 2020 at 15:07, kant kodali  wrote:
> > >
> > > > Nice!!
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Feb 27, 2020, at 9:03 PM, godfrey he 
> wrote:
> > > > >
> > > > > Hi kant, yes. We hope to deprecate the methods which confuse users
> > > ASAP.
> > > > >
> > > > > Bests,
> > > > > godfrey
> > > > >
> > > > > kant kodali  于2020年2月28日周五 上午11:17写道：
> > > > >
> > > > >> Is this targeted towards Flink 1.11?
> > > > >>
> > > > >>> On Thu, Feb 27, 2020 at 6:32 PM Kurt Young 
> > wrote:
> > > > >>>
> > > > >>> +1 (binding)
> > > > >>>
> > > > >>> Best,
> > > > >>> Kurt
> > > > >>>
> > > > >>>
> > > > >>>> On Fri, Feb 28, 2020 at 9:15 AM Terry Wang 
> > > > wrote:
> > > > >>>
> > > > >>>> I look through the whole design and it’s a big improvement of
> > > > usability
> > > > >>> on
> > > > >>>> TableEnvironment’s api.
> > > > >>>>
> > > > >>>> +1 (non-binding)
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Terry Wang
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>> 2020年2月27日 14:59，godfrey he  写道：
> > > > >>>>>
> > > > >>>>> Hi everyone,
> > > > >>>>>
> > > > >>>>> I'd like to start the vote of FLIP-84[1], which proposes to
> > > deprecate
> > > > >>>> some
> > > > >>>>> old APIs and introduce some new APIs in TableEnvironment. This
> > FLIP
> > > > >> is
> > > > >>>>> discussed and reached consensus in the discussion thread[2].
> > > > >>>>>
> > > > >>>>> The vote will be open for at least 72 hours. Unless there is an
> > > > >>>> objection,
> > > > >>>>> I will try to close it by Mar 1, 2020 07:00 UTC if we have
> > received
> > > > >>>>> sufficient votes.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment
> > > > >>>>>
> > > > >>>>> [2]
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Bests,
> > > > >>>>> Godfrey
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>
>
> --
> Best, Jingsong Lee
>

[jira] [Created] (FLINK-16364) Deprecate the methods in TableEnvironment proposed by FLIP-84

2020-03-01 Thread godfrey he (Jira)

godfrey he created FLINK-16364:
--

 Summary: Deprecate the methods in TableEnvironment proposed by 
FLIP-84
 Key: FLINK-16364
 URL: https://issues.apache.org/jira/browse/FLINK-16364
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


In 
[FLIP-84|https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment],
 We propose to deprecate the following methods in TableEnvironment: 
{code:java}
void sqlUpdate(String sql)
void insertInto(String targetPath, Table table)
void execute(String jobName)
String explain(boolean extended)
Table fromTableSource(TableSource source)
{code}
This issue aims to deprecate them.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16363) Correct the execution behavior of TableEnvironment and StreamTableEnvironment

2020-03-01 Thread godfrey he (Jira)

godfrey he created FLINK-16363:
--

 Summary: Correct the execution behavior of TableEnvironment and 
StreamTableEnvironment
 Key: FLINK-16363
 URL: https://issues.apache.org/jira/browse/FLINK-16363
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


Both {{TableEnvironment.execute()}} and {{StreamExecutionEnvironment.execute}} 
can trigger a Flink table program execution. However if you use 
{{TableEnvironment}} to build a Flink table program, you must use 
{{TableEnvironment.execute()}} to trigger execution, because you can’t get the 
{{StreamExecutionEnvironment}} instance. If you use {{StreamTableEnvironment}} 
to build a Flink table program, you can use both to trigger execution. If you 
convert a table program to a {{DataStream}} program (using 
{{StreamExecutionEnvironment.toAppendStream/toRetractStream}}), you also can 
use both to trigger execution. So it’s hard to explain which `execute` method 
should be used.

To correct current messy trigger point, we propose that: for 
{{TableEnvironment}} and {{StreamTableEnvironment}}, you must use 
{{TableEnvironment.execute()}} to trigger table program execution, once you 
convert the table program to a {{DataStream}} program (through 
{{toAppendStream}} or {{toRetractStream}} method), you must use 
{{StreamExecutionEnvironment.execute}} to trigger the {{DataStream}} program.

please refer to 
[FLIP-84|https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment]
 for more detail.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16362) remove deprecated method in StreamTableSink

2020-03-01 Thread godfrey he (Jira)

godfrey he created FLINK-16362:
--

 Summary: remove deprecated method in StreamTableSink
 Key: FLINK-16362
 URL: https://issues.apache.org/jira/browse/FLINK-16362
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


[FLIP-84|https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment]
 proposes to unify the behavior of {{TableEnvironment}} and 
{{StreamTableEnvironment}}, and requires the {{StreamTableSink}} always returns 
{{DataStream}}. However
{{StreamTableSink.emitDataStream}} returns nothing and is deprecated since 
Flink 1.9, So we will remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16361) FLIP-84: Improve & Refactor API of TableEnvironment

2020-03-01 Thread godfrey he (Jira)

godfrey he created FLINK-16361:
--

 Summary: FLIP-84: Improve & Refactor API of TableEnvironment
 Key: FLINK-16361
 URL: https://issues.apache.org/jira/browse/FLINK-16361
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.11.0


as the 
[FLIP-84|https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment]
 document described, 

We propose to deprecate the following methods in TableEnvironment:
{code:java}
void sqlUpdate(String sql)
void insertInto(String targetPath, Table table)
void execute(String jobName)
String explain(boolean extended)
Table fromTableSource(TableSource source)
{code}

meanwhile, we propose to introduce the following new methods in 
TableEnvironment:

{code:java}
// synchronously execute the given single statement immediately, and return the 
execution result.
ResultTable executeStatement(String statement) 

public interface ResultTable {
TableSchema getResultSchema();
Iterable getResultRows();
}

// create a DmlBatch instance which can add dml statements or Tables to the 
batch and explain or execute them as a batch.
DmlBatch createDmlBatch()

interface DmlBatch {
void addInsert(String insert);
void addInsert(String targetPath, Table table);
ResultTable execute() throws Exception ;
String explain(boolean extended);
}
{code}


We unify the Flink table program trigger point, and propose that: for 
TableEnvironment and StreamTableEnvironment, you must use 
`TableEnvironment.execute()` to trigger table program execution, once you 
convert the table program to a DataStream program (through `toAppendStream` or 
`toRetractStream` method), you must use `StreamExecutionEnvironment.execute` to 
trigger the DataStream program.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-03-01 Thread godfrey he

Hi Benchao,

I think the document has contained both parts: the behavior is explained
when introducing `executeStatement` method, and asynchronous execution
methods is explained in the appendix.

Bests,
Godfrey

Benchao Li  于2020年2月28日周五 下午10:09写道：

> Hi godfrey,
>
> Thanks for your explanation.
>
> Do we need to clarify this in the FLIP? Maybe this confuses other users as
> well.
>
> godfrey he  于2020年2月28日周五 下午4:54写道：
>
> > Hi Benchao,
> >
> > > I have one question about this FLIP:
> > > executeStatement  accepts DML, what if it's a streaming DML ?
> > >does it submit the job to cluster directly and blocks forever?
> what's
> > > the behavior for the next statements?
> > `executeStatement` is a synchronous method, will execute the statement
> once
> > calling this method and return the result until the job is finished.
> > We will introduce asynchronous method like `executeStatementAsync` in the
> > future.
> >
> > > nit: there's a typo in "the table describing the result for each kind
> of
> > > statement", "*Result Scheam" -> "Result Schema"*
> > Thanks for the reminding, I will fix it now.
> >
> > Bests,
> > Godfrey
> >
> > Benchao Li  于2020年2月28日周五 下午4:00写道：
> >
> > > Hi Terry,
> > >
> > > Thanks for the propose, and sorry for joining the party late.
> > >
> > > I have one question about this FLIP:
> > > executeStatement  accepts DML, what if it's a streaming DML ?
> > > does it submit the job to cluster directly and blocks forever?
> what's
> > > the behavior for the next statements?
> > >
> > > nit: there's a typo in "the table describing the result for each kind
> of
> > > statement", "*Result Scheam" -> "Result Schema"*
> > >
> > >
> > > godfrey he  于2020年2月18日周二 下午4:41写道：
> > >
> > > > Thanks Kurt and Jark for explanation, I now also think we should make
> > the
> > > > TableEnvironment interface more statable and should not change
> > "sqlQuery"
> > > > method and "from" method.
> > > >
> > > > Hi Jingsong. Regarding to the "DmlBatch", I totally agree with
> > advantages
> > > > of "addBatch" method. However, there are two more questions need to
> > > solve:
> > > > one is how users write multi-sink programs in a Table API ? and
> another
> > > is
> > > > how users explain multi-sink program in both SQL and Table API ?
> > > > Currently, "DmlBatch" class can solve those questions. (the main
> > > > disadvantages is Inconsistent with the current interface)
> > > >
> > > > Bests,
> > > > godfrey
> > > >
> > > > Jingsong Li  于2020年2月15日周六 下午9:09写道：
> > > >
> > > > > Hi Kurt and Godfrey,
> > > > >
> > > > > Thank you for your explanation.
> > > > >
> > > > > Regarding to the "DmlBatch",
> > > > > I see there are some description for JDBC Statement.addBatch in the
> > > > > document.
> > > > > What do you think about introducing "addBatch" to the TableEnv
> > instead
> > > of
> > > > > introducing a new class?
> > > > > The advantage is:
> > > > > - Consistent with JDBC statement.
> > > > > - Consistent with current interface, what we need do is just modify
> > > > method
> > > > > name.
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > >
> > > > > On Sat, Feb 15, 2020 at 4:48 PM Kurt Young 
> wrote:
> > > > >
> > > > > > I don't think we should change `from` to `fromCatalog`,
> especially
> > > > `from`
> > > > > > is just
> > > > > > introduced in 1.10. I agree with Jark we should change interface
> > only
> > > > > when
> > > > > > necessary,
> > > > > > e.g. the semantic is broken or confusing. So I'm +1 to keep
> > > `sqlQuery`
> > > > as
> > > > > > it is.
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Sat, Feb 15, 2020 at 3:59 PM Jark Wu 
> wrote:
> > &g

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-28 Thread godfrey he

Hi Benchao,

> I have one question about this FLIP:
> executeStatement  accepts DML, what if it's a streaming DML ?
>does it submit the job to cluster directly and blocks forever? what's
> the behavior for the next statements?
`executeStatement` is a synchronous method, will execute the statement once
calling this method and return the result until the job is finished.
We will introduce asynchronous method like `executeStatementAsync` in the
future.

> nit: there's a typo in "the table describing the result for each kind of
> statement", "*Result Scheam" -> "Result Schema"*
Thanks for the reminding, I will fix it now.

Bests,
Godfrey

Benchao Li  于2020年2月28日周五 下午4:00写道：

> Hi Terry,
>
> Thanks for the propose, and sorry for joining the party late.
>
> I have one question about this FLIP:
> executeStatement  accepts DML, what if it's a streaming DML ?
> does it submit the job to cluster directly and blocks forever? what's
> the behavior for the next statements?
>
> nit: there's a typo in "the table describing the result for each kind of
> statement", "*Result Scheam" -> "Result Schema"*
>
>
> godfrey he  于2020年2月18日周二 下午4:41写道：
>
> > Thanks Kurt and Jark for explanation, I now also think we should make the
> > TableEnvironment interface more statable and should not change "sqlQuery"
> > method and "from" method.
> >
> > Hi Jingsong. Regarding to the "DmlBatch", I totally agree with advantages
> > of "addBatch" method. However, there are two more questions need to
> solve:
> > one is how users write multi-sink programs in a Table API ? and another
> is
> > how users explain multi-sink program in both SQL and Table API ?
> > Currently, "DmlBatch" class can solve those questions. (the main
> > disadvantages is Inconsistent with the current interface)
> >
> > Bests,
> > godfrey
> >
> > Jingsong Li  于2020年2月15日周六 下午9:09写道：
> >
> > > Hi Kurt and Godfrey,
> > >
> > > Thank you for your explanation.
> > >
> > > Regarding to the "DmlBatch",
> > > I see there are some description for JDBC Statement.addBatch in the
> > > document.
> > > What do you think about introducing "addBatch" to the TableEnv instead
> of
> > > introducing a new class?
> > > The advantage is:
> > > - Consistent with JDBC statement.
> > > - Consistent with current interface, what we need do is just modify
> > method
> > > name.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > On Sat, Feb 15, 2020 at 4:48 PM Kurt Young  wrote:
> > >
> > > > I don't think we should change `from` to `fromCatalog`, especially
> > `from`
> > > > is just
> > > > introduced in 1.10. I agree with Jark we should change interface only
> > > when
> > > > necessary,
> > > > e.g. the semantic is broken or confusing. So I'm +1 to keep
> `sqlQuery`
> > as
> > > > it is.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Sat, Feb 15, 2020 at 3:59 PM Jark Wu  wrote:
> > > >
> > > > > Thanks Kurt and Godfrey for the explanation,
> > > > >
> > > > > It makes sense to me that renaming `from(tableName)` to
> > > > > `fromCatalog(tableName)`.
> > > > > However, I still think `sqlQuery(query)` is clear and works well.
> Is
> > it
> > > > > necessary to change it?
> > > > >
> > > > > We removed `sql(query)` and introduced `sqlQuery(query)`, we
> removed
> > > > > `scan(tableName)` and introduced `from(tableName)`,
> > > > > and now we want to remove them again. Users will feel like the
> > > interface
> > > > is
> > > > > very unstable, that really frustrates users.
> > > > > I think we should be cautious to remove interface and only when it
> is
> > > > > necessary.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > >
> > > > > On Thu, 13 Feb 2020 at 20:58, godfrey he 
> > wrote:
> > > > >
> > > > > > hi kurt，jark，jingsong
> > > > > >
> > > > > > Regarding to "fromQuery", I agree with kurt. In addition, I think
> > > > `Table
> > > > > > fro

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-02-27 Thread godfrey he

Hi kant, yes. We hope to deprecate the methods which confuse users ASAP.

Bests,
godfrey

kant kodali  于2020年2月28日周五 上午11:17写道：

> Is this targeted towards Flink 1.11?
>
> On Thu, Feb 27, 2020 at 6:32 PM Kurt Young  wrote:
>
> >  +1 (binding)
> >
> > Best,
> > Kurt
> >
> >
> > On Fri, Feb 28, 2020 at 9:15 AM Terry Wang  wrote:
> >
> > > I look through the whole design and it’s a big improvement of usability
> > on
> > > TableEnvironment’s api.
> > >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > > > 2020年2月27日 14:59，godfrey he  写道：
> > > >
> > > > Hi everyone,
> > > >
> > > > I'd like to start the vote of FLIP-84[1], which proposes to deprecate
> > > some
> > > > old APIs and introduce some new APIs in TableEnvironment. This FLIP
> is
> > > > discussed and reached consensus in the discussion thread[2].
> > > >
> > > > The vote will be open for at least 72 hours. Unless there is an
> > > objection,
> > > > I will try to close it by Mar 1, 2020 07:00 UTC if we have received
> > > > sufficient votes.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment
> > > >
> > > > [2]
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html
> > > >
> > > >
> > > > Bests,
> > > > Godfrey
> > >
> > >
> >
>

[jira] [Created] (FLINK-16322) wrong result after filter push down in parquet table source

2020-02-27 Thread godfrey he (Jira)

godfrey he created FLINK-16322:
--

 Summary: wrong result after filter push down in parquet table 
source
 Key: FLINK-16322
 URL: https://issues.apache.org/jira/browse/FLINK-16322
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: godfrey he
 Attachments: parquet-1-1.parquet

I get the wrong result when run the following query:

source schema:
first VARCHAR
id INT
score DOUBLE 
last VARCHAR

data: (parquet file is in the attachment)
("Mike", 1, 12.3d, "Smith"),
("Bob", 2, 45.6d, "Taylor"),
("Sam", 3, 7.89d, "Miller"),
("Peter", 4, 0.12d, "Smith"),
("Liz", 5, 34.5d, "Williams"),
("Sally", 6, 6.78d, "Miller"),
("Alice", 7, 90.1d, "Smith"),
("Kelly", 8, 2.34d, "Williams")

query:
SELECT id, `first`, `last`, score FROM ParquetTable WHERE score < 3

the expected result size is 2, however the actual result size is 0. 








--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16315) throw JsonMappingException when using BatchTableEnvironment#explain to get the plan of sql with constant string

2020-02-27 Thread godfrey he (Jira)

godfrey he created FLINK-16315:
--

 Summary: throw JsonMappingException when using 
BatchTableEnvironment#explain to get the plan of sql with constant string  
 Key: FLINK-16315
 URL: https://issues.apache.org/jira/browse/FLINK-16315
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Legacy Planner
Reporter: godfrey he



{code:java}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);

tEnv.registerTableSource("MyTable", CommonTestData.getCsvTableSource());

Table table = tEnv.sqlQuery("select * from MyTable where first = '274' ");

System.out.println(tEnv.explain(table));
{code}

when executing the above code, the following exception will occur.

{panel:title=exception}
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException:
 Unexpected character ('U' (code 85)): was expecting comma to separate Object 
entries
 at [Source: (String)"{
"nodes": [

{
"id": 2,
"type": "source",
"pact": "Data Source",
"contents": "CsvTableSource(read fields: first, id, score, 
last)",
"parallelism": "8",
"global_properties": [
{ "name": "Partitioning", "value": "RANDOM_PARTITIONED" 
},
{ "name": "Partitioning Order", "value": "(none)" },
{ "name": "Uniqueness", "value": "not unique" }
],
"local_properties": [
{ "name": "Order", "value": "(none)" },
{ "name": "Grouping", "value": "not grouped" },
{ "name": "Uniq"[truncated 3501 chars]; line: 41, 
column: 15] (through reference chain: 
org.apache.flink.table.explain.PlanTree["nodes"]->java.util.ArrayList[1])

at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:394)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:365)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:302)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:245)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:27)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4202)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3205)
at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3173)
at 
org.apache.flink.table.explain.PlanJsonParser.getSqlExecutionPlan(PlanJsonParser.java:42)
at 
org.apache.flink.table.api.internal.BatchTableEnvImpl.explain(BatchTableEnvImpl.scala:208)
at 
org.apache.flink.table.api.internal.BatchTableEnvImpl.explain(BatchTableEnvImpl.scala:223)
{panel}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-02-26 Thread godfrey he

Hi everyone,

I'd like to start the vote of FLIP-84[1], which proposes to deprecate some
old APIs and introduce some new APIs in TableEnvironment. This FLIP is
discussed and reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Mar 1, 2020 07:00 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html


Bests,
Godfrey

[jira] [Created] (FLINK-16195) append constant field to unique key set on project when deriving unique keys in FlinkRelMdUniqueKeys

2020-02-20 Thread godfrey he (Jira)

godfrey he created FLINK-16195:
--

 Summary: append constant field to unique key set on project when 
deriving unique keys in FlinkRelMdUniqueKeys
 Key: FLINK-16195
 URL: https://issues.apache.org/jira/browse/FLINK-16195
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he


currently, `FlinkRelMdUniqueKeys` only supports deriving unique keys on 
non-constant fields. such as: `select a, b, 1, count(*) from T group by a, b`, 
currently the derived unique keys is `a, b`. However `a, b, 1` is also a unique 
key, and the result is `a, b` and `a, b, 1`.
note: Ideally, the planner does not require the constant key in the unique key 
set, all constant values are pulled up or removed as much as possible. Supports 
this improvement to handle some corner cases in cbo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread godfrey he

Congrats Jingsong! Well deserved.

Best,
godfrey

Jeff Zhang  于2020年2月21日周五 上午11:49写道：

> Congratulations！Jingsong. You deserve it
>
> wenlong.lwl  于2020年2月21日周五 上午11:43写道：
>
>> Congrats Jingsong!
>>
>> On Fri, 21 Feb 2020 at 11:41, Dian Fu  wrote:
>>
>> > Congrats Jingsong!
>> >
>> > > 在 2020年2月21日，上午11:39，Jark Wu  写道：
>> > >
>> > > Congratulations Jingsong! Well deserved.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >
>> > >> Congratulations! Jingsong
>> > >>
>> > >>
>> > >> Best,
>> > >> Dan Zou
>> > >>
>> >
>> >
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-18 Thread godfrey he

Thanks Kurt and Jark for explanation, I now also think we should make the
TableEnvironment interface more statable and should not change "sqlQuery"
method and "from" method.

Hi Jingsong. Regarding to the "DmlBatch", I totally agree with advantages
of "addBatch" method. However, there are two more questions need to solve:
one is how users write multi-sink programs in a Table API ? and another is
how users explain multi-sink program in both SQL and Table API ?
Currently, "DmlBatch" class can solve those questions. (the main
disadvantages is Inconsistent with the current interface)

Bests,
godfrey

Jingsong Li  于2020年2月15日周六 下午9:09写道：

> Hi Kurt and Godfrey,
>
> Thank you for your explanation.
>
> Regarding to the "DmlBatch",
> I see there are some description for JDBC Statement.addBatch in the
> document.
> What do you think about introducing "addBatch" to the TableEnv instead of
> introducing a new class?
> The advantage is:
> - Consistent with JDBC statement.
> - Consistent with current interface, what we need do is just modify method
> name.
>
> Best,
> Jingsong Lee
>
>
> On Sat, Feb 15, 2020 at 4:48 PM Kurt Young  wrote:
>
> > I don't think we should change `from` to `fromCatalog`, especially `from`
> > is just
> > introduced in 1.10. I agree with Jark we should change interface only
> when
> > necessary,
> > e.g. the semantic is broken or confusing. So I'm +1 to keep `sqlQuery` as
> > it is.
> >
> > Best,
> > Kurt
> >
> >
> > On Sat, Feb 15, 2020 at 3:59 PM Jark Wu  wrote:
> >
> > > Thanks Kurt and Godfrey for the explanation,
> > >
> > > It makes sense to me that renaming `from(tableName)` to
> > > `fromCatalog(tableName)`.
> > > However, I still think `sqlQuery(query)` is clear and works well. Is it
> > > necessary to change it?
> > >
> > > We removed `sql(query)` and introduced `sqlQuery(query)`, we removed
> > > `scan(tableName)` and introduced `from(tableName)`,
> > > and now we want to remove them again. Users will feel like the
> interface
> > is
> > > very unstable, that really frustrates users.
> > > I think we should be cautious to remove interface and only when it is
> > > necessary.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Thu, 13 Feb 2020 at 20:58, godfrey he  wrote:
> > >
> > > > hi kurt，jark，jingsong
> > > >
> > > > Regarding to "fromQuery", I agree with kurt. In addition, I think
> > `Table
> > > > from(String tableName)` should be renamed to `Table
> fromCatalog(String
> > > > tableName)`.
> > > >
> > > > Regarding to the "DmlBatch", DML contains "INSERT", "UPDATE",
> "DELETE",
> > > and
> > > > they can be executed in a same batch in the future. So we can add
> > > > "addUpdate" method and "addDelete" method to support them.
> > > >
> > > > Regarding to the "Inserts addInsert", maybe we can add a
> > > "DmlBatchBuilder".
> > > >
> > > > open to more discussion
> > > >
> > > > Best,
> > > > godfrey
> > > >
> > > >
> > > >
> > > > Kurt Young  于2020年2月13日周四 下午4:56写道：
> > > >
> > > > > Regarding to "fromQuery" is confusing users with "Table from(String
> > > > > tableName)", I have
> > > > > a just opposite opinion. I think this "fromXXX" pattern can make
> > users
> > > > > quite clear when they
> > > > > want to get a Table from TableEnvironment. Similar interfaces will
> > also
> > > > > include like "fromElements".
> > > > >
> > > > > Regarding to the name of DmlBatch, I think it's mainly for
> > > > > future flexibility, in case we can support
> > > > > other statement in a single batch. If that happens, the name
> > "Inserts"
> > > > will
> > > > > be weird.
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Thu, Feb 13, 2020 at 4:03 PM Jark Wu  wrote:
> > > > >
> > > > > > I agree with Jingsong.
> > > > > >
> > > > > > +1 to keep `sqlQuery`, it's clear from the method name and

[jira] [Created] (FLINK-16110) LogicalTypeParser can't parse "TIMESTAMP(3) ROWTIME" and TIMESTAMP(3) PROCTIME

2020-02-16 Thread godfrey he (Jira)

godfrey he created FLINK-16110:
--

 Summary: LogicalTypeParser can't parse "TIMESTAMP(3) *ROWTIME*" 
and TIMESTAMP(3) *PROCTIME*
 Key: FLINK-16110
 URL: https://issues.apache.org/jira/browse/FLINK-16110
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.10.0
Reporter: godfrey he


 {{TIMESTAMP(3) *ROWTIME*}} is the string representation of 
{{TimestampType(true, TimestampKind.ROWTIME, 3)}} , however 
{{LogicalTypeParser}} can't convert it to  {{TimestampType(true, 
TimestampKind.ROWTIME, 3)}}. 
TIMESTAMP(3) *PROCTIME* is the same case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-13 Thread godfrey he

hi kurt，jark，jingsong

Regarding to "fromQuery", I agree with kurt. In addition, I think `Table
from(String tableName)` should be renamed to `Table fromCatalog(String
tableName)`.

Regarding to the "DmlBatch", DML contains "INSERT", "UPDATE", "DELETE", and
they can be executed in a same batch in the future. So we can add
"addUpdate" method and "addDelete" method to support them.

Regarding to the "Inserts addInsert", maybe we can add a "DmlBatchBuilder".

open to more discussion

Best,
godfrey



Kurt Young  于2020年2月13日周四 下午4:56写道：

> Regarding to "fromQuery" is confusing users with "Table from(String
> tableName)", I have
> a just opposite opinion. I think this "fromXXX" pattern can make users
> quite clear when they
> want to get a Table from TableEnvironment. Similar interfaces will also
> include like "fromElements".
>
> Regarding to the name of DmlBatch, I think it's mainly for
> future flexibility, in case we can support
> other statement in a single batch. If that happens, the name "Inserts" will
> be weird.
>
> Best,
> Kurt
>
>
> On Thu, Feb 13, 2020 at 4:03 PM Jark Wu  wrote:
>
> > I agree with Jingsong.
> >
> > +1 to keep `sqlQuery`, it's clear from the method name and return type
> that
> > it accepts a SELECT query and returns a logic representation `Table`.
> > The `fromQuery` is a little confused users with the `Table from(String
> > tableName)` method.
> >
> > Regarding to the `DmlBatch`, I agree with Jingsong, AFAIK, the purpose of
> > `DmlBatch` is used to batching insert statements.
> > Besides, DML terminology is not commonly know among users. So what about
> > `InsertsBatching startBatchingInserts()` ?
> >
> > Best,
> > Jark
> >
> > On Thu, 13 Feb 2020 at 15:50, Jingsong Li 
> wrote:
> >
> > > Hi Godfrey,
> > >
> > > Thanks for updating. +1 sketchy.
> > >
> > > I have no idea to change "sqlQuery" to "fromQuery", I think "sqlQuery"
> is
> > > OK, It's not that confusing with return values.
> > >
> > > Can we change the "DmlBatch" to "Inserts"?  I don't see any other
> needs.
> > > "Dml" seems a little weird.
> > > It is better to support "Inserts addInsert" too. Users can
> > > "inserts.addInsert().addInsert()"
> > >
> > > I try to match the new interfaces with the old interfaces simply.
> > > - "startInserts -> addInsert" replace old "sqlUpdate(insert)" and
> > > "insertInto".
> > > - "executeStatement" new one, execute all kinds of sqls immediately.
> > > Including old "sqlUpdate(DDLs)".
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Wed, Feb 12, 2020 at 11:10 AM godfreyhe 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'd like to resume the discussion for FlIP-84 [0]. I had updated the
> > > > document, the mainly changes are:
> > > >
> > > > 1. about "`void sqlUpdate(String sql)`" section
> > > >   a) change "Optional executeSql(String sql) throws
> > > Exception"
> > > > to "ResultTable executeStatement(String statement, String jobName)
> > throws
> > > > Exception". The reason is: "statement" is a more general concept than
> > > > "sql",
> > > > e.g. "show xx" is not a sql command (refer to [1]), but is a
> statement
> > > > (just
> > > > like JDBC). "insert" statement also has return value which is the
> > > affected
> > > > row count, we can unify the return type to "ResultTable" instead of
> > > > "Optional".
> > > >   b) add two sub-interfaces for "ResultTable": "RowResultTable" is
> used
> > > for
> > > > non-streaming select statement and will not contain change flag;
> > > > "RowWithChangeFlagResultTable" is used for streaming select statement
> > and
> > > > will contain change flag.
> > > >
> > > > 2) about "Support batch sql execute and explain" section
> > > > introduce "DmlBatch" to support both sql and Table API (which is
> > borrowed
> > > > from the ideas Dawid mentioned in the slack)
> > > >
> > > > interface TableEnvironment {
> > > > DmlBatch startDmlBatch();
> > > > }
> > > >
> > > > interface DmlBatch {
> > > >   /**
> > > >   * add insert statement to the batch
> > > >   */
> > > > void addInsert(String insert);
> > > >
> > > >  /**
> > > >   * add Table with given sink name to the batch
> > > >   */
> > > > void addInsert(String sinkName, Table table);
> > > >
> > > >  /**
> > > >   * execute the dml statements as a batch
> > > >   */
> > > >   ResultTable execute(String jobName) throws Exception
> > > >
> > > >   /**
> > > >  * Returns the AST and the execution plan to compute the result of
> the
> > > > batch
> > > > dml statement.
> > > >   */
> > > >   String explain(boolean extended);
> > > > }
> > > >
> > > > 3) about "Discuss a parse method for multiple statements execute in
> SQL
> > > > CLI"
> > > > section
> > > > add the pros and cons for each solution
> > > >
> > > > 4) update the "Examples" section and "Summary" section based on the
> > above
> > > > changes
> > > >
> > > > Please refer the design doc[1] for more details and welcome any
> > feedback.
> > > >
> > > > Bests,
> > > > godfreyhe
> > > >
> > > >
> > > > [0]
>

Re: [DISCUSS] FLIP-91 - Support SQL Client Gateway

2020-02-05 Thread godfrey he

nd never finished). Would having a
> > dedicated gateway component mean that we can simplify the client and make
> > it a simple "shell around the table environment"? I think that would be
> > good, it would make it much easier to have new Table API features
> available
> > in the SQL client.
> >
> > (2) Have you considered making this a standalone project? This seems like
> > unit of functionality that would be useful to have separately, and it
> would
> > have a few advantages:
> >
> >- Flink codebase is already very large and hard to maintain
> >- A separate project is simpler to develop, not limited by Flink
> > committer reviews
> >- Quicker independent releases when new features are added.
> >
> > I see other projects successfully putting ecosystem tools into separate
> > projects, like Livy for Spark.
> > Should we do the same here?
> >
> > Best,
> > Stephan
> >
> >
> > On Fri, Jan 17, 2020 at 1:48 PM godfrey he  wrote:
> >
> >> Hi devs,
> >>
> >> I've updated the FLIP-91 [0] according to feedbacks. Please take another
> >> look.
> >>
> >> Best,
> >> godfrey
> >>
> >> [0]
> >>
> >>
> https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/
> >> <
> >>
> https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/edit#heading=h.cje99dt78an2
> >>>
> >>
> >> Kurt Young  于2020年1月9日周四 下午4:21写道：
> >>
> >>> Hi,
> >>>
> >>> +1 to the general idea. Supporting sql client gateway mode will bridge
> >> the
> >>> connection
> >>> between Flink SQL and production environment. Also the JDBC driver is a
> >>> quite good
> >>> supplement for usability of Flink SQL, users will have more choices to
> >> try
> >>> out Flink SQL
> >>> such as Tableau.
> >>>
> >>> I went through the document and left some comments there.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Sun, Jan 5, 2020 at 1:57 PM tison  wrote:
> >>>
> >>>> The general idea sounds great. I'm going to keep up with the progress
> >>> soon.
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>>
> >>>> Bowen Li  于2020年1月5日周日 下午12:59写道：
> >>>>
> >>>>> +1. It will improve user experience quite a bit.
> >>>>>
> >>>>>
> >>>>> On Thu, Jan 2, 2020 at 22:07 Yangze Guo  wrote:
> >>>>>
> >>>>>> Thanks for driving this, Xiaoling!
> >>>>>>
> >>>>>> +1 for supporting SQL client gateway.
> >>>>>>
> >>>>>> Best,
> >>>>>> Yangze Guo
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jan 2, 2020 at 9:58 AM 贺小令  wrote:
> >>>>>>>
> >>>>>>> Hey everyone,
> >>>>>>> FLIP-24
> >>>>>>> <
> >>>>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >>>>
> >>>>>>> proposes the whole conception and architecture of SQL Client. The
> >>>>>> embedded
> >>>>>>> mode is already supported since release-1.5, which is helpful for
> >>>>>>> debugging/demo purposes.
> >>>>>>> Many users ask that how to submit a Flink job to online
> >> environment
> >>>>>> without
> >>>>>>> programming on Flink API. To solve this, we create FLIP-91 [0]
> >>> which
> >>>>>>> supports sql client gateway mode, then users can submit a job
> >>> through
> >>>>> CLI
> >>>>>>> client, REST API or JDBC.
> >>>>>>>
> >>>>>>> I'm glad that you can give me more feedback about FLIP-91.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> godfreyhe
> >>>>>>>
> >>>>>>> [0]
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

[jira] [Created] (FLINK-15669) SQL client can't cancel flink job

2020-01-19 Thread godfrey he (Jira)

godfrey he created FLINK-15669:
--

 Summary: SQL client can't cancel flink job
 Key: FLINK-15669
 URL: https://issues.apache.org/jira/browse/FLINK-15669
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.10.0
Reporter: godfrey he
 Fix For: 1.10.0


in sql client, CLI client do cancel query through {{void cancelQuery(String 
sessionId, String resultId)}} method in {{Executor}}. However, the {{resultId}} 
is a random UUID, is not the job id. So CLI client can't cancel a running job.


{code:java}
private  ResultDescriptor executeQueryInternal(String sessionId, 
ExecutionContext context, String query) {
..

// store the result with a unique id
final String resultId = UUID.randomUUID().toString();
resultStore.storeResult(resultId, result);

   ..

// create execution
final ProgramDeployer deployer = new ProgramDeployer(
configuration, jobName, pipeline);

// start result retrieval
result.startRetrieval(deployer);

return new ResultDescriptor(
resultId,
removeTimeAttributes(table.getSchema()),
result.isMaterialized());
}

private  void cancelQueryInternal(ExecutionContext context, String 
resultId) {
..

// stop Flink job
try (final ClusterDescriptor clusterDescriptor = 
context.createClusterDescriptor()) {
ClusterClient clusterClient = null;
try {
// retrieve existing cluster
clusterClient = 
clusterDescriptor.retrieve(context.getClusterId()).getClusterClient();
try {
clusterClient.cancel(new 
JobID(StringUtils.hexStringToByte(resultId))).get();
} catch (Throwable t) {
// the job might has finished earlier
}
} catch (Exception e) {
throw new SqlExecutionException("Could not 
retrieve or create a cluster.", e);
} finally {
try {
if (clusterClient != null) {
clusterClient.close();
}
} catch (Exception e) {
// ignore
}
}
} catch (SqlExecutionException e) {
throw e;
} catch (Exception e) {
throw new SqlExecutionException("Could not locate a 
cluster.", e);
}
}
{code}







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [jira] [Created] (FLINK-15644) Add support for SQL query validation

2020-01-18 Thread godfrey he

hi Flavio, TableEnvironment.getCompletionHints maybe already meet the
requirement.

Flavio Pompermaier  于2020年1月18日周六 下午3:39写道：

> Why not adding also a suggest() method (also unimplemented initially) that
> would return the list of suitable completions/tokens on the current query?
> How complex eould it be to implement it in you opinion?
>
> Il Ven 17 Gen 2020, 18:32 Fabian Hueske (Jira)  ha
> scritto:
>
> > Fabian Hueske created FLINK-15644:
> > -
> >
> >  Summary: Add support for SQL query validation
> >  Key: FLINK-15644
> >  URL: https://issues.apache.org/jira/browse/FLINK-15644
> >  Project: Flink
> >   Issue Type: New Feature
> >   Components: Table SQL / API
> > Reporter: Fabian Hueske
> >
> >
> > It would be good if the {{TableEnvironment}} would offer methods to check
> > the validity of SQL queries. Such a method could be used by services (CLI
> > query shells, notebooks, SQL UIs) that are backed by Flink and execute
> > their queries on Flink.
> >
> > Validation should be available in two levels:
> >  # Validation of syntax and semantics: This includes parsing the query,
> > checking the catalog for dbs, tables, fields, type checks for expressions
> > and functions, etc. This will check if the query is a valid SQL query.
> >  # Validation that query is supported: Checks if Flink can execute the
> > given query. Some syntactically and semantically valid SQL queries are
> not
> > supported, esp. in a streaming context. This requires running the
> > optimizer. If the optimizer generates an execution plan, the query can be
> > executed. This check includes the first step and is more expensive.
> >
> > The reason for this separation is that the first check can be done much
> > fast as it does not involve calling the optimizer. Hence, it would be
> > suitable for fast checks in an interactive query editor. The second check
> > might take more time (depending on the complexity of the query) and might
> > not be suitable for rapid checks but only on explicit user request.
> >
> > Requirements:
> >  * validation does not modify the state of the {{TableEnvironment}}, i.e.
> > it does not add plan operators
> >  * validation does not require connector dependencies
> >  * validation can identify the update mode of a continuous query result
> > (append-only, upsert, retraction).
> >
> > Out of scope for this issue:
> >  * better error messages for unsupported features as suggested by
> > FLINK-7217
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
>

Re: [DISCUSS] FLIP-91 - Support SQL Client Gateway

2020-01-17 Thread godfrey he

Hi devs,

I've updated the FLIP-91 [0] according to feedbacks. Please take another
look.

Best,
godfrey

[0]
https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/


Kurt Young  于2020年1月9日周四 下午4:21写道：

> Hi,
>
> +1 to the general idea. Supporting sql client gateway mode will bridge the
> connection
> between Flink SQL and production environment. Also the JDBC driver is a
> quite good
> supplement for usability of Flink SQL, users will have more choices to try
> out Flink SQL
> such as Tableau.
>
> I went through the document and left some comments there.
>
> Best,
> Kurt
>
>
> On Sun, Jan 5, 2020 at 1:57 PM tison  wrote:
>
> > The general idea sounds great. I'm going to keep up with the progress
> soon.
> >
> > Best,
> > tison.
> >
> >
> > Bowen Li  于2020年1月5日周日 下午12:59写道：
> >
> > > +1. It will improve user experience quite a bit.
> > >
> > >
> > > On Thu, Jan 2, 2020 at 22:07 Yangze Guo  wrote:
> > >
> > > > Thanks for driving this, Xiaoling!
> > > >
> > > > +1 for supporting SQL client gateway.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > >
> > > > On Thu, Jan 2, 2020 at 9:58 AM 贺小令  wrote:
> > > > >
> > > > > Hey everyone,
> > > > > FLIP-24
> > > > > <
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >
> > > > > proposes the whole conception and architecture of SQL Client. The
> > > > embedded
> > > > > mode is already supported since release-1.5, which is helpful for
> > > > > debugging/demo purposes.
> > > > > Many users ask that how to submit a Flink job to online environment
> > > > without
> > > > > programming on Flink API. To solve this, we create FLIP-91 [0]
> which
> > > > > supports sql client gateway mode, then users can submit a job
> through
> > > CLI
> > > > > client, REST API or JDBC.
> > > > >
> > > > > I'm glad that you can give me more feedback about FLIP-91.
> > > > >
> > > > > Best,
> > > > > godfreyhe
> > > > >
> > > > > [0]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > >
> > >
> >
>

Re: What is the suggested way to validate SQL?

2020-01-08 Thread godfrey he

hi kaibo,
As we discuss offline, I think it's a clean way that flink-table provides
an interface (or a tool) to do the sql validation for platform users.
`tEnv.sqlUpdate` or `tEnv.explain(false)` is a temporary solution which
contains too many unrelated logic (just consider the functionality whether
a sql is valid).

Best,
godfrey



Arvid Heise  于2020年1月8日周三 下午9:40写道：

> A common approach is to add the connector jar as test dependencies and have
> a smoke test that just starts the job with a temporary external system
> spawned with docker. I usually use test containers [1]. Then you simply
> need to execute the integration tests in your IDE and usually can even
> debug non-obvious errors.
>
> [1] https://www.testcontainers.org/
>
> On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou  wrote:
>
> > Hi, Jingsong,
> >
> > Thank you very much for your suggestion.
> >
> > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to
> do
> > validation, it works.
> > But this method needs the connector jar, which is very inconvenient to
> use.
> >
> >
> > Hi, Danny,
> >
> > Many thanks for providing very useful explanations.
> >
> > The user case is users will register some source/sink tables, udf to
> > catalog service first, and then they will write and modify SQL like
> "insert
> > into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The
> > platform wants to tell the user whether the SQL is valid includes the
> > detailed position if an error occurs.
> >
> > For the `insert target table`, the platform wants to validate the table
> > exists, field name and field type.
> >
> > Best,
> > Kaibo
> >
> > Danny Chan  于2019年12月30日周一 下午5:37写道：
> >
> > > Hi, Kaibo Zhou ~
> > >
> > > There are several phrases that a SQL text get to execution graph what
> can
> > > be run with Flink runtime:
> > >
> > >
> > > 1. Sql Parse: parse the sql text to AST(sql node tree)
> > > 2. Sql node(row type) validation, this includes the tables/schema
> > inference
> > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational
> > > algebra)
> > > 4. Promote the relational expression with planner(Volcano or Hep) then
> > > converts to execution convention nodes
> > > 5. Genegate the code and the execution graph
> > >
> > > For the first 3 steps, Apache Flink uses the Apache Calcite as the
> > > implementation, that means a SQL test passed to table environment would
> > > always have a SQL parse/validation/sql-to-rel conversion.
> > >
> > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO
> sinkTable
> > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM
> > > sourceTable” was validated.
> > >
> > > But you are right, for Flink SQL, an insert statement target table is
> not
> > > validated during the validation phrase, actually we validate the
> “select”
> > > clause first, extract the target table identifier and we validate the
> > > schema of “select” clause and target table are the same when we invoke
> > > write to sink(after step 4).
> > >
> > >
> > > For most of the cases this is okey, can you share your cases ? What
> kind
> > > of validation do you want for the insert target table ?
> > >
> > > We are planning to include the insert target table validation in the
> > step2
> > > for 2 reasons:
> > >
> > > • The computed column validation(stored or virtual)
> > > • The insert implicit type coercion
> > >
> > > But this would comes for Flink version 1.11 ~
> > >
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年12月27日 +0800 PM5:44，dev@flink.apache.org，写道：
> > > >
> > > > "INSERT INTO
> > > > sinkTable SELECT f1,f2 FROM sourceTable"
> > >
> >
>

[jira] [Created] (FLINK-15472) Support SQL Client Gateway

2020-01-03 Thread godfrey he (Jira)

godfrey he created FLINK-15472:
--

 Summary: Support SQL Client Gateway 
 Key: FLINK-15472
 URL: https://issues.apache.org/jira/browse/FLINK-15472
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Client
Reporter: godfrey he


FLIP-91: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
design document: 
https://docs.google.com/document/d/1T7--664rts4t_4gjRPw937S9ln9Plf1yghNQ9IiHQtQ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15123) remove uniqueKeys from FlinkStatistic in blink planner

2019-12-07 Thread godfrey he (Jira)

godfrey he created FLINK-15123:
--

 Summary: remove uniqueKeys from FlinkStatistic in blink planner 
 Key: FLINK-15123
 URL: https://issues.apache.org/jira/browse/FLINK-15123
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he


{{uniqueKeys}} is a kind of constraint, it's unreasonable that {{uniqueKeys}} 
is a kind of statistic. so we should remove uniqueKeys from {{FlinkStatistic}} 
in blink planner. Some temporary solutions (e.g. 
{{RichTableSourceQueryOperation}}) should also be resolved after primaryKey is 
introduced in {{TableSchema}} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15095) bridge table schema's primary key to metadata handler in blink planner

2019-12-06 Thread godfrey he (Jira)

godfrey he created FLINK-15095:
--

 Summary: bridge table schema's primary key to metadata handler in 
blink planner
 Key: FLINK-15095
 URL: https://issues.apache.org/jira/browse/FLINK-15095
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15004) Choose SortMergeJoin instead of HashJoin if the statistics is unknown

2019-12-01 Thread godfrey he (Jira)

godfrey he created FLINK-15004:
--

 Summary: Choose SortMergeJoin instead of HashJoin if the 
statistics is unknown
 Key: FLINK-15004
 URL: https://issues.apache.org/jira/browse/FLINK-15004
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.9.1, 1.9.0
Reporter: godfrey he
 Fix For: 1.10.0


Currently, blink planner will use default rowCount value (defined in 
{{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is unknown, 
and maybe choose {{HashJoin}} instead of {{SortMergeJoin}}. The job will hang 
if the build side has huge input size. So It's better to use {{SortMergeJoin}} 
for execution stability if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15001) The digest of sub-plan reuse should contain RelNode's trait

2019-12-01 Thread godfrey he (Jira)

godfrey he created FLINK-15001:
--

 Summary: The digest of sub-plan reuse should contain RelNode's 
trait
 Key: FLINK-15001
 URL: https://issues.apache.org/jira/browse/FLINK-15001
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.1, 1.9.0
Reporter: godfrey he
 Fix For: 1.10.0
 Attachments: image-2019-12-02-10-49-46-916.png, 
image-2019-12-02-10-52-01-399.png

This bug is found in [FLINK-14946| 
https://issues.apache.org/jira/browse/FLINK-14946]:

The plan for the given sql in [FLINK-14946| 
https://issues.apache.org/jira/browse/FLINK-14946] is
 !image-2019-12-02-10-49-46-916.png! 

however, the plan after sub-plan reuse is:
 !image-2019-12-02-10-52-01-399.png! 

in the first picture, we could find that the accMode of two joins are 
different, but the two joins are reused in the second picture. 

The reason is the digest of sub-plan reuse does not contain RelNode's trait now.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14999) sub-plan reuse should consider

2019-12-01 Thread godfrey he (Jira)

godfrey he created FLINK-14999:
--

 Summary: sub-plan reuse should consider
 Key: FLINK-14999
 URL: https://issues.apache.org/jira/browse/FLINK-14999
 Project: Flink
  Issue Type: Bug
Reporter: godfrey he






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14899) can not be translated to StreamExecDeduplicate when PROCTIME is defined in query

2019-11-21 Thread godfrey he (Jira)

godfrey he created FLINK-14899:
--

 Summary: can not be translated to StreamExecDeduplicate when 
PROCTIME is defined in query
 Key: FLINK-14899
 URL: https://issues.apache.org/jira/browse/FLINK-14899
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.1, 1.9.0
Reporter: godfrey he
 Fix For: 1.10.0


CREATE TABLE user_log (
user_id VARCHAR,
item_id VARCHAR,
category_id VARCHAR,
behavior VARCHAR,
ts TIMESTAMP
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'user_behavior',
'connector.startup-mode' = 'earliest-offset',
'connector.properties.0.key' = 'zookeeper.connect',
'connector.properties.0.value' = 'localhost:2181',
'connector.properties.1.key' = 'bootstrap.servers',
'connector.properties.1.value' = 'localhost:9092',
'update-mode' = 'append',
'format.type' = 'json',
'format.derive-schema' = 'true'
);

CREATE TABLE user_dist (
dt VARCHAR,
user_id VARCHAR,
behavior VARCHAR
) WITH (
'connector.type' = 'jdbc',
'connector.url' = 'jdbc:mysql://localhost:3306/flink-test',
'connector.table' = 'user_behavior_dup',
'connector.username' = 'root',
'connector.password' = ‘**',
'connector.write.flush.max-rows' = '1'
);

INSERT INTO user_dist
SELECT
  dt,
  user_id,
  behavior
FROM (
   SELECT
  dt,
  user_id,
  behavior,
 ROW_NUMBER() OVER (PARTITION BY dt, user_id, behavior ORDER BY proc asc ) 
AS rownum
   FROM (select DATE_FORMAT(ts, '-MM-dd HH:00') as 
dt,user_id,behavior,PROCTIME() as proc
from user_log) )
WHERE rownum = 1;

Exception in thread "main" org.apache.flink.table.api.TableException: 
UpsertStreamTableSink requires that Table has a full primary keys if it is 
updated.
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:114)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:50)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:54)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:50)
at 
org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:61)
at 
org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:60)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:60)
at 
org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:149)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:439)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlUpdate(TableEnvironmentImpl.java:348)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14874) add local aggregate to solve data skew for ROLLUP/CUBE case

2019-11-20 Thread godfrey he (Jira)

godfrey he created FLINK-14874:
--

 Summary: add local aggregate to solve data skew for ROLLUP/CUBE 
case
 Key: FLINK-14874
 URL: https://issues.apache.org/jira/browse/FLINK-14874
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.9.1, 1.9.0
Reporter: godfrey he
 Fix For: 1.10.0


Many tpc-ds queries have {{rollup}} keyword, which will be translated to 
multiple groups. 
for example:  {{group by rollup (channel, id) }} is equivalent {{ group by 
(channel, id)}} +  {{ group by (channel)}} +  {{ group by () }}. 
All data on empty group will be shuffled to a single node, It is a typical data 
skew case. If there is a local aggregate, the data size shuffled to the single 
node will be greatly reduced. However, currently the cost mode can't estimate 
the local aggregate's cost, and the plan with local aggregate may be chose even 
the query has {{rollup}} keyword.
we could add a rule based phase (after physical phase) to enforce local 
aggregate if it's input has empty group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14789) extends max/min type in ColumnStats from Number to Comparable

2019-11-14 Thread godfrey he (Jira)

godfrey he created FLINK-14789:
--

 Summary: extends max/min type in ColumnStats from Number to 
Comparable
 Key: FLINK-14789
 URL: https://issues.apache.org/jira/browse/FLINK-14789
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.10.0


Many tpc-ds queries have predicates on date, like `d_date between '1999-02-01' 
and (cast('1999-02-01' as date) + INTERVAL '60' day)`, It's very useful to find 
a better plan if the planner knows the max/min values of date. However, max/min 
in {{ColumnStats}} only support {{Number}} type currently. This issue aims to 
extend max/min type from {{Number}} to {{Comparable}}, and then {{Date}}, 
{{Time}}, {{Timestamp}} even {{String}} could be supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14724) Join condition could be simplified in logical phase

2019-11-12 Thread godfrey he (Jira)

godfrey he created FLINK-14724:
--

 Summary: Join condition could be simplified in logical phase
 Key: FLINK-14724
 URL: https://issues.apache.org/jira/browse/FLINK-14724
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.9.1, 1.9.0
Reporter: godfrey he
 Fix For: 1.10.0


currently the plan of tpcds q38.sql contains {{NestedLoopJoin}}, because it's 
join condition is {{CAST(AND(IS NOT DISTINCT FROM($2, $3), IS NOT DISTINCT 
FROM($1, $4), IS NOT DISTINCT FROM($0, $5))):BOOLEAN}}, and planner can't find 
equal join keys from the condition by {{Join#analyzeCondition. 
{{SimplifyJoinConditionRule}} could solve this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14656) blink planner should convert catalog statistics to TableStats for permanent table instead of temporary table

2019-11-07 Thread godfrey he (Jira)

godfrey he created FLINK-14656:
--

 Summary: blink planner should convert catalog statistics to 
TableStats for permanent table instead of temporary table
 Key: FLINK-14656
 URL: https://issues.apache.org/jira/browse/FLINK-14656
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.1, 1.9.0
Reporter: godfrey he
 Fix For: 1.10.0


currently, blink planner will convert {{CatalogTable}} to Calcite {{Table}}, 
and convert the catalog statistics to `TableStats` in 
{{DatabaseCalciteSchema}}. However, the catalog statistics conversion is only 
for temporary table which has no any statistics now. It should be for permanent 
table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-13708) transformations should be cleared because a table environment could execute multiple job

2019-08-14 Thread godfrey he (JIRA)

godfrey he created FLINK-13708:
--

 Summary: transformations should be cleared because a table 
environment could execute multiple job
 Key: FLINK-13708
 URL: https://issues.apache.org/jira/browse/FLINK-13708
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.9.0


currently, if a table environment execute more than one sql jobs, the following 
job contains transformations about the previous job. the reason is the 
transformations is not cleared after execution



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13611) Introduce analyze statistic utility to generate table & column statistics

2019-08-07 Thread godfrey he (JIRA)

godfrey he created FLINK-13611:
--

 Summary: Introduce analyze statistic utility to generate table & 
column statistics
 Key: FLINK-13611
 URL: https://issues.apache.org/jira/browse/FLINK-13611
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.10.0


this issue aims to introduce a utility class to generate table & column 
statistics, the main steps include: 
1. generate sql, like {{select approx_count_distinct(a) as ndv, count(1) - 
count(a) as nullCount, avg(char_length(a)) as avgLen, max(char_lenght(a)) as 
maxLen, max(a) as maxValue, min(a) as minValue, ... from MyTable }}
2. execute the query
3. convert to the result to {{TableStats}} (maybe the source table is not a 
catalog table)
4. convert to {{TableStats}} to {{CatalogTableStatistics}} if needed

This issue does not involve DDL, however the DDL could use this utility class 
once it's supported.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13563) TumblingGroupWindow should implement toString method

2019-08-02 Thread godfrey he (JIRA)

godfrey he created FLINK-13563:
--

 Summary: TumblingGroupWindow should implement toString method
 Key: FLINK-13563
 URL: https://issues.apache.org/jira/browse/FLINK-13563
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.0, 1.10.0
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


{code:scala}
  @Test
  def testAllEventTimeTumblingGroupWindowOverTime(): Unit = {
val util = streamTestUtil()
val table = util.addDataStream[(Long, Int, String)](
  "T1", 'long, 'int, 'string, 'rowtime.rowtime)

val windowedTable = table
  .window(Tumble over 5.millis on 'rowtime as 'w)
  .groupBy('w)
  .select('int.count)
util.verifyPlan(windowedTable)
  }
{code}

currently, it's physical plan is 

{code:java}
HashWindowAggregate(window=[TumblingGroupWindow], select=[Final_COUNT(count$0) 
AS EXPR$0])
+- Exchange(distribution=[single])
   +- LocalHashWindowAggregate(window=[TumblingGroupWindow], 
select=[Partial_COUNT(int) AS count$0])
  +- TableSourceScan(table=[[default_catalog, default_database, Table1, 
source: [TestTableSource(long, int, string)]]], fields=[long, int, string])
{code}

we know nothing about the TumblingGroupWindow except its name




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13562) throws exception when FlinkRelMdColumnInterval meets two stage stream group aggregate

2019-08-02 Thread godfrey he (JIRA)

godfrey he created FLINK-13562:
--

 Summary: throws exception when FlinkRelMdColumnInterval meets two 
stage stream group aggregate
 Key: FLINK-13562
 URL: https://issues.apache.org/jira/browse/FLINK-13562
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


test case:

{code:scala}
  @Test
  def testTwoDistinctAggregateWithNonDistinctAgg(): Unit = {
util.addTableSource[(Int, Long, String)]("MyTable", 'a, 'b, 'c)
util.verifyPlan("SELECT c, SUM(DISTINCT a), SUM(a), COUNT(DISTINCT b) FROM 
MyTable GROUP BY c")
  }
{code}



org.apache.flink.table.api.TableException: Sum aggregate function does not 
support type: ''VARCHAR''.
Please re-check the data type.

at 
org.apache.flink.table.planner.plan.utils.AggFunctionFactory.createSumAggFunction(AggFunctionFactory.scala:191)
at 
org.apache.flink.table.planner.plan.utils.AggFunctionFactory.createAggFunction(AggFunctionFactory.scala:74)
at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$$anonfun$9.apply(AggregateUtil.scala:285)
at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$$anonfun$9.apply(AggregateUtil.scala:279)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$.transformToAggregateInfoList(AggregateUtil.scala:279)
at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$.getOutputIndexToAggCallIndexMap(AggregateUtil.scala:154)
at 
org.apache.flink.table.planner.plan.metadata.FlinkRelMdColumnInterval.getAggCallIndexInLocalAgg$1(FlinkRelMdColumnInterval.scala:504)
at 
org.apache.flink.table.planner.plan.metadata.FlinkRelMdColumnInterval.estimateColumnIntervalOfAggregate(FlinkRelMdColumnInterval.scala:526)
at 
org.apache.flink.table.planner.plan.metadata.FlinkRelMdColumnInterval.getColumnInterval(FlinkRelMdColumnInterval.scala:417)
at GeneratedMetadataHandler_ColumnInterval.getColumnInterval_$(Unknown 
Source)
at GeneratedMetadataHandler_ColumnInterval.getColumnInterval(Unknown 
Source)
at 
org.apache.flink.table.planner.plan.metadata.FlinkRelMetadataQuery.getColumnInterval(FlinkRelMetadataQuery.java:122)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13545) JoinToMultiJoinRule should not match SEMI/ANTI LogicalJoin

2019-08-01 Thread godfrey he (JIRA)

godfrey he created FLINK-13545:
--

 Summary: JoinToMultiJoinRule should not match SEMI/ANTI LogicalJoin
 Key: FLINK-13545
 URL: https://issues.apache.org/jira/browse/FLINK-13545
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.0, 1.10.0
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


run tpcds 14.a on blink planner, an exception will thrown

java.lang.ArrayIndexOutOfBoundsException: 84

at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule$InputReferenceCounter.visitInputRef(JoinToMultiJoinRule.java:564)
at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule$InputReferenceCounter.visitInputRef(JoinToMultiJoinRule.java:555)
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
at 
org.apache.calcite.rex.RexVisitorImpl.visitCall(RexVisitorImpl.java:80)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:191)
at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule.addOnJoinFieldRefCounts(JoinToMultiJoinRule.java:481)
at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule.onMatch(JoinToMultiJoinRule.java:166)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:284)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202)


the reason is {{JoinToMultiJoinRule}} should match SEMI/ANTI LogicalJoin. 
before calcite-1.20, SEMI join is represented by {{SemiJoin}} which is not 
matched {{JoinToMultiJoinRule}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13502) CatalogTableStatisticsConverter should be in planner.utils package

2019-07-30 Thread godfrey he (JIRA)

godfrey he created FLINK-13502:
--

 Summary: CatalogTableStatisticsConverter should be in 
planner.utils package
 Key: FLINK-13502
 URL: https://issues.apache.org/jira/browse/FLINK-13502
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


currently, {{CatalogTableStatisticsConverter}} is in 
{{org.apache.flink.table.util}}, its correct position is 
{{org.apache.flink.table.planner.utils}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13404) Port csv factories & validator from flink-table-planner to flink-csv

2019-07-24 Thread godfrey he (JIRA)

godfrey he created FLINK-13404:
--

 Summary: Port csv factories & validator from flink-table-planner 
to flink-csv
 Key: FLINK-13404
 URL: https://issues.apache.org/jira/browse/FLINK-13404
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Legacy Planner
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


Blink planner does not define any csv factories & validator, so port csv 
factories & validator from flink-table-planner to flink-csv, and let both 
planners use them



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13403) Correct package name after relocation

2019-07-24 Thread godfrey he (JIRA)

godfrey he created FLINK-13403:
--

 Summary: Correct package name after relocation
 Key: FLINK-13403
 URL: https://issues.apache.org/jira/browse/FLINK-13403
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


some scala classes's package name is not updated after 
[FLINK-13266|https://issues.apache.org/jira/browse/FLINK-13267], this issue 
aims to correct the package names



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13347) should handle new JoinRelType(SEMI/ANTI) in switch case

2019-07-21 Thread godfrey he (JIRA)

godfrey he created FLINK-13347:
--

 Summary: should handle new JoinRelType(SEMI/ANTI) in switch case
 Key: FLINK-13347
 URL: https://issues.apache.org/jira/browse/FLINK-13347
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Legacy Planner, Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


Calcite 1.20 introduces {{SEMI}} & {{ANTI}} to {{JoinRelType}}, blink planner & 
flink planner should handle them in each switch case



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13269) copy RelDecorrelator from blink planner to flink planner to fix CALCITE-3169 & CALCITE-3170

2019-07-15 Thread godfrey he (JIRA)

godfrey he created FLINK-13269:
--

 Summary: copy RelDecorrelator from blink planner to flink planner 
to fix CALCITE-3169 & CALCITE-3170
 Key: FLINK-13269
 URL: https://issues.apache.org/jira/browse/FLINK-13269
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he
 Fix For: 1.9.0, 1.10.0


[CALCITE-3169|https://issues.apache.org/jira/browse/CALCITE-3169] & 
[CALCITE-3170|https://issues.apache.org/jira/browse/CALCITE-3170] are not fixed 
in Calcite-1.20. 
`RelDecorrelator` is copied from Calcite to blink planner to resolve those two 
bug. to make both planners available in one jar, `RelDecorrelator` should also 
be copied to flink planner.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13268) revert SqlSplittableAggFunction to make two planners available in one jar

2019-07-15 Thread godfrey he (JIRA)

godfrey he created FLINK-13268:
--

 Summary: revert SqlSplittableAggFunction to make two planners 
available in one jar
 Key: FLINK-13268
 URL: https://issues.apache.org/jira/browse/FLINK-13268
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he
 Fix For: 1.9.0, 1.10.0


currently, SqlSplittableAggFunction is copied from Calcite and its `topSplit` 
method is extended to supports Left/Right outer join (see more: 
[CALCITE-2378|http://issues.apache.org/jira/browse/CALCITE-2378]). this new 
feature is only used for tpc-h now, so we will revert this class to make both 
planners available in  one jar.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13263) supports explain DAG plan in flink-python

2019-07-15 Thread godfrey he (JIRA)

godfrey he created FLINK-13263:
--

 Summary: supports explain DAG plan in flink-python
 Key: FLINK-13263
 URL: https://issues.apache.org/jira/browse/FLINK-13263
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: godfrey he
Assignee: godfrey he
 Fix For: 1.9.0, 1.10.0


update existing `explain` to support explain DAG plan in flink-python



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13185) Bump Calcite dependency to 1.20.0 in sql parser & flink planner

2019-07-09 Thread godfrey he (JIRA)

godfrey he created FLINK-13185:
--

 Summary: Bump Calcite dependency to 1.20.0 in sql parser & flink 
planner
 Key: FLINK-13185
 URL: https://issues.apache.org/jira/browse/FLINK-13185
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner
Reporter: godfrey he
Assignee: godfrey he


blink planner had upgraded calcite version to 1.20.0 (before version is 
1.19.0), and blink planner will support DDL in FLINK-1.9 which depends on 
flink-sql-parser. so calcite version in flink-sql-parser should also be upgrade 
to 1.20.0.

[~walterddr], [FLINK-11935|https://issues.apache.org/jira/browse/FLINK-11935] 
will not be fixed in this issue, because supporting DDL in blink planner is 
blocked by this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13168) clarify isBatch/isStreaming/isBounded flag in flink planner and blink planner

2019-07-09 Thread godfrey he (JIRA)

godfrey he created FLINK-13168:
--

 Summary: clarify isBatch/isStreaming/isBounded flag in flink 
planner and blink planner
 Key: FLINK-13168
 URL: https://issues.apache.org/jira/browse/FLINK-13168
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner, Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


in blink planner & flink planner, there are many `isBatch` and `isStreaming` 
flags, they have different meaning in different place. which makes reader and 
coder crazy. especially in blink planner, Only `StreamTableSource` could be 
used for both batch and stream. is `bounded StreamTableSource` means batch, 
`unbounded` means stream ? 

we should make it clear:
1. `isBatch` in `ConnectorCatalogTable`, which tells if the 
tableSource/tableSink is BatchTableSource/BatchTableSink
2. `isStreaming` in `TableSourceTable`, which tells if  if the current table is 
on stream planner
3. `bounded StreamTableSource` could be used for both batch and stream, while 
`unbounded StreamTableSource` could only be used for stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13116) supports catalog statistic in blink planner

2019-07-05 Thread godfrey he (JIRA)

godfrey he created FLINK-13116:
--

 Summary: supports catalog statistic in blink planner 
 Key: FLINK-13116
 URL: https://issues.apache.org/jira/browse/FLINK-13116
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to let blink planner could use catalog statistic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13115) Introduce planner rule to support partition pruning for PartitionableTableSource

2019-07-05 Thread godfrey he (JIRA)

godfrey he created FLINK-13115:
--

 Summary: Introduce planner rule to support partition pruning for 
PartitionableTableSource
 Key: FLINK-13115
 URL: https://issues.apache.org/jira/browse/FLINK-13115
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to support partition pruning for {{PartitionableTableSource}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13114) adapt blink planner to new unified TableEnvironment

2019-07-05 Thread godfrey he (JIRA)

godfrey he created FLINK-13114:
--

 Summary: adapt blink planner to new unified TableEnvironment
 Key: FLINK-13114
 URL: https://issues.apache.org/jira/browse/FLINK-13114
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to adapt blink planner to new unified {{TableEnvironment}}, 
remove all legacy {{TableEnvironment}}s in blink planner,  and make sure all 
blink tests could run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13088) Supports execute DAG plan

2019-07-03 Thread godfrey he (JIRA)

godfrey he created FLINK-13088:
--

 Summary: Supports execute DAG plan
 Key: FLINK-13088
 URL: https://issues.apache.org/jira/browse/FLINK-13088
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


in [FLINK-13081|https://issues.apache.org/jira/browse/FLINK-13081], {{explain}} 
method will be introduced to support explain multiple-sinks plan, this issue 
aims to introduce {{execute}} method into {{TableEnvironment}} to trigger the 
program execution. and in blink planner, queries will not be optimized 
immediately in {{insertInto}}/{{sqlUpdate}} methods, and will be optimized 
together in this method.

{code:java}
// Triggers the program execution
JobExecutionResult execute(String jobName) throws Exception;
{code}

there are two concerns about this method: 
1. which {{execute}} methods ({{TableEnvironment#execute}} or 
{{StreamExecutionEnvironment#execute}}) users should use?
2.  how to make sure users only use {{TableEnvironment#execute}} method if 
their code only depends on planner module instead of bridge module？



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13081) Supports explain and execute DAG plan

2019-07-03 Thread godfrey he (JIRA)

godfrey he created FLINK-13081:
--

 Summary: Supports explain and execute DAG plan
 Key: FLINK-13081
 URL: https://issues.apache.org/jira/browse/FLINK-13081
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


in flink planner, a query will be optimized while calling 
{{TableEnvironment#insertInto}} or {{TableEnvironment#sqlUpdate}}. however, if 
a job has multiple sinks (means {{insertInto}} or {{sqlUpdate}} will be called 
multiple times), the final job contains several independent sub-graphs. In most 
cases, there is duplicate computing in a multiple sinks job. so in blink 
planner, multiple sinks queries will be optimized together to avoid duplicate 
computing. a query will not be optimized in {{insertInto}} and {{sqlUpdate}}. 
instead, queries will be optimized before executing.
this issue aims to support above case.

two methods will be added into {{TableEnvironment}}:
{code:java}
// explain multiple-sinks plan
String explain(boolean extended);
// Triggers the program execution
// in blink planner, queries will be optimized together in this method
JobExecutionResult execute(String jobName) throws Exception;
{code}

to make sure the behavior of flink planner is same as before, a 
{{isLazyOptMode}} filed is added into {{EnvironmentSettings}}, which tell the 
table environment should optimize the query immediately in 
{{insertInto}}/{{sqlUpdate}} methods(isLazyOptMode=false) for flink planner or 
in execute method for blink planner.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13076) Bump Calcite dependency to 1.20.0 in blink planner

2019-07-03 Thread godfrey he (JIRA)

godfrey he created FLINK-13076:
--

 Summary: Bump Calcite dependency to 1.20.0 in blink planner
 Key: FLINK-13076
 URL: https://issues.apache.org/jira/browse/FLINK-13076
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


Bump Calcite dependency to 1.20.0 in flink-table-planner-blink module.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13030) add more test case for blink planner

2019-06-28 Thread godfrey he (JIRA)

godfrey he created FLINK-13030:
--

 Summary: add more test case for blink planner
 Key: FLINK-13030
 URL: https://issues.apache.org/jira/browse/FLINK-13030
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


this issue aims to introduce more test cases from inner blink and flink planner 
to blink planner, which makes sure the functionality of blink planner could 
align with inner blink and flink planner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12969) Remove dependencies on RelNode from TableImpl in blink planner

2019-06-25 Thread godfrey he (JIRA)

godfrey he created FLINK-12969:
--

 Summary: Remove dependencies on RelNode from TableImpl in blink 
planner
 Key: FLINK-12969
 URL: https://issues.apache.org/jira/browse/FLINK-12969
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to remove dependencies on RelNode from TableImpl in blink 
planner, just as 
[FLINK-12737|https://issues.apache.org/jira/browse/FLINK-12737] does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12937) Introduce join reorder planner rules in blink planner

2019-06-21 Thread godfrey he (JIRA)

godfrey he created FLINK-12937:
--

 Summary: Introduce join reorder planner rules in blink planner
 Key: FLINK-12937
 URL: https://issues.apache.org/jira/browse/FLINK-12937
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to let blink planner support join reorder. 
`LoptOptimizeJoinRule` in Calcite could meet our requirement for now, so we 
could use directly this rule in blink planner. `JoinToMultiJoinRule` , 
`ProjectMultiJoinMergeRule` and `FilterMultiJoinMergeRule` should be also 
introduced to support `LoptOptimizeJoinRule`.

additionally, we add a new rule named `RewriteMultiJoinConditionRule` which 
could apply transitive closure on `MultiJoin` for equi-join predicates to 
create more optimization possibilities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12888) Introduce planner rule to push filter into TableSource

2019-06-18 Thread godfrey he (JIRA)

godfrey he created FLINK-12888:
--

 Summary: Introduce planner rule to push filter into TableSource
 Key: FLINK-12888
 URL: https://issues.apache.org/jira/browse/FLINK-12888
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to support push filter into FilterableTAbleSource to reduce 
output records of a TableSource





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12878) Add travis profile for flink-table-planner-blink/flink-table-runtime-blink

2019-06-17 Thread godfrey he (JIRA)

godfrey he created FLINK-12878:
--

 Summary: Add travis profile for 
flink-table-planner-blink/flink-table-runtime-blink
 Key: FLINK-12878
 URL: https://issues.apache.org/jira/browse/FLINK-12878
 Project: Flink
  Issue Type: Improvement
  Components: Travis
Reporter: godfrey he
Assignee: godfrey he


The flink-table-planner-blink/flink-table-runtime-blink profiles takes almost 
30 minutes, and that may cause libraries profile frequently hits timeouts; we 
can resolve this by moving flink-table-planner-blink and 
flink-table-runtime-blink into a separate profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12857) move FilterableTableSource into flink-table-common

2019-06-15 Thread godfrey he (JIRA)

godfrey he created FLINK-12857:
--

 Summary: move FilterableTableSource into flink-table-common
 Key: FLINK-12857
 URL: https://issues.apache.org/jira/browse/FLINK-12857
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


move FilterableTableSource into flink-table-common, and flink-planner and 
blink-planner could use this interface both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12856) Introduce planner rule to push projection into ProjectableTableSource

2019-06-15 Thread godfrey he (JIRA)

godfrey he created FLINK-12856:
--

 Summary: Introduce planner rule to push projection into 
ProjectableTableSource
 Key: FLINK-12856
 URL: https://issues.apache.org/jira/browse/FLINK-12856
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims support push projection into ProjectableTableSource to reduce 
output fields of a TableSource



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12815) Supports CatalogManager in blink planner

2019-06-12 Thread godfrey he (JIRA)

godfrey he created FLINK-12815:
--

 Summary: Supports CatalogManager in blink planner
 Key: FLINK-12815
 URL: https://issues.apache.org/jira/browse/FLINK-12815
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to let blink planner support `CatalogMananger` which is what 
FLINK-11476 has done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12795) Extracted creation & configuration of FrameworkConfig & RelBuilder to separate class in blink planner

2019-06-10 Thread godfrey he (JIRA)

godfrey he created FLINK-12795:
--

 Summary: Extracted creation & configuration of FrameworkConfig & 
RelBuilder to separate class in blink planner
 Key: FLINK-12795
 URL: https://issues.apache.org/jira/browse/FLINK-12795
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


just as commit 
([e682395a|https://github.com/apache/flink/commit/e682395ae4e13caba0e2fdd98868f69ede9f3b3e])
 in flink planner, do similar things in blink planner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12734) remove getVolcanoPlanner method from FlinkOptimizeContext and RelNodeBlock does not depend on TableEnvironment

2019-06-04 Thread godfrey he (JIRA)

godfrey he created FLINK-12734:
--

 Summary: remove getVolcanoPlanner method from FlinkOptimizeContext 
and RelNodeBlock does not depend on TableEnvironment
 Key: FLINK-12734
 URL: https://issues.apache.org/jira/browse/FLINK-12734
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


there are two improvements:
1. remove {{getVolcanoPlanner}} method from {{FlinkOptimizeContext}}. 
{{VolcanoPlanner}} limits that the planer a {{RelNode}} tree belongs to and the 
{{VolcanoPlanner}} used to optimize the {{RelNode}} tree should be same 
instance. (see: {{VolcanoPlanner}}#registerImpl)
so, we can use planner instance in {{RelNode}}'s cluster directly instead of 
{{getVolcanoPlanner}} from {{FlinkOptimizeContext}}.

2. {{RelNodeBlock}} does not depend on {{TableEnvironment}}
In {{RelNodeBlock}}, only {{TableConfig}} is used.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12703) Introduce metadata handlers on SEMI/ATNI join and look up join

2019-06-01 Thread godfrey he (JIRA)

godfrey he created FLINK-12703:
--

 Summary: Introduce metadata handlers on SEMI/ATNI join and look up 
join
 Key: FLINK-12703
 URL: https://issues.apache.org/jira/browse/FLINK-12703
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


In [FLINK-11822|https://issues.apache.org/jira/browse/FLINK-11822], we have 
introduced all Flink metadata handlers, several RelNode s (e.g. look up join) 
have not be implemented. So this issue aims to introduce all metadata handlers 
on SEMI/ATNI join and look up join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12685) Supports UNNEST query in blink planner

2019-05-30 Thread godfrey he (JIRA)

godfrey he created FLINK-12685:
--

 Summary: Supports UNNEST query in blink planner
 Key: FLINK-12685
 URL: https://issues.apache.org/jira/browse/FLINK-12685
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


this issue aim to support queries with UNNEST keyword, which relate to nested 
fields.
for example: 
table name: 
MyTable

schema: 
a: int, b int, c array[int]

sql:
SELECT a, b, s FROM MyTable, UNNEST(MyTable.c) AS A (s)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12665) Introduce MiniBatchIntervalInferRule and watermark assigner operators

2019-05-29 Thread godfrey he (JIRA)

godfrey he created FLINK-12665:
--

 Summary:  Introduce MiniBatchIntervalInferRule and watermark 
assigner operators
 Key: FLINK-12665
 URL: https://issues.apache.org/jira/browse/FLINK-12665
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner, Table SQL / Runtime
Reporter: godfrey he
Assignee: godfrey he


This issue aims to introduce MiniBatchIntervalInferRule to infer the mini-batch 
interval of watermark assigner, This rule could handle the following two kinds 
of operator:
1. supports operators which supports mini-batch and does not require watermark, 
e.g. group aggregate. In this case, {{StreamExecWatermarkAssigner}} with 
Protime mode will be created if not exist, and the interval value will be set 
as {{SQL_EXEC_MINIBATCH_ALLOW_LATENCY}}.
2. supports operators which requires watermark, e.g. window join, window 
aggregate. In this case, {{StreamExecWatermarkAssigner}} already exists, and 
its MiniBatchIntervalTrait will be updated as the merged intervals from its 
outputs.
Currently, mini-batched window aggregate is not supported, and will be 
introduced later.

this issue also introduces watermark assigner operators, including:
1. {{WatermarkAssignerOperator}}, that extracts timestamps from stream elements 
and generates periodic watermarks.
2. {{MiniBatchedWatermarkAssignerOperator}}, that extracts timestamps from 
stream elements and generates watermarks with specified emit latency.
3. {{MiniBatchAssignerOperator}}, that emits mini-batch marker in a given 
period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12610) Introduce planner rules about aggregate

2019-05-23 Thread godfrey he (JIRA)

godfrey he created FLINK-12610:
--

 Summary: Introduce planner rules about aggregate
 Key: FLINK-12610
 URL: https://issues.apache.org/jira/browse/FLINK-12610
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to introduce planner rules about aggregate, rules include:
1. {{AggregateCalcMergeRule}}, that recognizes {{Aggregate}} on top of a 
{{Calc}} and if possible aggregate through the calc or removes the calc
2. {{AggregateReduceGroupingRule}}, that reduces unless grouping columns
3. {{PruneAggregateCallRule}}, that that removes unreferenced AggregateCall 
from Aggregate
4. {{FlinkAggregateRemoveRule}}, that is copied from Calcite's 
AggregateRemoveRule, and supports SUM, MIN, MAX, AUXILIARY_GROUP aggregate 
functions in non-empty group aggregate
5. {{FlinkAggregateJoinTransposeRule}}, that is copied from Calcite's 
AggregateJoinTransposeRule, and supports Left/Right outer join and aggregate 
with AUXILIARY_GROUP



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12600) Introduce planner rules to do deterministic rewriting on RelNode

2019-05-23 Thread godfrey he (JIRA)

godfrey he created FLINK-12600:
--

 Summary: Introduce planner rules to do deterministic rewriting on 
RelNode 
 Key: FLINK-12600
 URL: https://issues.apache.org/jira/browse/FLINK-12600
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to introduce planner rules to to do deterministic rewriting on 
RelNode , rules include:
1. {{FlinkLimit0RemoveRule}} that rewrites `limit 0` to empty {{Values}}
2. {{FlinkRewriteSubQueryRule}} that rewrites a {{Filter}} with condition: 
`(select count(*) from T) > 0` to a {{Filter}} with condition: `exists(select * 
from T)`
3. {{ReplaceIntersectWithSemiJoinRule}} that rewrites distinct {{Intersect}} to 
a distinct {{Aggregate}} on a SEMI {{Join}}.
4. {{ReplaceMinusWithAntiJoinRule}} that rewrite distinct {{Minus}} to a 
distinct {{Aggregate}} on an ANTI {{Join}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12575) Introduce planner rules to remove redundant shuffle and collation

2019-05-21 Thread godfrey he (JIRA)

godfrey he created FLINK-12575:
--

 Summary: Introduce planner rules to remove redundant shuffle and 
collation
 Key: FLINK-12575
 URL: https://issues.apache.org/jira/browse/FLINK-12575
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


{{Exchange}} and {{Sort}} is the most heavy operator, they are created in 
{{FlinkExpandConversionRule}} when some operators require its inputs to satisfy 
distribution trait or collation trait in planner rules. However, many operators 
could provide distribution or collation, e.g. {{BatchExecHashAggregate}} or 
{{BatchExecHashJoin}} could provide distribution on its shuffle keys, 
{{BatchExecSortMergeJoin}} could provide distribution and collation on its join 
keys. If the provided traits could satisfy the required traits, the 
{{Exchange}} or the {{Sort}} is redundant.
e.g. 
{code:sql}
schema:
x: a int, b bigint, c varchar
y: d int, e bigint, f varchar
t1: a1 int, b1 bigint, c1 varchar
t2: d1 int, e1 bigint, f1 varchar

sql:
select * from x join y on a = d and b = e join t1 on d = a1 and e = b1 left 
outer join t2 on a1 = d1 and b1 = e1

the physical plan after redundant Exchange and Sort are removed:
SortMergeJoin(joinType=[LeftOuterJoin], where=[AND(=(a1, d1), =(b1, e1))], 
leftSorted=[true], ...)
:- SortMergeJoin(joinType=[InnerJoin], where=[AND(=(d, a1), =(e, b1))], 
leftSorted=[true], ...)
:  :- SortMergeJoin(joinType=[InnerJoin], where=[AND(=(a, d), =(b, e))], ...)
:  :  :- Exchange(distribution=[hash[a, b]])
:  :  :  +- TableSourceScan(table=[[x]], ...)
:  :  +- Exchange(distribution=[hash[d, e]])
:  : +- TableSourceScan(table=[[y]], ...)
:  +- Exchange(distribution=[hash[a1, b1]])
: +- TableSourceScan(table=[[t1]], ...)
+- Exchange(distribution=[hash[d1, e1]])
   +- TableSourceScan(table=[[t2]], ...)
{code}

In above physical plan, the {{Exchange}}s between {{SortMergeJoin}}s are 
redundant due to their shuffle keys are same, the {{Sort}}s in the top tow 
{{SortMergeJoin}}s' left hand side are redundant due to its input is sorted.

another situation is the shuffle and collation could be removed between 
multiple {{Over}}s. e.g.
{code:sql}
schema:
MyTable: a int, b int, c varchar

sql:
SELECT
COUNT(*) OVER (PARTITION BY c ORDER BY a),
SUM(a) OVER (PARTITION BY b ORDER BY a),
RANK() OVER (PARTITION BY c ORDER BY a, c),
SUM(a) OVER (PARTITION BY b ORDER BY a),
COUNT(*) OVER (PARTITION BY c ORDER BY c)
 FROM MyTable

the physical plan after redundant Exchange and Sort are removed:
Calc(select=[...])
+- OverAggregate(partitionBy=[c], orderBy=[c ASC], window#0=[COUNT(*) AS w3$o0 
RANG ...])
   +- OverAggregate(partitionBy=[c], orderBy=[a ASC], window#0=[COUNT(*) AS 
w1$o0 RANG ...], window#1=[RANK(*) AS w2$o0 RANG ...], ...)
  +- Sort(orderBy=[c ASC, a ASC])
 +- Exchange(distribution=[hash[c]])
+- OverAggregate(partitionBy=[b], orderBy=[a ASC], 
window#0=[COUNT(a) AS w1$o1, $SUM0(a) AS w0$o0 RANG ...], ...)
   +- Sort(orderBy=[b ASC, a ASC])
  +- Exchange(distribution=[hash[b]])
 +- TableSourceScan(table=[[MyTable]], ...)
{code}
the {{Exchange}}s and {{Sort}} between the top two {{OverAggregate}}s are 
redundant due to their shuffle keys and sort keys are same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12559) Introduce metadata handlers on window aggregate

2019-05-20 Thread godfrey he (JIRA)

godfrey he created FLINK-12559:
--

 Summary: Introduce metadata handlers on window aggregate
 Key: FLINK-12559
 URL: https://issues.apache.org/jira/browse/FLINK-12559
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


In [FLINK-11822|https://issues.apache.org/jira/browse/FLINK-11822], we have 
introduced all Flink metadata handlers,  several {{RelNode}}s (e.g. window 
aggregate) have not be implemented. So this issue aims to introduce metadata 
handlers on window aggregate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12524) Introduce planner rules about rank

2019-05-15 Thread godfrey he (JIRA)

godfrey he created FLINK-12524:
--

 Summary: Introduce planner rules about rank
 Key: FLINK-12524
 URL: https://issues.apache.org/jira/browse/FLINK-12524
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to introduce planner rules about rank, rules include:
1. {{CalcRankTransposeRule}}, that transposes {{FlinkLogicalCalc}} past 
{{FlinkLogicalRank}}
 to reduce rank input fields.
2. {{RankNumberColumnRemoveRule}}, that emoves the output column of rank number 
iff there is a equality condition for the rank column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12519) Introduce planner rules about semi/anti join

2019-05-15 Thread godfrey he (JIRA)

godfrey he created FLINK-12519:
--

 Summary: Introduce planner rules about semi/anti join
 Key: FLINK-12519
 URL: https://issues.apache.org/jira/browse/FLINK-12519
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to introduce planner rules about semi/anti join, rules include:
1. {{FlinkSemiAntiJoinFilterTransposeRule}} that pushes semi/anti join down in 
a tree past a filter
2. {{FlinkSemiAntiJoinJoinTransposeRule}} that pushes semi/anti join down in a 
tree past a non semi/anti join
3. {{FlinkSemiAntiJoinProjectTransposeRule}} that push semi/anti join down in a 
tree past a project
4. {{ProjectSemiAntiJoinTransposeRule}} that pushes a project down in a tree 
past a semi/anti join

planner rules about non semi/anti join will be introduced in 
[FLINK-12509|https://issues.apache.org/jira/browse/FLINK-12509].




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

< 1 2 3 4 5 >

301 - 400 of 453 matches

Mail list logo