access to the design doc of FLINK-12477

2023-09-27 Thread Shuyi Chen
Hi Devs, can anyone grant access to the design doc of FLINK-12477? Thanks a
lot.

https://docs.google.com/document/d/1eDpsUKv2FqwZiS1Pm6gYO5eFHScBHfULKmH1-ZEWB4g

Shuyi


Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-14 Thread Shuyi Chen
Congratulations!

On Sat, Dec 14, 2019 at 7:59 AM Rong Rong  wrote:

> Congrats Zhu Zhu :-)
>
> --
> Rong
>
> On Sat, Dec 14, 2019 at 4:47 AM tison  wrote:
>
> > Congratulations!:)
> >
> > Best,
> > tison.
> >
> >
> > OpenInx  于2019年12月14日周六 下午7:34写道:
> >
> > > Congrats Zhu Zhu!
> > >
> > > On Sat, Dec 14, 2019 at 2:38 PM Jeff Zhang  wrote:
> > >
> > > > Congrats, Zhu Zhu!
> > > >
> > > > Paul Lam  于2019年12月14日周六 上午10:29写道:
> > > >
> > > > > Congrats Zhu Zhu!
> > > > >
> > > > > Best,
> > > > > Paul Lam
> > > > >
> > > > > Kurt Young  于2019年12月14日周六 上午10:22写道:
> > > > >
> > > > > > Congratulations Zhu Zhu!
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Sat, Dec 14, 2019 at 10:04 AM jincheng sun <
> > > > sunjincheng...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Congrats ZhuZhu and welcome on board!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jincheng
> > > > > > >
> > > > > > >
> > > > > > > Jark Wu  于2019年12月14日周六 上午9:55写道:
> > > > > > >
> > > > > > > > Congratulations, Zhu Zhu!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jark
> > > > > > > >
> > > > > > > > On Sat, 14 Dec 2019 at 08:20, Yangze Guo  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Congrats, ZhuZhu!
> > > > > > > > >
> > > > > > > > > Bowen Li  于 2019年12月14日周六 上午5:37写道:
> > > > > > > > >
> > > > > > > > > > Congrats!
> > > > > > > > > >
> > > > > > > > > > On Fri, Dec 13, 2019 at 10:42 AM Xuefu Z <
> > usxu...@gmail.com>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Congratulations, Zhu Zhu!
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Dec 13, 2019 at 10:37 AM Peter Huang <
> > > > > > > > > huangzhenqiu0...@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Congratulations!:)
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Dec 13, 2019 at 9:45 AM Piotr Nowojski <
> > > > > > > > pi...@ververica.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Congratulations! :)
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On 13 Dec 2019, at 18:05, Fabian Hueske <
> > > > > fhue...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Congrats Zhu Zhu and welcome on board!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best, Fabian
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb Till
> > > > Rohrmann
> > > > > <
> > > > > > > > > > > > > > trohrm...@apache.org>:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Hi everyone,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I'm very happy to announce that Zhu Zhu accepted
> > the
> > > > > offer
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > Flink
> > > > > > > > > > > > > PMC
> > > > > > > > > > > > > >> to become a committer of the Flink project.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Zhu Zhu has been an active community member for
> > more
> > > > > than
> > > > > > a
> > > > > > > > year
> > > > > > > > > > > now.
> > > > > > > > > > > > > Zhu
> > > > > > > > > > > > > >> Zhu played an essential role in the scheduler
> > > > > refactoring,
> > > > > > > > > helped
> > > > > > > > > > > > > >> implementing fine grained recovery, drives
> FLIP-53
> > > and
> > > > > > fixed
> > > > > > > > > > various
> > > > > > > > > > > > > bugs
> > > > > > > > > > > > > >> in the scheduler and runtime. Zhu Zhu also
> helped
> > > the
> > > > > > > > community
> > > > > > > > > by
> > > > > > > > > > > > > >> reporting issues, answering user mails and being
> > > > active
> > > > > on
> > > > > > > the
> > > > > > > > > dev
> > > > > > > > > > > > > mailing
> > > > > > > > > > > > > >> list.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Congratulations Zhu Zhu!
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Best, Till
> > > > > > > > > > > > > >> (on behalf of the Flink PMC)
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Xuefu Zhang
> > > > > > > > > > >
> > > > > > > > > > > "In Honey We Trust!"
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
>


Re: [VOTE] FLIP-79: Flink Function DDL Support (1.10 Release Feature Only)

2019-11-11 Thread Shuyi Chen
+1 (binding)

On Sat, Nov 9, 2019 at 11:17 PM Kurt Young  wrote:

> +1 (binding)
>
> Best,
> Kurt
>
>
> On Sun, Nov 10, 2019 at 12:25 PM Peter Huang 
> wrote:
>
> > Hi Yu,
> >
> > Thanks for your reminder about the timeline of delivering the basic
> > function DDL in release 1.10.
> > As I replied to Xuefu, the "CREATE FUNCTION" and "DROP FUNCTION" can
> > relatively easy achieve by revising the existing PR.
> > Definitely, I probably need to start to work on a basic version of PR for
> > "ALTER FUNCTION" and "SHOW FUNCTIONS".
> > Please let me know if you have any suggestion to better align the
> timeline
> > of the ongoing catalog related efforts.
> >
> > Best Regards
> > Peter Huang
> >
> >
> > On Sat, Nov 9, 2019 at 7:26 PM Yu Li  wrote:
> >
> > > Thanks for driving this Peter!
> > >
> > > I agree it would be great if we could include this feature in 1.10.
> > > However, FWIW, since we are following the time-based release policy [1]
> > and
> > > 1.10 release is approaching its feature freeze (planned to be at the
> end
> > of
> > > November) [2], I'm a little bit concerned about the schedule.
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> > > [2]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-10-td32824.html
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Sat, 9 Nov 2019 at 04:12, Xuefu Z  wrote:
> > >
> > > > Hi Peter,
> > > >
> > > > Thanks for driving this. I'm all-in for this. However, as I read the
> > > latest
> > > > FLIP, I have a couple of questions/comments:
> > > >
> > > > 1. It seems that "JVM" is proposed as a language type in parallel to
> > > > python. I'm not sure that's very intuitive. JVM stands for "java
> > virtual
> > > > machine", so the language is really "JAVA", correct? I know "scala"
> is
> > > also
> > > > a language which can be generate java byte code that JVM can execute.
> > > >
> > > > 2. In the flip, "SHOW FUNCTIONS" and "ALTER FUNCTION" are mentioned
> > but I
> > > > don't see there is any implementation plan, either in 1.10 or
> beyond. I
> > > > think we could have more clarification on those.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Fri, Nov 8, 2019 at 10:36 AM Bowen Li 
> wrote:
> > > >
> > > > > Peter and I went thru the details and defined scope/plan for 1.10
> > > offline
> > > > > in the last few days. +1 (binding) from my side.
> > > > >
> > > > > On Fri, Nov 8, 2019 at 12:55 AM Terry Wang 
> > wrote:
> > > > >
> > > > > > Thanks Peter driving on this. LGTM for 1.10 release feature.
> > > > > >
> > > > > > +1 from my side. (non-binding)
> > > > > >
> > > > > > Best,
> > > > > > Terry Wang
> > > > > >
> > > > > >
> > > > > >
> > > > > > > 2019年11月8日 13:20,Peter Huang  写道:
> > > > > > >
> > > > > > > Dear All,
> > > > > > >
> > > > > > > I would like to start the vote for 1.10 release features in
> > FLIP-79
> > > > [1]
> > > > > > > which is discussed and research consensus in the discussion
> > thread
> > > > [2].
> > > > > > For
> > > > > > > the advanced feature, such as loading function from remote
> > > resources,
> > > > > > > support scala/python function, we will have the further
> > discussion
> > > > > after
> > > > > > > release 1.10.
> > > > > > >
> > > > > > > The vote will be open for at least 72 hours. If the voting
> > passes,
> > > I
> > > > > will
> > > > > > > close it by 2019-11-10 14:00 UTC.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-79+Flink+Function+DDL+Support
> > > > > > > [2]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-FLIP-79-Flink-Function-DDL-Support-td33965.html
> > > > > > >
> > > > > > > Best Regards
> > > > > > > Peter Huang
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > >
> >
>


Re: [ANNOUNCE] Becket Qin joins the Flink PMC

2019-10-28 Thread Shuyi Chen
Congratulations!

On Mon, Oct 28, 2019 at 11:18 AM Xingcan Cui  wrote:

> Congratulations, Becket!
>
> Best,
> Xingcan
>
> > On Oct 28, 2019, at 1:23 PM, Xuefu Z  wrote:
> >
> > Congratulations, Becket!
> >
> > On Mon, Oct 28, 2019 at 10:08 AM Zhu Zhu  wrote:
> >
> >> Congratulations Becket!
> >>
> >> Thanks,
> >> Zhu Zhu
> >>
> >> Peter Huang  于2019年10月29日周二 上午1:01写道:
> >>
> >>> Congratulations Becket Qin!
> >>>
> >>>
> >>> Best Regards
> >>> Peter Huang
> >>>
> >>> On Mon, Oct 28, 2019 at 9:19 AM Rong Rong  wrote:
> >>>
>  Congratulations Becket!!
> 
>  --
>  Rong
> 
>  On Mon, Oct 28, 2019, 7:53 AM Jark Wu  wrote:
> 
> > Congratulations Becket!
> >
> > Best,
> > Jark
> >
> > On Mon, 28 Oct 2019 at 20:26, Benchao Li 
> >> wrote:
> >
> >> Congratulations Becket.
> >>
> >> Dian Fu  于2019年10月28日周一 下午7:22写道:
> >>
> >>> Congrats, Becket.
> >>>
>  在 2019年10月28日,下午6:07,Fabian Hueske  写道:
> 
>  Hi everyone,
> 
>  I'm happy to announce that Becket Qin has joined the Flink PMC.
>  Let's congratulate and welcome Becket as a new member of the
> >>> Flink
> > PMC!
> 
>  Cheers,
>  Fabian
> >>>
> >>>
> >>
> >> --
> >>
> >> Benchao Li
> >> School of Electronics Engineering and Computer Science, Peking
>  University
> >> Tel:+86-15650713730
> >> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >>
> >
> 
> >>>
> >>
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
>
>


Re: Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-22 Thread Shuyi Chen
Hi Lasse,

Thanks for the reply. If your input is in epoch time, you are not getting
local time, instead, you are getting a wrong time that does not make sense.
For example,  if the user input value is 0 (which means 00:00:00 UTC on 1
January 1970), and your local timezone is UTC-8, converting 00:00:00 UTC on
1 January 1970 to your local timezone should yield 16:00:00 Dec 31, 1969.
But actually, you will be getting 08:00:00 UTC on 1 January 1970  from
Table/SQL runtime, which 00:00:00 on 1 January 1970 in your local timezone
(UTC-8). Your input time just get shifted by 8 hours in output.

Shuyi

On Mon, Jul 22, 2019 at 12:49 PM Lasse Nedergaard 
wrote:

> Hi.
>
> I have encountered the same problem when you input epoch time to window
> table function and then use window.start and window.end the out doesn’t
> output in epoch but local time and I located the problem to the same
> internal function as you.
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
> Den 22. jul. 2019 kl. 20.46 skrev Shuyi Chen :
>
> Hi all,
>
> Currently, in the non-blink table/SQL runtime, Flink used
> SqlFunctions.internalToTimestamp(long v) from Calcite to convert event time
> (in long) to java.sql.Timestamp. However, as discussed in the recent
> Calcite mailing list (Jul. 19, 2019), SqlFunctions.internalToTimestamp()
> assumes the input timestamp value is in the current JVM’s default timezone
> (which is unusual), NOT milliseconds since epoch. And
> SqlFunctions.internalToTimestamp() is used to convert timestamp value in
> the current JVM’s default timezone to milliseconds since epoch, which
> java.sql.Timestamp constructor takes. Therefore, the results will not only
> be wrong, but change if the job runs in machines on different timezones as
> well. (The only exception is that all your production machines uses UTC
> timezone.)
>
> Here is an example, if the user input value is 0 (00:00:00 UTC on 1
> January 1970), and the table/SQL runtime runs in a machine in PST (UTC-8),
> the output sql.Timestamp after SqlFunctions.internalToTimestamp() will
> become 2880 millisec since epoch (08:00:00 UTC on 1 January 1970); And
> with the same input, if the table/SQL runtime runs again in a different
> machine in EST (UTC-5), the output sql.Timestamp after
> SqlFunctions.internalToTimestamp() will become 1800 millisec since
> epoch (05:00:00 UTC on 1 January 1970).
>
> More details are captured in
> https://issues.apache.org/jira/browse/FLINK-13372. Please let me know
> your thoughts and correct me if I am wrong. Thanks a lot.
>
> Shuyi
>
>


Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-22 Thread Shuyi Chen
Hi all,

Currently, in the non-blink table/SQL runtime, Flink used
SqlFunctions.internalToTimestamp(long v) from Calcite to convert event time
(in long) to java.sql.Timestamp. However, as discussed in the recent
Calcite mailing list (Jul. 19, 2019), SqlFunctions.internalToTimestamp()
assumes the input timestamp value is in the current JVM’s default timezone
(which is unusual), NOT milliseconds since epoch. And
SqlFunctions.internalToTimestamp() is used to convert timestamp value in
the current JVM’s default timezone to milliseconds since epoch, which
java.sql.Timestamp constructor takes. Therefore, the results will not only
be wrong, but change if the job runs in machines on different timezones as
well. (The only exception is that all your production machines uses UTC
timezone.)

Here is an example, if the user input value is 0 (00:00:00 UTC on 1 January
1970), and the table/SQL runtime runs in a machine in PST (UTC-8), the
output sql.Timestamp after SqlFunctions.internalToTimestamp() will become
2880 millisec since epoch (08:00:00 UTC on 1 January 1970); And with
the same input, if the table/SQL runtime runs again in a different machine
in EST (UTC-5), the output sql.Timestamp after
SqlFunctions.internalToTimestamp() will become 1800 millisec since
epoch (05:00:00 UTC on 1 January 1970).

More details are captured in
https://issues.apache.org/jira/browse/FLINK-13372. Please let me know your
thoughts and correct me if I am wrong. Thanks a lot.

Shuyi


[jira] [Created] (FLINK-13372) Timestamp conversion bug in non-blink Table/SQL runtime

2019-07-22 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-13372:
--

 Summary: Timestamp conversion bug in non-blink Table/SQL runtime
 Key: FLINK-13372
 URL: https://issues.apache.org/jira/browse/FLINK-13372
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.8.1, 1.8.0, 1.7.2, 1.6.4, 1.6.3
Reporter: Shuyi Chen
Assignee: Shuyi Chen


Currently, in the non-blink table/SQL runtime, Flink used 
SqlFunctions.internalToTimestamp(long v) from Calcite to convert event time (in 
long) to java.sql.Timestamp. 

{code:java} public static Timestamp internalToTimestamp(long v) { return new 
Timestamp(v - (long)LOCAL_TZ.getOffset(v)); } {code} 

However, as discussed in the recent Calcite mailing list, 
SqlFunctions.internalToTimestamp() assumes the input timestamp value is in the 
current JVM’s default timezone (which is unusual), NOT milliseconds since 
epoch. And SqlFunctions.internalToTimestamp() is used to convert timestamp 
value in the current JVM’s default timezone to milliseconds since epoch, which 
java.sql.Timestamp constructor takes. Therefore, the results will not only be 
wrong, but change if the job runs in machines on different timezones as well. 

Here is an example, if the user input value is 0 (00:00:00 UTC on 1 January 
1970), and the table/SQL runtime runs in a machine with in PST (UTC-8), the 
output sql.Timestamp after SqlFunctions.internalToTimestamp() will become 
2880 millisec since epoch (08:00:00 UTC on 1 January 1970); And if the 
table/SQL runtime runs in a machine with in EST (UTC-5), the output 
sql.Timestamp after SqlFunctions.internalToTimestamp() will become 1800 
millisec since epoch (05:00:00 UTC on 1 January 1970). 

Currently, there are unittests to test the table/SQL API event time 
input/output (e.g., GroupWindowITCase.testEventTimeTumblingWindow() and 
SqlITCase.testDistinctAggWithMergeOnEventTimeSessionGroupWindow()). They now 
all passed because we are comparing the string format of the time which ignores 
timezone. If you step into the code, the actual java.sql.Timestamp value is 
wrong and change as the tests run in different timezone (e.g., one can use 
-Duser.timezone=PST to change the current JVM’s default timezone)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project

2019-07-18 Thread Shuyi Chen
Congrats!

On Thu, Jul 18, 2019 at 10:21 AM Thomas Weise  wrote:

> Congrats!
>
>
> On Thu, Jul 18, 2019 at 9:58 AM Richard Deurwaarder 
> wrote:
>
> > Congrats Becket! :)
> >
> > Richard
> >
> > On Thu, Jul 18, 2019 at 5:52 PM Xuefu Z  wrote:
> >
> > > Congratulation, Becket! At least you're able to assign JIRAs now!
> > >
> > > On Thu, Jul 18, 2019 at 8:22 AM Rong Rong  wrote:
> > >
> > > > Congratulations Becket!
> > > >
> > > > --
> > > > Rong
> > > >
> > > > On Thu, Jul 18, 2019 at 7:05 AM Xingcan Cui 
> > wrote:
> > > >
> > > > > Congrats Becket!
> > > > >
> > > > > Best,
> > > > > Xingcan
> > > > >
> > > > > On Thu, Jul 18, 2019, 07:17 Dian Fu  wrote:
> > > > >
> > > > > > Congrats Becket!
> > > > > >
> > > > > > > 在 2019年7月18日,下午6:42,Danny Chan  写道:
> > > > > > >
> > > > > > >> Congratulations!
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2019年7月18日 +0800 PM6:29,Haibo Sun ,写道:
> > > > > > >> Congratulations Becket!Best,
> > > > > > >> Haibo
> > > > > > >> 在 2019-07-18 17:51:06,"Hequn Cheng" 
> 写道:
> > > > > > >>> Congratulations Becket!
> > > > > > >>>
> > > > > > >>> Best, Hequn
> > > > > > >>>
> > > > > > >>> On Thu, Jul 18, 2019 at 5:34 PM vino yang <
> > yanghua1...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > >  Congratulations!
> > > > > > 
> > > > > >  Best,
> > > > > >  Vino
> > > > > > 
> > > > > >  Yun Gao  于2019年7月18日周四
> > 下午5:31写道:
> > > > > > 
> > > > > > > Congratulations!
> > > > > > >
> > > > > > > Best,
> > > > > > > Yun
> > > > > > >
> > > > > > >
> > > > > > >
> > > > --
> > > > > > > From:Kostas Kloudas 
> > > > > > > Send Time:2019 Jul. 18 (Thu.) 17:30
> > > > > > > To:dev 
> > > > > > > Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added
> > as
> > > a
> > > > > >  committer
> > > > > > > to the Flink project
> > > > > > >
> > > > > > > Congratulations Becket!
> > > > > > >
> > > > > > > Kostas
> > > > > > >
> > > > > > > On Thu, Jul 18, 2019 at 11:21 AM Guowei Ma <
> > > guowei@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Congrats Becket!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Guowei
> > > > > > >>
> > > > > > >>
> > > > > > >> Terry Wang  于2019年7月18日周四 下午5:17写道:
> > > > > > >>
> > > > > > >>> Congratulations Becket!
> > > > > > >>>
> > > > > >  在 2019年7月18日,下午5:09,Dawid Wysakowicz <
> > > dwysakow...@apache.org>
> > > > > 写道:
> > > > > > 
> > > > > >  Congratulations Becket! Good to have you onboard!
> > > > > > 
> > > > > >  On 18/07/2019 10:56, Till Rohrmann wrote:
> > > > > > > Congrats Becket!
> > > > > > >
> > > > > > > On Thu, Jul 18, 2019 at 10:52 AM Jeff Zhang <
> > > > zjf...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Congratulations Becket!
> > > > > > >>
> > > > > > >> Xu Forward  于2019年7月18日周四
> > > 下午4:39写道:
> > > > > > >>
> > > > > > >>> Congratulations Becket! Well deserved.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Cheers,
> > > > > > >>>
> > > > > > >>> forward
> > > > > > >>>
> > > > > > >>> Kurt Young  于2019年7月18日周四
> 下午4:20写道:
> > > > > > >>>
> > > > > >  Congrats Becket!
> > > > > > 
> > > > > >  Best,
> > > > > >  Kurt
> > > > > > 
> > > > > > 
> > > > > >  On Thu, Jul 18, 2019 at 4:12 PM JingsongLee <
> > > > > > >> lzljs3620...@aliyun.com
> > > > > >  .invalid>
> > > > > >  wrote:
> > > > > > 
> > > > > > > Congratulations Becket!
> > > > > > >
> > > > > > > Best, Jingsong Lee
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > --
> > > > > > > From:Congxian Qiu 
> > > > > > > Send Time:2019年7月18日(星期四) 16:09
> > > > > > > To:dev@flink.apache.org 
> > > > > > > Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has
> been
> > > > added
> > > > > >  as a
> > > > > >  committer
> > > > > > > to the Flink project
> > > > > > >
> > > > > > > Congratulations Becket! Well deserved.
> > > > > > >
> > > > > > > Best,
> > > > > > > Congxian
> > > > > > >
> > > > > > >
> > > > > > > Jark Wu  于2019年7月18日周四 下午4:03写道:
> > > > > > >
> > > > > > >> Congratulations Becket! Well deserved.
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >> Jark
> > > > > > >>
> 

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Shuyi Chen
Congratulations, Rong!

On Thu, Jul 11, 2019 at 8:26 AM Yu Li  wrote:

> Congratulations Rong!
>
> Best Regards,
> Yu
>
>
> On Thu, 11 Jul 2019 at 22:54, zhijiang  wrote:
>
>> Congratulations Rong!
>>
>> Best,
>> Zhijiang
>>
>> --
>> From:Kurt Young 
>> Send Time:2019年7月11日(星期四) 22:54
>> To:Kostas Kloudas 
>> Cc:Jark Wu ; Fabian Hueske ; dev <
>> dev@flink.apache.org>; user 
>> Subject:Re: [ANNOUNCE] Rong Rong becomes a Flink committer
>>
>> Congratulations Rong!
>>
>> Best,
>> Kurt
>>
>>
>> On Thu, Jul 11, 2019 at 10:53 PM Kostas Kloudas 
>> wrote:
>> Congratulations Rong!
>>
>> On Thu, Jul 11, 2019 at 4:40 PM Jark Wu  wrote:
>> Congratulations Rong Rong!
>> Welcome on board!
>>
>> On Thu, 11 Jul 2019 at 22:25, Fabian Hueske  wrote:
>> Hi everyone,
>>
>> I'm very happy to announce that Rong Rong accepted the offer of the Flink
>> PMC to become a committer of the Flink project.
>>
>> Rong has been contributing to Flink for many years, mainly working on SQL
>> and Yarn security features. He's also frequently helping out on the
>> user@f.a.o mailing lists.
>>
>> Congratulations Rong!
>>
>> Best, Fabian
>> (on behalf of the Flink PMC)
>>
>>
>>


Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-04 Thread Shuyi Chen
Hi Thomas,

I agreed that UDF support is important. As some of us might not be familiar
with the effort on Apache Beam, It will be great if you can share some
design documents on Beam's portability layer and multi-language support,
and the current status. Thanks a lot.

Regards
Shuyi

On Wed, Apr 3, 2019 at 5:03 PM Thomas Weise  wrote:

> Thanks for putting this proposal together.
>
> It would be nice, if you could share a few use case examples (maybe add
> them as section to the FLIP?).
>
> The reason I ask: The table API is immensely useful, but it isn't clear to
> me what value other language bindings provide without UDF support. With
> FLIP-38 it will be possible to write a program in Python, but not execute
> Python functions. Without UDF support, isn't it possible to achieve roughly
> the same with plain SQL? In which situation would I use the Python API?
>
> There was related discussion regarding UDF support in [1]. If the
> assumption is that such support will be added later, then I would like to
> circle back to the question why this cannot be built on top of Beam? It
> would be nice to clarify the bigger goal before embarking for the first
> milestone.
>
> I'm going to comment on other things in the doc.
>
> [1]
>
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>
> Thomas
>
>
> On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen  wrote:
>
> > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > good. Adding multi-lang support sounds a promising direction to expand
> the
> > footprint of Flink. Do we have plan for adding Golang support? As many
> > backend engineers nowadays are familiar with Go, but probably not Java as
> > much, adding Golang support would significantly reduce their friction to
> > use Flink. Also, do we have a design for multi-lang UDF support, and
> what's
> > timeline for adding DataStream API support? We would like to help and
> > contribute as well as we do have similar need internally at our company.
> > Thanks a lot.
> >
> > Shuyi
> >
> > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun 
> > wrote:
> >
> > > Hi All,
> > > As Xianda brought up in the previous email, There are a large number of
> > > data analysis users who want flink to support Python. At the Flink API
> > > level, we have DataStreamAPI/DataSetAPI/TableAPI, the Table API
> will
> > > become the first-class citizen. Table API is declarative and can be
> > > automatically optimized, which is mentioned in the Flink mid-term
> roadmap
> > > by Stephan. So we first considering supporting Python at the Table
> level
> > to
> > > cater to the current large number of analytics users. For further
> promote
> > > Python support in flink table level. Dian, Wei and I discussed offline
> a
> > > bit and came up with an initial features outline as follows:
> > >
> > > - Python TableAPI Interface
> > >   Introduce a set of Python Table API interfaces, including interface
> > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > >
> > > - Implementation Architecture
> > >   We will offer two alternative architecture options, one for pure
> Python
> > > language support and one for extended multi-language design.
> > >
> > > - Job Submission
> > >   Provide a way that can submit(local/remote) Python Table API jobs.
> > >
> > > - Python Shell
> > >   Python Shell is to provide an interactive way for users to write and
> > > execute flink Python Table API jobs.
> > >
> > >
> > > The design document for FLIP-38 can be found here:
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > >
> > > I am looking forward to your comments and feedback.
> > >
> > > Best,
> > > Jincheng
> > >
> >
>


Re: Expose per partition Kafka lag metric in Flink Kafka connector

2019-04-01 Thread Shuyi Chen
ping to bump this topic. Can we get some more input here or on the JIRA on
the agreement to resolve this issue? Thanks a lot.

Shuyi

On Mon, Mar 25, 2019 at 8:43 AM Shuyi Chen  wrote:

> Thanks a lot, Becket. I am sorry that I was out of the loop for the last
> few days due to sickness. Let’s keep the discussion continue on the JIRA.
>
> Shuyi
>
> On Mon, Mar 25, 2019 at 4:46 AM Becket Qin  wrote:
>
>> Hi Shuyi,
>>
>> Thanks for bringing this issue up. Per partition lag is definitely
>> something that should be exposed. I replied to the JIRA with some of my
>> concerns. Do you mind keeping the discussion in the JIRA ticket so it is
>> easier for future readers to follow the issue?
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Mon, Mar 18, 2019 at 2:14 PM Shuyi Chen  wrote:
>>
>> > Hi all,
>> >
>> > We found that Flink's Kafka connector does not expose the per-partition
>> > Kafka lag. The metric is available in KafkaConsumer after Kafka 0.10.2.
>> And
>> > it's an important metric to diagnose which Kafka partition is lagging in
>> > production. I've created a JIRA (
>> > https://issues.apache.org/jira/browse/FLINK-11912) and created a
>> proposed
>> > change in Flink to register and expose the metrics. Could someone help
>> take
>> > a look and give some suggestions? Thanks a lot.
>> >
>> > Shuyi
>> >
>>
>


Re: Expose per partition Kafka lag metric in Flink Kafka connector

2019-03-25 Thread Shuyi Chen
Thanks a lot, Becket. I am sorry that I was out of the loop for the last
few days due to sickness. Let’s keep the discussion continue on the JIRA.

Shuyi

On Mon, Mar 25, 2019 at 4:46 AM Becket Qin  wrote:

> Hi Shuyi,
>
> Thanks for bringing this issue up. Per partition lag is definitely
> something that should be exposed. I replied to the JIRA with some of my
> concerns. Do you mind keeping the discussion in the JIRA ticket so it is
> easier for future readers to follow the issue?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Mar 18, 2019 at 2:14 PM Shuyi Chen  wrote:
>
> > Hi all,
> >
> > We found that Flink's Kafka connector does not expose the per-partition
> > Kafka lag. The metric is available in KafkaConsumer after Kafka 0.10.2.
> And
> > it's an important metric to diagnose which Kafka partition is lagging in
> > production. I've created a JIRA (
> > https://issues.apache.org/jira/browse/FLINK-11912) and created a
> proposed
> > change in Flink to register and expose the metrics. Could someone help
> take
> > a look and give some suggestions? Thanks a lot.
> >
> > Shuyi
> >
>


[jira] [Created] (FLINK-11914) Expose a REST endpoint in JobManager to disconnect specific TaskManager

2019-03-13 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-11914:
--

 Summary: Expose a REST endpoint in JobManager to disconnect 
specific TaskManager
 Key: FLINK-11914
 URL: https://issues.apache.org/jira/browse/FLINK-11914
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / REST
Reporter: Shuyi Chen
Assignee: Shuyi Chen


we want to add capability in the Flink web UI to kill each individual TM by 
clicking a button, this would require first exposing the capability from the 
REST API endpoint. The reason is that  some TM might be running on a heavily 
loaded YARN host over time, and we want to kill just that TM and have flink JM 
to reallocate a TM to restart the job graph. The other approach would be 
restart the entire YARN job and this is heavy-weight.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Releasing Flink 1.7.2

2019-02-05 Thread Shuyi Chen
HI Gordon,

Thanks for bringing up the discussion. The following JIRA/PR  is  almost
there, and it's a major/critical issue that blocks us from upgrading to
Flink 1.6 or above in production. W/o this fix, a job might go into an
infinite resource acquirement loop without failing itself in YARN.

FLINK-10868 : Flink's
JobCluster ResourceManager doesn't use maximum-failed-containers as limit
of resource acquirement

Would greatly appreciate that we could get it in for 1.7.2. Thanks a lot.

Shuyi

On Tue, Feb 5, 2019 at 7:32 AM Tzu-Li (Gordon) Tai 
wrote:

> Hi Flink devs,
>
> What do you think about releasing Flink 1.7.2 soon?
>
> We already have some critical fixes in the release-1.7 branch:
> - FLINK-11207: security vulnerability with currently used Apache
> commons-compress version
> - FLINK-11419: restore issue with StreamingFileSink
> - FLINK-11436: restore issue with Flink's AvroSerializer
> - FLINK-10761: potential deadlock with metrics system
> - FLINK-10774: connection leak in FlinkKafkaConsumer
> - FLINK-10848: problem with resource allocation in YARN mode
>
> Please let me know what you think. Ideally, we can kick off the release
> vote for the first RC early next week.
> If there are some other critical fixes for 1.7.2 that are almost completed
> (already have a PR opened and review is in progress), please let me know
> here by the end of the week to account for it for the 1.7.2 release.
>
> Cheers,
> Gordon
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Flink Kerberos Improvement

2018-12-17 Thread Shuyi Chen
Hi Rong, thanks a lot for the proposal. Currently, Flink assume the keytab
is located in a remote DFS. Pre-installing Keytabs statically in YARN node
local filesystem is a common approach, so I think we should support this
mode in Flink natively. As an optimazation to reduce the KDC access
frequency, we should also support method 3 (the DT approach) as discussed
in [1]. A question is that why do we need to implement impersonation in
Flink? I assume the superuser can do the impersonation for 'joe' and 'joe'
can then invoke Flink client to deploy the job. Thanks a lot.

Shuyi

[1]
https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit

On Mon, Dec 17, 2018 at 5:49 PM Rong Rong  wrote:

> Hi All,
>
> We have been experimenting integration of Kerberos with Flink in our Corp
> environment and found out some limitations on the current Flink-Kerberos
> security mechanism running with Apache YARN.
>
> Based on the Hadoop Kerberos security guide [1]. Apparently there are only
> a subset of the suggested long-running service security mechanism is
> supported in Flink. Furthermore, the current model does not work well with
> superuser impersonating actual users [2] for deployment purposes, which is
> a widely adopted way to launch application in corp environments.
>
> We would like to propose an improvement [3] to introduce the other comment
> methods [1] for securing long-running application on YARN and enable
> impersonation mode. Any comments and suggestions are highly appreciated.
>
> Many thanks,
> Rong
>
> [1]
>
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
> [2]
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
> [3]
>
> https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Flink SQL DDL Design

2018-12-12 Thread Shuyi Chen
t not included in the DDL draft. It will
> further
> > extend the SQL syntax, which is we should be cautious about. What do you
> > think about this two solutions?
> >
> > 4.d) Custom watermark strategies:
> > @Timo,  I don't have a strong opinion on this.
> >
> > 3) SOURCE/SINK/BOTH
> > Agree with Lin, GRANT/INVOKE [SELECT|UPDATE] ON TABLE is a clean and
> > standard way to manage the permission, which is also adopted by HIVE[2]
> and
> > many databases.
> >
> > [1]: https://docs.confluent.io/current/ksql/docs/tutorials/examples.html
> > [2]:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=45876173#Hivedeprecatedauthorizationmode/LegacyMode-Grant/RevokePrivileges
> >
> > @Timo, it's great if someone can conclude the discussion and summarize
> into
> > a FLIP.
> > @Shuyi, Thanks a lot for putting it all together. The google doc looks
> good
> > to me, and I left some minor comments there.
> >
> > Regarding to the FLIP, I have some suggestions:
> > 1. The FLIP can contain MILESTONE1 and FUTURE WORKS.
> > 2. The MILESTONE1 is the MVP. It describes the MVP DDL syntax.
> > 3. Separate FUTURE WORKS into two parts: UNDER DISCUSSION and ADOPTED. We
> > can derive MILESTONE2 from this easily when it is ready.
> >
> > I summarized the Future Works based on Shuyi's work:
> >
> > Adopted: (Should detailed described here...)
> > 1. support data type nullability and precision.
> > 2. comment on table and columns.
> >
> > Under Discussion: (Should briefly describe some options...)
> > 1. Ingesting and writing timestamps to systems.
> > 2. support custom watermark strategy.
> > 3. support table update mode
> > 4. support row/map/array data type
> > 5. support schema derivation
> > 6. support system versioned temporal table
> > 7. support table index
> >
> > We can continue the further discussion here, also can separate to an
> other
> > DISCUSS topic if it is a sophisticated problem such as Table Update Mode,
> > Temporal Table.
> >
> > Best,
> > Jark
> >
> > On Mon, 10 Dec 2018 at 11:54, Lin Li  wrote:
> >
> >> hi all,
> >> Thanks for your valuable input!
> >>
> >> 4) Event-Time Attributes and Watermarks:
> >> 4.b) @Fabian As you mentioned using a computed columns `ts AS
> >> SYSTEMROWTIME()`
> >> for writing out to kafka table sink will violate the rule that computed
> >> fields are not emitted.
> >> Since the timestamp column in kafka's header area is a specific
> >> materialization protocol,
> >> why don't we treat it as an connector property? For an example:
> >> ```
> >> CREATE TABLE output_kafka_t1(
> >>id bigint,
> >>ts timestamp,
> >>msg varchar
> >> ) WITH (
> >>type=kafka,
> >>header.timestamp=ts
> >>,...
> >> );
> >> ```
> >>
> >> 4d) For custom watermark strategies
> >> @Fabian Agree with you that opening another topic about this feature
> later.
> >>
> >> 3) SOURCE / SINK / BOTH
> >> I think the permissions and availabilities are two separately things,
> >> permissions
> >> can be managed well by using GRANT/INVOKE(you can call it DCL) solutions
> >> which
> >> commonly used in different DBs. The permission part can be an new topic
> for
> >> later discussion, what do you think?
> >>
> >> For the availabilities, @Fabian @Timo  I've another question,
> >> does instantiate a TableSource/Sink cost much or has some other
> downsides?
> >> IMO, create a new source/sink object via the construct seems not costly.
> >> When receiving a DDL we should associate it with the catalog object
> >> (reusing an existence or create a new one).
> >> Am I lost something important?
> >>
> >> 5. Schema declaration:
> >> @Timo  yes, your concern about the user convenience is very important.
> But
> >> I haven't seen a clear way to solve this so far.
> >> Do we put it later and wait for more inputs from the community?
> >>
> >> Shuyi Chen  于2018年12月8日周六 下午4:27写道:
> >>
> >>> Hi all,
> >>>
> >>> Thanks a lot for the great discussion. I think we can continue the
> >>> discussion here while carving out a MVP so that the community can start
> >>> working on. Based on the discussion so far, I try to summarize what we
> >> will
> >>> do for the MVP:
> >>&g

Re: [DISCUSS] Flink SQL DDL Design

2018-12-08 Thread Shuyi Chen
>>>> in-place
> >>>>>> can be a common case.
> >>>>>> e.g.,
> >>>>>> ```
> >>>>>> CREATE TABLE t1 (
> >>>>>> col1 varchar,
> >>>>>> col2 int,
> >>>>>> col3 varchar
> >>>>>> ...
> >>>>>> );
> >>>>>>
> >>>>>> INSERT [OVERWRITE] TABLE t1
> >>>>>> AS
> >>>>>> SELECT
> >>>>>> (some computing ...)
> >>>>>> FROM t1;
> >>>>>> ```
> >>>>>> So, let's forget these SOURCE/SINK keywords in DDL. For the
> >> validation
> >>>>>> purpose, we can find out other ways.
> >>>>>>
> >>>>>> 4. Time attributes
> >>>>>> As Shuyi mentioned before, there exists an
> >>>>>> `org.apache.flink.table.sources.tsextractors.TimestampExtractor` for
> >>>> custom
> >>>>>> defined time attributes usage, but this expression based class is
> >> more
> >>>>>> friendly for table api not the SQL.
> >>>>>> ```
> >>>>>> /**
> >>>>>> * Provides the an expression to extract the timestamp for a
> >> rowtime
> >>>>>> attribute.
> >>>>>> */
> >>>>>> abstract class TimestampExtractor extends FieldComputer[Long] with
> >>>>>> Serializable {
> >>>>>>
> >>>>>> /** Timestamp extractors compute the timestamp as Long. */
> >>>>>> override def getReturnType: TypeInformation[Long] =
> >>>>>> Types.LONG.asInstanceOf[TypeInformation[Long]]
> >>>>>> }
> >>>>>> ```
> >>>>>> BTW, I think both the Scalar function and the TimestampExtractor are
> >>>>>> expressing computing logic, the TimestampExtractor has no more
> >>>> advantage in
> >>>>>> SQL scenarios.
> >>>>>>
> >>>>>>
> >>>>>> 6. Partitioning and keys
> >>>>>> Primary Key is included in Constraint part, and partitioned table
> >>>> support
> >>>>>> can be another topic later.
> >>>>>>
> >>>>>> 5. Schema declaration
> >>>>>> Agree with you that we can do better schema derivation for user
> >>>>>> convenience, but this is not conflict with the syntax.
> >>>>>> Table properties can carry any useful informations both for the
> >> users
> >>>> and
> >>>>>> the framework, I like your `contract name` proposal,
> >>>>>> e.g., `WITH (format.type = avro)`, the framework can recognize some
> >>>>>> `contract name` like `format.type`, `connector.type` and etc.
> >>>>>> And also derive the table schema from an existing schema file can be
> >>>> handy
> >>>>>> especially one with too many table columns.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Lin
> >>>>>>
> >>>>>>
> >>>>>> Timo Walther  于2018年12月5日周三 下午10:40写道:
> >>>>>>
> >>>>>>> Hi Jark and Shuyi,
> >>>>>>>
> >>>>>>> thanks for pushing the DDL efforts forward. I agree that we should
> >>> aim
> >>>>>>> to combine both Shuyi's design and your design.
> >>>>>>>
> >>>>>>> Here are a couple of concerns that I think we should address in the
> >>>>>> design:
> >>>>>>> 1. Scope: Let's focuses on a MVP DDL for CREATE TABLE statements
> >>> first.
> >>>>>>> I think this topic has already enough potential for long
> >> discussions
> >>>> and
> >>>>>>> is very helpful for users. We can discuss CREATE VIEW and CREATE
> >>>>>>> FUNCTION afterwards as they are not related to each other.
> >>>>>>>
> >>>>>>> 2. Constraints: I think we should consider things like nullability,
> >>>>>>> VARCHAR length, and decimal scale and precision in the future as
> >> they
> >>>>>

Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Shuyi Chen
) Library DDL
> > > (5) Drop statement
> > >
> > > [1] Flink DDL draft by Lin and Jark:
> > >
> > >
> >
> https://docs.google.com/document/d/1o16jC-AxnZoxMfHQptkKQkSC6ZDDBRhKg6gm8VGnY-k/edit#
> > > [2] Flink SQL DDL design by Shuyi:
> > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#
> > >
> > > Cheers,
> > > Jark
> > >
> > > On Thu, 29 Nov 2018 at 16:13, Shaoxuan Wang 
> wrote:
> > >
> > > > Sure Shuyu,
> > > > What I hope is that we can reach an agreement on DDL gramma as soon
> as
> > > > possible. There are a few differences between your proposal and ours.
> > > Once
> > > > Lin and Jark propose our design, we can quickly discuss on the those
> > > > differences, and see how far away towards a unified design.
> > > >
> > > > WRT the external catalog, I think it is an orthogonal topic, we can
> > > design
> > > > it in parallel. I believe @Xuefu, @Bowen are already working on. We
> > > > should/will definitely involve them to review the final design of DDL
> > > > implementation. I would suggest that we should give it a higher
> > priority
> > > on
> > > > the DDL implementation, as it is a crucial component for the user
> > > > experience of SQL_CLI.
> > > >
> > > > Regards,
> > > > Shaoxuan
> > > >
> > > >
> > > >
> > > > On Thu, Nov 29, 2018 at 6:56 AM Shuyi Chen 
> wrote:
> > > >
> > > > > Thanks a lot, Shaoxuan, Jack and Lin. We should definitely
> > collaborate
> > > > > here, we have also our own DDL implementation running in production
> > for
> > > > > almost 2 years at Uber. With the joint experience from both
> > companies,
> > > we
> > > > > can definitely make the Flink SQL DDL better.
> > > > >
> > > > > As @shaoxuan suggest, Jark can come up with a doc that talks about
> > the
> > > > > current DDL design in Alibaba, and we can discuss and merge them
> into
> > > > one,
> > > > > make it as a FLIP, and plan the tasks for implementation. Also, we
> > > should
> > > > > take into account the new external catalog effort in the design.
> What
> > > do
> > > > > you guys think?
> > > > >
> > > > > Shuyi
> > > > >
> > > > > On Wed, Nov 28, 2018 at 6:45 AM Jark Wu  wrote:
> > > > >
> > > > > > Hi Shaoxuan,
> > > > > >
> > > > > > I think summarizing it into a google doc is a good idea. We will
> > > > prepare
> > > > > it
> > > > > > in the next few days.
> > > > > >
> > > > > > Thanks,
> > > > > > Jark
> > > > > >
> > > > > > Shaoxuan Wang  于2018年11月28日周三 下午9:17写道:
> > > > > >
> > > > > > > Hi Lin and Jark,
> > > > > > > Thanks for sharing those details. Can you please consider
> > > summarizing
> > > > > > your
> > > > > > > DDL design into a google doc.
> > > > > > > We can still continue the discussions on Shuyi's proposal. But
> > > > having a
> > > > > > > separate google doc will be easy for the DEV to
> > > > > > understand/comment/discuss
> > > > > > > on your proposed DDL implementation.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shaoxuan
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 28, 2018 at 7:39 PM Jark Wu 
> > wrote:
> > > > > > >
> > > > > > > > Hi Shuyi,
> > > > > > > >
> > > > > > > > Thanks for bringing up this discussion and the awesome work!
> I
> > > have
> > > > > > left
> > > > > > > > some comments in the doc.
> > > > > > > >
> > > > > > > > I want to share something more about the watermark definition
> > > > learned
> > > > > > > from
> > > > > > > > Alibaba.
> > > > > > > >
> > > > > > > >

Re: [ANNOUNCE] Apache Flink 1.7.0 released

2018-12-03 Thread Shuyi Chen
Thanks a lot for the hard work, Till!

Shuyi

On Sat, Dec 1, 2018 at 4:07 AM Dominik Wosiński  wrote:

> Thanks Till for being the release manager!
> Thanks Everyone and Great Job.
>
> Best Regards,
> Dom.
>
> pt., 30 lis 2018 o 13:19 vino yang  napisał(a):
>
> > Thanks Till for your great work, also thanks to the whole community!
> >
> > Thanks, vino.
> >
> > Timo Walther  于2018年11月30日周五 下午7:42写道:
> >
> > > Thanks for being the release manager Till and thanks for the great work
> > > Flink community!
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > Am 30.11.18 um 10:39 schrieb Till Rohrmann:
> > > > The Apache Flink community is very happy to announce the release of
> > > Apache
> > > > Flink 1.7.0, which is the next major release.
> > > >
> > > > Apache Flink® is an open-source stream processing framework for
> > > > distributed, high-performing, always-available, and accurate data
> > > streaming
> > > > applications.
> > > >
> > > > The release is available for download at:
> > > > https://flink.apache.org/downloads.html
> > > >
> > > > Please check out the release blog post for an overview of the new
> > > features
> > > > and improvements for this release:
> > > > https://flink.apache.org/news/2018/11/30/release-1.7.0.html
> > > >
> > > > The full release notes are available in Jira:
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12343585
> > > >
> > > > We would like to thank all contributors of the Apache Flink community
> > who
> > > > made this release possible!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > >
> > >
> >
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Shuyi Chen
> > [ computedColumnDefinition [, computedColumnDefinition]* ]
> > > > [ tableConstraint [, tableConstraint]* ]
> > > > [ tableIndex [, tableIndex]* ]
> > > > [ PERIOD FOR SYSTEM_TIME ]
> > > > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > > > withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> > > > tableOption]* ) ] [ ; ]
> > > >
> > > > columnDefinition ::=
> > > > columnName dataType [ NOT NULL ]
> > > >
> > > > dataType  ::=
> > > > {
> > > >   [ VARCHAR ]
> > > >   | [ BOOLEAN ]
> > > >   | [ TINYINT ]
> > > >   | [ SMALLINT ]
> > > >   | [ INT ]
> > > >   | [ BIGINT ]
> > > >   | [ FLOAT ]
> > > >   | [ DECIMAL ]
> > > >   | [ DOUBLE ]
> > > >   | [ DATE ]
> > > >   | [ TIME ]
> > > >   | [ TIMESTAMP ]
> > > >   | [ VARBINARY ]
> > > > }
> > > >
> > > > computedColumnDefinition ::=
> > > > columnName AS computedColumnExpression
> > > >
> > > > tableConstraint ::=
> > > > { PRIMARY KEY | UNIQUE }
> > > > (columnName [, columnName]* )
> > > >
> > > > tableIndex ::=
> > > > [ UNIQUE ] INDEX indexName
> > > >  (columnName [, columnName]* )
> > > >
> > > > rowTimeColumn ::=
> > > > columnName
> > > >
> > > > tableOption ::=
> > > > property=value
> > > > offset ::=
> > > > positive integer (unit: ms)
> > > >
> > > > CREATE VIEW
> > > >
> > > > CREATE VIEW viewName
> > > >   [
> > > > ( columnName [, columnName]* )
> > > >   ]
> > > > AS queryStatement;
> > > >
> > > > CREATE FUNCTION
> > > >
> > > >  CREATE FUNCTION functionName
> > > >   AS 'className';
> > > >
> > > >  className ::=
> > > > fully qualified name
> > > >
> > > >
> > > > Shuyi Chen  于2018年11月28日周三 上午3:28写道:
> > > >
> > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the
> design
> > > doc
> > > > > first and start implementation w/o the unified connector API ready
> by
> > > > > skipping some featue.
> > > > >
> > > > > Xuefu, I like the idea of making Flink specific properties into
> > generic
> > > > > key-value pairs, so that it will make integration with Hive DDL (or
> > > > others,
> > > > > e.g. Beam DDL) easier.
> > > > >
> > > > > I'll run a final pass over the design doc and finalize the design
> in
> > > the
> > > > > next few days. And we can start creating tasks and collaborate on
> the
> > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > >
> > > > > Cheers!
> > > > > Shuyi
> > > > >
> > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > xuef...@alibaba-inc.com>
> > > > > wrote:
> > > > >
> > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > blocked
> > > > > by
> > > > > > connector API. We can leave the unknown out while defining the
> > basic
> > > > > syntax.
> > > > > >
> > > > > > @Shuyi
> > > > > >
> > > > > > As commented in the doc, I think we can probably stick with
> simple
> > > > syntax
> > > > > > with general properties, without extending the syntax too much
> that
> > > it
> > > > > > mimics the descriptor API.
> > > > > >
> > > > > > Part of our effort on Flink-Hive integration is also to make DDL
> > > syntax
> > > > > > compatible with Hive's. The one in the current proposal seems
> > making
> > > > our
> > > > > > effort more challenging.
> > > > > >
> > > > > > We can help and collaborate. At this moment, I think we can
> &

Re: [DISCUSS] Flink SQL DDL Design

2018-11-27 Thread Shuyi Chen
Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design doc
first and start implementation w/o the unified connector API ready by
skipping some featue.

Xuefu, I like the idea of making Flink specific properties into generic
key-value pairs, so that it will make integration with Hive DDL (or others,
e.g. Beam DDL) easier.

I'll run a final pass over the design doc and finalize the design in the
next few days. And we can start creating tasks and collaborate on the
implementation. Thanks a lot for all the comments and inputs.

Cheers!
Shuyi

On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu 
wrote:

> Yeah! I agree with Timo that DDL can actually proceed w/o being blocked by
> connector API. We can leave the unknown out while defining the basic syntax.
>
> @Shuyi
>
> As commented in the doc, I think we can probably stick with simple syntax
> with general properties, without extending the syntax too much that it
> mimics the descriptor API.
>
> Part of our effort on Flink-Hive integration is also to make DDL syntax
> compatible with Hive's. The one in the current proposal seems making our
> effort more challenging.
>
> We can help and collaborate. At this moment, I think we can finalize on
> the proposal and then we can divide the tasks for better collaboration.
>
> Please let me know if there are  any questions or suggestions.
>
> Thanks,
> Xuefu
>
>
>
>
> --
> Sender:Timo Walther 
> Sent at:2018 Nov 27 (Tue) 16:21
> Recipient:dev 
> Subject:Re: [DISCUSS] Flink SQL DDL Design
>
> Thanks for offering your help here, Xuefu. It would be great to move
> these efforts forward. I agree that the DDL is somehow releated to the
> unified connector API design but we can also start with the basic
> functionality now and evolve the DDL during this release and next releases.
>
> For example, we could identify the MVP DDL syntax that skips defining
> key constraints and maybe even time attributes. This DDL could be used
> for batch usecases, ETL, and materializing SQL queries (no time
> operations like windows).
>
> The unified connector API is high on our priority list for the 1.8
> release. I will try to update the document until mid of next week.
>
>
> Regards,
>
> Timo
>
>
> Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> weeks,
> > but we are definitely interested in moving this forward. I think once the
> > unified connector API design [1] is done, we can finalize the DDL design
> as
> > well and start creating concrete subtasks to collaborate on the
> > implementation with the community.
> >
> > Shuyi
> >
> > [1]
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> >
> > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
> > wrote:
> >
> >> Hi Shuyi,
> >>
> >> I'm wondering if you folks still have the bandwidth working on this.
> >>
> >> We have some dedicated resource and like to move this forward. We can
> >> collaborate.
> >>
> >> Thanks,
> >>
> >> Xuefu
> >>
> >>
> >> --
> >> 发件人:wenlong.lwl
> >> 日 期:2018年11月05日 11:15:35
> >> 收件人:
> >> 主 题:Re: [DISCUSS] Flink SQL DDL Design
> >>
> >> Hi, Shuyi, thanks for the proposal.
> >>
> >> I have two concerns about the table ddl:
> >>
> >> 1. how about remove the source/sink mark from the ddl, because it is not
> >> necessary, the framework determine the table referred is a source or a
> sink
> >> according to the context of the query using the table. it will be more
> >> convenient for use defining a table which can be both a source and sink,
> >> and more convenient for catalog to persistent and manage the meta infos.
> >>
> >> 2. how about just keeping one pure string map as parameters for table,
> like
> >> create tabe Kafka10SourceTable (
> >> intField INTEGER,
> >> stringField VARCHAR(128),
> >> longField BIGINT,
> >> rowTimeField TIMESTAMP
> >> ) with (
> >> connector.type = ’kafka’,
> >> connector.property-version = ’1’,
> >> connector.version = ’0.10’,
> >> connector.properties.topic = ‘test-kafka-topic’,
> >> connector.properties.startup-mode = ‘latest-offset’,
> >> connector.properties.specific-offset = ‘offset’,
> >> format.type = 'json'
> >> format.prperties.version=’1’,
> >> fo

Re: [DISCUSS] Flink SQL DDL Design

2018-11-26 Thread Shuyi Chen
Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
but we are definitely interested in moving this forward. I think once the
unified connector API design [1] is done, we can finalize the DDL design as
well and start creating concrete subtasks to collaborate on the
implementation with the community.

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
wrote:

> Hi Shuyi,
>
> I'm wondering if you folks still have the bandwidth working on this.
>
> We have some dedicated resource and like to move this forward. We can
> collaborate.
>
> Thanks,
>
> Xuefu
>
>
> --
> 发件人:wenlong.lwl
> 日 期:2018年11月05日 11:15:35
> 收件人:
> 主 题:Re: [DISCUSS] Flink SQL DDL Design
>
> Hi, Shuyi, thanks for the proposal.
>
> I have two concerns about the table ddl:
>
> 1. how about remove the source/sink mark from the ddl, because it is not
> necessary, the framework determine the table referred is a source or a sink
> according to the context of the query using the table. it will be more
> convenient for use defining a table which can be both a source and sink,
> and more convenient for catalog to persistent and manage the meta infos.
>
> 2. how about just keeping one pure string map as parameters for table, like
> create tabe Kafka10SourceTable (
> intField INTEGER,
> stringField VARCHAR(128),
> longField BIGINT,
> rowTimeField TIMESTAMP
> ) with (
> connector.type = ’kafka’,
> connector.property-version = ’1’,
> connector.version = ’0.10’,
> connector.properties.topic = ‘test-kafka-topic’,
> connector.properties.startup-mode = ‘latest-offset’,
> connector.properties.specific-offset = ‘offset’,
> format.type = 'json'
> format.prperties.version=’1’,
> format.derive-schema = 'true'
> );
> Because:
> 1. in TableFactory, what user use is a string map properties, defining
> parameters by string-map can be the closest way to mapping how user use the
> parameters.
> 2. The table descriptor can be extended by user, like what is done in Kafka
> and Json, it means that the parameter keys in connector or format can be
> different in different implementation, we can not restrict the key in a
> specified set, so we need a map in connector scope and a map in
> connector.properties scope. why not just give user a single map, let them
> put parameters in a format they like, which is also the simplest way to
> implement DDL parser.
> 3. whether we can define a format clause or not, depends on the
> implementation of the connector, using different clause in DDL may make a
> misunderstanding that we can combine the connectors with arbitrary formats,
> which may not work actually.
>
> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:
>
> > +1, Thanks for the proposal.
> >
> > I guess this is a long-awaited change. This can vastly increase the
> > functionalities of the SQL Client as it will be possible to use complex
> > extensions like for example those provided by Apache Bahir[1].
> >
> > Best Regards,
> > Dom.
> >
> > [1]
> > https://github.com/apache/bahir-flink
> >
> > sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):
> >
> > > +1. Thanks for putting the proposal together Shuyi.
> > >
> > > DDL has been brought up in a couple of times previously [1,2].
> Utilizing
> > > DDL will definitely be a great extension to the current Flink SQL to
> > > systematically support some of the previously brought up features such
> as
> > > [3]. And it will also be beneficial to see the document closely aligned
> > > with the previous discussion for unified SQL connector API [4].
> > >
> > > I also left a few comments on the doc. Looking forward to the alignment
> > > with the other couple of efforts and contributing to them!
> > >
> > > Best,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
> > >
> > >
> > > On Fri, Nov 2, 2018 at 10:22 A

Re: [DISCUSS] Flink SQL DDL Design

2018-11-26 Thread Shuyi Chen
/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
> > >
> > >
> > > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li  wrote:
> > >
> > > > Thanks Shuyi!
> > > >
> > > > I left some comments there. I think the design of SQL DDL and
> > Flink-Hive
> > > > integration/External catalog enhancements will work closely with each
> > > > other. Hope we are well aligned on the directions of the two designs,
> > > and I
> > > > look forward to working with you guys on both!
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen 
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > SQL DDL support has been a long-time ask from the community.
> Current
> > > > Flink
> > > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > > current
> > > > > form, Flink SQL users still need to define/create table sources and
> > > sinks
> > > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> > > support,
> > > > > the current implementation does not allow dynamical creation of
> > table,
> > > > type
> > > > > or functions with SQL, this adds friction for its adoption.
> > > > >
> > > > > I drafted a design doc [1] with a few other community members that
> > > > proposes
> > > > > the design and implementation for adding DDL support in Flink. The
> > > > initial
> > > > > design considers DDL for table, view, type, library and function.
> It
> > > will
> > > > > be great to get feedback on the design from the community, and
> align
> > > with
> > > > > latest effort in unified SQL connector API  [2] and Flink Hive
> > > > integration
> > > > > [3].
> > > > >
> > > > > Any feedback is highly appreciated.
> > > > >
> > > > > Thanks
> > > > > Shuyi Chen
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > --
> > > > > "So you have to trust that the dots will somehow connect in your
> > > future."
> > > > >
> > > >
> > >
> >
>


-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-10848) Flink's Yarn ResourceManager can allocate too many excess containers

2018-11-10 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-10848:
--

 Summary: Flink's Yarn ResourceManager can allocate too many excess 
containers
 Key: FLINK-10848
 URL: https://issues.apache.org/jira/browse/FLINK-10848
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.6.2, 1.5.5, 1.4.2, 1.3.3
Reporter: Shuyi Chen
Assignee: Shuyi Chen


Currently, both the YarnFlinkResourceManager and YarnResourceManager do not 
call removeContainerRequest() on container allocation success. Because the YARN 
AM-RM protocol is not a delta protocol (please see YARN-1902), AMRMClient will 
keep all ContainerRequests that are added and send them to RM.

In production, we observe the following that verifies the theory: 16 containers 
are allocated and used upon cluster startup; when a TM is killed, 17 containers 
are allocated, 1 container is used, and 16 excess containers are returned; when 
another TM is killed, 18 containers are allocated, 1 container is used, and 17 
excess containers are returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Flink SQL DDL Design

2018-11-01 Thread Shuyi Chen
Hi everyone,

SQL DDL support has been a long-time ask from the community. Current Flink
SQL support only DML (e.g. SELECT and INSERT statements). In its current
form, Flink SQL users still need to define/create table sources and sinks
programmatically in Java/Scala. Also, in SQL Client, without DDL support,
the current implementation does not allow dynamical creation of table, type
or functions with SQL, this adds friction for its adoption.

I drafted a design doc [1] with a few other community members that proposes
the design and implementation for adding DDL support in Flink. The initial
design considers DDL for table, view, type, library and function. It will
be great to get feedback on the design from the community, and align with
latest effort in unified SQL connector API  [2] and Flink Hive integration
[3].

Any feedback is highly appreciated.

Thanks
Shuyi Chen

[1]
https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
[2]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
[3]
https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Shuyi Chen
Hi Xuefu,

Thanks a lot for driving this big effort. I would suggest convert your
proposal and design doc into a google doc, and share it on the dev mailing
list for the community to review and comment with title like "[DISCUSS] ...
Hive integration design ..." . Once approved,  we can document it as a FLIP
(Flink Improvement Proposals), and use JIRAs to track the implementations.
What do you think?

Shuyi

On Tue, Oct 30, 2018 at 11:32 AM Zhang, Xuefu 
wrote:

> Hi all,
>
> I have also shared a design doc on Hive metastore integration that is
> attached here and also to FLINK-10556[1]. Please kindly review and share
> your feedback.
>
>
> Thanks,
> Xuefu
>
> [1] https://issues.apache.org/jira/browse/FLINK-10556
>
> --
> Sender:Xuefu 
> Sent at:2018 Oct 25 (Thu) 01:08
> Recipient:Xuefu ; Shuyi Chen 
> Cc:yanghua1127 ; Fabian Hueske ;
> dev ; user 
> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem
>
> Hi all,
>
> To wrap up the discussion, I have attached a PDF describing the proposal,
> which is also attached to FLINK-10556 [1]. Please feel free to watch that
> JIRA to track the progress.
>
> Please also let me know if you have additional comments or questions.
>
> Thanks,
> Xuefu
>
> [1] https://issues.apache.org/jira/browse/FLINK-10556
>
>
> --
> Sender:Xuefu 
> Sent at:2018 Oct 16 (Tue) 03:40
> Recipient:Shuyi Chen 
> Cc:yanghua1127 ; Fabian Hueske ;
> dev ; user 
> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem
>
> Hi Shuyi,
>
> Thank you for your input. Yes, I agreed with a phased approach and like to
> move forward fast. :) We did some work internally on DDL utilizing babel
> parser in Calcite. While babel makes Calcite's grammar extensible, at
> first impression it still seems too cumbersome for a project when too
> much extensions are made. It's even challenging to find where the extension
> is needed! It would be certainly better if Calcite can magically support
> Hive QL by just turning on a flag, such as that for MYSQL_5. I can also
> see that this could mean a lot of work on Calcite. Nevertheless, I will
> bring up the discussion over there and to see what their community thinks.
>
> Would mind to share more info about the proposal on DDL that you
> mentioned? We can certainly collaborate on this.
>
> Thanks,
> Xuefu
>
> --
> Sender:Shuyi Chen 
> Sent at:2018 Oct 14 (Sun) 08:30
> Recipient:Xuefu 
> Cc:yanghua1127 ; Fabian Hueske ;
> dev ; user 
> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem
>
> Welcome to the community and thanks for the great proposal, Xuefu! I think
> the proposal can be divided into 2 stages: making Flink to support Hive
> features, and make Hive to work with Flink. I agreed with Timo that on
> starting with a smaller scope, so we can make progress faster. As for [6],
> a proposal for DDL is already in progress, and will come after the unified
> SQL connector API is done. For supporting Hive syntax, we might need to
> work with the Calcite community, and a recent effort called babel (
> https://issues.apache.org/jira/browse/CALCITE-2280) in Calcite might help
> here.
>
> Thanks
> Shuyi
>
> On Wed, Oct 10, 2018 at 8:02 PM Zhang, Xuefu 
> wrote:
> Hi Fabian/Vno,
>
> Thank you very much for your encouragement inquiry. Sorry that I didn't
> see Fabian's email until I read Vino's response just now. (Somehow Fabian's
> went to the spam folder.)
>
> My proposal contains long-term and short-terms goals. Nevertheless, the
> effort will focus on the following areas, including Fabian's list:
>
> 1. Hive metastore connectivity - This covers both read/write access, which
> means Flink can make full use of Hive's metastore as its catalog (at least
> for the batch but can extend for streaming as well).
> 2. Metadata compatibility - Objects (databases, tables, partitions, etc)
> created by Hive can be understood by Flink and the reverse direction is
> true also.
> 3. Data compatibility - Similar to #2, data produced by Hive can be
> consumed by Flink and vise versa.
> 4. Support Hive UDFs - For all Hive's native udfs, Flink either provides
> its own implementation or make Hive's implementation work in Flink.
> Further, for user created UDFs in Hive, Flink SQL should provide a
> mechanism allowing user to import them into Flink without any code change
> required.
> 5. Data types -  Flink SQL should support all data types that are
> available in Hive.
> 6. SQL Language - Flink SQL should support SQL standard 

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-13 Thread Shuyi Chen
Welcome to the community and thanks for the great proposal, Xuefu! I think
the proposal can be divided into 2 stages: making Flink to support Hive
features, and make Hive to work with Flink. I agreed with Timo that on
starting with a smaller scope, so we can make progress faster. As for [6],
a proposal for DDL is already in progress, and will come after the unified
SQL connector API is done. For supporting Hive syntax, we might need to
work with the Calcite community, and a recent effort called babel (
https://issues.apache.org/jira/browse/CALCITE-2280) in Calcite might help
here.

Thanks
Shuyi

On Wed, Oct 10, 2018 at 8:02 PM Zhang, Xuefu 
wrote:

> Hi Fabian/Vno,
>
> Thank you very much for your encouragement inquiry. Sorry that I didn't
> see Fabian's email until I read Vino's response just now. (Somehow Fabian's
> went to the spam folder.)
>
> My proposal contains long-term and short-terms goals. Nevertheless, the
> effort will focus on the following areas, including Fabian's list:
>
> 1. Hive metastore connectivity - This covers both read/write access, which
> means Flink can make full use of Hive's metastore as its catalog (at least
> for the batch but can extend for streaming as well).
> 2. Metadata compatibility - Objects (databases, tables, partitions, etc)
> created by Hive can be understood by Flink and the reverse direction is
> true also.
> 3. Data compatibility - Similar to #2, data produced by Hive can be
> consumed by Flink and vise versa.
> 4. Support Hive UDFs - For all Hive's native udfs, Flink either provides
> its own implementation or make Hive's implementation work in Flink.
> Further, for user created UDFs in Hive, Flink SQL should provide a
> mechanism allowing user to import them into Flink without any code change
> required.
> 5. Data types -  Flink SQL should support all data types that are
> available in Hive.
> 6. SQL Language - Flink SQL should support SQL standard (such as SQL2003)
> with extension to support Hive's syntax and language features, around DDL,
> DML, and SELECT queries.
> 7.  SQL CLI - this is currently developing in Flink but more effort is
> needed.
> 8. Server - provide a server that's compatible with Hive's HiverServer2 in
> thrift APIs, such that HiveServer2 users can reuse their existing client
> (such as beeline) but connect to Flink's thrift server instead.
> 9. JDBC/ODBC drivers - Flink may provide its own JDBC/ODBC drivers for
> other application to use to connect to its thrift server
> 10. Support other user's customizations in Hive, such as Hive Serdes,
> storage handlers, etc.
> 11. Better task failure tolerance and task scheduling at Flink runtime.
>
> As you can see, achieving all those requires significant effort and across
> all layers in Flink. However, a short-term goal could  include only core
> areas (such as 1, 2, 4, 5, 6, 7) or start  at a smaller scope (such as #3,
> #6).
>
> Please share your further thoughts. If we generally agree that this is the
> right direction, I could come up with a formal proposal quickly and then we
> can follow up with broader discussions.
>
> Thanks,
> Xuefu
>
>
>
> --
> Sender:vino yang 
> Sent at:2018 Oct 11 (Thu) 09:45
> Recipient:Fabian Hueske 
> Cc:dev ; Xuefu ; user <
> u...@flink.apache.org>
> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem
>
> Hi Xuefu,
>
> Appreciate this proposal, and like Fabian, it would look better if you can
> give more details of the plan.
>
> Thanks, vino.
>
> Fabian Hueske  于2018年10月10日周三 下午5:27写道:
> Hi Xuefu,
>
> Welcome to the Flink community and thanks for starting this discussion!
> Better Hive integration would be really great!
> Can you go into details of what you are proposing? I can think of a couple
> ways to improve Flink in that regard:
>
> * Support for Hive UDFs
> * Support for Hive metadata catalog
> * Support for HiveQL syntax
> * ???
>
> Best, Fabian
>
> Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu <
> xuef...@alibaba-inc.com>:
> Hi all,
>
> Along with the community's effort, inside Alibaba we have explored Flink's
> potential as an execution engine not just for stream processing but also
> for batch processing. We are encouraged by our findings and have initiated
> our effort to make Flink's SQL capabilities full-fledged. When comparing
> what's available in Flink to the offerings from competitive data processing
> engines, we identified a major gap in Flink: a well integration with Hive
> ecosystem. This is crucial to the success of Flink SQL and batch due to the
> well-established data ecosystem around Hive. Therefore, we have done some
> initial work along this direction but there are still a lot of effort
> needed.
>
> We have two strategies in mind. The first one is to make Flink SQL
> full-fledged and well-integrated with Hive ecosystem. This is a similar
> approach to what Spark SQL adopted. The second strategy is to make Hive
> itself work with Flink, similar 

Re: Creating a slide set for a Flink intro talk

2018-10-12 Thread Shuyi Chen
Thanks a lot, Fabian. That's very useful. I made a quick pass and made a
few suggestions. I like the use cases/users section. One general comment I
have is that the slides seems emphasizing streaming in the first overview
slides, but only mention batch here and there not consistently. In my
opinion, I think we should make this consistent, and mention batch in the
beginning as well.

Thanks
Shuyi

On Fri, Oct 12, 2018 at 2:32 PM Fabian Hueske  wrote:

> Hi everybody,
>
> I'm currently creating a slide set for a Flink intro talk [1].
>
> The content will be mostly based on pages of the recently updated website
> * Main page [2]
> * What is Apache Flink? [3]
> * Use cases [4]
> * Powered By [5]
>
> The idea is to have a good set of slides that anybody can use to give a
> Flink intro talk.
> I'm doing the slides in Google Slides [1].
> So everybody can also make improvement suggestions.
>
> I'm not done yet, but comments are already welcome.
>
> Cheers,
> Fabian
>
> [1]
>
> https://docs.google.com/presentation/d/1GVdd11wZZdNW1PpIpHaWnhr-Q3L0DvmGD0lE42IcSR8
> [2] https://flink.apache.org
> [3] https://flink.apache.org/flink-architecture.html
> [4] https://flink.apache.org/usecases.html
> [5] https://flink.apache.org/poweredby.html
>


-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-10516) YarnApplicationMasterRunner fail to initialize FileSystem with correct Flink Configuration during setup

2018-10-09 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-10516:
--

 Summary: YarnApplicationMasterRunner fail to initialize FileSystem 
with correct Flink Configuration during setup
 Key: FLINK-10516
 URL: https://issues.apache.org/jira/browse/FLINK-10516
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.6.0, 1.5.0, 1.4.0, 1.7.0
Reporter: Shuyi Chen
Assignee: Shuyi Chen
 Fix For: 1.7.0


Will add a fix, and refactor YarnApplicationMasterRunner to add a unittest to 
prevent future regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Improvements to the Unified SQL Connector API

2018-10-04 Thread Shuyi Chen
In the case of normal Flink job, I agree we can infer the table type from
the queries. However, for SQL client, the query is adhoc and not known
beforehand. In such case, we might want to enforce the table open mode at
startup time, so users won't accidentally write to a Kafka topic that is
supposed to be written only by some producer. What do you guys think?

Shuyi

On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng  wrote:

> Hi,
>
> Thanks a lot for the proposal. I like the idea to unify table definitions.
> I think we can drop the table type since the type can be derived from the
> sql, i.e, a table be inserted can only be a sink table.
>
> I left some minor suggestions in the document, mainly include:
> - Maybe we also need to allow define properties for tables.
> - Support specify Computed Columns in a table
> - Support define keys for sources.
>
> Best, Hequn
>
>
> On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen  wrote:
>
> > Thanks a lot for the proposal, Timo. I left a few comments. Also, it
> seems
> > the example in the doc does not have the table type (source, sink and
> both)
> > property anymore. Are you suggesting drop it? I think the table type
> > properties is still useful as it can restrict a certain connector to be
> > only source/sink, for example, we usually want a Kafka topic to be either
> > read-only or write-only, but not both.
> >
> > Shuyi
> >
> > On Mon, Oct 1, 2018 at 1:53 AM Timo Walther  wrote:
> >
> > > Hi everyone,
> > >
> > > as some of you might have noticed, in the last two releases we aimed to
> > > unify SQL connectors and make them more modular. The first connectors
> > > and formats have been implemented and are usable via the SQL Client and
> > > Java/Scala/SQL APIs.
> > >
> > > However, after writing more connectors/example programs and talking to
> > > users, there are still a couple of improvements that should be applied
> > > to unified SQL connector API.
> > >
> > > I wrote a design document [1] that discusses limitations that I have
> > > observed and consideres feedback that I have collected over the last
> > > months. I don't know whether we will implement all of these
> > > improvements, but it would be great to get feedback for a satisfactory
> > > API and for future priorization.
> > >
> > > The general goal should be to connect to external systems as convenient
> > > and type-safe as possible. Any feedback is highly appreciated.
> > >
> > > Thanks,
> > >
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >
> > >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Improvements to the Unified SQL Connector API

2018-10-04 Thread Shuyi Chen
Thanks a lot for the proposal, Timo. I left a few comments. Also, it seems
the example in the doc does not have the table type (source, sink and both)
property anymore. Are you suggesting drop it? I think the table type
properties is still useful as it can restrict a certain connector to be
only source/sink, for example, we usually want a Kafka topic to be either
read-only or write-only, but not both.

Shuyi

On Mon, Oct 1, 2018 at 1:53 AM Timo Walther  wrote:

> Hi everyone,
>
> as some of you might have noticed, in the last two releases we aimed to
> unify SQL connectors and make them more modular. The first connectors
> and formats have been implemented and are usable via the SQL Client and
> Java/Scala/SQL APIs.
>
> However, after writing more connectors/example programs and talking to
> users, there are still a couple of improvements that should be applied
> to unified SQL connector API.
>
> I wrote a design document [1] that discusses limitations that I have
> observed and consideres feedback that I have collected over the last
> months. I don't know whether we will implement all of these
> improvements, but it would be great to get feedback for a satisfactory
> API and for future priorization.
>
> The general goal should be to connect to external systems as convenient
> and type-safe as possible. Any feedback is highly appreciated.
>
> Thanks,
>
> Timo
>
> [1]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>
>

-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints

2018-08-22 Thread Shuyi Chen
+1 on the tooling. Also, you mentioned about state bootstrapping problem.
Could you please elaborate on how we can leverage the tooling to solve
state bootstrapping? I think this is a common problem to stream processing,
and it will be great the community can work on it. Thanks.

Shuyi

On Wed, Aug 22, 2018 at 11:51 AM Gyula Fóra  wrote:

> Thanks,
>
> I guess the first thing that would be great help from anyone interested in
> helping is to try it for some streaming state :)
>
> We have tested these tools at King to analyze, transform and perform some
> aggregations on our user-states. The major limitation is that it requires
> RocksDB savepoints to work but other than that we successfully analyzed a
> few hundred gigabytes of state including reading keyed, and broadcast
> states from different operators. Also you need to have a savepoint before
> you can create a new savepoint (with whatever state).
>
> Once we have some people who have played with it we can probably greatly
> improve the API and user experience as it is pretty low level at the
> moment. I suggest we use the King git repo 
> for
> now to track some features before it is in a shape that deserves a Flink
> PR. We are super happy to take any improvements, code contributions from
> anyone so dont hesitate to reach out to me if you have some ideas.
>
> Gyula
>
>
> Rong Rong  ezt írta (időpont: 2018. aug. 22., Sze,
> 17:06):
>
> > +1. Being able to analyze the state is a huge operational advantage.
> > Thanks Gyula for the POC and I would be very interested in contributing
> to
> > the work.
> >
> > --
> > Rong
> >
> > On Tue, Aug 21, 2018 at 4:26 AM Till Rohrmann 
> > wrote:
> >
> > > big +1 for this feature. A tool to get your state out of and into Flink
> > > will be tremendously helpful.
> > >
> > > On Mon, Aug 20, 2018 at 10:21 AM Aljoscha Krettek  >
> > > wrote:
> > >
> > > > +1 I'd like to have something like this in Flink a lot!
> > > >
> > > > > On 19. Aug 2018, at 11:57, Gyula Fóra 
> wrote:
> > > > >
> > > > > Hi all!
> > > > >
> > > > > Thanks for the feedback and I'm happy there is some interest :)
> > > > > Tomorrow I will start improving the proposal based on the feedback
> > and
> > > > will
> > > > > get back to work.
> > > > >
> > > > > If you are interested working together in this please ping me and
> we
> > > can
> > > > > discuss some ideas/plans and how to share work.
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > Paris Carbone  ezt írta (időpont: 2018. aug. 18.,
> > Szo,
> > > > 9:03):
> > > > >
> > > > >> +1
> > > > >>
> > > > >> Might also be a good start to implement queryable stream state
> with
> > > > >> snapshot isolation using that mechanism.
> > > > >>
> > > > >> Paris
> > > > >>
> > > > >>> On 17 Aug 2018, at 12:28, Gyula Fóra 
> wrote:
> > > > >>>
> > > > >>> Hi All!
> > > > >>>
> > > > >>> I want to share with you a little project we have been working on
> > at
> > > > King
> > > > >>> (with some help from some dataArtisans folks). I think this would
> > be
> > > a
> > > > >>> valuable addition to Flink and solve a bunch of outstanding
> > > production
> > > > >>> use-cases and headaches around state bootstrapping and state
> > > analytics.
> > > > >>>
> > > > >>> We have built a quick and dirty POC implementation on top of
> Flink
> > > 1.6,
> > > > >>> please check the README for some nice examples to get a quick
> idea:
> > > > >>>
> > > > >>> https://github.com/king/bravo
> > > > >>>
> > > > >>> *Short story*
> > > > >>> Bravo is a convenient state reader and writer library leveraging
> > the
> > > > >>> Flink’s batch processing capabilities. It supports processing and
> > > > writing
> > > > >>> Flink streaming savepoints. At the moment it only supports
> > processing
> > > > >>> RocksDB savepoints but this can be extended in the future for
> other
> > > > state
> > > > >>> backends and checkpoint types.
> > > > >>>
> > > > >>> Our goal is to cover a few basic features:
> > > > >>>
> > > > >>>  - Converting keyed states to Flink DataSets for processing and
> > > > >> analytics
> > > > >>>  - Reading/Writing non-keyed operators states
> > > > >>>  - Bootstrap keyed states from Flink DataSets and create new
> valid
> > > > >>>  savepoints
> > > > >>>  - Transform existing savepoints by replacing/changing some
> states
> > > > >>>
> > > > >>>
> > > > >>> Some example use-cases:
> > > > >>>
> > > > >>>  - Point-in-time state analytics across all operators and keys
> > > > >>>  - Bootstrap state of a streaming job from external resources
> such
> > as
> > > > >>>  reading from database/filesystem
> > > > >>>  - Validate and potentially repair corrupted state of a streaming
> > job
> > > > >>>  - Change max parallelism of a job
> > > > >>>
> > > > >>>
> > > > >>> Our main goal is to start working together with other Flink
> > > production
> > > > >>> users and make this something useful that can be part of Flink.
> So
> > if
> > > > you
> > > > >>> have 

[jira] [Created] (FLINK-10187) Fix LogicalUnnestRule to match Correlate/Uncollect correctly

2018-08-21 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-10187:
--

 Summary: Fix LogicalUnnestRule to match Correlate/Uncollect 
correctly
 Key: FLINK-10187
 URL: https://issues.apache.org/jira/browse/FLINK-10187
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Affects Versions: 1.6.0
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10076) Upgrade Calcite dependency to 1.18

2018-08-06 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-10076:
--

 Summary: Upgrade Calcite dependency to 1.18
 Key: FLINK-10076
 URL: https://issues.apache.org/jira/browse/FLINK-10076
 Project: Flink
  Issue Type: Task
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New committer: Sihua Zhou

2018-06-22 Thread Shuyi Chen
Congratulations!

On Fri, Jun 22, 2018 at 11:08 AM Matthias J. Sax  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Congrats!
>
> On 6/22/18 10:33 AM, shimin yang wrote:
> > Congrats!
> >
> > On Sat, Jun 23, 2018 at 1:13 AM Chen Qin 
> > wrote:
> >
> >> Congrats!
> >>
> >>> On Jun 22, 2018, at 9:48 AM, Ted Yu 
> >>> wrote:
> >>>
> >>> Congratulations Sihua!
> >>>
>  On Fri, Jun 22, 2018 at 6:42 AM, zhangminglei
>  <18717838...@163.com>
> >> wrote:
> 
>  Congrats! Sihua
> 
>  Cheers Minglei.
> 
> > 在 2018年6月22日,下午9:17,Till Rohrmann  写
> > 道:
> >
> > Hi everybody,
> >
> > On behalf of the PMC I am delighted to announce Sihua Zhou
> > as a new
> >> Flink
> > committer!
> >
> > Sihua has been an active member of our community for
> > several months.
>  Among
> > other things, he helped developing Flip-6, improved
> > Flink's state
>  backends
> > and fixed a lot of major and minor issues. Moreover, he is
> > helping the Flink community reviewing PRs, answering users
> > on the mailing list and proposing new features.
> >
> > Please join me in congratulating Sihua for becoming a
> > Flink committer!
> >
> > Cheers, Till
> 
> 
> 
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAlstOykACgkQDccxaWtL
> g18BnA/+OnY+NscS/uCYud9A0cM8Tj2z2QdoQ/ILe6jvvtcRX9SncYdZ1tNDcrPt
> ogjOPR/2Uawz6u3tgC/ddjYeMb0YUewKaa2GHwUsD51222iYhQH1uor73rVT9pbz
> u8xoC1x/NcaHr2XrQLlyToacMm7oh1fL66+sBHeoE3k0UDeFsJmh5LdKbMSZT5KG
> yfrll9ND/PLKmeN0D00TRlgifdZZZiDY7ItDKZz0LKpdQ5DVBzVO003g8tg8Q1q+
> mvRnkQ1MZcA/X6eqR1KOS85fW0WWwhSS5+7m3z0fR77mwM4yAIsJl9/HR69yKDCk
> F8Js0DG5KtRm02IRP0Z5kgRZITmS3V7YOU/JR1874tqLvDfegdn9V/Pnk6A/vjsy
> uW4FPqtL610I7eKAsL3ckDnGatOUuStwJGgM0KFZbmVxTrzveh8ow42uy70qykz1
> 9tWhpZ6iDmCH7RTs0tJ/GFAWeq22at/EJG6qQ8T9ZPYz1pZWaEdYD0gSPZEUOPex
> A978T4l2HucpMCiHR0b8gv7BttWndXFOCVS8wD1YJy0AFvMyxeegBmLZ1dQPo9Y2
> hrOwLKc1o2wl7DQ7FdknMhJb3KKyPJZ1LXUmd4hSO5e+Gb20X2OEW53jvguxZSaG
> DqAIxlu8zI+krlChJ9O+PNB2YeyO7Yhu48Kj7XuTs/xI/ZavFU0=
> =ddxU
> -END PGP SIGNATURE-
>
-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Adding new interfaces in [Stream]ExecutionEnvironment

2018-06-08 Thread Shuyi Chen
Thanks a lot for the comments, Till and Fabian.

The RemoteEnvrionment does provide a way to specify jar files at
construction, but we want the jar files to be specified dynamically in the
user code, e.g. in a DDL statement, and the jar files might be in a remote
DFS. As we discussed, I think there are 2 approaches:

1) add new interface env.registerJarFile(jarFiles...), which ships the JAR
files using JobGraph.addJar(). In this case, all jars will be loaded by
default at runtime. This approach will be the same as how SQL client ship
UDF jars now.
2) add new interface env.registerJarFile(name, jarFiles...). It will do
similar things as env.registerCachedFile(), which will register a set of
Jar files with a key name, and we can add a new interface in
RuntimeContext as Fabian suggests, i.e.,
RuntimeContext.getClassloaderWithJar(). Now user will be able to
load the functions in remote jar dynamically using the returned ClassLoader.

Comparing the 2 approaches:

   - Approach 1) will be simpler for user to use.
   - Approach 2) will allow us to use different versions of a class in the
   same code, and might solve some dependency conflict issues. Also in 2), we
   can load Jars on demand, while in 1) all jars will be loaded by default.

I think we can support both interfaces. On the SQL DDL implementation, both
will work and approach 2) will be more complicated, but with some nice
benefit as stated above. However, the implementation choice should be
transparent to the end user. Also, I am wondering outside of the SQL DDL,
will these new functionality/interface be helpful in other scenarios?
Maybe, that will help make the interface better and more generic. Thanks a
lot.

Shuyi

On Tue, Jun 5, 2018 at 1:47 AM Fabian Hueske  wrote:

> We could also offer a feature that users can request classloaders with
> additional jars.
> This could work as follows:
>
> 1) Users register jar files in the ExecutionEnvironment (similar to cached
> files) with a name, e.g., env.registerJarFile("~/myJar.jar", "myName");
> 2) In a function, the user can request a user classloader with the
> additional classes, e.g., RuntimeContext.getClassloaderWithJar("myName");
> This could also support to load multiple jar files in the same classloader.
>
> IMO, the interesting part of Shuyi's proposal is to be able to dynamically
> load code from remote locations without fetching it to the client first.
>
> Best, Fabian
>
>
> 2018-05-29 12:42 GMT+02:00 Till Rohrmann :
>
> > I see Shuyi's point that it would nice to allow adding jar files which
> > should be part of the user code classloader programmatically. Actually,
> we
> > expose this functionality in the `RemoteEnvironment` where you can
> specify
> > additional jars which shall be shipped to the cluster in the
> constructor. I
> > assume that is exactly the functionality you are looking for. In that
> > sense, it might be an API inconsistency that we allow it for some cases
> and
> > for others not.
> >
> > But I could also see that the whole functionality of dynamically loading
> > jars at runtime could also perfectly live in the `UdfSqlOperator`. This,
> of
> > course, would entail that one has to take care of clean up of the
> > downloaded resources. But it should be possible to first download the
> > resources and create a custom URLClassLoader at startup and then use this
> > class loader when calling into the UDF.
> >
> > Cheers,
> > Till
> >
> > On Wed, May 16, 2018 at 9:28 PM, Shuyi Chen  wrote:
> >
> > > Hi Aljoscha, Fabian, Rong, Ted and Timo,
> > >
> > > Thanks a lot for the feedback. Let me clarify the usage scenario in a
> bit
> > > more detail. The context is that we want to add support for SQL DDL to
> > load
> > > UDF from external JARs located either in local filesystem or HDFS or a
> > HTTP
> > > endpoint in Flink SQL. The local FS option is more for debugging
> purpose
> > > for user to submit the job jar locally, and the later 2 are for
> > production
> > > uses. Below is an example User application with the *CREATE FUNCTION*
> DDL
> > > (Note: grammar and interface not finalized yet).
> > >
> > > 
> > > -
> > >
> > >
> > >
> > >
> > > *val env = StreamExecutionEnvironment.getExecutionEnvironmentval tEnv =
> > > TableEnvironment.getTableEnvironment(env)// setup the
> > DataStream//..*
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> &

[jira] [Created] (FLINK-9523) Add Kafka examples for Flink Table/SQL API

2018-06-04 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9523:
-

 Summary: Add Kafka examples for Flink Table/SQL API 
 Key: FLINK-9523
 URL: https://issues.apache.org/jira/browse/FLINK-9523
 Project: Flink
  Issue Type: Task
  Components: Examples
Reporter: Shuyi Chen


Given the popularity of Flink SQL and Kafka as streaming source, we want to add 
some examples of using Kafka JSON/Avro TableSource in 
flink-examples/flink-examples-table module. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Ask for SQL using kafka in Flink

2018-06-04 Thread Shuyi Chen
Given the popularity of Flink SQL and Kafka as streaming source, I think we
can add some examples of using Kafka[XXX]TableSource in
flink-examples/flink-examples-table module. What do you guys think?

Cheers
Shuyi

On Mon, Jun 4, 2018 at 12:57 AM, Timo Walther  wrote:

> Hi,
>
> as you can see in code [1] Kafka09JsonTableSource takes a TableSchema. You
> can create table schema from type information see [2].
>
> Regards,
> Timo
>
> [1] https://github.com/apache/flink/blob/master/flink-connectors
> /flink-connector-kafka-0.9/src/main/java/org/apache/
> flink/streaming/connectors/kafka/Kafka09JsonTableSource.java
> [2] https://github.com/apache/flink/blob/master/flink-libraries/
> flink-table/src/main/scala/org/apache/flink/table/api/TableSchema.scala
>
> Am 02.06.18 um 18:31 schrieb Radhya Sahal:
>
> Thanks Rong,
>>
>> I used Flink 1.3.0 in case of using Flink 1.5 how can I define jsonschema?
>>
>> Yes, there are two names but now I put one name only and I want to define
>> jsonschema.
>>
>>
>>
>> Rong Rong wrote
>>
>>> Hi Radhya,
>>>
>>> Can you provide which Flink version you are using? Based on the latest
>>> FLINK 1.5 release, Kafka09JsonTableSource takes:
>>>
>>> /**
>>>   * Creates a Kafka 0.9 JSON {@link StreamTableSource}.
>>>   *
>>>   * @param topic   Kafka topic to consume.
>>>   * @param properties  Properties for the Kafka consumer.
>>>   * @param tableSchema The schema of the table.
>>>   * @param jsonSchema  The schema of the JSON messages to decode from
>>> Kafka.
>>>   */
>>>
>>> Also, your type definition: TypeInformation
>>> 
>>>   typeInfo2 = Types.ROW(...
>>> arguments seem to have different length for schema names and types.
>>>
>>> Thanks,
>>> Rong
>>>
>>> On Fri, Jun 1, 2018 at 9:09 AM, Radhya Sahal 
>>> radhya.sahal@
>>>  wrote:
>>>
>>> Hi,

 Could anyone help me to solve this problem


 /Exception in thread "main" java.lang.Error: Unresolved compilation
 problem:
  The constructor Kafka09JsonTableSource(String, Properties,
 TypeInformation

>>> 
>>> ) is undefined
>>>
 /
 *--This is the code *
 public class FlinkKafkaSQL {
  public static void main(String[] args) throws Exception {
  // Read parameters from command line
  final ParameterTool params = ParameterTool.fromArgs(args);

  if(params.getNumberOfParameters() < 5) {
  System.out.println("\nUsage: FlinkReadKafka " +
 "--read-topic

>>> 
>>>   " +
>>>
 "--write-topic

>>> 
>>>   " +
>>>
 "--bootstrap.servers

>>> 
>>>   " +
>>>
 "zookeeper.connect" +
 "--group.id

>>> 
>>> ");
>>>
  return;
  }

  // setup streaming environment
  StreamExecutionEnvironment env =
 StreamExecutionEnvironment.getExecutionEnvironment();

 env.getConfig().setRestartStrategy(RestartStrategies.fixedDe
 layRestart(4,
 1));
  env.enableCheckpointing(30); // 300 seconds
  env.getConfig().setGlobalJobParameters(params);

  StreamTableEnvironment tableEnv =
 TableEnvironment.getTableEnvironment(env);

  // specify JSON field names and types


  TypeInformation

>>> 
>>>   typeInfo2 = Types.ROW(
>>>
  new String[] { "iotdevice", "sensorID" },
  new TypeInformation[] { Types.STRING()}
  );

  // create a new tablesource of JSON from kafka
  KafkaJsonTableSource kafkaTableSource = new
 Kafka09JsonTableSource(
  params.getRequired("read-topic"),
  params.getProperties(),
  typeInfo2);

  // run some SQL to filter results where a key is not null
  String sql = "SELECT sensorID " +
   "FROM iotdevice ";
  tableEnv.registerTableSource("iotdevice",
 kafkaTableSource);
  Table result = tableEnv.sql(sql);

  // create a partition for the data going into kafka
  FlinkFixedPartitioner partition =  new
 FlinkFixedPartitioner();

  // create new tablesink of JSON to kafka
  KafkaJsonTableSink kafkaTableSink = new
 Kafka09JsonTableSink(
  params.getRequired("write-topic"),
  params.getProperties(),
  partition);

  result.writeToSink(kafkaTableSink);

  env.execute("FlinkReadWriteKafkaJSON");
  }
 }


 *This is the dependencies  

Re: [SQL] [CALCITE] not applicable function for TIME

2018-05-31 Thread Shuyi Chen
I think you might find some context in these 2 PRs in Flink & Calcite
respectively:

https://issues.apache.org/jira/browse/CALCITE-1987
https://issues.apache.org/jira/browse/FLINK-7934

We have different EXTRACT implementation paths in Calcite and Flink. Hope
it helps.

On Thu, May 31, 2018 at 7:13 AM, Viktor Vlasov <
viktorvlasovsiber...@gmail.com> wrote:

> Thank you for quick response, ok, I'll do it
>
>
> 2018-05-31 17:01 GMT+03:00 Fabian Hueske :
>
> > Hi Viktor,
> >
> > Welcome to the Flink dev mailing list!
> > You are certainly right, this is an unexpected behavior and IMO we should
> > fix this.
> >
> > It would be great if you could open a JIRA issue for that and maybe also
> > dig a bit into the issue to figure out why this happens.
> >
> > Thank you,
> > Fabian
> >
> > 2018-05-31 15:53 GMT+02:00 Viktor Vlasov  >:
> >
> > > Hi there!​
> > >
> > > First of all I want to thank you for your time and efforts about this
> > > project.
> > >
> > > I am Software Engineer with almost 3 years experience, most of the
> time I
> > > work with Java related technologies.
> > >
> > > Recently I have started to consider possibility to contribute to Flink.
> > > For begin I chose this issue: https://issues.apache.org/
> > > jira/browse/FLINK-9432.
> > >
> > > After implementation I have faced with an interesting question. When I
> > was
> > > trying to decide what tests to create for the function DECADE in class
> > > org/apache/flink/table/expressions/validation/
> > > ScalarFunctionsValidationTest.scala
> > > I've figured out that such functions as CENTURY and MILLENNIUM work
> with
> > > TIME type without problems.  Here an examples:
> > > EXTRACT(CENTURY FROM TIME '00:00:00') - returns 0
> > > EXTRACT(MILLENNIUM FROM TIME '00:00:00') - returns 0
> > >
> > > It's strange by my opinion, time is not date and how we can extract
> such
> > > things from that.
> > >
> > > Meanwhile when I try to use similar logic in calcite, error is occured.
> > > Here an example:
> > > SELECT EXTRACT(CENTURY FROM TIME '00:00:00');
> > > throws `java.lang.AssertionError: unexpected TIME`
> > >
> > > Is it necessary to create separate issue for that?
> > >
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-9477) Support SQL 2016 JSON functions in Flink SQL

2018-05-30 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9477:
-

 Summary: Support SQL 2016 JSON functions in Flink SQL
 Key: FLINK-9477
 URL: https://issues.apache.org/jira/browse/FLINK-9477
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] GitBox

2018-05-16 Thread Shuyi Chen
+1 :) A lot of projects  are already
using it.

On Wed, May 16, 2018 at 3:40 AM, Chesnay Schepler 
wrote:

> Hello,
>
> during the discussion about how to better manage pull requests [1] the
> topic of GitBox integration came up again.
>
> This seems like a good opportunity to restart this discussion that we had
> about a year ago [2].
>
> * What is GitBox
>
>Essentially, GitBox allow us to use GitHub features.
>We can decide for ourselves which features we want enabled.
>
>We could merge PRs directly on GitHub at the button of a click.
>That said the merge functionality is fairly limited and would
>require picture-perfect commits in the pull requests.
>Commits can be squashed, but you cannot amend commits in any way, be
>it fixing typos or changing the commit message. Realistically this
>limits how much we can use this feature, and it may lead to a
>decline in the quality of commit messages.
>
>Labels can be useful for the management of PRs as (ready for review,
>delayed for next release, waiting for changes). This is really what
>I'm personally most interested in.
>
>We've been using GitBox for flink-shaded for a while now and i
>didn't run into any issue. AFAIK GitBox is also the default for new
>projects.
>
> * What this means for committers:
>
>The apache git remote URL will change, which will require all
>committers to update their git setup.
>This also implies that we may have to update the website build scripts.
>The new URL would (probably) be
>/https://gitbox.apache.org/repos/asf/flink.git/.
>
>To make use of GitHub features you have to link your GitHub and
>Apache accounts. [3]
>This also requires setting up two-factor authentication on GitHub.
>
>Update the scm entry in the parent pom.xml.
>
> * What this means for contributors:
>
>Nothing should change for contributors. Small changes (like typos)
>may be merged more quickly, if the commit message is appropriate, as
>it could be done directly through GitHub.
>
> [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.
> com/Closing-automatically-inactive-pull-requests-tt22248.html
> [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.
> com/DISCUSS-GitBox-td18027.html
> [3] https://gitbox.apache.org/setup/
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Adding new interfaces in [Stream]ExecutionEnvironment

2018-05-16 Thread Shuyi Chen
 the JobGraph as well for
> > the SQL Client.
> >
> > I think the execution environment is not the right place to specify jars.
> > The location of the jars depends on the submission method. If a local
> path
> > is specified in the main() method of a packaged Flink jar, it would not
> > work when such a program is submitted through the REST API.
> >
> > Regards,
> > Timo
> >
> > Am 16.05.18 um 14:32 schrieb Aljoscha Krettek:
> >
> > I think this functionality is already there, we just have to expose it in
> >> the right places: ClusterClient.submitJob() takes a JobGraph, JobGraph
> has
> >> method addJar() for adding jars that need to be in the classloader for
> >> executing a user program.
> >>
> >> On 16. May 2018, at 12:34, Fabian Hueske <fhue...@gmail.com> wrote:
> >>>
> >>> Hi Ted,
> >>>
> >>> The design doc is in late draft status and proposes support for SQL DDL
> >>> statements (CREATE TABLE, CREATE  FUNCTION, etc.).
> >>> The question about registering JARs came up because we need a way to
> >>> distribute JAR files that contain the code of user-defined functions.
> >>>
> >>> The design doc will soon be shared on the dev mailing list to gather
> >>> feedback from the community.
> >>>
> >>> Best, Fabian
> >>>
> >>> 2018-05-16 10:45 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:
> >>>
> >>> bq. In a design document, Timo mentioned that we can ship multiple JAR
> >>>> files
> >>>>
> >>>> Mind telling us where the design doc can be retrieved ?
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Wed, May 16, 2018 at 1:29 AM, Fabian Hueske <fhue...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> Hi,
> >>>>>
> >>>>> I'm not sure if we need to modify the existing method.
> >>>>> What we need is a bit different from what registerCachedFile()
> >>>>> provides.
> >>>>> The method ensures that a file is copied to each TaskManager and can
> be
> >>>>> locally accessed from a function's RuntimeContext.
> >>>>> In our case, we don't need to access the file but would like to make
> >>>>> sure
> >>>>> that it is loaded into the class loader.
> >>>>> So, we could also just add a method like registerUserJarFile().
> >>>>>
> >>>>> In a design document, Timo mentioned that we can ship multiple JAR
> >>>>> files
> >>>>> with a job.
> >>>>> So, we could also implement the UDF shipping logic by loading the Jar
> >>>>> file(s) to the client and distribute them from there.
> >>>>> In that case, we would not need to add new method to the execution
> >>>>> environment.
> >>>>>
> >>>>> Best,
> >>>>> Fabian
> >>>>>
> >>>>> 2018-05-15 3:50 GMT+02:00 Rong Rong <walter...@gmail.com>:
> >>>>>
> >>>>> +1. This could be very useful for "dynamic" UDF.
> >>>>>>
> >>>>>> Just to clarify, if I understand correctly, we are tying to use an
> >>>>>> ENUM
> >>>>>> indicator to
> >>>>>> (1) Replace the current Boolean isExecutable flag.
> >>>>>> (2) Provide additional information used by ExecutionEnvironment to
> >>>>>>
> >>>>> decide
> >>>>
> >>>>> when/where to use the DistributedCached file.
> >>>>>>
> >>>>>> In this case, DistributedCache.CacheType or
> DistributedCache.FileType
> >>>>>> sounds more intuitive, what do you think?
> >>>>>>
> >>>>>> Also, I was wondering is there any other useful information for the
> >>>>>>
> >>>>> cached
> >>>>>
> >>>>>> file to be passed to runtime.
> >>>>>> If we are just talking about including the library to the
> classloader,
> >>>>>>
> >>>>> can
> >>>>>
> >>>>>> we directly extend the interface with
> >>>>>>
> >>>>>> public void registerCachedFile(
>

Re: [DISCUSS] Configuration for local recovery

2018-05-14 Thread Shuyi Chen
+1 to the proposal. IMO, the current option "ENABLE_FILE_BASED" contains
too much implementation details and might confuse the simple users. Having
a simple on/off toggle for majority of the users and an advanced option for
the experts to do the tuning will definitely make better user experience.

On Sun, May 13, 2018 at 12:33 PM, Till Rohrmann 
wrote:

> I agree with Stephan that a simple on/off configuration option for local
> recovery would be easier to understand and gives more flexibility wrt
> future changes.
>
> Cheers,
> Till
>
> On Sun, May 13, 2018 at 4:00 PM, sihua zhou  wrote:
>
> > +1 for @Stephan's proposal, it makes the out of the box experience better
> > and also leaves some space for the expert.
> >
> > Best,
> > Sihua
> >
> >
> >
> > On 05/12/2018 02:41,Stephan Ewen 
> > wrote:
> >
> > Hi!
> >
> > The configuration option (in flink-conf.yaml) for local recovery is
> > currently an enumeration with the values "DISABLED" and
> > "ENABLE_FILE_BASED".
> >
> > I would suggest to change that, for a few reasons:
> >
> > - Having values like "ENABLE_FILE_BASED" breaks with the style of the
> > other config options. Having a homogeneous feel for the configuration of
> > the system is important for ease of use.
> >
> > - Do we need to require users to understand what file-based local
> > recovery means? It might be easier for users to have an option to
> activate
> > deactivate the mode (on by default in the future) and if we need to have
> > different modes in the future, then we can have a "mode" option as an
> > "expert option". That way we expose the simple fact of whether to use
> local
> > recovery or not in a simple boolean, and hide the complex tuning part
> > (which hopefully few users ever need to touch) in a separate option.
> >
> > - Are we sure already whether options beyond "on/off" are shared across
> > state backends? For example, memory snapshot based local recovery would
> be
> > specific to the Memoy/FsStateBackend. Persistent-volume based local
> > recovery may behave differently for RocksDB and FsStateBackend.
> >
> >
> > ==>  This config option looks like it sets things up in a tricky
> direction.
> > We can still change it, now that we have not yet released it.
> >
> > Best,
> > Stephan
> >
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[DISCUSS] Adding new interfaces in [Stream]ExecutionEnvironment

2018-05-14 Thread Shuyi Chen
Hi Flink devs,

In an effort to support loading external libraries and creating UDFs from
external libraries using DDL in Flink SQL, we want to use Flink’s Blob
Server to distribute the external libraries in runtime and load those
libraries into the user code classloader automatically.

However, the current [Stream]ExecutionEnvironment.registerCachedFile
interface limits only to registering executable or non-executable blobs.
It’s not possible to tell in runtime if the blob files are libraries and
should be loaded into the user code classloader in RuntimeContext.
Therefore, I want to propose to add an enum called *BlobType* explicitly to
indicate the type of the Blob file being distributed, and the following
interface in [Stream]ExecutionEnvironment to support it. In general, I
think the new BlobType information can be used by Flink runtime to
preprocess the Blob files if needed.

*/***
** Registers a file at the distributed cache under the given name. The file
will be accessible*
** from any user-defined function in the (distributed) runtime under a
local path. Files*
** may be local files (as long as all relevant workers have access to it),
or files in a distributed file system.*
** The runtime will copy the files temporarily to a local cache, if needed.*
***
** The {@link org.apache.flink.api.common.functions.RuntimeContext} can
be obtained inside UDFs via*
** {@link
org.apache.flink.api.common.functions.RichFunction#getRuntimeContext()} and
provides access*
** {@link org.apache.flink.api.common.ca
che.DistributedCache} via*
** {@link
org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache()}.*
***
** @param filePath The path of the file, as a URI (e.g. "file:///some/path"
or "hdfs://host:port/and/path")*
** @param name The name under which the file is registered.*
** @param blobType indicating the type of the Blob file*
**/*

*public void registerCachedFile(String filePath, String name,
DistributedCache.BlobType blobType) {...}*

Optionally, we can add another interface to register UDF Jars which will
use the interface above to implement.

*public void registerJarFile(String filePath, String name) {...}*

The existing interface in the following will be marked deprecated:

*public void registerCachedFile(String filePath, String name, boolean
executable) {...}*

And the following interface will be implemented using the new interface
proposed above with a EXECUTABLE BlobType:

*public void registerCachedFile(String filePath, String name) { ... }*

Thanks a lot.
Shuyi

"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-9327) Support explicit ROW value constructor in Flink SQL

2018-05-09 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9327:
-

 Summary: Support explicit ROW value constructor in Flink SQL
 Key: FLINK-9327
 URL: https://issues.apache.org/jira/browse/FLINK-9327
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen


Currently, explicit ROW value constructor can be only used in VALUES() 
statement. The parser will fail if ROW is explicitly used in SELECT, WHERE or 
etc. [CALCITE-2276|https://issues.apache.org/jira/browse/CALCITE-2276] fix the 
problem. We should integrate this as part of 1.17 upgrade, and add unittests 
for it in Flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Two new committers: Xingcan Cui and Nico Kruber

2018-05-08 Thread Shuyi Chen
Congratulations!

On Tue, May 8, 2018 at 12:18 PM, Dawid Wysakowicz <
wysakowicz.da...@gmail.com> wrote:

> Congratulations Nico and Xingcan! Well deserved!
>
>
> On 08.05.2018 20:52, Fabian Hueske wrote:
> > Hi everyone,
> >
> > I'm happy to announce that two members of the Flink community accepted
> the
> > offer of the PMC to become committers.
> >
> > * Xingcan Cui has been contributing to Flink for about a year, focusing
> on
> > Flink's relational APIs (SQL & Table API). In the past year, Xingcan has
> > started design discussions, helped reviewing several pull requests, and
> > replied to questions on the user mailing list.
> >
> > * Nico Kruber is an active contributor since 1.5 years and worked mostly
> on
> > internal features, such as the blob manager and a new network stack. Nico
> > answers many questions on the user mailing list, reports lots of bugs and
> > is a very active PR reviewer.
> >
> > Please join me in congratulating Xingcan and Nico.
> >
> > Cheers,
> > Fabian
> >
>
>
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Releasing Flink 1.5.0

2018-04-23 Thread Shuyi Chen
Hi Aljoscha and Till,

I've added 2 PRs to fix and harden the YARN kerberos security for flip-6. I
think they should go in for 1.5.0 (particularly FLINK-8286
).

1) https://github.com/apache/flink/pull/5896 (FLINK-8286
): fixed broken kerberos
configuration for Yarn TaskExecutor.
2) https://github.com/apache/flink/pull/5901 (FLINK-9235
): Added integration test
to test YARN security for both legacy and new mode.

Thanks a lot.
Shuyi

On Thu, Apr 12, 2018 at 9:17 AM, Christophe Jolif  wrote:

> Hi all,
>
> A small patch: https://github.com/apache/flink/pull/5789 (JIRA:
> https://issues.apache.org/jira/browse/FLINK-9103) was issued to help with
> SSL certificates in Kubernetes deployment where you don't control your IPs.
> As this is very small and helpful (at least to me and Edward who issued the
> fix), I was wondering if that could be considered for 1.5?
>
> Thanks,
> --
> Christophe
>
>
> On Mon, Mar 12, 2018 at 12:42 PM, Till Rohrmann 
> wrote:
>
> > Hi Pavel,
> >
> > currently, it is extremely difficult to say when it will happen since
> Flink
> > 1.5 includes some very big changes which need thorough testing. Depending
> > on that and what else the community finds on the way, it may go faster or
> > slower. Personally, I hope to finish the release until end of
> > March/beginning of April.
> >
> > Cheers,
> > Till
> >
> > On Thu, Mar 8, 2018 at 7:28 PM, Pavel Ciorba  wrote:
> >
> > > Approximately when is the release of Flink 1.5 planned?
> > >
> > > Best,
> > >
> > > 2018-03-01 11:20 GMT+02:00 Till Rohrmann :
> > >
> > > > Thanks for bringing this issue up Shashank. I think Aljoscha is
> taking
> > a
> > > > look at the issue. It looks like a serious bug which we should
> > definitely
> > > > fix. What I've heard so far is that it's not so trivial.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Mar 1, 2018 at 9:56 AM, shashank734 
> > > wrote:
> > > >
> > > > > Can we have
> > > > > https://issues.apache.org/jira/browse/FLINK-7756
> > > > >    solved in
> this
> > > > > version.
> > > > > Cause unable to use checkpointing with CEP and RocksDB backend.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.
> > > > nabble.com/
> > > > >
> > > >
> > >
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-9235) Add Integration test for Flink-Yarn-Kerberos integration for flip-6

2018-04-22 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9235:
-

 Summary: Add Integration test for Flink-Yarn-Kerberos integration 
for flip-6
 Key: FLINK-9235
 URL: https://issues.apache.org/jira/browse/FLINK-9235
 Project: Flink
  Issue Type: Test
Affects Versions: 1.5.0
Reporter: Shuyi Chen
Assignee: Shuyi Chen


We need to provide an integration test for flip-6 similar to 
YARNSessionFIFOSecuredITCase for the legacy deployment mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Support for out-of-the-box external catalog for SQL Client

2018-04-13 Thread Shuyi Chen
I've created master JIRA (https://issues.apache.org/jira/browse/FLINK-9171),
and included all HCatalog related JIRAs as subtasks. This make it easier to
track all HCatalog related effort in Flink. Thanks.

Shuyi

On Fri, Apr 13, 2018 at 12:36 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi everybody,
>
> An HCatalog integration with the Table API/SQL would be great and be
> helpful for many users!
>
> A big +1 to that.
>
> Thank you,
> Fabian
>
> Shuyi Chen <suez1...@gmail.com> schrieb am Mi., 11. Apr. 2018, 14:36:
>
> > Thanks a lot, Rong and Peter.
> >
> > AFAIK, there is a flink hcatalog connector introduced in FLINK-1466
> > <https://issues.apache.org/jira/browse/FLINK-1466> that is added by
> > Fabian.
> > And there is another JIRA in FLINK-1913
> > <https://issues.apache.org/jira/browse/FLINK-1913> to document the use
> of
> > connector.
> >
> > I think we can start with looking at the existing hcatalog  connector,
> > adding missing documentation, and come up with a proposal to evolve the
> > Flink HCatalog integration with ExternalCatalog, and the SQL client to
> make
> > it both useful both SQL and non-SQL scenarios.
> >
> > Given we already have the integration implemented in AthenaX
> > <https://github.com/uber/AthenaX> internally, we can help drive and
> > contribute back to the community.
> >
> > Shuyi
> >
> > On Wed, Apr 11, 2018 at 12:01 PM, Peter Huang <
> huangzhenqiu0...@gmail.com>
> > wrote:
> >
> > > Hi Rong,
> > >
> > > It is a good point out. I aligned with Fabian yesterday. It is a good
> > work
> > > that I can involve
> > > to contribute back to Apache Flink after having AthenaX backfill
> support
> > > internally.
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> > > On Wed, Apr 11, 2018 at 10:52 AM, Rong Rong <walter...@gmail.com>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I was wondering if it is a good idea to support some external catalog
> > > > software, such as Apache HCatalog[2], out-of-the-box for the FLIP-24
> > > > SQL-Client[1]. There are many widely used catalogs that we can
> > > incorporate.
> > > > This way users won't have to always extend and create their own
> > > > ExternalCatalog.class separately and this could potentially make the
> > > > configuration part easier for SQL users.
> > > >
> > > > Thanks,
> > > > Rong
> > > >
> > > >
> > > > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> > > 24+-+SQL+Client
> > > > [2] https://cwiki.apache.org/confluence/display/Hive/HCatalog
> > > >
> > >
> >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-9171) Flink HCatolog integration

2018-04-13 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9171:
-

 Summary: Flink HCatolog integration 
 Key: FLINK-9171
 URL: https://issues.apache.org/jira/browse/FLINK-9171
 Project: Flink
  Issue Type: Task
Reporter: Shuyi Chen


This is a parent task for all HCatalog related integration in Flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9170) HCatolog integration with Table/SQL API

2018-04-13 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9170:
-

 Summary: HCatolog integration with Table/SQL API
 Key: FLINK-9170
 URL: https://issues.apache.org/jira/browse/FLINK-9170
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Zhenqiu Huang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9161) Support AS STRUCT syntax to create named STRUCT in SQL

2018-04-12 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9161:
-

 Summary: Support AS STRUCT syntax to create named STRUCT in SQL
 Key: FLINK-9161
 URL: https://issues.apache.org/jira/browse/FLINK-9161
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen


As discussed in [calcite dev mailing 
list|https://mail-archives.apache.org/mod_mbox/calcite-dev/201804.mbox/%3cCAMZk55avGNmp1vXeJwA1B_a8bGyCQ9ahxmE=R=6fklpf7jt...@mail.gmail.com%3e],
 we want add support for adding named structure construction in SQL, e.g., 

{code:java}
SELECT STRUCT(a as first_name, b as last_name, STRUCT(c as zip code, d as
street, e as state) as address) as record FROM example_table
{code}

This would require adding necessary change in Calcite first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Support for out-of-the-box external catalog for SQL Client

2018-04-11 Thread Shuyi Chen
Thanks a lot, Rong and Peter.

AFAIK, there is a flink hcatalog connector introduced in FLINK-1466
 that is added by Fabian.
And there is another JIRA in FLINK-1913
 to document the use of
connector.

I think we can start with looking at the existing hcatalog  connector,
adding missing documentation, and come up with a proposal to evolve the
Flink HCatalog integration with ExternalCatalog, and the SQL client to make
it both useful both SQL and non-SQL scenarios.

Given we already have the integration implemented in AthenaX
 internally, we can help drive and
contribute back to the community.

Shuyi

On Wed, Apr 11, 2018 at 12:01 PM, Peter Huang 
wrote:

> Hi Rong,
>
> It is a good point out. I aligned with Fabian yesterday. It is a good work
> that I can involve
> to contribute back to Apache Flink after having AthenaX backfill support
> internally.
>
>
> Best Regards
> Peter Huang
>
> On Wed, Apr 11, 2018 at 10:52 AM, Rong Rong  wrote:
>
> > Hi everyone,
> >
> > I was wondering if it is a good idea to support some external catalog
> > software, such as Apache HCatalog[2], out-of-the-box for the FLIP-24
> > SQL-Client[1]. There are many widely used catalogs that we can
> incorporate.
> > This way users won't have to always extend and create their own
> > ExternalCatalog.class separately and this could potentially make the
> > configuration part easier for SQL users.
> >
> > Thanks,
> > Rong
> >
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 24+-+SQL+Client
> > [2] https://cwiki.apache.org/confluence/display/Hive/HCatalog
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Flink security improvements

2018-03-22 Thread Shuyi Chen
Thanks a lot, Till and Eron, for the updates. I'll try to pin this thread
after 1.5 release.

Also let me know how I can help more with Flink 1.5. I am currently working
on adding e2e test for kerberos and cassandra.

On Wed, Mar 21, 2018 at 8:16 AM, Eron Wright <eronwri...@gmail.com> wrote:

> Please accept my apologies also.
>
> On Mon, Mar 19, 2018 at 2:46 AM, Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > Hi Shuyi,
> >
> > sorry for the unresponsiveness on your proposal. Currently, the community
> > is strongly focused on fixing and testing Flink 1.5 so that we can
> release
> > it soon. My gut feeling is that the community will pick up the security
> > improvements thread once most of the blocking issues are resolved. Please
> > bear with us until then.
> >
> > Cheers,
> > Till
> >
> > On Fri, Mar 9, 2018 at 1:05 AM, Shuyi Chen <suez1...@gmail.com> wrote:
> >
> > > Ping :)
> > >
> > > On Wed, Feb 21, 2018 at 7:16 PM, Shuyi Chen <suez1...@gmail.com>
> wrote:
> > >
> > > > Hi Eron, thanks a lot for taking a look at the proposal, the comments
> > are
> > > > very useful. I've updated the document to address your concerns.
> Could
> > > you
> > > > please help take another look, and suggest what the next step is?
> > Highly
> > > > appreciated.
> > > >
> > > > Shuyi
> > > >
> > > > On Thu, Feb 15, 2018 at 4:19 AM, Shuyi Chen <suez1...@gmail.com>
> > wrote:
> > > >
> > > >> Hi community,
> > > >>
> > > >> I would like to propose a few improvements in Flink security
> regarding
> > > >> scalability and extensibility. Here is the proposal:
> > > >>
> > > >> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t
> > > >> 6BrC6TZi3FGf2Dm6-i8/edit?usp=sharing
> > > >>
> > > >> Comments are highly appreciated. Please let me know what the next
> step
> > > >> will be.
> > > >>
> > > >> Thanks a lot
> > > >> Shuyi
> > > >>
> > > >> --
> > > >> "So you have to trust that the dots will somehow connect in your
> > > future."
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > "So you have to trust that the dots will somehow connect in your
> > future."
> > > >
> > >
> > >
> > >
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-9049) Create unified interfaces to configure and instatiate TableSink

2018-03-21 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9049:
-

 Summary: Create unified interfaces to configure and instatiate 
TableSink
 Key: FLINK-9049
 URL: https://issues.apache.org/jira/browse/FLINK-9049
 Project: Flink
  Issue Type: Task
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen


This is a similar effort to 
[FLINK-8240|https://issues.apache.org/jira/browse/FLINK-8240], we want to 
create a unified interface for discovery and instantiation of TableSink, and 
later support table DDL in flink. The proposed solution would use similar 
approach in [FLINK-8240|https://issues.apache.org/jira/browse/FLINK-8240], and 
can re-use most of the implementations already done in 
[FLINK-8240|https://issues.apache.org/jira/browse/FLINK-8240].

1) Add TableSinkFactory{Service} similar to TableSourceFactory{Service]
2) Add a common property called "tableType" with values (source, sink and both) 
for both TableSource and TableSink.
3) in yaml file, replace "sources" with "tables", and use tableType to identify 
whether it's source or sink.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9015) Upgrade Calcite dependency to 1.17

2018-03-16 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9015:
-

 Summary: Upgrade Calcite dependency to 1.17
 Key: FLINK-9015
 URL: https://issues.apache.org/jira/browse/FLINK-9015
 Project: Flink
  Issue Type: Task
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Flip 6 mesos support

2018-03-16 Thread Shuyi Chen
Hi Till,

For FLINK-8562, the test is passing now because it's not really
checking the right thing.

Yes, I can help with the Kerberos integration ticket.

Is there an example on how the e2e test should be structured and invoked?

Thanks
Shuyi

On Fri, Mar 16, 2018 at 6:51 AM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Shuyi,
>
> thanks for the working on FLINK-8562. Once this issue is fixed, it will
> automatically be executed on the Flip-6 components. In fact it is already
> being executed on Flip-6.
>
> But what you could help the community with is setting up an automated
> end-to-end test for the Kerberos integration if you want:
> https://issues.apache.org/jira/browse/FLINK-8981.
>
> The Flink community is currently working on automating more and more tests
> in order to facilitate faster releases and improve the test coverage. You
> can find more about this effort here:
> https://issues.apache.org/jira/browse/FLINK-8970.
>
> Cheers,
> Till
>
> On Thu, Mar 15, 2018 at 8:45 PM, Shuyi Chen <suez1...@gmail.com> wrote:
>
> > Hi Till,
> >
> > This is Shuyi :) Thanks a lot. In FLINK-8562, I already sent a PR to
> > resolve the issue, your help to take a look will be great.
> >
> > Please let me know what I can help to test the Kerberos authentication, I
> > am decently familiar with the Kerberos and YARN security part in Flink.
> >
> > As a starting point, I'd suggest to add an integration test similar to
> > YARNSessionFIFOSecuredITCase
> > for flip6.
> >
> > Shuyi
> >
> > On Thu, Mar 15, 2018 at 5:44 AM, Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > Hi Renjie,
> > >
> > > thanks for the pointer with the YARNSessionFIFOSecuredITCase. You're
> > right
> > > that we should fix this test. There is FLINK-8562 which seems to
> address
> > > the problem. Will take a look.
> > >
> > > Additionally, we want to test Kerberos authentication explicitly as
> part
> > of
> > > the release testing for Flink 1.5. I will shortly send around a mail
> > where
> > > I will lay out the ongoing testing efforts and where more is needed.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Mar 15, 2018 at 7:37 AM, Renjie Liu <liurenjie2...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for the clarification
> > > >
> > > > On Thu, Mar 15, 2018 at 2:30 PM 周思华 <summerle...@163.com> wrote:
> > > >
> > > > > Hi Renjie,
> > > > > if I am not misunderstand, you just need to start the cluster as
> > normal
> > > > as
> > > > > before. The dispatcher and resourcemanager are spawned by
> > > > ClusterEntryPoint
> > > > > (you can have a look at yarn-session.sh & FlinkYarnSessionCli &
> > > > > YarnSessionClusterEntrypoint), and the TM are spawned by
> > > ResourceManager
> > > > > lazily (ResourceManager will setup TM according to the submitted
> job)
> > > or
> > > > > spawned by the setup script (you can have a look at
> > start-cluster.sh).
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Sihua Zhou
> > > > >
> > > > >
> > > > > 发自网易邮箱大师
> > > > >
> > > > >
> > > > > On 03/15/2018 10:14,Renjie Liu<liurenjie2...@gmail.com> wrote:
> > > > > Hi, Till:
> > > > > In fact I'm asking how to deploy other components such as
> dispatcher,
> > > > etc.
> > > > >
> > > > > Till Rohrmann <trohrm...@apache.org> 于 2018年3月15日周四 上午12:17写道:
> > > > >
> > > > > Hi Renjie,
> > > > >
> > > > > in the current master and release-1.5 branch flip-6 is activated by
> > > > > default. If you want to turn it off you have to add `mode: old` to
> > your
> > > > > flink-conf.yaml. I'm really happy that you want to test it out :-)
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Wed, Mar 14, 2018 at 3:03 PM, Renjie Liu <
> liurenjie2...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > Hi Till:
> > > > > Is there any doc on deploying flink in flip6 mode? We want to help
> > > > > testing
> > > > > it.
> > &g

Re: Flip 6 mesos support

2018-03-15 Thread Shuyi Chen
Hi Till,

This is Shuyi :) Thanks a lot. In FLINK-8562, I already sent a PR to
resolve the issue, your help to take a look will be great.

Please let me know what I can help to test the Kerberos authentication, I
am decently familiar with the Kerberos and YARN security part in Flink.

As a starting point, I'd suggest to add an integration test similar to
YARNSessionFIFOSecuredITCase
for flip6.

Shuyi

On Thu, Mar 15, 2018 at 5:44 AM, Till Rohrmann  wrote:

> Hi Renjie,
>
> thanks for the pointer with the YARNSessionFIFOSecuredITCase. You're right
> that we should fix this test. There is FLINK-8562 which seems to address
> the problem. Will take a look.
>
> Additionally, we want to test Kerberos authentication explicitly as part of
> the release testing for Flink 1.5. I will shortly send around a mail where
> I will lay out the ongoing testing efforts and where more is needed.
>
> Cheers,
> Till
>
> On Thu, Mar 15, 2018 at 7:37 AM, Renjie Liu 
> wrote:
>
> > Thanks for the clarification
> >
> > On Thu, Mar 15, 2018 at 2:30 PM 周思华  wrote:
> >
> > > Hi Renjie,
> > > if I am not misunderstand, you just need to start the cluster as normal
> > as
> > > before. The dispatcher and resourcemanager are spawned by
> > ClusterEntryPoint
> > > (you can have a look at yarn-session.sh & FlinkYarnSessionCli &
> > > YarnSessionClusterEntrypoint), and the TM are spawned by
> ResourceManager
> > > lazily (ResourceManager will setup TM according to the submitted job)
> or
> > > spawned by the setup script (you can have a look at start-cluster.sh).
> > >
> > >
> > > Best Regards,
> > > Sihua Zhou
> > >
> > >
> > > 发自网易邮箱大师
> > >
> > >
> > > On 03/15/2018 10:14,Renjie Liu wrote:
> > > Hi, Till:
> > > In fact I'm asking how to deploy other components such as dispatcher,
> > etc.
> > >
> > > Till Rohrmann  于 2018年3月15日周四 上午12:17写道:
> > >
> > > Hi Renjie,
> > >
> > > in the current master and release-1.5 branch flip-6 is activated by
> > > default. If you want to turn it off you have to add `mode: old` to your
> > > flink-conf.yaml. I'm really happy that you want to test it out :-)
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Mar 14, 2018 at 3:03 PM, Renjie Liu 
> > > wrote:
> > >
> > > Hi Till:
> > > Is there any doc on deploying flink in flip6 mode? We want to help
> > > testing
> > > it.
> > >
> > > Till Rohrmann  于 2018年3月14日周三 下午7:08写道:
> > >
> > > Hi Renjie,
> > >
> > > in order to make Mesos work, we only needed to implement a Mesos
> > > specific
> > > ResourceManager. Look at MesosResourceManager for more details. As
> > > dispatcher, we use the StandaloneDispatcher which is spawned by
> > > the MesosSessionClusterEntrypoint.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Mar 14, 2018 at 9:32 AM, Renjie Liu 
> > > wrote:
> > >
> > > Hi all:
> > > I'm reading the source code and it seems that flip6 does not support
> > > mesos?
> > > According to the design, client send job graph to dispatcher and
> > > dispatcher
> > > spawn job mananger and resource manager for job execution. But I
> > > can't
> > > find
> > > dispatcher implementation for mesos.
> > > --
> > > Liu, Renjie
> > > Software Engineer, MVAD
> > >
> > >
> > > --
> > > Liu, Renjie
> > > Software Engineer, MVAD
> > >
> > >
> > > --
> > > Liu, Renjie
> > > Software Engineer, MVAD
> > >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: Flip 6 mesos support

2018-03-14 Thread Shuyi Chen
Hi Till, have we tested the YARN kerberos integration in flip6? AFAI
remember, YARNSessionFIFOSecuredITCase is not functioning (FLINK-8562
), do we have similar
integration test for flip6? Also, Flink yarn kerberos integration in the
old deployment was broken in 1.3 when flip6 is being developed (FLINK-8286
). Thanks a lot.

Shuyi

On Wed, Mar 14, 2018 at 9:16 AM, Till Rohrmann  wrote:

> Hi Renjie,
>
> in the current master and release-1.5 branch flip-6 is activated by
> default. If you want to turn it off you have to add `mode: old` to your
> flink-conf.yaml. I'm really happy that you want to test it out :-)
>
> Cheers,
> Till
>
> On Wed, Mar 14, 2018 at 3:03 PM, Renjie Liu 
> wrote:
>
> > Hi Till:
> > Is there any doc on deploying flink in flip6 mode? We want to help
> testing
> > it.
> >
> > Till Rohrmann  于 2018年3月14日周三 下午7:08写道:
> >
> > > Hi Renjie,
> > >
> > > in order to make Mesos work, we only needed to implement a Mesos
> specific
> > > ResourceManager. Look at MesosResourceManager for more details. As
> > > dispatcher, we use the StandaloneDispatcher which is spawned by
> > > the MesosSessionClusterEntrypoint.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Mar 14, 2018 at 9:32 AM, Renjie Liu 
> > > wrote:
> > >
> > > > Hi all:
> > > > I'm reading the source code and it seems that flip6 does not support
> > > mesos?
> > > > According to the design, client send job graph to dispatcher and
> > > dispatcher
> > > > spawn job mananger and resource manager for job execution. But I
> can't
> > > find
> > > > dispatcher implementation for mesos.
> > > > --
> > > > Liu, Renjie
> > > > Software Engineer, MVAD
> > > >
> > >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Flink security improvements

2018-03-08 Thread Shuyi Chen
Ping :)

On Wed, Feb 21, 2018 at 7:16 PM, Shuyi Chen <suez1...@gmail.com> wrote:

> Hi Eron, thanks a lot for taking a look at the proposal, the comments are
> very useful. I've updated the document to address your concerns. Could you
> please help take another look, and suggest what the next step is? Highly
> appreciated.
>
> Shuyi
>
> On Thu, Feb 15, 2018 at 4:19 AM, Shuyi Chen <suez1...@gmail.com> wrote:
>
>> Hi community,
>>
>> I would like to propose a few improvements in Flink security regarding
>> scalability and extensibility. Here is the proposal:
>>
>> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t
>> 6BrC6TZi3FGf2Dm6-i8/edit?usp=sharing
>>
>> Comments are highly appreciated. Please let me know what the next step
>> will be.
>>
>> Thanks a lot
>> Shuyi
>>
>> --
>> "So you have to trust that the dots will somehow connect in your future."
>>
>
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Convert main Table API classes into traits

2018-03-02 Thread Shuyi Chen
Hi Timo,

I am throwing some second thoughts here, as I don't quite see what trait
provides over abstract class here for TableEnvironment case. Trait in scala
can also have implementation and you can have 'private[flink]' or
'protected'  type and method in trait as well.

AFAIK, the differences between Scala trait and abstract class are:
1) you can have constructor for abstract class, but not in trait
2) Abstract classes are fully interoperable with Java. You can call them
from Java code without any wrappers. Traits are fully interoperable only if
they do not contain any implementation code for scala 2.11.
3) you can do multiple inheritance or mixin composition with trait.

In the TableEnvironment case,
1) I don't see a need for mixin, and class hierarchy seems fit better here
by design.
2) to better interoperate with Java from scala 2.11, it's better to use
abstract class. (But AFAIK, scala 2.12 and java 8 would be compatible,
though)
3) you might pay a bit performance overhead with trait (compiled to
interface) compared to abstract class, but it's not a big deal here.

But in other cases, trait might be a better one if it might be reused and
mixined in multiple, unrelated classes.

So another option would be to refactor TableEnvironment to clean up or move
the 'private[flink]' or 'protected' stuff to the actual implementor
(e.g. 'InternalTableEnvironment') as
you would do for your trait approach for TableEnvironment. I think this
option might help with backward compatibility as well. Thanks.

Shuyi

On Fri, Mar 2, 2018 at 10:25 AM, Rong Rong  wrote:

> Hi Timo,
>
> Thanks for looking into this Timo. It's becoming increasingly messy for my
> trying to locate the correct functions in IDE :-/
>
> This is probably due to the fact that Scala and Java access modifiers /
> qualifiers are subtly and fundamentally different. Using Trait might be the
> best solution here. Another way I can think of is to move the all
> TableEnvironment classes to Java side, but that would probably introduce a
> lot of issue we need to resolve on the Scala side though. "protected" is
> less restrictive in Java but there's really no equivalent of package
> private modifier on Java.
>
> I was wondering is there any better way to provide backward-compatible
> support though. I played around with it and seems like every "protected"
> field will create a private Java member and a public getter, should we add
> them all and annotate with "@Deprecated" ?
> --
> Rong
>
> On Thu, Mar 1, 2018 at 10:58 AM, Timo Walther  wrote:
>
> > Hi everyone,
> >
> > I'm currently thinking about how to implement FLINK-8606. The reason
> > behind it is that Java users are able to see all variables and methods
> that
> > are declared 'private[flink]' or even 'protected' in Scala. Classes such
> as
> > TableEnvironment look very messy from the outside in Java. Since we
> cannot
> > change the visibility of Scala protected members, I was thinking about a
> > bigger change to solve this issue once and for all. My idea is to convert
> > all TableEnvironment classes and maybe the Table class into traits. The
> > actual implementation would end up in some internal classes such as
> > "InternalTableEnvironment" that implement the public traits. The goal
> would
> > be to stay source code compatible.
> >
> > What do you think?
> >
> > Regards,
> > Timo
> >
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: Flink Table 1.5-SNAPSHOT

2018-02-24 Thread Shuyi Chen
Maybe you can explain a bit more on what you need. But here is the link to
the jar for your convenience,
https://drive.google.com/file/d/0BwG71AtfYG0aZk04MHNSMDZiZTlkbUh0dXktMXphZ1c3YTdR/view?usp=sharing.
Just rename it and remove the .tmp suffix.

On Sat, Feb 24, 2018 at 9:40 AM, Pavel Ciorba  wrote:

> Hi everyone!
>
> I want to test the nonWindowInnerJoin API.
>
> Could anyone share the flink-table 1.5-SNAPSHOT jar ? (dropbox link etc.)
>
> I've tried to build the flink-table myself using *mvn clean install *but
> when I import it in a Gradle project it is unresolved for some reason.
>
> I'd be glad if someone just simply share a working jar.
>
> Thanks!
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Flink security improvements

2018-02-21 Thread Shuyi Chen
Hi Eron, thanks a lot for taking a look at the proposal, the comments are
very useful. I've updated the document to address your concerns. Could you
please help take another look, and suggest what the next step is? Highly
appreciated.

Shuyi

On Thu, Feb 15, 2018 at 4:19 AM, Shuyi Chen <suez1...@gmail.com> wrote:

> Hi community,
>
> I would like to propose a few improvements in Flink security regarding
> scalability and extensibility. Here is the proposal:
>
> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6
> TZi3FGf2Dm6-i8/edit?usp=sharing
>
> Comments are highly appreciated. Please let me know what the next step
> will be.
>
> Thanks a lot
> Shuyi
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[DISCUSS] Flink security improvements

2018-02-15 Thread Shuyi Chen
Hi community,

I would like to propose a few improvements in Flink security regarding
scalability and extensibility. Here is the proposal:

https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit?usp=sharing

Comments are highly appreciated. Please let me know what the next step will
be.

Thanks a lot
Shuyi

-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-8562) Fix YARNSessionFIFOSecuredITCase

2018-02-05 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8562:
-

 Summary: Fix YARNSessionFIFOSecuredITCase
 Key: FLINK-8562
 URL: https://issues.apache.org/jira/browse/FLINK-8562
 Project: Flink
  Issue Type: Bug
  Components: Security
Reporter: Shuyi Chen
Assignee: Shuyi Chen


Currently, YARNSessionFIFOSecuredITCase will not fail even if the current Flink 
YARN Kerberos integration test is failing. Please see FLINK-8275.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Releasing Flink 1.5.0

2018-02-05 Thread Shuyi Chen
Hi Aljoscha, can we get this feature in for 1.5.0? We have a lot of
internal users waiting for this feature.

[FLINK-7923 ] Support
accessing subfields of a Composite element in an Object Array type column

Thanks a lot
Shuyi


On Mon, Feb 5, 2018 at 6:59 AM, Christophe Jolif  wrote:

> Hi guys,
>
> Sorry for jumping in, but I think
>
> [FLINK-8101] Elasticsearch 6.X support
> [FLINK-7386]  Flink Elasticsearch 5 connector is not compatible with
> Elasticsearch 5.2+ client
>
>  have long been awaited and there was one PR from me and from someone else
> showing the interest ;) So if you could consider it for 1.5 that would be
> great!
>
> Thanks!
> --
> Christophe
>
> On Mon, Feb 5, 2018 at 2:47 PM, Timo Walther  wrote:
>
> > Hi Aljoscha,
> >
> > it would be great if we can include the first version of the SQL client
> > (see FLIP-24, Implementation Plan 1). I will open a PR this week. I think
> > we can merge this with explicit "experimental/alpha" status. It is far
> away
> > from feature completeness but will be a great tool for Flink beginners.
> >
> > In order to use the SQL client we would need to also add some table
> > sources with the new unified table factories (FLINK-8535), but this is
> > optional because a user can implement own table factories at the
> begining.
> >
> > Regards,
> > Timo
> >
> >
> > Am 2/5/18 um 2:36 PM schrieb Tzu-Li (Gordon) Tai:
> >
> > Hi Aljoscha,
> >>
> >> Thanks for starting the discussion.
> >>
> >> I think there’s a few connector related must-have improvements that we
> >> should get in before the feature freeze, since quite a few users have
> been
> >> asking for them:
> >>
> >> [FLINK-6352] FlinkKafkaConsumer should support to use timestamp to set
> up
> >> start offset
> >> [FLINK-5479] Per-partition watermarks in FlinkKafkaConsumer should
> >> consider idle partitions
> >> [FLINK-8516] Pluggable shard-to-subtask partitioning for
> >> FlinkKinesisConsumer
> >> [FLINK-6109] Add a “checkpointed offset” metric to FlinkKafkaConsumer
> >>
> >> These are still missing in the master branch. Only FLINK-5479 is still
> >> lacking a pull request.
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On 31 January 2018 at 10:38:43 AM, Aljoscha Krettek (
> aljos...@apache.org)
> >> wrote:
> >> Hi Everyone,
> >>
> >> When we decided to do the 1.4.0 release a while back we did that to get
> a
> >> stable release out before putting in a couple of new features. Back
> then,
> >> some of those new features (FLIP-6, network stack changes, local state
> >> recovery) were almost ready and we wanted to do a shortened 1.5.0
> >> development cycle to allow for those features to become ready and then
> do
> >> the next release.
> >>
> >> We are now approaching the approximate time where we wanted to do the
> >> Flink 1.5.0 release so I would like to gauge where we are and gather
> >> opinions on how we should proceed now.
> >>
> >> With this, I'd also like to propose myself as the release manager for
> >> 1.5.0 but I'm very happy to yield if someone else would be interested in
> >> doing that.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Aljoscha
> >>
> >
> >
> >
>
>
> --
> Christophe
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-8509) Remove SqlGroupedWindowFunction from Flink repo

2018-01-24 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8509:
-

 Summary: Remove SqlGroupedWindowFunction from Flink repo
 Key: FLINK-8509
 URL: https://issues.apache.org/jira/browse/FLINK-8509
 Project: Flink
  Issue Type: Task
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen


SqlGroupedWindowFunction is copied to the Flink repo due to 
[CALCITE-2133|https://issues.apache.org/jira/browse/CALCITE-2133], we should 
remove it once flink upgrade Calcite dependency to 1.16.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8508) Remove RexSimplify from Flink repo

2018-01-24 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8508:
-

 Summary: Remove RexSimplify from Flink repo
 Key: FLINK-8508
 URL: https://issues.apache.org/jira/browse/FLINK-8508
 Project: Flink
  Issue Type: Task
  Components: Table API  SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen


RexSimplify is copied to the Flink repo due to 
[CALCITE-2110|https://issues.apache.org/jira/browse/CALCITE-2110], we should 
remove it once flink upgrade Calcite dependency to 1.16.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8507) Upgrade Calcite dependency to 1.16

2018-01-24 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8507:
-

 Summary: Upgrade Calcite dependency to 1.16
 Key: FLINK-8507
 URL: https://issues.apache.org/jira/browse/FLINK-8507
 Project: Flink
  Issue Type: Task
  Components: Table API  SQL
Reporter: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Flink-Yarn-Kerberos integration

2018-01-18 Thread Shuyi Chen
Thanks a lot, Eron. I'll draft a proposal and share it with the community.

On Thu, Jan 18, 2018 at 4:18 PM, Eron Wright <eronwri...@gmail.com> wrote:

> I would suggest that you draft a proposal that lays out your goals and the
> technical challenges that you perceive.  Then the community can provide
> some feedback on potential solutions to those challenges, culminating in a
> concrete improvement proposal.
>
> Thanks
>
> On Wed, Jan 17, 2018 at 7:29 PM, Shuyi Chen <suez1...@gmail.com> wrote:
>
> > Ping, any comments?  Thanks a lot.
> >
> > Shuyi
> >
> > On Wed, Jan 3, 2018 at 3:43 PM, Shuyi Chen <suez1...@gmail.com> wrote:
> >
> > > Thanks a lot for the clarification, Eron. That's very helpful.
> Currently,
> > > we are more concerned about 1) data access, but will get to 2) and 3)
> > > eventually.
> > >
> > > I was thinking doing the following:
> > > 1) extend the current HadoopModule to use and refresh DTs as suggested
> > on YARN
> > > Application Security docs.
> > > 2) I found the current SecurityModule interface might be enough for
> > > supporting other security mechanisms. However, the loading of security
> > > modules are hard-coded, not configuration based. I think we can extend
> > > SecurityUtils to load from configurations. So we can implement our own
> > > security mechanism in our internal repo, and have flink jobs to load it
> > at
> > > runtime.
> > >
> > > Please let me know your comments. Thanks a lot.
> > >
> > > On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <eronwri...@gmail.com>
> > wrote:
> > >
> > >> I agree that it is reasonable to use Hadoop DTs as you describe.  That
> > >> approach is even recommended in YARN's documentation (see Securing
> > >> Long-lived YARN Services on the YARN Application Security page).   But
> > one
> > >> of the goals of Kerberos integration is to support Kerberized data
> > access
> > >> for connectors other than HDFS, such as Kafka, Cassandra, and
> > >> Elasticsearch.   So your second point makes sense too, suggesting a
> > >> general
> > >> architecture for managing secrets (DTs, keytabs, certificates, oauth
> > >> tokens, etc.) within the cluster.
> > >>
> > >> There's quite a few aspects to Flink security, including:
> > >> 1. data access (e.g. how a connector authenticates to a data source)
> > >> 2. service authorization and network security (e.g. how a Flink
> cluster
> > >> protects itself from unauthorized access)
> > >> 3. multi-user support (e.g. multi-user Flink clusters, RBAC)
> > >>
> > >> I mention these aspects to clarify your point about AuthN, which I
> took
> > to
> > >> be related to (1).   Do tell if I misunderstood.
> > >>
> > >> Eron
> > >>
> > >>
> > >> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <suez1...@gmail.com>
> > wrote:
> > >>
> > >> > Hi community,
> > >> >
> > >> > We are working on secure Flink on YARN. The current
> > Flink-Yarn-Kerberos
> > >> > integration will require each container of a job to log in Kerberos
> > via
> > >> > keytab every say, 24 hours, and does not use any Hadoop delegation
> > token
> > >> > mechanism except when localizing the container. As I fixed the
> current
> > >> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more
> > >> > features(FLINK-7860), I have some concern regarding the current
> > >> > implementation. It can pose a scalability issue to the KDC, e.g., if
> > >> YARN
> > >> > cluster is restarted and all 10s of thousands of containers suddenly
> > >> DDOS
> > >> > KDC.
> > >> >
> > >> > I would like to propose to improve the current Flink-YARN-Kerberos
> > >> > integration by doing something like the following:
> > >> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all
> > >> > required DTs for the job.
> > >> > 2) all other TM or TE containers periodically retrieve new DTs from
> > the
> > >> > AppMaster (either through a secure HDFS folder, or a secure Akka
> > >> channel)
> > >> >
> > >> > Also, we want to extend Flink to support pluggable AuthN mechanism,
> > >> because
> > >> > we have our own internal AuthN mechanism. We would like add support
> in
> > >> > Flink to authenticate periodically to our internal AuthN service as
> > well
> > >> > through, e.g., dynamic class loading, and use similar mechanism to
> > >> > distribute the credential from the appMaster to containers.
> > >> >
> > >> > I would like to get comments and feedbacks. I can also write a
> design
> > >> doc
> > >> > or create a Flip if needed. Thanks a lot.
> > >> >
> > >> > Shuyi
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > "So you have to trust that the dots will somehow connect in your
> > >> future."
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: Flink-Yarn-Kerberos integration

2018-01-17 Thread Shuyi Chen
Ping, any comments?  Thanks a lot.

Shuyi

On Wed, Jan 3, 2018 at 3:43 PM, Shuyi Chen <suez1...@gmail.com> wrote:

> Thanks a lot for the clarification, Eron. That's very helpful. Currently,
> we are more concerned about 1) data access, but will get to 2) and 3)
> eventually.
>
> I was thinking doing the following:
> 1) extend the current HadoopModule to use and refresh DTs as suggested on YARN
> Application Security docs.
> 2) I found the current SecurityModule interface might be enough for
> supporting other security mechanisms. However, the loading of security
> modules are hard-coded, not configuration based. I think we can extend
> SecurityUtils to load from configurations. So we can implement our own
> security mechanism in our internal repo, and have flink jobs to load it at
> runtime.
>
> Please let me know your comments. Thanks a lot.
>
> On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <eronwri...@gmail.com> wrote:
>
>> I agree that it is reasonable to use Hadoop DTs as you describe.  That
>> approach is even recommended in YARN's documentation (see Securing
>> Long-lived YARN Services on the YARN Application Security page).   But one
>> of the goals of Kerberos integration is to support Kerberized data access
>> for connectors other than HDFS, such as Kafka, Cassandra, and
>> Elasticsearch.   So your second point makes sense too, suggesting a
>> general
>> architecture for managing secrets (DTs, keytabs, certificates, oauth
>> tokens, etc.) within the cluster.
>>
>> There's quite a few aspects to Flink security, including:
>> 1. data access (e.g. how a connector authenticates to a data source)
>> 2. service authorization and network security (e.g. how a Flink cluster
>> protects itself from unauthorized access)
>> 3. multi-user support (e.g. multi-user Flink clusters, RBAC)
>>
>> I mention these aspects to clarify your point about AuthN, which I took to
>> be related to (1).   Do tell if I misunderstood.
>>
>> Eron
>>
>>
>> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <suez1...@gmail.com> wrote:
>>
>> > Hi community,
>> >
>> > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos
>> > integration will require each container of a job to log in Kerberos via
>> > keytab every say, 24 hours, and does not use any Hadoop delegation token
>> > mechanism except when localizing the container. As I fixed the current
>> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more
>> > features(FLINK-7860), I have some concern regarding the current
>> > implementation. It can pose a scalability issue to the KDC, e.g., if
>> YARN
>> > cluster is restarted and all 10s of thousands of containers suddenly
>> DDOS
>> > KDC.
>> >
>> > I would like to propose to improve the current Flink-YARN-Kerberos
>> > integration by doing something like the following:
>> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all
>> > required DTs for the job.
>> > 2) all other TM or TE containers periodically retrieve new DTs from the
>> > AppMaster (either through a secure HDFS folder, or a secure Akka
>> channel)
>> >
>> > Also, we want to extend Flink to support pluggable AuthN mechanism,
>> because
>> > we have our own internal AuthN mechanism. We would like add support in
>> > Flink to authenticate periodically to our internal AuthN service as well
>> > through, e.g., dynamic class loading, and use similar mechanism to
>> > distribute the credential from the appMaster to containers.
>> >
>> > I would like to get comments and feedbacks. I can also write a design
>> doc
>> > or create a Flip if needed. Thanks a lot.
>> >
>> > Shuyi
>> >
>> >
>> >
>> > --
>> > "So you have to trust that the dots will somehow connect in your
>> future."
>> >
>>
>
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [DISCUSS] Releasing Flink 1.4.1

2018-01-15 Thread Shuyi Chen
Can we add this to 1.4.1 release? The flink-yarn-kerberos deployment is not
working in 1.4.0.

https://issues.apache.org/jira/browse/FLINK-8275

Thanks a lot.

On Mon, Jan 15, 2018 at 7:53 AM, vishal  wrote:

> Hello folks,
>
> https://issues.apache.org/jira/browse/FLINK-8226 . We would want a
> schedule on the 1.4.1 version. Without checkpointing and state, CEP is in
> effect not production ready IMHO. Is there any time line on 1.4.1 ? The
> earlier the better.
>
> Thank you and Regards,
>
> Vishal
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-8401) Allow subclass to override write-failure behavior in CassandraOutputFormat

2018-01-10 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8401:
-

 Summary: Allow subclass to override write-failure behavior in 
CassandraOutputFormat 
 Key: FLINK-8401
 URL: https://issues.apache.org/jira/browse/FLINK-8401
 Project: Flink
  Issue Type: Improvement
Reporter: Shuyi Chen
Assignee: Shuyi Chen


Currently it will throw an exception and fail the entire job, we would like to 
keep the current default behavior, but refactor the code to allow subclass to 
override and customize the failure handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8397) Support ROW type in CassandraOutputFormat

2018-01-09 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8397:
-

 Summary: Support ROW type in CassandraOutputFormat
 Key: FLINK-8397
 URL: https://issues.apache.org/jira/browse/FLINK-8397
 Project: Flink
  Issue Type: Improvement
 Environment: Currently, only tuple is supported.
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Flink-Yarn-Kerberos integration

2018-01-03 Thread Shuyi Chen
Thanks a lot for the clarification, Eron. That's very helpful. Currently,
we are more concerned about 1) data access, but will get to 2) and 3)
eventually.

I was thinking doing the following:
1) extend the current HadoopModule to use and refresh DTs as suggested on YARN
Application Security docs.
2) I found the current SecurityModule interface might be enough for
supporting other security mechanisms. However, the loading of security
modules are hard-coded, not configuration based. I think we can extend
SecurityUtils to load from configurations. So we can implement our own
security mechanism in our internal repo, and have flink jobs to load it at
runtime.

Please let me know your comments. Thanks a lot.

On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <eronwri...@gmail.com> wrote:

> I agree that it is reasonable to use Hadoop DTs as you describe.  That
> approach is even recommended in YARN's documentation (see Securing
> Long-lived YARN Services on the YARN Application Security page).   But one
> of the goals of Kerberos integration is to support Kerberized data access
> for connectors other than HDFS, such as Kafka, Cassandra, and
> Elasticsearch.   So your second point makes sense too, suggesting a general
> architecture for managing secrets (DTs, keytabs, certificates, oauth
> tokens, etc.) within the cluster.
>
> There's quite a few aspects to Flink security, including:
> 1. data access (e.g. how a connector authenticates to a data source)
> 2. service authorization and network security (e.g. how a Flink cluster
> protects itself from unauthorized access)
> 3. multi-user support (e.g. multi-user Flink clusters, RBAC)
>
> I mention these aspects to clarify your point about AuthN, which I took to
> be related to (1).   Do tell if I misunderstood.
>
> Eron
>
>
> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <suez1...@gmail.com> wrote:
>
> > Hi community,
> >
> > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos
> > integration will require each container of a job to log in Kerberos via
> > keytab every say, 24 hours, and does not use any Hadoop delegation token
> > mechanism except when localizing the container. As I fixed the current
> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more
> > features(FLINK-7860), I have some concern regarding the current
> > implementation. It can pose a scalability issue to the KDC, e.g., if YARN
> > cluster is restarted and all 10s of thousands of containers suddenly DDOS
> > KDC.
> >
> > I would like to propose to improve the current Flink-YARN-Kerberos
> > integration by doing something like the following:
> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all
> > required DTs for the job.
> > 2) all other TM or TE containers periodically retrieve new DTs from the
> > AppMaster (either through a secure HDFS folder, or a secure Akka channel)
> >
> > Also, we want to extend Flink to support pluggable AuthN mechanism,
> because
> > we have our own internal AuthN mechanism. We would like add support in
> > Flink to authenticate periodically to our internal AuthN service as well
> > through, e.g., dynamic class loading, and use similar mechanism to
> > distribute the credential from the appMaster to containers.
> >
> > I would like to get comments and feedbacks. I can also write a design doc
> > or create a Flip if needed. Thanks a lot.
> >
> > Shuyi
> >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>



-- 
"So you have to trust that the dots will somehow connect in your future."


Flink-Yarn-Kerberos integration

2017-12-20 Thread Shuyi Chen
Hi community,

We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos
integration will require each container of a job to log in Kerberos via
keytab every say, 24 hours, and does not use any Hadoop delegation token
mechanism except when localizing the container. As I fixed the current
Flink-Yarn-Kerberos (FLINK-8275) and tried to add more
features(FLINK-7860), I have some concern regarding the current
implementation. It can pose a scalability issue to the KDC, e.g., if YARN
cluster is restarted and all 10s of thousands of containers suddenly DDOS
KDC.

I would like to propose to improve the current Flink-YARN-Kerberos
integration by doing something like the following:
1) AppMaster (JobManager) periodically authenticate the KDC, get all
required DTs for the job.
2) all other TM or TE containers periodically retrieve new DTs from the
AppMaster (either through a secure HDFS folder, or a secure Akka channel)

Also, we want to extend Flink to support pluggable AuthN mechanism, because
we have our own internal AuthN mechanism. We would like add support in
Flink to authenticate periodically to our internal AuthN service as well
through, e.g., dynamic class loading, and use similar mechanism to
distribute the credential from the appMaster to containers.

I would like to get comments and feedbacks. I can also write a design doc
or create a Flip if needed. Thanks a lot.

Shuyi



-- 
"So you have to trust that the dots will somehow connect in your future."


[jira] [Created] (FLINK-8286) Investigate Flink-Yarn-Kerberos integration for flip-6

2017-12-18 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8286:
-

 Summary: Investigate Flink-Yarn-Kerberos integration for flip-6
 Key: FLINK-8286
 URL: https://issues.apache.org/jira/browse/FLINK-8286
 Project: Flink
  Issue Type: Task
  Components: Security
Reporter: Shuyi Chen
Assignee: Shuyi Chen
Priority: Blocker
 Fix For: 1.5.0


We've found some issues with the Flink-Yarn-Kerberos integration in the current 
deployment model, we will need to investigate and test it for flip-6 when it's 
ready.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8275) Flink YARN deployment with Kerberos enabled not working

2017-12-17 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8275:
-

 Summary: Flink YARN deployment with Kerberos enabled not working 
 Key: FLINK-8275
 URL: https://issues.apache.org/jira/browse/FLINK-8275
 Project: Flink
  Issue Type: Bug
  Components: Security
Affects Versions: 1.4.0
Reporter: Shuyi Chen
Assignee: Shuyi Chen
Priority: Blocker
 Fix For: 1.5.0


The local keytab path in YarnTaskManagerRunner is incorrectly set to the 
ApplicationMaster's local keytab path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8039) Support "CREATE TYPE" DDL in Flink SQL

2017-11-08 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8039:
-

 Summary: Support "CREATE TYPE" DDL in Flink SQL
 Key: FLINK-8039
 URL: https://issues.apache.org/jira/browse/FLINK-8039
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen


Allow us to create custom types using DDL, e.g.,

{code:java}
CREATE TYPE myrowtype AS (f1 INTEGER, f2 VARCHAR(10));
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8003) Support Calcite's ROW value constructor in Flink SQL

2017-11-06 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-8003:
-

 Summary: Support Calcite's ROW value constructor in Flink SQL
 Key: FLINK-8003
 URL: https://issues.apache.org/jira/browse/FLINK-8003
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7860) Support YARN proxy user in Flink (impersonation)

2017-10-17 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7860:
-

 Summary: Support YARN proxy user in Flink (impersonation)
 Key: FLINK-7860
 URL: https://issues.apache.org/jira/browse/FLINK-7860
 Project: Flink
  Issue Type: New Feature
  Components: YARN
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7690) Do not call actorSystem.awaitTermination from the main akka message handling thread

2017-09-26 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7690:
-

 Summary: Do not call actorSystem.awaitTermination from the main 
akka message handling thread
 Key: FLINK-7690
 URL: https://issues.apache.org/jira/browse/FLINK-7690
 Project: Flink
  Issue Type: Bug
Reporter: Shuyi Chen
Assignee: Shuyi Chen


In flip-6, this bug causes the yarn job to hang forever with RUNNING status 
when the enclosing flink job has already failed/finished.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7658) Support COLLECT Aggregate function in Flink TABLE API

2017-09-20 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7658:
-

 Summary: Support COLLECT Aggregate function in Flink TABLE API
 Key: FLINK-7658
 URL: https://issues.apache.org/jira/browse/FLINK-7658
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7546) Support SUBMULTISET_OF Operator for Multiset SQL type

2017-08-28 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7546:
-

 Summary: Support SUBMULTISET_OF Operator for Multiset SQL type
 Key: FLINK-7546
 URL: https://issues.apache.org/jira/browse/FLINK-7546
 Project: Flink
  Issue Type: New Feature
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7545) Support MEMBER OF Operator for Multiset SQL type

2017-08-28 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7545:
-

 Summary: Support MEMBER OF Operator for Multiset SQL type
 Key: FLINK-7545
 URL: https://issues.apache.org/jira/browse/FLINK-7545
 Project: Flink
  Issue Type: New Feature
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7491) Support COLLECT Aggregate function in Flink SQL

2017-08-22 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7491:
-

 Summary: Support COLLECT Aggregate function in Flink SQL
 Key: FLINK-7491
 URL: https://issues.apache.org/jira/browse/FLINK-7491
 Project: Flink
  Issue Type: New Feature
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7003) "select * from" in Flink SQL should not flatten all fields in the table by default

2017-06-25 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-7003:
-

 Summary: "select * from" in Flink SQL should not flatten all 
fields in the table by default
 Key: FLINK-7003
 URL: https://issues.apache.org/jira/browse/FLINK-7003
 Project: Flink
  Issue Type: Bug
Reporter: Shuyi Chen


Currently, CompositeRelDataType is extended from 
RelRecordType(StructKind.PEEK_FIELDS, ...).  In Calcite, StructKind.PEEK_FIELDS 
would allow us to peek fields for nested types. However, when we use "select * 
from", calcite will flatten all nested fields that is marked as 
StructKind.PEEK_FIELDS in the table. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)