[jira] [Created] (FLINK-15050) DataFormatConverters will fail when timestamp type precision more than 3

2019-12-03 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-15050:


 Summary: DataFormatConverters will fail when timestamp type 
precision more than 3
 Key: FLINK-15050
 URL: https://issues.apache.org/jira/browse/FLINK-15050
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jingsong Lee
 Fix For: 1.10.0


Should have tests to cover these behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15049) Compile bug when hash join with timestamp type key

2019-12-03 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-15049:


 Summary: Compile bug when hash join with timestamp type key
 Key: FLINK-15049
 URL: https://issues.apache.org/jira/browse/FLINK-15049
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jingsong Lee
 Fix For: 1.10.0


Now, internal format of timestamp type has been modified to SqlTimestamp, but 
in LongHashJoinGenerator, still convert it to long directly in genGetLongKey.

This should be a bug when hash join with timestamp type key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-72: Introduce Pulsar Connector

2019-12-03 Thread Becket Qin
Yes, you are absolutely right. Cannot believe I posted in the wrong
thread...

On Wed, Dec 4, 2019 at 1:46 PM Jark Wu  wrote:

> Thanks Becket the the updating,
>
> But shouldn't this message be posted in FLIP-27 discussion thread[1]?
>
>
> Best,
> Jark
>
> [1]:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952.html
>
> On Wed, 4 Dec 2019 at 12:12, Becket Qin  wrote:
>
> > Hi all,
> >
> > Sorry for the long belated update. I have updated FLIP-27 wiki page with
> > the latest proposals. Some noticeable changes include:
> > 1. A new generic communication mechanism between SplitEnumerator and
> > SourceReader.
> > 2. Some detail API method signature changes.
> >
> > We left a few things out of this FLIP and will address them in separate
> > FLIPs. Including:
> > 1. Per split event time.
> > 2. Event time alignment.
> > 3. Fine grained failover for SplitEnumerator failure.
> >
> > Please let us know if you have any question.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Nov 19, 2019 at 10:28 AM Yijie Shen 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I've put the catalog part design in separate doc with more details for
> > > easier communication.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing
> > >
> > > I would love to hear your thoughts on this.
> > >
> > > Best,
> > > Yijie
> > >
> > > On Mon, Oct 21, 2019 at 11:15 AM Yijie Shen  >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Glad to receive your valuable feedbacks.
> > > >
> > > > I'd first separate the Pulsar catalog as another doc and show more
> > design
> > > > and implementation details there.
> > > >
> > > > For the current FLIP-72, I would separate it into the sink part for
> > > > current work and keep the source part as future works until we reach
> > > > FLIP-27 finals.
> > > >
> > > > I also reply to some of the comments in the design doc. I will
> rewrite
> > > the
> > > > catalog part in regarding to Bowen's advice in both email and
> comments.
> > > >
> > > > Thanks for the help again.
> > > >
> > > > Best,
> > > > Yijie
> > > >
> > > > On Fri, Oct 18, 2019 at 12:40 AM Rong Rong 
> > wrote:
> > > >
> > > >> Hi Yijie,
> > > >>
> > > >> I also agree with Jark on separating the Catalog part into another
> > FLIP.
> > > >>
> > > >> With FLIP-27[1] also in the air, it is also probably great to split
> > and
> > > >> unblock the sink implementation contribution.
> > > >> I would suggest either putting in a detail implementation plan
> section
> > > in
> > > >> the doc, or (maybe too much separation?) splitting them into
> different
> > > >> FLIPs. What do you guys think?
> > > >>
> > > >> --
> > > >> Rong
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > >>
> > > >> On Wed, Oct 16, 2019 at 9:00 PM Jark Wu  wrote:
> > > >>
> > > >> > Hi Yijie,
> > > >> >
> > > >> > Thanks for the design document. I agree with Bowen that the
> catalog
> > > part
> > > >> > needs more details.
> > > >> > And I would suggest to separate Pulsar Catalog as another FLIP.
> IMO,
> > > it
> > > >> has
> > > >> > little to do with source/sink.
> > > >> > Having a separate FLIP can unblock the contribution for sink (or
> > > source)
> > > >> > and keep the discussion more focus.
> > > >> > I also left some comments in the documentation.
> > > >> >
> > > >> > Thanks,
> > > >> > Jark
> > > >> >
> > > >> > On Thu, 17 Oct 2019 at 11:24, Yijie Shen <
> henry.yijies...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Bowen,
> > > >> > >
> > > >> > > Thanks for your comments. I'll add catalog details as you
> > suggested.
> > > >> > >
> > > >> > > One more question: since we decide to not implement source part
> of
> > > the
> > > >> > > connector at the moment.
> > > >> > > What can users do with a Pulsar catalog?
> > > >> > > Create a table backed by Pulsar and check existing pulsar tables
> > to
> > > >> see
> > > >> > > their schemas? Drop tables maybe?
> > > >> > >
> > > >> > > Best,
> > > >> > > Yijie
> > > >> > >
> > > >> > > On Thu, Oct 17, 2019 at 1:04 AM Bowen Li 
> > > wrote:
> > > >> > >
> > > >> > > > Hi Yijie,
> > > >> > > >
> > > >> > > > Per the discussion, maybe you can move pulsar source to
> 'future
> > > >> work'
> > > >> > > > section in the FLIP for now?
> > > >> > > >
> > > >> > > > Besides, the FLIP seems to be quite rough at the moment, and
> I'd
> > > >> > > recommend
> > > >> > > > to add more details .
> > > >> > > >
> > > >> > > > A few questions mainly regarding the proposed pulsar catalog.
> > > >> > > >
> > > >> > > >- Can you provide some background of pulsar schema registry
> > and
> > > >> how
> > > >> > it
> > > >> > > >works?
> > > >> > > >- The proposed design of pulsar catalog is very vague now,
> > can
> > > >> you
> > > >> > > >share some details of how a

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2019-12-03 Thread Becket Qin
Hi all,

Sorry for the long belated update. I have updated FLIP-27 wiki page with
the latest proposals. Some noticeable changes include:
1. A new generic communication mechanism between SplitEnumerator and
SourceReader.
2. Some detail API method signature changes.

We left a few things out of this FLIP and will address them in separate
FLIPs. Including:
1. Per split event time.
2. Event time alignment.
3. Fine grained failover for SplitEnumerator failure.

Please let us know if you have any question.

Thanks,

Jiangjie (Becket) Qin

On Sat, Nov 16, 2019 at 6:10 AM Stephan Ewen  wrote:

> Hi  Łukasz!
>
> Becket and me are working hard on figuring out the last details and
> implementing the first PoC. We would update the FLIP hopefully next week.
>
> There is a fair chance that a first version of this will be in 1.10, but I
> think it will take another release to battle test it and migrate the
> connectors.
>
> Best,
> Stephan
>
>
>
>
> On Fri, Nov 15, 2019 at 11:14 AM Łukasz Jędrzejewski  wrote:
>
> > Hi,
> >
> > This proposal looks very promising for us. Do you have any plans in which
> > Flink release it is going to be released? We are thinking on using a Data
> > Set API for our future use cases but on the other hand Data Set API is
> > going to be deprecated so using proposed bounded data streams solution
> > could be more viable in the long term.
> >
> > Thanks,
> > Łukasz
> >
> > On 2019/10/01 15:48:03, Thomas Weise  wrote:
> > > Thanks for putting together this proposal!
> > >
> > > I see that the "Per Split Event Time" and "Event Time Alignment"
> sections
> > > are still TBD.
> > >
> > > It would probably be good to flesh those out a bit before proceeding
> too
> > far
> > > as the event time alignment will probably influence the interaction
> with
> > > the split reader, specifically ReaderStatus emitNext(SourceOutput
> > > output).
> > >
> > > We currently have only one implementation for event time alignment in
> the
> > > Kinesis consumer. The synchronization in that case takes place as the
> > last
> > > step before records are emitted downstream (RecordEmitter). With the
> > > currently proposed interfaces, the equivalent can be implemented in the
> > > reader loop, although note that in the Kinesis consumer the per shard
> > > threads push records.
> > >
> > > Synchronization has not been implemented for the Kafka consumer yet.
> > >
> > > https://issues.apache.org/jira/browse/FLINK-12675
> > >
> > > When I looked at it, I realized that the implementation will look quite
> > > different
> > > from Kinesis because it needs to take place in the pull part, where
> > records
> > > are taken from the Kafka client. Due to the multiplexing it cannot be
> > done
> > > by blocking the split thread like it currently works for Kinesis.
> Reading
> > > from individual Kafka partitions needs to be controlled via
> pause/resume
> > > on the Kafka client.
> > >
> > > To take on that responsibility the split thread would need to be aware
> of
> > > the
> > > watermarks or at least whether it should or should not continue to
> > consume
> > > a given split and this may require a different SourceReader or
> > SourceOutput
> > > interface.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Fri, Jul 26, 2019 at 1:39 AM Biao Liu  wrote:
> > >
> > > > Hi Stephan,
> > > >
> > > > Thank you for feedback!
> > > > Will take a look at your branch before public discussing.
> > > >
> > > >
> > > > On Fri, Jul 26, 2019 at 12:01 AM Stephan Ewen 
> > wrote:
> > > >
> > > > > Hi Biao!
> > > > >
> > > > > Thanks for reviving this. I would like to join this discussion, but
> > am
> > > > > quite occupied with the 1.9 release, so can we maybe pause this
> > > > discussion
> > > > > for a week or so?
> > > > >
> > > > > In the meantime I can share some suggestion based on prior
> > experiments:
> > > > >
> > > > > How to do watermarks / timestamp extractors in a simpler and more
> > > > flexible
> > > > > way. I think that part is quite promising should be part of the new
> > > > source
> > > > > interface.
> > > > >
> > > > >
> > > >
> >
> https://github.com/StephanEwen/flink/tree/source_interface/flink-core/src/main/java/org/apache/flink/api/common/eventtime
> > > > >
> > > > >
> > > > >
> > > >
> >
> https://github.com/StephanEwen/flink/blob/source_interface/flink-core/src/main/java/org/apache/flink/api/common/src/SourceOutput.java
> > > > >
> > > > >
> > > > >
> > > > > Some experiments on how to build the source reader and its library
> > for
> > > > > common threading/split patterns:
> > > > >
> > > > >
> > > >
> >
> https://github.com/StephanEwen/flink/tree/source_interface/flink-core/src/main/java/org/apache/flink/api/common/src
> > > > >
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Thu, Jul 25, 2019 at 10:03 AM Biao Liu 
> > wrote:
> > > > >
> > > > >> Hi devs,
> > > > >>
> > > > >> Since 1.9 is nearly released, I think we could get back to
> FLIP-27.
> > I
> > > > >> believe it shoul

Re: Building with Hadoop 3

2019-12-03 Thread vino yang
cc @Chesnay Schepler  to answer this question.

Foster, Craig  于2019年12月4日周三 上午1:22写道:

> Hi:
>
> I don’t see a JIRA for Hadoop 3 support. I see a comment on a JIRA here
> from a year ago that no one is looking into Hadoop 3 support [1]. Is there
> a document or JIRA that now exists which would point to what needs to be
> done to support Hadoop 3? Right now builds with Hadoop 3 don’t work
> obviously because there’s no flink-shaded-hadoop-3 artifacts.
>
>
>
> Thanks!
>
> Craig
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-11086
>
>
>


[jira] [Created] (FLINK-15048) Properly set internal.jobgraph-path when deploy job cluster

2019-12-03 Thread Zili Chen (Jira)
Zili Chen created FLINK-15048:
-

 Summary: Properly set internal.jobgraph-path when deploy job 
cluster
 Key: FLINK-15048
 URL: https://issues.apache.org/jira/browse/FLINK-15048
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.10.0
Reporter: Zili Chen
Assignee: Zili Chen
 Fix For: 1.10.0


Currently we don't ship this config. It works occasionally because we use the 
default value as the actual value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread vino yang
+1,

One concern: these two classes are marked with `@publicEvolving`
annotation.
Shall we mark them with `@Deprecated` annotation firstly?

Best,
Vino

Dian Fu  于2019年12月3日周二 下午8:56写道:

> +1 to remove them. It seems that we should also drop the class Option as
> it's currently only used in RequiredParameters.
>
> 在 2019年12月3日,下午8:34,Robert Metzger  写道:
>
> +1 on removing it.
>
> On Tue, Dec 3, 2019 at 12:31 PM Stephan Ewen  wrote:
>
>> I just stumbled across these classes recently and was looking for sample
>> uses.
>> No examples and other tests in the code base seem to
>> use RequiredParameters and OptionType.
>>
>> They also seem quite redundant with how ParameterTool itself works
>> (tool.getRequired()).
>>
>> Should we drop them, in an attempt to reduce unnecessary code and
>> confusion for users (multiple ways to do the same thing)? There are also
>> many better command line parsing libraries out there, this seems like
>> something we don't need to solve in Flink.
>>
>> Best,
>> Stephan
>>
>
>


[jira] [Created] (FLINK-15047) YarnDistributedCacheITCase is unstable

2019-12-03 Thread Zili Chen (Jira)
Zili Chen created FLINK-15047:
-

 Summary: YarnDistributedCacheITCase is unstable
 Key: FLINK-15047
 URL: https://issues.apache.org/jira/browse/FLINK-15047
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.10.0
Reporter: Zili Chen
 Fix For: 1.10.0


See also https://api.travis-ci.com/v3/job/262854881/log.txt

cc [~ZhenqiuHuang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-72: Introduce Pulsar Connector

2019-12-03 Thread Jark Wu
Thanks Becket the the updating,

But shouldn't this message be posted in FLIP-27 discussion thread[1]?


Best,
Jark

[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952.html

On Wed, 4 Dec 2019 at 12:12, Becket Qin  wrote:

> Hi all,
>
> Sorry for the long belated update. I have updated FLIP-27 wiki page with
> the latest proposals. Some noticeable changes include:
> 1. A new generic communication mechanism between SplitEnumerator and
> SourceReader.
> 2. Some detail API method signature changes.
>
> We left a few things out of this FLIP and will address them in separate
> FLIPs. Including:
> 1. Per split event time.
> 2. Event time alignment.
> 3. Fine grained failover for SplitEnumerator failure.
>
> Please let us know if you have any question.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Nov 19, 2019 at 10:28 AM Yijie Shen 
> wrote:
>
> > Hi everyone,
> >
> > I've put the catalog part design in separate doc with more details for
> > easier communication.
> >
> >
> >
> https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing
> >
> > I would love to hear your thoughts on this.
> >
> > Best,
> > Yijie
> >
> > On Mon, Oct 21, 2019 at 11:15 AM Yijie Shen 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Glad to receive your valuable feedbacks.
> > >
> > > I'd first separate the Pulsar catalog as another doc and show more
> design
> > > and implementation details there.
> > >
> > > For the current FLIP-72, I would separate it into the sink part for
> > > current work and keep the source part as future works until we reach
> > > FLIP-27 finals.
> > >
> > > I also reply to some of the comments in the design doc. I will rewrite
> > the
> > > catalog part in regarding to Bowen's advice in both email and comments.
> > >
> > > Thanks for the help again.
> > >
> > > Best,
> > > Yijie
> > >
> > > On Fri, Oct 18, 2019 at 12:40 AM Rong Rong 
> wrote:
> > >
> > >> Hi Yijie,
> > >>
> > >> I also agree with Jark on separating the Catalog part into another
> FLIP.
> > >>
> > >> With FLIP-27[1] also in the air, it is also probably great to split
> and
> > >> unblock the sink implementation contribution.
> > >> I would suggest either putting in a detail implementation plan section
> > in
> > >> the doc, or (maybe too much separation?) splitting them into different
> > >> FLIPs. What do you guys think?
> > >>
> > >> --
> > >> Rong
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > >>
> > >> On Wed, Oct 16, 2019 at 9:00 PM Jark Wu  wrote:
> > >>
> > >> > Hi Yijie,
> > >> >
> > >> > Thanks for the design document. I agree with Bowen that the catalog
> > part
> > >> > needs more details.
> > >> > And I would suggest to separate Pulsar Catalog as another FLIP. IMO,
> > it
> > >> has
> > >> > little to do with source/sink.
> > >> > Having a separate FLIP can unblock the contribution for sink (or
> > source)
> > >> > and keep the discussion more focus.
> > >> > I also left some comments in the documentation.
> > >> >
> > >> > Thanks,
> > >> > Jark
> > >> >
> > >> > On Thu, 17 Oct 2019 at 11:24, Yijie Shen  >
> > >> > wrote:
> > >> >
> > >> > > Hi Bowen,
> > >> > >
> > >> > > Thanks for your comments. I'll add catalog details as you
> suggested.
> > >> > >
> > >> > > One more question: since we decide to not implement source part of
> > the
> > >> > > connector at the moment.
> > >> > > What can users do with a Pulsar catalog?
> > >> > > Create a table backed by Pulsar and check existing pulsar tables
> to
> > >> see
> > >> > > their schemas? Drop tables maybe?
> > >> > >
> > >> > > Best,
> > >> > > Yijie
> > >> > >
> > >> > > On Thu, Oct 17, 2019 at 1:04 AM Bowen Li 
> > wrote:
> > >> > >
> > >> > > > Hi Yijie,
> > >> > > >
> > >> > > > Per the discussion, maybe you can move pulsar source to 'future
> > >> work'
> > >> > > > section in the FLIP for now?
> > >> > > >
> > >> > > > Besides, the FLIP seems to be quite rough at the moment, and I'd
> > >> > > recommend
> > >> > > > to add more details .
> > >> > > >
> > >> > > > A few questions mainly regarding the proposed pulsar catalog.
> > >> > > >
> > >> > > >- Can you provide some background of pulsar schema registry
> and
> > >> how
> > >> > it
> > >> > > >works?
> > >> > > >- The proposed design of pulsar catalog is very vague now,
> can
> > >> you
> > >> > > >share some details of how a pulsar catalog would work
> > internally?
> > >> > E.g.
> > >> > > >   - which APIs does it support exactly? E.g. I see from your
> > >> > > >   prototype that table creation is supported but not
> > alteration.
> > >> > > >   - is it going to connect to a pulsar schema registry via a
> > >> http
> > >> > > >   client or a pulsar client, etc
> > >> > > >   - will it be able to handle multiple versions of pulsar,
> or
> > >> just
> > >> > > >   one? How is compatibility handl

Re: [DISCUSS] FLIP-72: Introduce Pulsar Connector

2019-12-03 Thread Becket Qin
Hi all,

Sorry for the long belated update. I have updated FLIP-27 wiki page with
the latest proposals. Some noticeable changes include:
1. A new generic communication mechanism between SplitEnumerator and
SourceReader.
2. Some detail API method signature changes.

We left a few things out of this FLIP and will address them in separate
FLIPs. Including:
1. Per split event time.
2. Event time alignment.
3. Fine grained failover for SplitEnumerator failure.

Please let us know if you have any question.

Thanks,

Jiangjie (Becket) Qin

On Tue, Nov 19, 2019 at 10:28 AM Yijie Shen 
wrote:

> Hi everyone,
>
> I've put the catalog part design in separate doc with more details for
> easier communication.
>
>
> https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing
>
> I would love to hear your thoughts on this.
>
> Best,
> Yijie
>
> On Mon, Oct 21, 2019 at 11:15 AM Yijie Shen 
> wrote:
>
> > Hi everyone,
> >
> > Glad to receive your valuable feedbacks.
> >
> > I'd first separate the Pulsar catalog as another doc and show more design
> > and implementation details there.
> >
> > For the current FLIP-72, I would separate it into the sink part for
> > current work and keep the source part as future works until we reach
> > FLIP-27 finals.
> >
> > I also reply to some of the comments in the design doc. I will rewrite
> the
> > catalog part in regarding to Bowen's advice in both email and comments.
> >
> > Thanks for the help again.
> >
> > Best,
> > Yijie
> >
> > On Fri, Oct 18, 2019 at 12:40 AM Rong Rong  wrote:
> >
> >> Hi Yijie,
> >>
> >> I also agree with Jark on separating the Catalog part into another FLIP.
> >>
> >> With FLIP-27[1] also in the air, it is also probably great to split and
> >> unblock the sink implementation contribution.
> >> I would suggest either putting in a detail implementation plan section
> in
> >> the doc, or (maybe too much separation?) splitting them into different
> >> FLIPs. What do you guys think?
> >>
> >> --
> >> Rong
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >>
> >> On Wed, Oct 16, 2019 at 9:00 PM Jark Wu  wrote:
> >>
> >> > Hi Yijie,
> >> >
> >> > Thanks for the design document. I agree with Bowen that the catalog
> part
> >> > needs more details.
> >> > And I would suggest to separate Pulsar Catalog as another FLIP. IMO,
> it
> >> has
> >> > little to do with source/sink.
> >> > Having a separate FLIP can unblock the contribution for sink (or
> source)
> >> > and keep the discussion more focus.
> >> > I also left some comments in the documentation.
> >> >
> >> > Thanks,
> >> > Jark
> >> >
> >> > On Thu, 17 Oct 2019 at 11:24, Yijie Shen 
> >> > wrote:
> >> >
> >> > > Hi Bowen,
> >> > >
> >> > > Thanks for your comments. I'll add catalog details as you suggested.
> >> > >
> >> > > One more question: since we decide to not implement source part of
> the
> >> > > connector at the moment.
> >> > > What can users do with a Pulsar catalog?
> >> > > Create a table backed by Pulsar and check existing pulsar tables to
> >> see
> >> > > their schemas? Drop tables maybe?
> >> > >
> >> > > Best,
> >> > > Yijie
> >> > >
> >> > > On Thu, Oct 17, 2019 at 1:04 AM Bowen Li 
> wrote:
> >> > >
> >> > > > Hi Yijie,
> >> > > >
> >> > > > Per the discussion, maybe you can move pulsar source to 'future
> >> work'
> >> > > > section in the FLIP for now?
> >> > > >
> >> > > > Besides, the FLIP seems to be quite rough at the moment, and I'd
> >> > > recommend
> >> > > > to add more details .
> >> > > >
> >> > > > A few questions mainly regarding the proposed pulsar catalog.
> >> > > >
> >> > > >- Can you provide some background of pulsar schema registry and
> >> how
> >> > it
> >> > > >works?
> >> > > >- The proposed design of pulsar catalog is very vague now, can
> >> you
> >> > > >share some details of how a pulsar catalog would work
> internally?
> >> > E.g.
> >> > > >   - which APIs does it support exactly? E.g. I see from your
> >> > > >   prototype that table creation is supported but not
> alteration.
> >> > > >   - is it going to connect to a pulsar schema registry via a
> >> http
> >> > > >   client or a pulsar client, etc
> >> > > >   - will it be able to handle multiple versions of pulsar, or
> >> just
> >> > > >   one? How is compatibility handles between different
> >> Flink-Pulsar
> >> > > versions?
> >> > > >   - will it support only reading from pulsar schema registry ,
> >> or
> >> > > >   both read/write? Will it work end-to-end in Flink SQL for
> >> users
> >> > to
> >> > > create
> >> > > >   and manipulate a pulsar table such as "CREATE TABLE t WITH
> >> > > >   PROPERTIES(type=pulsar)" and "DROP TABLE t"?
> >> > > >   - Is a pulsar topic always gonna be a non-partitioned table?
> >> How
> >> > is
> >> > > >   a partitioned topic mapped to a Flink table?
> >> > > >- How to map Flink's catalog/database na

Re: [DISCUSS] Voting from apache.org addresses

2019-12-03 Thread Dian Fu
Thanks Dawid for start this discussion.

I have the same feeling with Xuefu and Jingsong. Besides that, according to the 
bylaws, for some kinds of votes, only the votes from active PMC members are 
binding, such as product release. So an email address doesn't help here. Even 
if a vote is from a Flink committer, it is still non-binding.

Thanks,
Dian

> 在 2019年12月4日,上午10:37,Jingsong Lee  写道:
> 
> Thanks Dawid for driving this discussion.
> 
> +1 to Xuefu's viewpoint.
> I am not a Flink committer, but sometimes I use apache email address to
> send email.
> 
> Another way is that we require the binding ticket to must contain "binding".
> Otherwise it must be a "non-binding" ticket.
> In this way, we can let lazy people continue voting without any suffix too.
> 
> Best,
> Jingsong Lee
> 
> On Wed, Dec 4, 2019 at 3:58 AM Xuefu Z  wrote:
> 
>> Hi Dawid,
>> 
>> Thanks for initiating this discussion. I understand the problem you
>> described, but the solution might not work as having an apache.org email
>> address doesn't necessary mean it's from a Flink committer. This certainly
>> applies to me.
>> 
>> It probably helps for the voters to identify themselves by specifying
>> either "binding" or "non-binding", though I understand this cannot be
>> enforced but serves a general guideline.
>> 
>> Thanks,
>> Xuefu
>> 
>> On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I wanted to reach out primarily to the Flink's committers. I think
>>> whenever we cast a vote on a proposal, is it a FLIP, release candidate
>>> or any other proposal, we should use our apache.org email address.
>>> 
>>> It is not an easy task to check if a person voting is a committer/PMC if
>>> we do not work with him/her on a daily basis. This is important for
>>> verifying if a vote is binding or not.
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> 
>>> 
>> 
>> --
>> Xuefu Zhang
>> 
>> "In Honey We Trust!"
>> 
> 
> 
> -- 
> Best, Jingsong Lee



Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu
Hi Becket,

Thanks for the kind remind. Definitely agree with you. I have updated the 
progress of this vote on the discussion thread[1] and submitted a PR which 
updates the flink website on how to report security issues.

Thanks,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
 

> 在 2019年12月4日,上午7:29,Becket Qin  写道:
> 
> Hi Dian,
> 
> Thanks for driving the effort regardless.
> 
> Even if we don't setup a security@f.a.o ML for Flink, we probably should
> have a clear pointer to the ASF guideline and secur...@apache.org in the
> project website. I think many people are not aware of the
> secur...@apache.org address. If they failed to find information in the
> Flink site, they will simply assume there is no special procedure for
> security problems.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Tue, Dec 3, 2019 at 4:54 PM Dian Fu  wrote:
> 
>> Hi all,
>> 
>> Thanks everyone for participating this vote. As we have received only two
>> +1 and there is also one -1 for this vote, according to the bylaws, I'm
>> sorry to announce that this proposal was rejected.
>> 
>> Neverthless, I think we can always restart the discussion in the future if
>> we see more evidence that such a mailing list is necessary.
>> 
>> Thanks,
>> Dian
>> 
>> 
>>> 在 2019年12月3日,下午4:53,Dian Fu  写道:
>>> 
>>> Actually I have tried to find out the reason why so many apache projects
>> choose to set up a project specific security mailing list in case that the
>> general secur...@apache.org mailing list seems working well.
>> Unfortunately, there is no open discussions in these projects and there is
>> also no clear guideline/standard in the ASF site whether a project should
>> set up such a mailing list (The project specific security mailing list
>> seems only an optional and we noticed that at the beginning of the
>> discussion). This is also one of the main reasons we start such a
>> discussion to see if somebody has more thoughts about this.
>>> 
 在 2019年12月2日,下午6:03,Chesnay Schepler  写道:
 
 Would security@f.a.o work as any other private ML?
 
 Contrary to what Becket said in the discussion thread,
>> secur...@apache.org is not just "another hop"; it provides guiding
>> material, the security team checks for activity and can be pinged easily as
>> they are cc'd in the initial report.
 
 I vastly prefer this over a separate mailing list; if these benefits
>> don't apply to security@f.a.o I'm -1 on this.
 
 On 02/12/2019 02:28, Becket Qin wrote:
> Thanks for driving this, Dian.
> 
> +1 from me, for the reasons I mentioned in the discussion thread.
> 
> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu 
>> wrote:
> 
>> NOTE: Only PMC votes is binding.
>> 
>> Thanks for sharing your thoughts. I also think that this doesn't fall
>> into
>> any of the existing categories listed in the bylaws. Maybe we could
>> do some
>> improvements for the bylaws.
>> 
>> This is not codebase change as Robert mentioned and it's related to
>> how to
>> manage Flink's development in a good way. So, I agree with Robert and
>> Jincheng that this VOTE should only count PMC votes for now.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年11月26日,上午11:43,jincheng sun  写道:
>>> 
>>> I also think that we should only count PMC votes.
>>> 
>>> This ML is to improve the security mechanism for Flink. Of course we
>> don't
>>> expect to use this
>>> ML often. I hope that it's perfect if this ML is never used.
>> However, the
>>> Flink community is growing rapidly, it's better to
>>> make our security mechanism as convenient as possible. But I agree
>> that
>>> this ML is not a must to have, it's nice to have.
>>> 
>>> So, I give the vote as +1(binding).
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Robert Metzger  于2019年11月25日周一 下午9:45写道:
>>> 
 I agree that we are only counting PMC votes (because this decision
>> goes
 beyond the codebase)
 
 I'm undecided what to vote :) I'm not against setting up a new
>> mailing
 list, but I also don't think the benefit (having a private list with
>> PMC +
 committers) is enough to justify the work involved. As far as I
>> remember,
 we have received 2 security issue notices, both basically about the
>> same
 issue.  I'll leave it to other PMC members to support this if they
>> want
>> to
 ...
 
 
 On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>> dwysakow...@apache.org>
 wrote:
 
> Hi all,
> 
> What is th

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-12-03 Thread Dian Fu
Hi all,

Just sync the results of the vote for setup a mailing list security@f.a.o
that it has been rejected [1].

Another very important thing is that all the people agree that there should
be a guideline on how to report security issues in Flink website. Do you
think we should bring up a separate discussion/vote thread? If so, I will
do that. Personally I think that discussing on the PR is enough. What do
you think?

I have created a PR [2]. Appreciate if you can take a look at.

Regards,
Dian

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Setup-a-security-flink-apache-org-mailing-list-tt35205.html
[2] https://github.com/apache/flink-web/pull/287

On Thu, Nov 21, 2019 at 3:58 PM Dian Fu  wrote:

> Hi all,
>
> There are no new feedbacks and it seems that we have received enough
> feedback about setup a secur...@flink.apache.org mailing list[1] for
> security report and discussion. It shows that it's optional as we can use
> either secur...@flink.apache.org or secur...@apache.org. So I'd like to
> start the vote for setup a secur...@flink.apache.org mailing list to make
> the final decision.
>
> Thanks,
> Dian
>
> 在 2019年11月19日,下午6:06,Dian Fu  写道:
>
> Hi all,
>
> Thanks for sharing your thoughts. Appreciated! Let me try to summarize the
> information and thoughts received so far. Please feel free to let me know
> if there is anything wrong or missing.
>
> 1. Setup project specific security mailing list
> Pros:
> - The security reports received by secur...@apache.org will be forwarded
> to the project private(PMC) mailing list. Having a project specific
> security mailing list is helpful in cases when the best person to address
> the security issue is not a PMC member, but a committer. It makes things
> simple as everyone(both PMCs and committers) is on the same table.
> - Even though the security issues are usually rare, they could be
> devastating and thus need to be treated seriously.
> - Most notable apache projects such as apache common, hadoop, spark,
> kafka, hive, etc have a security specific mailing list.
>
> Cons:
> - The ASF security mailing list secur...@apache.org could be used if
> there is no project specific security mailing list.
> - The number of security reports is very low.
>
> Additional information:
> - Security mailing list could only be subscribed by PMCs and committers.
> However everyone could report security issues to the security mailing list.
>
>
> 2. Guide users to report the security issues
> Why:
> - Security vulnerabilities should not be publicly disclosed (e.g. via dev
> ML or JIRA) until the project has responded. We should guide users on how
> to report security issues in Flink website.
>
> How:
> - Option 1: Set up secur...@flink.apache.org and ask users to report
> security issues there
> - Option 2: Ask users to send security report to secur...@apache.org
> - Option 3: Ask users to send security report directly to
> priv...@flink.apache.org
>
>
> 3. Dedicated page to show the security vulnerabilities
> - We may need a dedicated security page to describe the CVE list on the
> Flink website.
>
> I think it makes sense to open separate discussion thread on 2) and 3).
> I'll create separate discussion thread for them. Let's focus on 1) in this
> thread.
>
> If there is no other feedback on 1), I'll bring up a VOTE for this
> discussion.
>
> What do you think?
>
> Thanks,
> Dian
>
> On Fri, Nov 15, 2019 at 10:18 AM Becket Qin  wrote:
>
>> Thanks for bringing this up, Dian.
>>
>> +1 on creating a project specific security mailing list. My two cents, I
>> think it is worth doing in practice.
>>
>> Although the ASF security ML is always available, usually all the emails
>> are simply routed to the individual project PMC. This is an additional
>> hop.
>> And in some cases, the best person to address the reported issue may not
>> be
>> a PMC member, but a committer, so the PMC have to again involve them into
>> the loop. This make things unnecessarily complicated. Having a project
>> specific security ML would make it much easier to have everyone at the
>> same
>> table.
>>
>> Also, one thing to note is that even though the security issues are
>> usually
>> rare, they could be devastating, thus need to be treated seriously. So I
>> think it is a good idea to establish the handling mechanism regardless of
>> the frequency of the reported security vulnerabilities.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Nov 15, 2019 at 1:14 AM Yu Li  wrote:
>>
>> > Thanks for bringing up this discussion Dian! How to report security
>> bugs to
>> > our project is a very important topic!
>> >
>> > Big +1 on adding some explicit instructions in our document about how to
>> > report security issues, and I suggest to open another thread to vote the
>> > reporting way in Flink.
>> >
>> > FWIW, known options to report security issues include:
>> > 1. Set up secur...@flink.apache.org and ask users to report security
>> > issues
>> > there
>> > 2. Ask users to send

Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-03 Thread Wei Zhong
Hi Aljoscha,

Thanks for your reply! Before bringing up this discussion I did some research 
on commonly used separators for options that take multiple values. I have 
considered ",", ":" and "#". Finally I chose "#" as the separator of 
"--pyRequirements".

For ",", it is the most widely used separator. Many projects use it as the 
separator of the values in same level. e.g. "-Dexcludes" in Maven, "--files" in 
Spark and "-pyFiles" in Flink. But the second parameter of "--pyRequirements", 
the requirement cached directory, is not at the same level as its first 
parameter (the requirements file). It is secondary and is only needed when the 
packages in the requirements file can not be downloaded from the package index 
server.

For ":", it is used as a path separator in most cases. e.g. main arguments of 
scp (secure copy), "--volume" in Docker and "-cp" in Java. But as we support 
accept a URI as the file path, which contains ":" in most cases, ":" can not be 
used as the separator of "--pyRequirements".

For "#", it is really rarely used as a separator for multiple values. I only 
find Spark using "#" as the separator for option "--files" and "--archives" 
between file path and target file/directory name. After some research I find 
that this usage comes from the URI fragment. We can append a secondary resource 
as the fragment of the URI after a number sign ("#") character. As we treat 
user file paths as URIs when parsing command line, using "#" as the separator 
of "--pyRequirements" makes sense to me, which means the second parameter is 
the fragment of the first parameter. The definition of URI fragment can be 
found here [1].

The reason of using "#" in "--pyArchives" as the separator of file path and 
targer directory name is the same as above.

Best,
Wei

[1] https://tools.ietf.org/html/rfc3986#section-3.5

> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
> 
> Hi,
> 
> Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as a 
> separator for options that take two values seems a bit strange to me, did you 
> research if any other CLI tools have this convention?
> 
> Side note: I don’t like that our options use camel-case, I think that’s very 
> non-standard. But that’s how it is now…
> 
> Best,
> Aljoscha
> 
>> On 3. Dec 2019, at 10:14, jincheng sun  wrote:
>> 
>> Thanks for bringup this discussion Wei!
>> I think this is very important for Flink User, we should contains this
>> changes in Flink 1.10.
>> +1  for the optimization from the perspective of user convenience and the
>> unified use of Flink command line parameters.
>> 
>> Best,
>> Jincheng
>> 
>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
>> 
>>> Hi everyone,
>>> 
>>> I wanted to bring up the discussion of improving the Pyflink command line
>>> options.
>>> 
>>> A few command line options have been introduced in the FLIP-78 [1], i.e.
>>> "python-executable-path", "python-requirements","python-archive", etc.
>>> There are a few problems with these options, i.e. the naming style,
>>> variable argument options, etc.
>>> 
>>> We want to make some adjustment of FLIP-78 to improve the newly introduced
>>> command line options, here is the design doc:
>>> 
>>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> <
>>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
 
>>> Looking forward to your feedback!
>>> 
>>> Best,
>>> Wei
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>>> <
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
 
>>> 
>>> 
> 



Re: I love Flink

2019-12-03 Thread Caizhi Weng
Hi Boqi,

It's really nice to hear that you would like to contribute. However
community contribution rules have changed and now we don't have contributor
permissions.

To contribute, you can just open up a JIRA and share your ideas. Committers
will assign it to you after the discussion in JIRA ticket.

Thanks.

博奇  于2019年12月4日周三 上午11:18写道:

>
> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?My JIRA ID is
> humanhaunt.


Re: I love Flink

2019-12-03 Thread Jingsong Li
Hi boqi,

Now, you don't need permission now.
For JIRA creation, you can create JIRAs by yourself to report bugs or
improvements.
For JIRA assginment, only committers can assign JIRA to someone, if you
want to contribute some code, you can comment JIRA, and some committers can
help you to assign the JIRA.

Best,
Jingsong Lee

On Wed, Dec 4, 2019 at 11:18 AM 博奇  wrote:

>
> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?My JIRA ID is
> humanhaunt.



-- 
Best, Jingsong Lee


I love Flink

2019-12-03 Thread 博奇

Hi Guys,

I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?My JIRA ID is 
humanhaunt.

[jira] [Created] (FLINK-15046) Add guideline on how to report security issues

2019-12-03 Thread Dian Fu (Jira)
Dian Fu created FLINK-15046:
---

 Summary: Add guideline on how to report security issues
 Key: FLINK-15046
 URL: https://issues.apache.org/jira/browse/FLINK-15046
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
 Environment: As discussed in the 
[ML|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951]
 , there should be a guideline on how to report security issues in Flink 
website.
Reporter: Dian Fu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Voting from apache.org addresses

2019-12-03 Thread Jingsong Lee
Thanks Dawid for driving this discussion.

+1 to Xuefu's viewpoint.
I am not a Flink committer, but sometimes I use apache email address to
send email.

Another way is that we require the binding ticket to must contain "binding".
Otherwise it must be a "non-binding" ticket.
In this way, we can let lazy people continue voting without any suffix too.

Best,
Jingsong Lee

On Wed, Dec 4, 2019 at 3:58 AM Xuefu Z  wrote:

> Hi Dawid,
>
> Thanks for initiating this discussion. I understand the problem you
> described, but the solution might not work as having an apache.org email
> address doesn't necessary mean it's from a Flink committer. This certainly
> applies to me.
>
> It probably helps for the voters to identify themselves by specifying
> either "binding" or "non-binding", though I understand this cannot be
> enforced but serves a general guideline.
>
> Thanks,
> Xuefu
>
> On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz 
> wrote:
>
> > Hi,
> >
> > I wanted to reach out primarily to the Flink's committers. I think
> > whenever we cast a vote on a proposal, is it a FLIP, release candidate
> > or any other proposal, we should use our apache.org email address.
> >
> > It is not an easy task to check if a person voting is a committer/PMC if
> > we do not work with him/her on a daily basis. This is important for
> > verifying if a vote is binding or not.
> >
> > Best,
> >
> > Dawid
> >
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>


-- 
Best, Jingsong Lee


Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Becket Qin
Hi Dian,

Thanks for driving the effort regardless.

Even if we don't setup a security@f.a.o ML for Flink, we probably should
have a clear pointer to the ASF guideline and secur...@apache.org in the
project website. I think many people are not aware of the
secur...@apache.org address. If they failed to find information in the
Flink site, they will simply assume there is no special procedure for
security problems.

Thanks,

Jiangjie (Becket) Qin

On Tue, Dec 3, 2019 at 4:54 PM Dian Fu  wrote:

> Hi all,
>
> Thanks everyone for participating this vote. As we have received only two
> +1 and there is also one -1 for this vote, according to the bylaws, I'm
> sorry to announce that this proposal was rejected.
>
> Neverthless, I think we can always restart the discussion in the future if
> we see more evidence that such a mailing list is necessary.
>
> Thanks,
> Dian
>
>
> > 在 2019年12月3日,下午4:53,Dian Fu  写道:
> >
> > Actually I have tried to find out the reason why so many apache projects
> choose to set up a project specific security mailing list in case that the
> general secur...@apache.org mailing list seems working well.
> Unfortunately, there is no open discussions in these projects and there is
> also no clear guideline/standard in the ASF site whether a project should
> set up such a mailing list (The project specific security mailing list
> seems only an optional and we noticed that at the beginning of the
> discussion). This is also one of the main reasons we start such a
> discussion to see if somebody has more thoughts about this.
> >
> >> 在 2019年12月2日,下午6:03,Chesnay Schepler  写道:
> >>
> >> Would security@f.a.o work as any other private ML?
> >>
> >> Contrary to what Becket said in the discussion thread,
> secur...@apache.org is not just "another hop"; it provides guiding
> material, the security team checks for activity and can be pinged easily as
> they are cc'd in the initial report.
> >>
> >> I vastly prefer this over a separate mailing list; if these benefits
> don't apply to security@f.a.o I'm -1 on this.
> >>
> >> On 02/12/2019 02:28, Becket Qin wrote:
> >>> Thanks for driving this, Dian.
> >>>
> >>> +1 from me, for the reasons I mentioned in the discussion thread.
> >>>
> >>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu 
> wrote:
> >>>
>  NOTE: Only PMC votes is binding.
> 
>  Thanks for sharing your thoughts. I also think that this doesn't fall
> into
>  any of the existing categories listed in the bylaws. Maybe we could
> do some
>  improvements for the bylaws.
> 
>  This is not codebase change as Robert mentioned and it's related to
> how to
>  manage Flink's development in a good way. So, I agree with Robert and
>  Jincheng that this VOTE should only count PMC votes for now.
> 
>  Thanks,
>  Dian
> 
> > 在 2019年11月26日,上午11:43,jincheng sun  写道:
> >
> > I also think that we should only count PMC votes.
> >
> > This ML is to improve the security mechanism for Flink. Of course we
>  don't
> > expect to use this
> > ML often. I hope that it's perfect if this ML is never used.
> However, the
> > Flink community is growing rapidly, it's better to
> > make our security mechanism as convenient as possible. But I agree
> that
> > this ML is not a must to have, it's nice to have.
> >
> > So, I give the vote as +1(binding).
> >
> > Best,
> > Jincheng
> >
> > Robert Metzger  于2019年11月25日周一 下午9:45写道:
> >
> >> I agree that we are only counting PMC votes (because this decision
> goes
> >> beyond the codebase)
> >>
> >> I'm undecided what to vote :) I'm not against setting up a new
> mailing
> >> list, but I also don't think the benefit (having a private list with
>  PMC +
> >> committers) is enough to justify the work involved. As far as I
>  remember,
> >> we have received 2 security issue notices, both basically about the
> same
> >> issue.  I'll leave it to other PMC members to support this if they
> want
>  to
> >> ...
> >>
> >>
> >> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>  dwysakow...@apache.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> What is the voting scheme for it? I am not sure if it falls into
> any of
> >>> the categories we have listed in our bylaws. Are committers votes
> >>> binding or just PMCs'? (Personally I think it should be PMCs') Is
> this
>  a
> >>> binding vote or just an informational vote?
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>> On 25/11/2019 07:34, jincheng sun wrote:
>  +1
> 
>  Dian Fu  于2019年11月21日周四 下午4:11写道:
> 
> > Hi all,
> >
> > According to our previous discussion in [1], I'd like to bring
> up a
> >> vote
> > to set up a secur...@flink.apache.org mailing list.
> >
> > The vote will be open for at least 72 hours (excluding w

[jira] [Created] (FLINK-15045) SchedulerBase should only log the RestartStrategy in legacy scheduling mode

2019-12-03 Thread Gary Yao (Jira)
Gary Yao created FLINK-15045:


 Summary: SchedulerBase should only log the RestartStrategy in 
legacy scheduling mode
 Key: FLINK-15045
 URL: https://issues.apache.org/jira/browse/FLINK-15045
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.10.0


In ng scheduling, we configure {{ThrowingRestartStrategy}} in the execution 
graph to assert that legacy code paths are not executed. Hence, the restart 
strategy should not be logged in ng scheduling mode. Currently, the following 
obsolete log message is logged:

{noformat}
2019-12-03 20:22:14,426 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
- Using restart strategy 
org.apache.flink.runtime.executiongraph.restart.ThrowingRestartStrategy@44003e98
 for General purpose test job (9af8d0845d449f3c3447f817b2150bc8).
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15044) Clean up TpcdsResultComparator

2019-12-03 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-15044:


 Summary: Clean up TpcdsResultComparator
 Key: FLINK-15044
 URL: https://issues.apache.org/jira/browse/FLINK-15044
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.10.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen


The TpcdsResultComparator has some code style issues and the validation logic 
is very hard to understand. It also has a bug that can lead to ignoring one 
trailing line in the expected answers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Voting from apache.org addresses

2019-12-03 Thread Xuefu Z
Hi Dawid,

Thanks for initiating this discussion. I understand the problem you
described, but the solution might not work as having an apache.org email
address doesn't necessary mean it's from a Flink committer. This certainly
applies to me.

It probably helps for the voters to identify themselves by specifying
either "binding" or "non-binding", though I understand this cannot be
enforced but serves a general guideline.

Thanks,
Xuefu

On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz 
wrote:

> Hi,
>
> I wanted to reach out primarily to the Flink's committers. I think
> whenever we cast a vote on a proposal, is it a FLIP, release candidate
> or any other proposal, we should use our apache.org email address.
>
> It is not an easy task to check if a person voting is a committer/PMC if
> we do not work with him/her on a daily basis. This is important for
> verifying if a vote is binding or not.
>
> Best,
>
> Dawid
>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"


[jira] [Created] (FLINK-15043) Possible thread leak in task manager

2019-12-03 Thread Jordan Hatcher (Jira)
Jordan Hatcher created FLINK-15043:
--

 Summary: Possible thread leak in task manager
 Key: FLINK-15043
 URL: https://issues.apache.org/jira/browse/FLINK-15043
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.8.2
Reporter: Jordan Hatcher


We have a few flink clusters running version 1.8.2, with a mongo 4.2 database. 
We initially noticed a large number (16000-19000) of connections to the 
database, which we traced back to the flink task manager. Restarting the task 
managers dropped this number down to a few hundred connections. The number of 
database connections roughly corresponds to the number of task manager threads 
as reported by the metrics API 
(/taskmanagers/metrics?get=Status.JVM.Threads.Count) across a few different 
clusters, so we suspect there is some kind of thread leak occurring in the task 
manager. Running a few jobs through our flink workflow shows that the number of 
database connections is increasing by about 50 for each job we run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15042) Fix python compatibility by excluding the Env.executeAsync() (FLINK-14854)

2019-12-03 Thread Kostas Kloudas (Jira)
Kostas Kloudas created FLINK-15042:
--

 Summary: Fix python compatibility by excluding the 
Env.executeAsync() (FLINK-14854)
 Key: FLINK-15042
 URL: https://issues.apache.org/jira/browse/FLINK-15042
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15041) Remove default close() implementation from JobClient

2019-12-03 Thread Kostas Kloudas (Jira)
Kostas Kloudas created FLINK-15041:
--

 Summary: Remove default close() implementation from JobClient
 Key: FLINK-15041
 URL: https://issues.apache.org/jira/browse/FLINK-15041
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Stateful Functions - Contribution Details

2019-12-03 Thread Stephan Ewen
 That's great!
Thanks, Robert

On Tue, Dec 3, 2019 at 2:31 PM Robert Metzger  wrote:

> No concerns were raised. I created a repository:
> https://github.com/apache/flink-statefun
>
> Looking forward to the first PRs :)
>
> On Wed, Nov 20, 2019 at 2:43 AM tison  wrote:
>
> > Thanks for your summary Stephan. All entries make sense to me. Let's play
> > statefun :-)
> >
> > Best,
> > tison.
> >
> >
> > Stephan Ewen  于2019年11月20日周三 上午12:53写道:
> >
> > > I am also fine with skipping a FLIP, if no one objects.
> > >
> > > The discussion seemed rather converged (or stalled). There was a
> concern
> > > about the name, but in the absence of another candidate for the name, I
> > > would go ahead with the current one.
> > > For the other aspects, we seem to have converged in the discussion.
> > >
> > > Summary
> > >   - Repository name: "flink-statefun"
> > >   - Maven modules:
> > >  - group id: org.apache.flink
> > >  -  artifact ids: "statefun-*".
> > >  - Java package name: org.apache.flink.statefun.*
> > >  - Reuse the dev and user mailing lists of Flink
> > >  - Flink JIRA, with dedicated component
> > >
> > > Maybe one more point, which might have been implicit, but let me state
> it
> > > here explicitly:
> > >   - Because this is a regular part of the Flink project, common
> processes
> > > (like PRs, reviews, etc.) should be the same unless we find a reason to
> > > diverge.
> > >   - We could simplify the PR template (omit the flink-core specific
> > > checklist for serializers, public API, etc.)
> > >
> > > Please raise concerns soon, otherwise we would go ahead with this
> > proposal
> > > in a few days.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Tue, Nov 19, 2019 at 3:44 PM Igal Shilman 
> wrote:
> > >
> > > > Hi Robert,
> > > > Your proposal skipping FLIP and the vote sounds reasonable to me.
> > > >
> > > > The project is currently built (with tests, shading, spotbugs etc')
> in
> > > > around 2-3 minutes, but since it will reside in its own repository,
> it
> > > will
> > > > not affect Flink
> > > > build time.
> > > >
> > > > Thanks,
> > > > Igal
> > > >
> > > > On Tue, Nov 19, 2019 at 3:36 PM Robert Metzger 
> > > > wrote:
> > > >
> > > > > +1 on what has been decided so far in this thread (including using
> > the
> > > > same
> > > > > ML, and sticking to the statefun name).
> > > > >
> > > > > I'm not 100% sure if we need a FLIP for this, as we have VOTEd
> > already
> > > > with
> > > > > a 2/3 majority on accepting this contribution, and there are no
> > changes
> > > > to
> > > > > the Flink codebase, or user-facing APIs. I would be fine adding
> this
> > > > > without a FLIP.
> > > > >
> > > > > Is this contribution going to add substantial additional build time
> > > > > (especially tests)?
> > > > >
> > > > >
> > > > > On Tue, Nov 12, 2019 at 10:56 AM Stephan Ewen 
> > > wrote:
> > > > >
> > > > > > As mentioned before, the name was mainly chosen to resonate with
> > > > > developers
> > > > > > form a different background (applications / services) and we
> > checked
> > > it
> > > > > > with some users. Unrelated to Flink and Stream Processing, it
> > seemed
> > > to
> > > > > > describe the target use case pretty well.
> > > > > >
> > > > > > What would you use as a name instead?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Nov 12, 2019 at 10:10 AM Chesnay Schepler <
> > > ches...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I'm concerned both about the abbreviation and full name.
> > > > > > >
> > > > > > > a) It's not distinguishing enough from existing APIs,
> > specifically
> > > > the
> > > > > > > Streaming API, which already features stateful functions.
> > > > > > > b) It doesn't describe use-cases that the existing APIs cannot
> > > > satisfy.
> > > > > > >
> > > > > > > On 11/11/2019 15:28, Stephan Ewen wrote:
> > > > > > > > Thanks, all for the discussion!
> > > > > > > >
> > > > > > > > About the name:
> > > > > > > >
> > > > > > > >- Like Igal mentioned, the name "Stateful Functions" and
> the
> > > > > > > abbreviation
> > > > > > > > "statefun" underwent some iterations and testing with a small
> > > > sample
> > > > > of
> > > > > > > > developers from a few companies.
> > > > > > > >  If anyone has an amazing suggestion for another name,
> > please
> > > > > > share.
> > > > > > > > Would be great to also test it with a small sample of
> > developers
> > > > > from a
> > > > > > > few
> > > > > > > > companies, just to make sure we have at least a bit of
> outside
> > > > > > feedback.
> > > > > > > >
> > > > > > > >- fun vs. fn vs. func: I think these are more or less
> > > > equivalent,
> > > > > > > there
> > > > > > > > are examples of each one in some language. Working with the
> > code
> > > > over
> > > > > > the
> > > > > > > > last months, we found "statefun" to be somehow appealing.
> > > > > > > >  Maybe as a datapoint, Beam uses "DoFn" but pronounces it
> > > > > > "doo-fun".
> > > > > > > So,
> > > > 

[DISCUSS] Voting from apache.org addresses

2019-12-03 Thread Dawid Wysakowicz
Hi,

I wanted to reach out primarily to the Flink's committers. I think
whenever we cast a vote on a proposal, is it a FLIP, release candidate
or any other proposal, we should use our apache.org email address.

It is not an easy task to check if a person voting is a committer/PMC if
we do not work with him/her on a daily basis. This is important for
verifying if a vote is binding or not.

Best,

Dawid




signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-03 Thread Aljoscha Krettek
+1

Best,
Aljoscha

> On 2. Dec 2019, at 18:38, Konstantin Knauf  wrote:
> 
> +1 from my side to drop.
> 
> On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman  wrote:
> 
>> Hi all,
>> 
>> I'd like to discuss dropping vendor-specific deployment documentation from
>> Flink's official docs. To be clear, I am *NOT* suggesting we drop any of
>> the filesystem documentation, but the following three pages.
>> 
>> AWS:
>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
>> Google Compute Engine:
>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
>> MapR:
>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
>> 
>> Unlike the filesystems, these docs do not refer to components maintained by
>> the Apache Flink community, but external commercial services and products.
>> None of these pages are well maintained and I do not think the open-source
>> community can reasonably be expected to keep them up to date. In
>> particular,
>> 
>> 
>>   - The AWS page contains sparse information and mostly just links to the
>>   official EMR docs.
>>   - The Google Compute Engine page is out of date and the commands do not
>>   work.
>>   - MapR contains some relevant information but the community has already
>>   dropped the MapR filesystem so I am not sure that deployment would work
>> (I
>>   have not tested).
>> 
>> There is also a larger question of which vendor products should be included
>> and which should not. That is why I would like to suggest dropping these
>> pages and referring users to vendor maintained documentation whenever they
>> are using one of these services.
>> 
>> Seth Wiesman
>> 
> 
> 
> -- 
> 
> Konstantin Knauf | Solutions Architect
> 
> +49 160 91394525
> 
> 
> Follow us @VervericaData Ververica 
> 
> 
> --
> 
> Join Flink Forward  - The Apache Flink
> Conference
> 
> Stream Processing | Event Driven | Real Time
> 
> --
> 
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> 
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Tony) Cheng



Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-03 Thread Aljoscha Krettek
Hi,

Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as a 
separator for options that take two values seems a bit strange to me, did you 
research if any other CLI tools have this convention?

Side note: I don’t like that our options use camel-case, I think that’s very 
non-standard. But that’s how it is now…

Best,
Aljoscha

> On 3. Dec 2019, at 10:14, jincheng sun  wrote:
> 
> Thanks for bringup this discussion Wei!
> I think this is very important for Flink User, we should contains this
> changes in Flink 1.10.
> +1  for the optimization from the perspective of user convenience and the
> unified use of Flink command line parameters.
> 
> Best,
> Jincheng
> 
> Wei Zhong  于2019年12月2日周一 下午3:26写道:
> 
>> Hi everyone,
>> 
>> I wanted to bring up the discussion of improving the Pyflink command line
>> options.
>> 
>> A few command line options have been introduced in the FLIP-78 [1], i.e.
>> "python-executable-path", "python-requirements","python-archive", etc.
>> There are a few problems with these options, i.e. the naming style,
>> variable argument options, etc.
>> 
>> We want to make some adjustment of FLIP-78 to improve the newly introduced
>> command line options, here is the design doc:
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> <
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> Wei
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>> 
>> 
>> 



[jira] [Created] (FLINK-15040) Open function is not called in UDF AggregateFunction

2019-12-03 Thread LI Guobao (Jira)
LI Guobao created FLINK-15040:
-

 Summary: Open function is not called in UDF AggregateFunction
 Key: FLINK-15040
 URL: https://issues.apache.org/jira/browse/FLINK-15040
 Project: Flink
  Issue Type: Bug
Reporter: LI Guobao


I am trying to register a metric in an aggregate UDF by overriding the *open* 
function. According to the documentation, the *open* function can be override 
to retrieve the metric group to do the metric registration. But it works only 
on ScalarFunction but not on AggregateFunction. Because the *open* function is 
not call by AggregateFunction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Stateful Functions - Contribution Details

2019-12-03 Thread Robert Metzger
No concerns were raised. I created a repository:
https://github.com/apache/flink-statefun

Looking forward to the first PRs :)

On Wed, Nov 20, 2019 at 2:43 AM tison  wrote:

> Thanks for your summary Stephan. All entries make sense to me. Let's play
> statefun :-)
>
> Best,
> tison.
>
>
> Stephan Ewen  于2019年11月20日周三 上午12:53写道:
>
> > I am also fine with skipping a FLIP, if no one objects.
> >
> > The discussion seemed rather converged (or stalled). There was a concern
> > about the name, but in the absence of another candidate for the name, I
> > would go ahead with the current one.
> > For the other aspects, we seem to have converged in the discussion.
> >
> > Summary
> >   - Repository name: "flink-statefun"
> >   - Maven modules:
> >  - group id: org.apache.flink
> >  -  artifact ids: "statefun-*".
> >  - Java package name: org.apache.flink.statefun.*
> >  - Reuse the dev and user mailing lists of Flink
> >  - Flink JIRA, with dedicated component
> >
> > Maybe one more point, which might have been implicit, but let me state it
> > here explicitly:
> >   - Because this is a regular part of the Flink project, common processes
> > (like PRs, reviews, etc.) should be the same unless we find a reason to
> > diverge.
> >   - We could simplify the PR template (omit the flink-core specific
> > checklist for serializers, public API, etc.)
> >
> > Please raise concerns soon, otherwise we would go ahead with this
> proposal
> > in a few days.
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Nov 19, 2019 at 3:44 PM Igal Shilman  wrote:
> >
> > > Hi Robert,
> > > Your proposal skipping FLIP and the vote sounds reasonable to me.
> > >
> > > The project is currently built (with tests, shading, spotbugs etc') in
> > > around 2-3 minutes, but since it will reside in its own repository, it
> > will
> > > not affect Flink
> > > build time.
> > >
> > > Thanks,
> > > Igal
> > >
> > > On Tue, Nov 19, 2019 at 3:36 PM Robert Metzger 
> > > wrote:
> > >
> > > > +1 on what has been decided so far in this thread (including using
> the
> > > same
> > > > ML, and sticking to the statefun name).
> > > >
> > > > I'm not 100% sure if we need a FLIP for this, as we have VOTEd
> already
> > > with
> > > > a 2/3 majority on accepting this contribution, and there are no
> changes
> > > to
> > > > the Flink codebase, or user-facing APIs. I would be fine adding this
> > > > without a FLIP.
> > > >
> > > > Is this contribution going to add substantial additional build time
> > > > (especially tests)?
> > > >
> > > >
> > > > On Tue, Nov 12, 2019 at 10:56 AM Stephan Ewen 
> > wrote:
> > > >
> > > > > As mentioned before, the name was mainly chosen to resonate with
> > > > developers
> > > > > form a different background (applications / services) and we
> checked
> > it
> > > > > with some users. Unrelated to Flink and Stream Processing, it
> seemed
> > to
> > > > > describe the target use case pretty well.
> > > > >
> > > > > What would you use as a name instead?
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 12, 2019 at 10:10 AM Chesnay Schepler <
> > ches...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > I'm concerned both about the abbreviation and full name.
> > > > > >
> > > > > > a) It's not distinguishing enough from existing APIs,
> specifically
> > > the
> > > > > > Streaming API, which already features stateful functions.
> > > > > > b) It doesn't describe use-cases that the existing APIs cannot
> > > satisfy.
> > > > > >
> > > > > > On 11/11/2019 15:28, Stephan Ewen wrote:
> > > > > > > Thanks, all for the discussion!
> > > > > > >
> > > > > > > About the name:
> > > > > > >
> > > > > > >- Like Igal mentioned, the name "Stateful Functions" and the
> > > > > > abbreviation
> > > > > > > "statefun" underwent some iterations and testing with a small
> > > sample
> > > > of
> > > > > > > developers from a few companies.
> > > > > > >  If anyone has an amazing suggestion for another name,
> please
> > > > > share.
> > > > > > > Would be great to also test it with a small sample of
> developers
> > > > from a
> > > > > > few
> > > > > > > companies, just to make sure we have at least a bit of outside
> > > > > feedback.
> > > > > > >
> > > > > > >- fun vs. fn vs. func: I think these are more or less
> > > equivalent,
> > > > > > there
> > > > > > > are examples of each one in some language. Working with the
> code
> > > over
> > > > > the
> > > > > > > last months, we found "statefun" to be somehow appealing.
> > > > > > >  Maybe as a datapoint, Beam uses "DoFn" but pronounces it
> > > > > "doo-fun".
> > > > > > So,
> > > > > > > why not go with "fun" directly?
> > > > > > >
> > > > > > > About mailing lists:
> > > > > > >
> > > > > > >- There are pros and cons for separating the mailing lists
> or
> > > not
> > > > to
> > > > > > do
> > > > > > > that.
> > > > > > >- Having the same mailing lists gives synergies around
> > questions
> > > > for
> > > > > > > operating the system.
> > > > > > >- 

[jira] [Created] (FLINK-15039) Remove default close() implementation from ClusterClient

2019-12-03 Thread Kostas Kloudas (Jira)
Kostas Kloudas created FLINK-15039:
--

 Summary: Remove default close() implementation from ClusterClient
 Key: FLINK-15039
 URL: https://issues.apache.org/jira/browse/FLINK-15039
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas


Currently the {{ClusterClient}} interface extends {{AutoCloseable}} but 
provides a default, empty implementation for the {{close()}} method. This is 
error-prone as subclass implementations may not think that they need to 
actually implement a proper {{close()}}.

 

To this end, this issue aims at removing the default implementation and each 
implementation of the {{ClusterClient}} will have to explicitly provide its own 
implementation.
 
 
 
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread Dian Fu
+1 to remove them. It seems that we should also drop the class Option as it's 
currently only used in RequiredParameters.

> 在 2019年12月3日,下午8:34,Robert Metzger  写道:
> 
> +1 on removing it.
> 
> On Tue, Dec 3, 2019 at 12:31 PM Stephan Ewen  > wrote:
> I just stumbled across these classes recently and was looking for sample uses.
> No examples and other tests in the code base seem to use RequiredParameters 
> and OptionType.
> 
> They also seem quite redundant with how ParameterTool itself works 
> (tool.getRequired()).
> 
> Should we drop them, in an attempt to reduce unnecessary code and confusion 
> for users (multiple ways to do the same thing)? There are also many better 
> command line parsing libraries out there, this seems like something we don't 
> need to solve in Flink.
> 
> Best,
> Stephan



[jira] [Created] (FLINK-15038) Get field DataType of TableSchema should return DataType with default conversion class

2019-12-03 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-15038:


 Summary: Get field DataType of TableSchema should return DataType 
with default conversion class
 Key: FLINK-15038
 URL: https://issues.apache.org/jira/browse/FLINK-15038
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Jingsong Lee
 Fix For: 1.10.0


Now, the user and planner construct TableSchema with various DataTypes, them 
maybe have various conversion classes.

For example, return DataType of Functions maybe with various TypeInformations 
and lead to various conversion classes.

TableSchema should only have logical informations instead of various conversion 
classes.

This ticket want to clean the field DataType in TableSchema and return a 
consistent conversion classes to users.

Modify:

getFieldDataTypes/getFieldDataType/toRowDataType in TableSchema.

Before return, we can bridgeTo default conversion class using 
TypeConversions.fromLogicalToDataType(type.getLogicalType()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread Robert Metzger
+1 on removing it.

On Tue, Dec 3, 2019 at 12:31 PM Stephan Ewen  wrote:

> I just stumbled across these classes recently and was looking for sample
> uses.
> No examples and other tests in the code base seem to
> use RequiredParameters and OptionType.
>
> They also seem quite redundant with how ParameterTool itself works
> (tool.getRequired()).
>
> Should we drop them, in an attempt to reduce unnecessary code and
> confusion for users (multiple ways to do the same thing)? There are also
> many better command line parsing libraries out there, this seems like
> something we don't need to solve in Flink.
>
> Best,
> Stephan
>


[DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread Stephan Ewen
I just stumbled across these classes recently and was looking for sample
uses.
No examples and other tests in the code base seem to use RequiredParameters
and OptionType.

They also seem quite redundant with how ParameterTool itself works
(tool.getRequired()).

Should we drop them, in an attempt to reduce unnecessary code and confusion
for users (multiple ways to do the same thing)? There are also many better
command line parsing libraries out there, this seems like something we
don't need to solve in Flink.

Best,
Stephan


Re: [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-03 Thread Jingsong Li
+1 (non-binding)

Best,
Jingsong Lee

On Mon, Dec 2, 2019 at 5:30 PM Dian Fu  wrote:

> Hi Jingsong,
>
> It's fine. :)  Appreciated the comments!
>
> I have replied you in the discussion thread as I also think it's better to
> discuss these in the discussion thread.
>
> Thanks,
> Dian
>
> > 在 2019年12月2日,下午3:47,Jingsong Li  写道:
> >
> > Sorry for bothering your voting.
> > Let's discuss in discussion thread.
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Dec 2, 2019 at 3:32 PM Jingsong Lee 
> wrote:
> >
> >> Hi Dian:
> >>
> >> Thanks for your driving. I have some questions:
> >>
> >> - Where should these configurations belong? You have mentioned
> >> tableApi/SQL, so should in TableConfig?
> >> - If just in table/sql, whether it should be called: table.python.,
> >> because in table, all config options are called table.***.
> >> - What should table module do? So in CommonPythonCalc, we should read
> >> options from table config, and set resources to OneInputTransformation?
> >> - Are all buffer.memory off-heap memory? I took a look
> >> to AbstractPythonScalarFunctionOperator, there is a
> forwardedInputQueue, is
> >> this one a heap queue? So we need heap memory too?
> >>
> >> Hope to get your reply.
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Mon, Dec 2, 2019 at 2:34 PM Dian Fu  wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'd like to start the vote of FLIP-88 [1] since that we have reached an
> >>> agreement on the design in the discussion thread [2].
> >>>
> >>> This vote will be open for at least 72 hours. Unless there is an
> >>> objection, I will try to close it by Dec 5, 2019 08:00 UTC if we have
> >>> received sufficient votes.
> >>>
> >>> Regards,
> >>> Dian
> >>>
> >>> [1]
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
> >>> [2]
> >>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html
> >>
> >>
> >>
> >> --
> >> Best, Jingsong Lee
> >>
> >
> >
> > --
> > Best, Jingsong Lee
>
>

-- 
Best, Jingsong Lee


[jira] [Created] (FLINK-15037) Introduce LimittingMemoryManager as operator scope MemoyManager

2019-12-03 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-15037:
---

 Summary: Introduce LimittingMemoryManager as operator scope 
MemoyManager
 Key: FLINK-15037
 URL: https://issues.apache.org/jira/browse/FLINK-15037
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Zhu Zhu


Current {{MemoryManager}} is slot scope component, and the operators needs to 
use a fraction to compute memory size/pages it can allocate and use it to 
reserver memory or allocate pages.
This, however, requires operators to be aware of the managed memory fraction. 
There is also risk that one operator may over allocate resources than it has 
declared and causes other operators to break.

To separate concerns, we can introduce a {{LimittingMemoryManager}} which wraps 
the the original MemoryManager but limit the available memory size to the 
fraction of total memory governed by the original one. This wrapper would be a 
operator scope {{MemoryManager}}.

cc [~azagrebin]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15036) Container startup error will be handled out side of the YarnResourceManager's main thread

2019-12-03 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-15036:
-

 Summary: Container startup error will be handled out side of the 
YarnResourceManager's main thread
 Key: FLINK-15036
 URL: https://issues.apache.org/jira/browse/FLINK-15036
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.10.0, 1.8.3, 1.9.2
Reporter: Till Rohrmann
 Fix For: 1.10.0, 1.8.3, 1.9.2


With FLINK-13184, we replaced the {{NMClient}} with the {{NMClientAsync}}. As 
part of this change, container start up errors are now handled by a callback to 
{{NMClientAsync.CallbackHandler}}. The implementation of 
{{NMClientAsync.CallbackHandler#onStartContainerError}} will be called by the 
{{NMClientAsync}}. Since the implementation does state changing operations, it 
needs to happen inside of the {{YarnResourceManager}} main thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15035) Introduce unknown memory setting to table in blink planner

2019-12-03 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-15035:


 Summary: Introduce unknown memory setting to table in blink planner
 Key: FLINK-15035
 URL: https://issues.apache.org/jira/browse/FLINK-15035
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jingsong Lee
 Fix For: 1.10.0


After https://jira.apache.org/jira/browse/FLINK-14566

We can just set unknown resources with setting whether managed memory is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-03 Thread Dian Fu
Hi Jingsong,

Thanks for your valuable feedback. I have updated the "Example" section 
describing how to use these options in a Python Table API program.

Thanks,
Dian

> 在 2019年12月2日,下午6:12,Jingsong Lee  写道:
> 
> Hi Dian:
> 
> Thanks for you explanation.
> If you can update the document to add explanation for the changes to the
> table layer,
> it might be better. (it's just a suggestion, it depends on you)
> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> Will this queue take up a lot of memory?
> Can it also occupy memory as large as buffer.memory?
> If so, what we're dealing with now is the silent use of heap memory?
> I feel a little strange, because the memory on the python side will reserve,
> but the memory on the JVM side is used silently.
> 
> After carefully seeing your comments on Google doc:
>> The memory used by the Java operator is currently accounted as the task
> on-heap memory. We can revisit this if we find it's a problem in the future.
> I agree that we can ignore it now, But we can add some content to the
> document to remind the user, What do you think?
> 
> Best,
> Jingsong Lee
> 
> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks a lot for your comments. Please see my reply inlined below.
>> 
>>> 在 2019年12月2日,下午3:47,Jingsong Lee  写道:
>>> 
>>> Hi Dian:
>>> 
>>> 
>>> Thanks for your driving. I have some questions:
>>> 
>>> 
>>> - Where should these configurations belong? You have mentioned
>> tableApi/SQL,
>>> so should in TableConfig?
>> 
>> All Python related configurations are defined in PythonOptions. User could
>> configure these configurations via TableConfig.getConfiguration.setXXX for
>> Python Table API programs.
>> 
>>> 
>>> - If just in table/sql, whether it should be called: table.python.,
>>> because in table, all config options are called table.***.
>> 
>> These configurations are not table specific. They will be used for both
>> Python Table API programs and Python DataStream API programs (which is
>> planned to be supported in the future). So python.xxx seems more
>> appropriate, what do you think?
>> 
>>> - What should table module do? So in CommonPythonCalc, we should read
>>> options from table config, and set resources to OneInputTransformation?
>> 
>> As described in the design doc, in compilation phase, for batch jobs, the
>> required memory of the Python worker will be calculated according to the
>> configuration and set as the managed memory for the operator. For stream
>> jobs, the resource spec will be unknown(The reason is that currently the
>> resources for all the operators in stream jobs are unknown and it doesn’t
>> support to configure both known and unknown resources in a single job).
>> 
>>> - Are all buffer.memory off-heap memory? I took a look
>>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue,
>> is
>>> this one a heap queue? So we need heap memory too?
>> 
>> Yes, they are all off-heap memory which is supposed to be used by the
>> Python process. The forwardedInputQueue is a buffer used in the Java
>> operator and its memory is accounted as the on-heap memory.
>> 
>> Regards,
>> Dian
>> 
>>> 
>>> Hope to get your reply.
>>> 
>>> 
>>> Best,
>>> 
>>> Jingsong Lee
>>> 
>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu  wrote:
>>> 
 Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
 offline and also on the design doc.
 
 It seems that we have reached consensus on the design. I would bring up
 the VOTE if there is no other feedbacks.
 
 Thanks,
 Dian
 
> 在 2019年11月22日,下午2:51,Hequn Cheng  写道:
> 
> Thanks a lot for putting this together, Dian! Definitely +1 for this!
> It is great to make sure that the resources used by the Python process
 are
> managed properly by Flink’s resource management framework.
> 
> Also, thanks to the guys that working on the unified memory management
> framework.
> 
> Best, Hequn
> 
> 
> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
> 
>> Thanks for driving this discussion, Dian!
>> 
>> +1 for this proposal. It will help to reduce container failure due to
>> the memory overuse.
>> Some comments left in the design doc.
>> 
>> Best,
>> Yangze Guo
>> 
>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song 
>> wrote:
>>> 
>>> Sorry for the late reply.
>>> 
>>> +1 for the general proposal.
>>> 
>>> And one remainder, to use UNKNOWN resource requirement, we need to
>> make
>>> sure optimizer knowns which operators use off-heap managed memory,
>> and
>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>> details, and I would suggest you to double check with @Zhu Zhu who
 works
>> on
>>> this part.
>>> 
>>> Thank you~
>>> 
>>> Xintong Song
>>> 
>>> 
>>> [1]
>>> 
>> 
 
>> https://cwiki.apache.org/con

[jira] [Created] (FLINK-15034) Bump FRocksDB version for memory control

2019-12-03 Thread Yun Tang (Jira)
Yun Tang created FLINK-15034:


 Summary: Bump FRocksDB version for memory control
 Key: FLINK-15034
 URL: https://issues.apache.org/jira/browse/FLINK-15034
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Yun Tang
 Fix For: 1.10.0


Since FLINK-14483 has already been resolved and a new version of FRocksDB has 
been released, we should bump the FRocksDB version in Flink for next steps in 
memory control.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-03 Thread jincheng sun
Thanks for bringup this discussion Wei!
I think this is very important for Flink User, we should contains this
changes in Flink 1.10.
+1  for the optimization from the perspective of user convenience and the
unified use of Flink command line parameters.

Best,
Jincheng

Wei Zhong  于2019年12月2日周一 下午3:26写道:

> Hi everyone,
>
> I wanted to bring up the discussion of improving the Pyflink command line
> options.
>
> A few command line options have been introduced in the FLIP-78 [1], i.e.
> "python-executable-path", "python-requirements","python-archive", etc.
> There are a few problems with these options, i.e. the naming style,
> variable argument options, etc.
>
> We want to make some adjustment of FLIP-78 to improve the newly introduced
> command line options, here is the design doc:
>
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> <
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> >
> Looking forward to your feedback!
>
> Best,
> Wei
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
> >
>
>


[jira] [Created] (FLINK-15033) Remove unused RemoteEnvirnment.executeRemotely() (FLINK-11048)

2019-12-03 Thread Kostas Kloudas (Jira)
Kostas Kloudas created FLINK-15033:
--

 Summary: Remove unused RemoteEnvirnment.executeRemotely() 
(FLINK-11048) 
 Key: FLINK-15033
 URL: https://issues.apache.org/jira/browse/FLINK-15033
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu
Hi all,

Thanks everyone for participating this vote. As we have received only two +1 
and there is also one -1 for this vote, according to the bylaws, I'm sorry to 
announce that this proposal was rejected. 

Neverthless, I think we can always restart the discussion in the future if we 
see more evidence that such a mailing list is necessary.

Thanks,
Dian


> 在 2019年12月3日,下午4:53,Dian Fu  写道:
> 
> Actually I have tried to find out the reason why so many apache projects 
> choose to set up a project specific security mailing list in case that the 
> general secur...@apache.org mailing list seems working well. Unfortunately, 
> there is no open discussions in these projects and there is also no clear 
> guideline/standard in the ASF site whether a project should set up such a 
> mailing list (The project specific security mailing list seems only an 
> optional and we noticed that at the beginning of the discussion). This is 
> also one of the main reasons we start such a discussion to see if somebody 
> has more thoughts about this.
> 
>> 在 2019年12月2日,下午6:03,Chesnay Schepler  写道:
>> 
>> Would security@f.a.o work as any other private ML?
>> 
>> Contrary to what Becket said in the discussion thread, secur...@apache.org 
>> is not just "another hop"; it provides guiding material, the security team 
>> checks for activity and can be pinged easily as they are cc'd in the initial 
>> report.
>> 
>> I vastly prefer this over a separate mailing list; if these benefits don't 
>> apply to security@f.a.o I'm -1 on this.
>> 
>> On 02/12/2019 02:28, Becket Qin wrote:
>>> Thanks for driving this, Dian.
>>> 
>>> +1 from me, for the reasons I mentioned in the discussion thread.
>>> 
>>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu  wrote:
>>> 
 NOTE: Only PMC votes is binding.
 
 Thanks for sharing your thoughts. I also think that this doesn't fall into
 any of the existing categories listed in the bylaws. Maybe we could do some
 improvements for the bylaws.
 
 This is not codebase change as Robert mentioned and it's related to how to
 manage Flink's development in a good way. So, I agree with Robert and
 Jincheng that this VOTE should only count PMC votes for now.
 
 Thanks,
 Dian
 
> 在 2019年11月26日,上午11:43,jincheng sun  写道:
> 
> I also think that we should only count PMC votes.
> 
> This ML is to improve the security mechanism for Flink. Of course we
 don't
> expect to use this
> ML often. I hope that it's perfect if this ML is never used. However, the
> Flink community is growing rapidly, it's better to
> make our security mechanism as convenient as possible. But I agree that
> this ML is not a must to have, it's nice to have.
> 
> So, I give the vote as +1(binding).
> 
> Best,
> Jincheng
> 
> Robert Metzger  于2019年11月25日周一 下午9:45写道:
> 
>> I agree that we are only counting PMC votes (because this decision goes
>> beyond the codebase)
>> 
>> I'm undecided what to vote :) I'm not against setting up a new mailing
>> list, but I also don't think the benefit (having a private list with
 PMC +
>> committers) is enough to justify the work involved. As far as I
 remember,
>> we have received 2 security issue notices, both basically about the same
>> issue.  I'll leave it to other PMC members to support this if they want
 to
>> ...
>> 
>> 
>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
 dwysakow...@apache.org>
>> wrote:
>> 
>>> Hi all,
>>> 
>>> What is the voting scheme for it? I am not sure if it falls into any of
>>> the categories we have listed in our bylaws. Are committers votes
>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this
 a
>>> binding vote or just an informational vote?
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 25/11/2019 07:34, jincheng sun wrote:
 +1
 
 Dian Fu  于2019年11月21日周四 下午4:11写道:
 
> Hi all,
> 
> According to our previous discussion in [1], I'd like to bring up a
>> vote
> to set up a secur...@flink.apache.org mailing list.
> 
> The vote will be open for at least 72 hours (excluding weekend). I'll
>>> try
> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>> not
> enough votes.
> 
> Regards,
> Dian
> 
> [1]
> 
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>> 
 
>> 
> 



Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu
Actually I have tried to find out the reason why so many apache projects choose 
to set up a project specific security mailing list in case that the general 
secur...@apache.org mailing list seems working well. Unfortunately, there is no 
open discussions in these projects and there is also no clear 
guideline/standard in the ASF site whether a project should set up such a 
mailing list (The project specific security mailing list seems only an optional 
and we noticed that at the beginning of the discussion). This is also one of 
the main reasons we start such a discussion to see if somebody has more 
thoughts about this.

> 在 2019年12月2日,下午6:03,Chesnay Schepler  写道:
> 
> Would security@f.a.o work as any other private ML?
> 
> Contrary to what Becket said in the discussion thread, secur...@apache.org is 
> not just "another hop"; it provides guiding material, the security team 
> checks for activity and can be pinged easily as they are cc'd in the initial 
> report.
> 
> I vastly prefer this over a separate mailing list; if these benefits don't 
> apply to security@f.a.o I'm -1 on this.
> 
> On 02/12/2019 02:28, Becket Qin wrote:
>> Thanks for driving this, Dian.
>> 
>> +1 from me, for the reasons I mentioned in the discussion thread.
>> 
>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu  wrote:
>> 
>>> NOTE: Only PMC votes is binding.
>>> 
>>> Thanks for sharing your thoughts. I also think that this doesn't fall into
>>> any of the existing categories listed in the bylaws. Maybe we could do some
>>> improvements for the bylaws.
>>> 
>>> This is not codebase change as Robert mentioned and it's related to how to
>>> manage Flink's development in a good way. So, I agree with Robert and
>>> Jincheng that this VOTE should only count PMC votes for now.
>>> 
>>> Thanks,
>>> Dian
>>> 
 在 2019年11月26日,上午11:43,jincheng sun  写道:
 
 I also think that we should only count PMC votes.
 
 This ML is to improve the security mechanism for Flink. Of course we
>>> don't
 expect to use this
 ML often. I hope that it's perfect if this ML is never used. However, the
 Flink community is growing rapidly, it's better to
 make our security mechanism as convenient as possible. But I agree that
 this ML is not a must to have, it's nice to have.
 
 So, I give the vote as +1(binding).
 
 Best,
 Jincheng
 
 Robert Metzger  于2019年11月25日周一 下午9:45写道:
 
> I agree that we are only counting PMC votes (because this decision goes
> beyond the codebase)
> 
> I'm undecided what to vote :) I'm not against setting up a new mailing
> list, but I also don't think the benefit (having a private list with
>>> PMC +
> committers) is enough to justify the work involved. As far as I
>>> remember,
> we have received 2 security issue notices, both basically about the same
> issue.  I'll leave it to other PMC members to support this if they want
>>> to
> ...
> 
> 
> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>>> dwysakow...@apache.org>
> wrote:
> 
>> Hi all,
>> 
>> What is the voting scheme for it? I am not sure if it falls into any of
>> the categories we have listed in our bylaws. Are committers votes
>> binding or just PMCs'? (Personally I think it should be PMCs') Is this
>>> a
>> binding vote or just an informational vote?
>> 
>> Best,
>> 
>> Dawid
>> 
>> On 25/11/2019 07:34, jincheng sun wrote:
>>> +1
>>> 
>>> Dian Fu  于2019年11月21日周四 下午4:11写道:
>>> 
 Hi all,
 
 According to our previous discussion in [1], I'd like to bring up a
> vote
 to set up a secur...@flink.apache.org mailing list.
 
 The vote will be open for at least 72 hours (excluding weekend). I'll
>> try
 to close it by 2019-11-26 18:00 UTC, unless there is an objection or
> not
 enough votes.
 
 Regards,
 Dian
 
 [1]
 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>> 
>>> 
>