Re: [DISCUSS] Release flink-shaded 10.0

2020-02-07 Thread Jeff Zhang
+1, Thanks for driving this Chesnay


Yu Li  于2020年2月8日周六 上午11:53写道:

> +1 for starting a new flink-shaded release.
>
> Thanks for bringing this up Chesnay.
>
> Best Regards,
> Yu
>
>
> On Thu, 6 Feb 2020 at 20:59, Hequn Cheng  wrote:
>
> > +1. It sounds great to allow us to support zk 3.4 and 3.5.
> > Thanks for starting the discussion.
> >
> > Best,
> > Hequn
> >
> > On Thu, Feb 6, 2020 at 12:21 AM Till Rohrmann 
> > wrote:
> >
> > > Thanks for starting this discussion Chesnay. +1 for starting a new
> > > flink-shaded release.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Feb 5, 2020 at 2:10 PM Chesnay Schepler 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I would like to kick off the next release of flink-shaded. The main
> > > > feature are new modules that bundle zookeeper, that will
> allow
> > > > us to support zk 3.4 and 3.5 .
> > > >
> > > > Additionally we fixed an issue where slightly older dependencies than
> > > > intended were bundled in the flink-shaded-hadoop-2-uber jar, which
> was
> > > > flagged by security checks.
> > > >
> > > > Are there any other changes that people are interested in doing?
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Chesnay
> > > >
> > > >
> > >
> >
>


-- 
Best Regards

Jeff Zhang


Re: [DISCUSS] Release flink-shaded 10.0

2020-02-07 Thread Yu Li
+1 for starting a new flink-shaded release.

Thanks for bringing this up Chesnay.

Best Regards,
Yu


On Thu, 6 Feb 2020 at 20:59, Hequn Cheng  wrote:

> +1. It sounds great to allow us to support zk 3.4 and 3.5.
> Thanks for starting the discussion.
>
> Best,
> Hequn
>
> On Thu, Feb 6, 2020 at 12:21 AM Till Rohrmann 
> wrote:
>
> > Thanks for starting this discussion Chesnay. +1 for starting a new
> > flink-shaded release.
> >
> > Cheers,
> > Till
> >
> > On Wed, Feb 5, 2020 at 2:10 PM Chesnay Schepler 
> > wrote:
> >
> > > Hello,
> > >
> > > I would like to kick off the next release of flink-shaded. The main
> > > feature are new modules that bundle zookeeper, that will allow
> > > us to support zk 3.4 and 3.5 .
> > >
> > > Additionally we fixed an issue where slightly older dependencies than
> > > intended were bundled in the flink-shaded-hadoop-2-uber jar, which was
> > > flagged by security checks.
> > >
> > > Are there any other changes that people are interested in doing?
> > >
> > >
> > > Regards,
> > >
> > > Chesnay
> > >
> > >
> >
>


Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-07 Thread Kurt Young
Hi Timo,

tableEnv.fromElements/values() sounds good, do we have a jira ticket to
track the issue?

Best,
Kurt


On Fri, Feb 7, 2020 at 10:56 PM Timo Walther  wrote:

> Hi Kurt,
>
> Dawid is currently working on making a tableEnv.fromElements/values()
> kind of source possible in the future. We can use this to replace some
> of the tests. Otherwise I guess we should come up with a better test
> infrastructure to make defining source not necessary anymore.
>
> Regards,
> Timo
>
>
> On 07.02.20 11:24, Kurt Young wrote:
> > Thanks all for your feedback, since no objection has been raised, I've
> > created
> > https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
> >
> > Since this issue would require lots of tests adjustment before it really
> > happen,
> > it won't be done in a short time. Feel free to give feedback anytime here
> > or in jira
> > if you have other opinions.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Feb 5, 2020 at 8:26 PM Kurt Young  wrote:
> >
> >> Hi Zhenghua,
> >>
> >> After removing TableSource::getTableSchema, during optimization, I could
> >> imagine
> >> the schema information might come from relational nodes such as
> TableScan.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young  wrote:
> >>
> >>> Hi Jingsong,
> >>>
> >>> Yes current TableFactory is not ideal for users to use either. I think
> we
> >>> should
> >>> also spend some time in 1.11 to improve the usability of
> TableEnvironment
> >>> when
> >>> users trying to read or write something. Automatic scheme inference
> would
> >>> be
> >>> one of them. Other from this, we also support convert a DataStream to
> >>> Table, which
> >>> can serve some flexible requirements to read or write data.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao  wrote:
> >>>
>  +1 to remove these methods.
> 
>  One concern about invocations of TableSource::getTableSchema:
>  By removing such methods, we can stop calling
> TableSource::getTableSchema
>  in some place(such
>  as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>  ConnectorCatalogTable, TableSourceQueryOperation).
> 
>  But in other place we need field types and names of the table
> source(such
>  as
>  BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>  PushProjectIntoTableSourceScanRule,
>  CommonLookupJoin).  So how should we deal with this?
> 
>  *Best Regards,*
>  *Zhenghua Gao*
> 
> 
>  On Wed, Feb 5, 2020 at 2:36 PM Kurt Young  wrote:
> 
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor.
> The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read
> &
> > write the physical
> > representation of the table, and they are not fitting well after we
>  already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env
> just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of
>  table, it
> > should rely on
> > other metadata information held by framework, which eventually comes
>  from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we
>  have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear,
> it
> > will make things much
> > more complicated if we want to improve the table api usability such
> as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing
>  list to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> >
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

[VOTE] Release 1.10.0, release candidate #3

2020-02-07 Thread Gary Yao
Hi everyone,
Please review and vote on the release candidate #3 for the version 1.10.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.10.0-rc3" [5],
* website pull request listing the new release and adding announcement blog
post [6][7].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Yu & Gary

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12345845
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc3/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1333
[5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc3
[6] https://github.com/apache/flink-web/pull/302
[7] https://github.com/apache/flink-web/pull/301


Re: [VOTE] Release 1.10.0, release candidate #2

2020-02-07 Thread Gary Yao
Hi everyone,

I am hereby canceling the vote due to:

FLINK-15917
FLINK-15918
FLINK-15935
FLINK-15937

RC3 will be created later today.

Best,
Gary

On Thu, Feb 6, 2020 at 4:39 PM Andrey Zagrebin  wrote:

> alright, thanks for confirming this Benchao!
>
> On Thu, Feb 6, 2020 at 6:36 PM Benchao Li  wrote:
>
> > Hi Andrey,
> >
> > I noticed that 1.10 has changed to enabling background cleanup by default
> > just after I posted to this email.
> > So it won't affect 1.10 any more, just affect 1.9.x. We can move to the
> > Jira ticket to discuss further more.
> >
> > Andrey Zagrebin  于2020年2月6日周四 下午11:30写道:
> >
> > > Hi Benchao,
> > >
> > > Do you observe this issue FLINK-15938 with 1.9 or 1.10?
> > > If with 1.9, I suggest to check with 1.10.
> > >
> > > Thanks,
> > > Andrey
> > >
> > > On Thu, Feb 6, 2020 at 4:07 PM Benchao Li  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I found another issue[1], I don't know if it should be a blocker. But
> > it
> > > > does affects joins without window in blink planner.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-15938
> > > >
> > > > Jeff Zhang  于2020年2月6日周四 下午5:05写道:
> > > >
> > > > > Hi Jingsong,
> > > > >
> > > > > Thanks for the suggestion. It works for running it in IDE, but for
> > > > > downstream project like Zeppelin where I will include flink jars in
> > > > > classpath.
> > > > > it only works when I specify the jars one by one explicitly in
> > > classpath,
> > > > > using * doesn't work.
> > > > >
> > > > > e.g.
> > > > >
> > > > > The following command where I use * to specify classpath doesn't
> > work,
> > > > > jzhang 4794 1.2 6.3 8408904 1048828 s007 S 4:30PM 0:33.90
> > > > >
> > >
> /Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home/bin/java
> > > > > -Dfile.encoding=UTF-8
> > > > >
> > > > >
> > > >
> > >
> >
> -Dlog4j.configuration=file:///Users/jzhang/github/zeppelin/conf/log4j.properties
> > > > >
> > > > >
> > > >
> > >
> >
> -Dzeppelin.log.file=/Users/jzhang/github/zeppelin/logs/zeppelin-interpreter-flink-shared_process-jzhang-JeffdeMacBook-Pro.local.log
> > > > > -Xms1024m -Xmx2048m -XX:MaxPermSize=512m -cp
> > > > >
> > > > >
> > > >
> > >
> >
> *:/Users/jzhang/github/flink/build-target/lib/**:/Users/jzhang/github/zeppelin/interpreter/flink/*:/Users/jzhang/github/zeppelin/zeppelin-interpreter-api/target/*:/Users/jzhang/github/zeppelin/zeppelin-zengine/target/test-classes/*::/Users/jzhang/github/zeppelin/interpreter/zeppelin-interpreter-api-0.9.0-SNAPSHOT.jar:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/classes:/Users/jzhang/github/zeppelin/zeppelin-zengine/target/test-classes:/Users/jzhang/github/flink/build-target/opt/flink-python_2.11-1.10-SNAPSHOT.jar
> > > > > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
> > 0.0.0.0
> > > > > 52395 flink-shared_process :
> > > > >
> > > > >
> > > > > While this command where I specify jar one by one explicitly in
> > > classpath
> > > > > works
> > > > >
> > > > > jzhang5660 205.2  6.1  8399420 1015036 s005  R
>  4:43PM
> > > > > 0:24.82
> > > > >
> > >
> /Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home/bin/java
> > > > > -Dfile.encoding=UTF-8
> > > > >
> > > > >
> > > >
> > >
> >
> -Dlog4j.configuration=file:///Users/jzhang/github/zeppelin/conf/log4j.properties
> > > > >
> > > > >
> > > >
> > >
> >
> -Dzeppelin.log.file=/Users/jzhang/github/zeppelin/logs/zeppelin-interpreter-flink-shared_process-jzhang-JeffdeMacBook-Pro.local.log
> > > > > -Xms1024m -Xmx2048m -XX:MaxPermSize=512m -cp
> > > > >
> > > > >
> > > >
> > >
> >
> *:/Users/jzhang/github/flink/build-target/lib/slf4j-log4j12-1.7.15.jar:/Users/jzhang/github/flink/build-target/lib/flink-connector-hive_2.11-1.10-SNAPSHOT.jar:/Users/jzhang/github/flink/build-target/lib/hive-exec-2.3.4.jar:/Users/jzhang/github/flink/build-target/lib/flink-shaded-hadoop-2-uber-2.7.5-7.0.jar:/Users/jzhang/github/flink/build-target/lib/log4j-1.2.17.jar:/Users/jzhang/github/flink/build-target/lib/flink-table-blink_2.11-1.10-SNAPSHOT.jar:/Users/jzhang/github/flink/build-target/lib/flink-table_2.11-1.10-SNAPSHOT.jar:/Users/jzhang/github/flink/build-target/lib/flink-dist_2.11-1.10-SNAPSHOT.jar:/Users/jzhang/github/flink/build-target/lib/flink-python_2.11-1.10-SNAPSHOT.jar:/Users/jzhang/github/flink/build-target/lib/flink-hadoop-compatibility_2.11-1.9.1.jar*:/Users/jzhang/github/zeppelin/interpreter/flink/*:/Users/jzhang/github/zeppelin/zeppelin-interpreter-api/target/*:/Users/jzhang/github/zeppelin/zeppelin-zengine/target/test-classes/*::/Users/jzhang/github/zeppelin/interpreter/zeppelin-interpreter-api-0.9.0-SNAPSHOT.jar:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/classes:/Users/jzhang/github/zeppelin/zeppelin-zengine/target/test-classes:/Users/jzhang/github/flink/build-target/opt/flink-python_2.11-1.10-SNAPSHOT.jar
> > > > >
> > > > >
> > > >
> > >
> >
> /Users/jzhang/github/flink/build-target/opt/tmp/flink-python_2.11-1.10-SNAPSHOT.jar
> > > > > 

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-07 Thread Rong Rong
CC @Xu Yang 

Thanks for starting the discussion @Hequn Cheng  and
sorry for joining the discussion late.

I've mainly helped merging the code in flink-ml-api and flink-ml-lib in the
past several months.
IMO the flink-ml-api are an extension on top of the table API and agree
that it should be treated as a part of the "core" core.

However, I think given the fact that there are multiple PRs still under
review [1], is it a better idea to come up with a long term plan first
before make the decision to moving it to /opt now?


--
Rong

[1]
https://github.com/apache/flink/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+label%3Acomponent%3DLibrary%2FMachineLearning+

On Fri, Feb 7, 2020 at 5:54 AM Hequn Cheng  wrote:

> Hi,
>
> @Till Rohrmann  Thanks for the great inputs. I agree
> with you that we should have a long term plan for this. It definitely
> deserves another discussion.
> @Jeff Zhang  Thanks for your reports and ideas. It's a
> good idea to improve the error messages. Do we have any JIRAs for it or
> maybe we can create one for it.
>
> Thank you again for your feedback and suggestions. I will go on with the
> PR. Thanks!
>
> Best,
> Hequn
>
> On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang  wrote:
>
> > I have another concern which may not be closely related to this thread.
> > Since flink doesn't include all the necessary jars, I think it is
> critical
> > for flink to display meaningful error message when any class is missing.
> > e.g. Here's the error message when I use kafka but miss
> > including flink-json.  To be honest, the kind of error message is hard to
> > understand for new users.
> >
> >
> > Reason: No factory implements
> > 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
> > following properties are requested:
> > connector.properties.bootstrap.servers=localhost:9092
> > connector.properties.group.id=testGroup
> > connector.properties.zookeeper.connect=localhost:2181
> > connector.startup-mode=earliest-offset connector.topic=generated.events
> > connector.type=kafka connector.version=universal format.type=json
> > schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
> > schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
> > schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append The
> > following factories have been considered:
> > org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
> > org.apache.flink.table.module.hive.HiveModuleFactory
> > org.apache.flink.table.module.CoreModuleFactory
> > org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
> > org.apache.flink.table.sources.CsvBatchTableSourceFactory
> > org.apache.flink.table.sources.CsvAppendTableSourceFactory
> > org.apache.flink.table.sinks.CsvBatchTableSinkFactory
> > org.apache.flink.table.sinks.CsvAppendTableSinkFactory
> > org.apache.flink.table.planner.delegation.BlinkPlannerFactory
> > org.apache.flink.table.planner.delegation.BlinkExecutorFactory
> > org.apache.flink.table.planner.StreamPlannerFactory
> > org.apache.flink.table.executor.StreamExecutorFactory
> > org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory
> at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
> > at
> >
> >
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
> > at
> >
> >
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
> > at
> >
> >
> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
> > ... 36 more
> >
> >
> >
> > Till Rohrmann  于2020年2月6日周四 下午11:30写道:
> >
> > > I would not object given that it is rather small at the moment.
> However,
> > I
> > > also think that we should have a plan how to handle the ever growing
> > Flink
> > > ecosystem and how to make it easily accessible to our users. E.g. one
> far
> > > fetched idea could be something like a configuration script which
> > downloads
> > > the required components for the user. But this deserves definitely a
> > > separate discussion and does not really belong here.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng  wrote:
> > >
> > > >
> > > > Hi everyone,
> > > >
> > > > Thank you all for the great inputs!
> > > >
> > > > I think probably what we all agree on is we should try to make a
> leaner
> > > > flink-dist. However, 

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-07 Thread Zhenghua Gao
Thanks Timo! Look forward your design!

*Best Regards,*
*Zhenghua Gao*


On Fri, Feb 7, 2020 at 5:26 PM Timo Walther  wrote:

> Hi Zhenghua,
>
> Jark is right. The reason why we haven't updated those interfaces yet is
> because we are actually would like to introduce new interfaces. We
> should target new interfaces in this release. Even a short-term fix as
> you proposed with `getRecordDataType` does actually not help as Jingsong
> pointed out because we cannot represent tuples in DataType and are also
> not planning to support them natively but only as a structured type in
> the future.
>
> In my envisioned design, the new sink interface should just always get a
> `ChangeRow` which is never serialized and just a data structure for
> communicating between the wrapping sink function and the returned sink
> function by the table sink.
>
> Let me sketch a rough design document that I will share with you
> shortly. Then we could also discuss alternatives.
>
> Thanks,
> Timo
>
>
> On 04.02.20 04:18, Zhenghua Gao wrote:
> > Hi Jark, thanks for your comments.
>  IIUC, the framework will only recognize getRecordDataType and
>  ignore getConsumedDataType for UpsertStreamTableSink, is that right?
> > Your are right.
> >
>  getRecordDataType is little confused as UpsertStreamTableSink already
> has
>  three getXXXType().
> > the getRecordType and getOutputType is deprecated and mainly for backward
> > compatibility.
> >
> > *Best Regards,*
> > *Zhenghua Gao*
> >
> >
> > On Mon, Feb 3, 2020 at 10:11 PM Jark Wu  wrote:
> >
> >> Thanks Zhenghua for starting this discussion.
> >>
> >> Currently, all the UpsertStreamTableSinks can't upgrade to the new type
> >> system which affects usability a lot.
> >> I hope we can fix that in 1.11.
> >>
> >> I'm find with *getRecordDataType* for a temporary solution.
> >> IIUC, the framework will only recognize getRecordDataType and
> >> ignore getConsumedDataType for UpsertStreamTableSink, is that right?
> >>
> >> I guess Timo are planning to design a new source/sink interface which
> will
> >> also fix this problem, but I'm not sure the timelines. cc @Timo
> >> It would be better if we can have a new and complete interface, because
> >> getRecordDataType is little confused as UpsertStreamTableSink already
> has
> >> three getXXXType().
> >>
> >> Best,
> >> Jark
> >>
> >>
> >> On Mon, 3 Feb 2020 at 17:48, Zhenghua Gao  wrote:
> >>
> >>> Hi Jingsong,  For now, only UpsertStreamTableSink and
> >>> RetractStreamTableSink consumes JTuple2
> >>> So the 'getConsumedDataType' interface is not necessary in validate &
> >>> codegen phase.
> >>> See
> >>>
> >>>
> >>
> https://github.com/apache/flink/pull/10880/files#diff-137d090bd719f3a99ae8ba6603ed81ebR52
> >>>   and
> >>>
> >>>
> >>
> https://github.com/apache/flink/pull/10880/files#diff-8405c17e5155aa63d16e497c4c96a842R304
> >>>
> >>> What about stay the same to use RAW type?
> >>>
> >>> *Best Regards,*
> >>> *Zhenghua Gao*
> >>>
> >>>
> >>> On Mon, Feb 3, 2020 at 4:59 PM Jingsong Li 
> >> wrote:
> >>>
>  Hi Zhenghua,
> 
>  The *getRecordDataType* looks good to me.
> 
>  But the main problem is how to represent the tuple type in DataType. I
>  understand that it is necessary to use StructuredType, but at present,
>  planner does not support StructuredType, so the other way is to
> support
>  StructuredType.
> 
>  Best,
>  Jingsong Lee
> 
>  On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:
> 
> > Would overriding `getConsumedDataType` do the job?
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao 
> >> wrote:
> >
> >> Hi all,
> >>
> >> FLINK-12254[1] [2] updated TableSink and related interfaces to new
> >>> type
> >> system which
> >> allows connectors use the new type system based on DataTypes.
> >>
> >> But FLINK-12911 port UpsertStreamTableSink and
> >> RetractStreamTableSink
> >>> to
> >> flink-api-java-bridge and returns TypeInformation of the requested
>  record
> >> type which
> >> can't support types with precision and scale, e.g. TIMESTAMP(p),
> >> DECIMAL(p,s).
> >>
> >> /**
> >>   * Returns the requested record type.
> >>   */
> >> TypeInformation getRecordType();
> >>
> >>
> >> A proposal is deprecating the *getRecordType* API and adding a
> >> *getRecordDataType* API instead to return the data type of the
> >>> requested
> >> record. I have filed the issue FLINK-15469 and
> >> an initial PR to verify it.
> >>
> >> What do you think about this API changes? Any feedback are
> >>> appreciated.
> >> [1] https://issues.apache.org/jira/browse/FLINK-12254
> >> [2] https://github.com/apache/flink/pull/8596
> >> [3] https://issues.apache.org/jira/browse/FLINK-15469
> >>
> >> *Best Regards,*
> >> *Zhenghua Gao*
> >>
> >
> 
>  --
>  Best, Jingsong Lee
> 
> >>>
> 

[jira] [Created] (FLINK-15958) Fully support RAW types in the API

2020-02-07 Thread Timo Walther (Jira)
Timo Walther created FLINK-15958:


 Summary: Fully support RAW types in the API
 Key: FLINK-15958
 URL: https://issues.apache.org/jira/browse/FLINK-15958
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


Currently, the Table API does not expose a way of creating RAW types without a 
{{DataTypeFactory}}. The reason for that is that RAW types need to be resolved 
at a later stage. This is similar to user-defined types that need to be 
resolved from the catalog.

A design document will follow as this will require some changes to the main API.

We also need better support in the Blink planner for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-07 Thread Timo Walther

Hi Kurt,

Dawid is currently working on making a tableEnv.fromElements/values() 
kind of source possible in the future. We can use this to replace some 
of the tests. Otherwise I guess we should come up with a better test 
infrastructure to make defining source not necessary anymore.


Regards,
Timo


On 07.02.20 11:24, Kurt Young wrote:

Thanks all for your feedback, since no objection has been raised, I've
created
https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.

Since this issue would require lots of tests adjustment before it really
happen,
it won't be done in a short time. Feel free to give feedback anytime here
or in jira
if you have other opinions.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:26 PM Kurt Young  wrote:


Hi Zhenghua,

After removing TableSource::getTableSchema, during optimization, I could
imagine
the schema information might come from relational nodes such as TableScan.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:24 PM Kurt Young  wrote:


Hi Jingsong,

Yes current TableFactory is not ideal for users to use either. I think we
should
also spend some time in 1.11 to improve the usability of TableEnvironment
when
users trying to read or write something. Automatic scheme inference would
be
one of them. Other from this, we also support convert a DataStream to
Table, which
can serve some flexible requirements to read or write data.

Best,
Kurt


On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao  wrote:


+1 to remove these methods.

One concern about invocations of TableSource::getTableSchema:
By removing such methods, we can stop calling TableSource::getTableSchema
in some place(such
as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
ConnectorCatalogTable, TableSourceQueryOperation).

But in other place we need field types and names of the table source(such
as
BatchExecLookupJoinRule/StreamExecLookupJoinRule,
PushProjectIntoTableSourceScanRule,
CommonLookupJoin).  So how should we deal with this?

*Best Regards,*
*Zhenghua Gao*


On Wed, Feb 5, 2020 at 2:36 PM Kurt Young  wrote:


Hi all,

I'd like to bring up a discussion about removing registration of
TableSource and
TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
affected
method would be:

TableEnvironment::registerTableSource
TableEnvironment::fromTableSource
TableEnvironment::registerTableSink
ConnectTableDescriptor::registerTableSource
ConnectTableDescriptor::registerTableSink
ConnectTableDescriptor::registerTableSourceAndSink

(Most of them are already deprecated, except for
TableEnvironment::fromTableSource,
which was intended to deprecate but missed by accident).

FLIP-64 [1] already explained why we want to deprecate TableSource &
TableSink from
user's interface. In a short word, these interfaces should only read &
write the physical
representation of the table, and they are not fitting well after we

already

introduced some
logical table fields such as computed column, watermarks.

Another reason is the exposure of registerTableSource in Table Env just
make the whole
SQL protocol opposite. TableSource should be used as a reader of

table, it

should rely on
other metadata information held by framework, which eventually comes

from

DDL or
ConnectDescriptor. But if we register a TableSource to Table Env, we

have

no choice but
have to rely on TableSource::getTableSchema. It will make the design
obscure, sometimes
TableSource should trust the information comes from framework, but
sometimes it should
also generate its own schema information.

Furthermore, if the authority about schema information is not clear, it
will make things much
more complicated if we want to improve the table api usability such as
introducing automatic
schema inference in the near future.

Since this is an API break change, I've also included user mailing

list to

gather more feedbacks.

Best,
Kurt

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module












[DISCUSS] FLINK-15831: Add Docker image publication to release documentation

2020-02-07 Thread Patrick Lucas
Hi all,

For FLINK-15831[1], I think the way to start is for the flink-docker
repo[2] itself to sufficiently document the workflow for publishing new
Dockerfiles, and then update the Flink release guide in the wiki to refer
to this documentation and to include this step in the "Finalize the
release" checklist.

To the first point, I have opened a PR[3] on flink-docker to improve its
documentation.

And for updating the release guide, I propose the following changes:

1. Add a new subsection to "Finalize the release", prior to "Checklist to
proceed to the next step" with the following content:

Publish the Dockerfiles for the new release
>
> Note: the official Dockerfiles fetch the binary distribution of the target
> Flink version from an Apache mirror. After publishing the binary release
> artifacts, mirrors can take some hours to start serving the new artifacts,
> so you may want to wait to do this step until you are ready to continue
> with the "Promote the release" steps below.
>
> Follow the instructions in the [flink-docker] repo to build the new
> Dockerfiles and send an updated manifest to Docker Hub so the new images
> are built and published.
>

2. Add an entry to the "Checklist to proceed to the next step" subsection
of "Finalize the release":

>
>- Dockerfiles in flink-docker updated for the new Flink release and
>pull request opened on the Docker official-images with an updated manifest
>
> Please let me know if you have any questions or suggestions to improve
this proposal.

Thanks,
Patrick

[1]https://issues.apache.org/jira/browse/FLINK-15831
[2]https://github.com/apache/flink-docker
[3]https://github.com/apache/flink-docker/pull/5


Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-07 Thread Hequn Cheng
Hi,

@Till Rohrmann  Thanks for the great inputs. I agree
with you that we should have a long term plan for this. It definitely
deserves another discussion.
@Jeff Zhang  Thanks for your reports and ideas. It's a
good idea to improve the error messages. Do we have any JIRAs for it or
maybe we can create one for it.

Thank you again for your feedback and suggestions. I will go on with the
PR. Thanks!

Best,
Hequn

On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang  wrote:

> I have another concern which may not be closely related to this thread.
> Since flink doesn't include all the necessary jars, I think it is critical
> for flink to display meaningful error message when any class is missing.
> e.g. Here's the error message when I use kafka but miss
> including flink-json.  To be honest, the kind of error message is hard to
> understand for new users.
>
>
> Reason: No factory implements
> 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
> following properties are requested:
> connector.properties.bootstrap.servers=localhost:9092
> connector.properties.group.id=testGroup
> connector.properties.zookeeper.connect=localhost:2181
> connector.startup-mode=earliest-offset connector.topic=generated.events
> connector.type=kafka connector.version=universal format.type=json
> schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
> schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
> schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append The
> following factories have been considered:
> org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
> org.apache.flink.table.module.hive.HiveModuleFactory
> org.apache.flink.table.module.CoreModuleFactory
> org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
> org.apache.flink.table.sources.CsvBatchTableSourceFactory
> org.apache.flink.table.sources.CsvAppendTableSourceFactory
> org.apache.flink.table.sinks.CsvBatchTableSinkFactory
> org.apache.flink.table.sinks.CsvAppendTableSinkFactory
> org.apache.flink.table.planner.delegation.BlinkPlannerFactory
> org.apache.flink.table.planner.delegation.BlinkExecutorFactory
> org.apache.flink.table.planner.StreamPlannerFactory
> org.apache.flink.table.executor.StreamExecutorFactory
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory at
>
> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
> at
>
> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
> at
>
> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
> at
>
> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
> at
>
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
> at
>
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
> at
>
> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
> at
>
> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
> ... 36 more
>
>
>
> Till Rohrmann  于2020年2月6日周四 下午11:30写道:
>
> > I would not object given that it is rather small at the moment. However,
> I
> > also think that we should have a plan how to handle the ever growing
> Flink
> > ecosystem and how to make it easily accessible to our users. E.g. one far
> > fetched idea could be something like a configuration script which
> downloads
> > the required components for the user. But this deserves definitely a
> > separate discussion and does not really belong here.
> >
> > Cheers,
> > Till
> >
> > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng  wrote:
> >
> > >
> > > Hi everyone,
> > >
> > > Thank you all for the great inputs!
> > >
> > > I think probably what we all agree on is we should try to make a leaner
> > > flink-dist. However, we may also need to do some compromises
> considering
> > > the user experience that users don't need to download the dependencies
> > from
> > > different places. Otherwise, we can move all the jars in the current
> opt
> > > folder to the download page.
> > >
> > > The missing of clear rules for guiding such compromises makes things
> more
> > > complicated now. I would agree that the decisive factor for what goes
> > into
> > > Flink's binary distribution should be how core it is to Flink.
> Meanwhile,
> > > it's better to treat Flink API as a (core) core to Flink. Not only it
> is
> > a
> > > very clear rule that easy to be followed but also in most cases, API is
> > > very significant and deserved to be included in the dist.
> > >
> > > Given this, it might make sense to put flink-ml-api and flink-ml-lib
> into
> > > the opt.
> > > What do you think?
> > >
> > > Best,
> > > Hequn
> > >
> > > On Wed, Feb 5, 2020 at 

Re: [DISCUSS] Does removing deprecated interfaces needs another FLIP

2020-02-07 Thread Stephan Ewen
I would also agree with the above.

Changing a stable API and deprecating stable methods would need a FLIP in
my opinion. But then executing the removal of previously deprecated methods
would be fine in my understanding.


On Fri, Feb 7, 2020 at 11:17 AM Kurt Young  wrote:

> Thanks for the clarification, that make sense to me.
>
> Best,
> Kurt
>
>
> On Fri, Feb 7, 2020 at 4:56 PM Timo Walther  wrote:
>
> > Hi Kurt,
> >
> > I agree with Aljoscha. We don't need to introduce a big process or do
> > voting but we should ensure that all stakeholders are notified and have
> > a chance to raise doubts.
> >
> > Regards,
> > Timo
> >
> >
> > On 07.02.20 09:51, Aljoscha Krettek wrote:
> > > I would say a ML discussion or even a Jira issue is enough because
> > >
> > > a) the methods are already deprecated
> > > b) the methods are @PublicEvolving, which I don't consider a super
> > > strong guarantee to users (we still shouldn't remove them lightly, but
> > > we can if we have to...)
> > >
> > > Best,
> > > Aljoscha
> > >
> > > On 07.02.20 04:40, Kurt Young wrote:
> > >> Hi dev,
> > >>
> > >> Currently I want to remove some already deprecated methods from
> > >> TableEnvironment which annotated with @PublicEnvolving. And I also
> > >> created
> > >> a discussion thread [1] to both dev and user mailing lists to gather
> > >> feedback on that. But I didn't find any matching rule in Flink bylaw
> > >> [2] to
> > >> follow. Since this is definitely a API breaking change, but we already
> > >> voted for that back in the FLIP which deprecated these methods.
> > >>
> > >> I'm not sure about how to proceed for now. Looks like I have 2
> choices:
> > >>
> > >> 1. If no one raise any objections in discuss thread in like 72 hours,
> I
> > >> will create a jira to start working on it.
> > >> 2. Since this is a API breaking change, I need to open another FLIP to
> > >> tell
> > >> that I want to remove these deprecated methods. This seems a little
> > >> redundant with the first FLIP which deprecate the methods.
> > >>
> > >> What do you think?
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >> [1]
> > >>
> >
> https://lists.apache.org/thread.html/r98af66feb531ce9e6b94914e44391609cad857e16ea84db5357c1980%40%3Cdev.flink.apache.org%3E
> > >>
> > >> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > >>
> >
> >
>


[jira] [Created] (FLINK-15957) Document the release process for Stateful Functions in the community wiki

2020-02-07 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-15957:
---

 Summary: Document the release process for Stateful Functions in 
the community wiki
 Key: FLINK-15957
 URL: https://issues.apache.org/jira/browse/FLINK-15957
 Project: Flink
  Issue Type: Sub-task
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
 Fix For: statefun-1.1


The release process for Stateful Functions should be documented under 
https://cwiki.apache.org/confluence/display/FLINK/Releasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-07 Thread Stephan Ewen
+1 (binding) (belated)

Quick addendum to clarify some questions from recent discussions in other
threads:

  - The core interfaces (Source, SourceReader, Enumerator) and the core
architecture (Enumerator as coordinators on the JobManager, SourceReaders
in Tasks) seem to have no open questions

  - Some open questions remained about whether the proposed base
implementation (SplitReader) fits all possible sources (like block-based
file readers).
  - Even if that turns out to not be the case, this FLIP does not prevent
the creation of another base implementation. That would be completely in
line with this FLIP and exactly the reason why the FLIP proposes the
separation between the simpler core interfaces (SourceReader) and the base
implementations, of which the first one (so that we can start to
immediately build some connectors) is the SplitReader.

On Fri, Feb 7, 2020 at 9:47 AM Becket Qin  wrote:

> Thanks everyone for voting. The voting result is following:
>
> +1 (Binding): 5 (Yu, Jark, Zhijiang, Piotr, Becket)
>
> +1 (Non-binding): 4 (Jingsong, Danny, Wei, Guowei)
>
> -1: 0
>
> FLIP-27 has passed.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Feb 4, 2020 at 3:42 PM Piotr Nowojski  wrote:
>
> > +1 (binding)
> >
> > Piotrek
> >
> > > On 4 Feb 2020, at 05:39, Zhijiang 
> > wrote:
> > >
> > > +1 (binding), we are waiting too long for it. :)
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > --
> > > From:Guowei Ma 
> > > Send Time:2020 Feb. 4 (Tue.) 12:34
> > > To:dev 
> > > Subject:Re: [VOTE] FLIP-27 - Refactor Source Interface
> > >
> > > +1 (non-binding), thanks for driving.
> > >
> > > Best,
> > > Guowei
> > >
> > >
> > > Jingsong Li  于2020年2月4日周二 上午11:20写道:
> > >
> > >> +1 (non-binding), thanks for driving.
> > >> FLIP-27 is the basis of a lot of follow-up work.
> > >>
> > >> Best,
> > >> Jingsong Lee
> > >>
> > >> On Tue, Feb 4, 2020 at 10:26 AM Jark Wu  wrote:
> > >>
> > >>> Thanks for driving this Becket!
> > >>>
> > >>> +1 from my side.
> > >>>
> > >>> Cheers,
> > >>> Jark
> > >>>
> > >>> On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:
> > >>>
> >  +1, thanks for the efforts Becket!
> > 
> >  Best Regards,
> >  Yu
> > 
> > 
> >  On Mon, 3 Feb 2020 at 17:52, Becket Qin 
> wrote:
> > 
> > > Bump up the thread.
> > >
> > > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
> >  wrote:
> > >
> > >> Hi Folks,
> > >>
> > >> I'd like to resume the voting thread for FlIP-27.
> > >>
> > >> Please note that the FLIP wiki has been updated to reflect the
> > >> latest
> > >> discussions in the discussion thread.
> > >>
> > >> To avoid confusion, I'll only count the votes casted after this
> > >>> point.
> > >>
> > >> FLIP wiki:
> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > >> %3A+Refactor+Source+Interface
> > >>
> > >> Discussion thread:
> > >>
> > >>
> > >
> > 
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> > >>
> > >> The vote will last for at least 72 hours, following the consensus
> >  voting
> > >> process.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Thu, Dec 5, 2019 at 10:31 AM jincheng sun <
> > >>> sunjincheng...@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> +1 (binding), and looking forward to seeing the new interface in
> > >> the
> > >>> master.
> > >>>
> > >>> Best,
> > >>> Jincheng
> > >>>
> > >>> Becket Qin  于2019年12月5日周四 上午8:05写道:
> > >>>
> >  Hi all,
> > 
> >  I would like to start the vote for FLIP-27 which proposes to
> > > introduce a
> >  new Source connector interface to address a few problems in the
> > > existing
> >  source connector. The main goals of the the FLIP are following:
> > 
> >  1. Unify the Source interface in Flink for batch and stream.
> >  2. Significantly reduce the work for developers to develop new
> >  source
> >  connectors.
> >  3. Provide a common abstraction for all the sources, as well as
> > >> a
> > >>> mechanism
> >  to allow source subtasks to coordinate among themselves.
> > 
> >  The vote will last for at least 72 hours, following the
> > >> consensus
> > > voting
> >  process.
> > 
> >  FLIP wiki:
> > 
> > 
> > >>>
> > >
> > 
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > 
> >  Discussion thread:
> > 
> > 
> > >>>
> > >
> > 
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> 

[jira] [Created] (FLINK-15956) Introduce an HttpRemoteFunction

2020-02-07 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-15956:


 Summary: Introduce an HttpRemoteFunction
 Key: FLINK-15956
 URL: https://issues.apache.org/jira/browse/FLINK-15956
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Igal Shilman
 Fix For: statefun-1.1


This issue introduces the remote HttpFunction based on a multi language 
protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15955) Split RemoteFunction into GRPC Function/HttpFunction

2020-02-07 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-15955:


 Summary: Split RemoteFunction into GRPC Function/HttpFunction
 Key: FLINK-15955
 URL: https://issues.apache.org/jira/browse/FLINK-15955
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Igal Shilman


This issue deals with preparation required to introduce of a new type of a 
remote function (was previously only gRPC) now we will have an HTTP based 
remote function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15954) Introduce a PersistedTable to the SDK

2020-02-07 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-15954:


 Summary: Introduce a PersistedTable to the SDK
 Key: FLINK-15954
 URL: https://issues.apache.org/jira/browse/FLINK-15954
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Igal Shilman


A PersistedTable is a new SDK primitive that supports adding and removing keys 
at runtime, similar to a MapState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Question: Modifying Checkpoint Interval at runtime

2020-02-07 Thread Congxian Qiu
In current Flink, the checkpoint interval can not be modified after job
started, If you want to change the code to implement this feature, maybe
you can try to see the logic in CheckpointCoordinator, and the usage of
function checkMinPauseBetweenCheckpoints[1] and
scheduleTriggerWithDelay()[2]

[1]
https://github.com/apache/flink/blob/19689cb8ddb7cdc57a99d02217e67880c5e938a5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1463
[2]
https://github.com/apache/flink/blob/19689cb8ddb7cdc57a99d02217e67880c5e938a5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1485
Best,
Congxian


Morgan Geldenhuys  于2020年2月7日周五 下午7:12写道:

> I am working with the Flink at the moment and am interested in modifying
> the codebase so that the checkpoint interval can be changed at runtime,
> (e.g. if it was set to kickoff the distributed snapshot process every 10
> seconds, and then I wanted to change it to every 15 seconds instead
> without restarting). I was wondering, in your professional opinion, is
> this possible? Of course I am not expecting this to be a simple
> implementation, however, what considerations would I need to take into
> account and which parts of the codebase would this modification involve.
>


[jira] [Created] (FLINK-15953) Job Status is hard to read for some Statuses

2020-02-07 Thread Gary Yao (Jira)
Gary Yao created FLINK-15953:


 Summary: Job Status is hard to read for some Statuses
 Key: FLINK-15953
 URL: https://issues.apache.org/jira/browse/FLINK-15953
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.9.2, 1.10.0
Reporter: Gary Yao
Assignee: Yadong Xie
 Fix For: 1.10.1, 1.11.0
 Attachments: 769B08ED-D644-4DEB-BA4C-14B18E562A52.png

The job status {{RESTARTING}} is rendered in a white font on white background 
which makes it hard to read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15952) Don't use raw byte[] in MultiplexedState

2020-02-07 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-15952:


 Summary: Don't use raw byte[] in MultiplexedState 
 Key: FLINK-15952
 URL: https://issues.apache.org/jira/browse/FLINK-15952
 Project: Flink
  Issue Type: Bug
Reporter: Igal Shilman


Currently multiplexed state is backed by Flink's MapState, and uses raw byte[] 
for keys,
this breaks with the FsStateBackend.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15951) Travis build fails on npm install

2020-02-07 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-15951:
-

 Summary: Travis build fails on npm install
 Key: FLINK-15951
 URL: https://issues.apache.org/jira/browse/FLINK-15951
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.11.0
Reporter: Roman Khachatryan


[https://api.travis-ci.com/v3/job/284796450/log.txt]

 
{code:java}
 10:58:09.614 [INFO] --- frontend-maven-plugin:1.6:npm (npm install) @ 
flink-runtime-web_2.11 ---
 10:58:09.616 [INFO] Running 'npm ci --cache-max=0 --no-save' in 
/home/travis/build/flink-ci/flink/flink-runtime-web/web-dashboard
 10:58:12.732 [ERROR] npm ERR! code E409
 10:58:12.733 [ERROR] npm ERR! 409 Conflict: 
@angular/platform-browser-dynamic@7.2.10
 10:58:13.545 [ERROR]
 10:58:13.545 [ERROR] npm ERR! A complete log of this run can be found in:
 10:58:13.545 [ERROR] npm ERR! 
/home/travis/.npm/_logs/2020-02-07T10_58_12_741Z-debug.log

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] Allow for getting late elements via side outputs from IntervalJoinOperator

2020-02-07 Thread Bakht Kahloon
I was hoping to make an enhancement to the interval join operator in
regards to having access to late elements from either the left or right
stream. I was thinking this could possibly be done using the side outputs
feature.


Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-07 Thread Jark Wu
Cool! Looking forward to the design doc.

Best,
Jark

On Fri, 7 Feb 2020 at 17:26, Timo Walther  wrote:

> Hi Zhenghua,
>
> Jark is right. The reason why we haven't updated those interfaces yet is
> because we are actually would like to introduce new interfaces. We
> should target new interfaces in this release. Even a short-term fix as
> you proposed with `getRecordDataType` does actually not help as Jingsong
> pointed out because we cannot represent tuples in DataType and are also
> not planning to support them natively but only as a structured type in
> the future.
>
> In my envisioned design, the new sink interface should just always get a
> `ChangeRow` which is never serialized and just a data structure for
> communicating between the wrapping sink function and the returned sink
> function by the table sink.
>
> Let me sketch a rough design document that I will share with you
> shortly. Then we could also discuss alternatives.
>
> Thanks,
> Timo
>
>
> On 04.02.20 04:18, Zhenghua Gao wrote:
> > Hi Jark, thanks for your comments.
>  IIUC, the framework will only recognize getRecordDataType and
>  ignore getConsumedDataType for UpsertStreamTableSink, is that right?
> > Your are right.
> >
>  getRecordDataType is little confused as UpsertStreamTableSink already
> has
>  three getXXXType().
> > the getRecordType and getOutputType is deprecated and mainly for backward
> > compatibility.
> >
> > *Best Regards,*
> > *Zhenghua Gao*
> >
> >
> > On Mon, Feb 3, 2020 at 10:11 PM Jark Wu  wrote:
> >
> >> Thanks Zhenghua for starting this discussion.
> >>
> >> Currently, all the UpsertStreamTableSinks can't upgrade to the new type
> >> system which affects usability a lot.
> >> I hope we can fix that in 1.11.
> >>
> >> I'm find with *getRecordDataType* for a temporary solution.
> >> IIUC, the framework will only recognize getRecordDataType and
> >> ignore getConsumedDataType for UpsertStreamTableSink, is that right?
> >>
> >> I guess Timo are planning to design a new source/sink interface which
> will
> >> also fix this problem, but I'm not sure the timelines. cc @Timo
> >> It would be better if we can have a new and complete interface, because
> >> getRecordDataType is little confused as UpsertStreamTableSink already
> has
> >> three getXXXType().
> >>
> >> Best,
> >> Jark
> >>
> >>
> >> On Mon, 3 Feb 2020 at 17:48, Zhenghua Gao  wrote:
> >>
> >>> Hi Jingsong,  For now, only UpsertStreamTableSink and
> >>> RetractStreamTableSink consumes JTuple2
> >>> So the 'getConsumedDataType' interface is not necessary in validate &
> >>> codegen phase.
> >>> See
> >>>
> >>>
> >>
> https://github.com/apache/flink/pull/10880/files#diff-137d090bd719f3a99ae8ba6603ed81ebR52
> >>>   and
> >>>
> >>>
> >>
> https://github.com/apache/flink/pull/10880/files#diff-8405c17e5155aa63d16e497c4c96a842R304
> >>>
> >>> What about stay the same to use RAW type?
> >>>
> >>> *Best Regards,*
> >>> *Zhenghua Gao*
> >>>
> >>>
> >>> On Mon, Feb 3, 2020 at 4:59 PM Jingsong Li 
> >> wrote:
> >>>
>  Hi Zhenghua,
> 
>  The *getRecordDataType* looks good to me.
> 
>  But the main problem is how to represent the tuple type in DataType. I
>  understand that it is necessary to use StructuredType, but at present,
>  planner does not support StructuredType, so the other way is to
> support
>  StructuredType.
> 
>  Best,
>  Jingsong Lee
> 
>  On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:
> 
> > Would overriding `getConsumedDataType` do the job?
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao 
> >> wrote:
> >
> >> Hi all,
> >>
> >> FLINK-12254[1] [2] updated TableSink and related interfaces to new
> >>> type
> >> system which
> >> allows connectors use the new type system based on DataTypes.
> >>
> >> But FLINK-12911 port UpsertStreamTableSink and
> >> RetractStreamTableSink
> >>> to
> >> flink-api-java-bridge and returns TypeInformation of the requested
>  record
> >> type which
> >> can't support types with precision and scale, e.g. TIMESTAMP(p),
> >> DECIMAL(p,s).
> >>
> >> /**
> >>   * Returns the requested record type.
> >>   */
> >> TypeInformation getRecordType();
> >>
> >>
> >> A proposal is deprecating the *getRecordType* API and adding a
> >> *getRecordDataType* API instead to return the data type of the
> >>> requested
> >> record. I have filed the issue FLINK-15469 and
> >> an initial PR to verify it.
> >>
> >> What do you think about this API changes? Any feedback are
> >>> appreciated.
> >> [1] https://issues.apache.org/jira/browse/FLINK-12254
> >> [2] https://github.com/apache/flink/pull/8596
> >> [3] https://issues.apache.org/jira/browse/FLINK-15469
> >>
> >> *Best Regards,*
> >> *Zhenghua Gao*
> >>
> >
> 
>  --
>  Best, Jingsong Lee
> 
> >>>
> >>
> >
>
>


Question: Modifying Checkpoint Interval at runtime

2020-02-07 Thread Morgan Geldenhuys
I am working with the Flink at the moment and am interested in modifying 
the codebase so that the checkpoint interval can be changed at runtime, 
(e.g. if it was set to kickoff the distributed snapshot process every 10 
seconds, and then I wanted to change it to every 15 seconds instead 
without restarting). I was wondering, in your professional opinion, is 
this possible? Of course I am not expecting this to be a simple 
implementation, however, what considerations would I need to take into 
account and which parts of the codebase would this modification involve.


Re: performances of S3 writing with many buckets in parallel

2020-02-07 Thread Kostas Kloudas
Hi Enrico,

Nice to hear from you and thanks for checking it out!

This can be helpful for people using the BucketingSink but I would
recommend you to switch to the StreamingFileSink which is the "new
version" of the BucketingSink. In fact the BucketingSink is going to
be removed in one of the following releases, as it is deprecated for
quite a while.

If you try the StreamingFileSink, let us know if the problem persists.

Cheers,
Kostas


On Fri, Feb 7, 2020 at 11:20 AM Enrico Agnoli  wrote:
>
> I finally found the time to dig a little more on this and found the real 
> problem.
> The culprit of the slow-down is this piece of code:
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L543-L551
>
> This alone takes around 4-5 secs, with a total of 6 secs to open the file. 
> Logs from an instrumented call:
> 2020-02-07 08:51:05,825 INFO  BucketingSink  - openNewPartFile FS verification
> 2020-02-07 08:51:09,906 INFO  BucketingSink  - openNewPartFile FS 
> verification - done
> 2020-02-07 08:51:11,181 INFO  BucketingSink  - openNewPartFile FS - completed 
> partPath = s3a://
>
> This together with the default setup of the bucketing sink with 60 secs 
> inactivity rollover
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L195
> means that with more than 10 parallel bucket on a slot by the time we finish 
> creating the last bucket the first one became stale, so needs to be rotated 
> generating a blocking situation.
>
> We solved this by deleting the FS check mentioned above (now the file opening 
> takes ~1.2sec) and set the default inactive threshold to 5 mins. With this 
> changes we can easily handle more than 200 buckets per slot (once the job 
> takes speed it will ingest on all the slots so postponing the inactive 
> timeout)
>
> -Enrico


Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-07 Thread Kurt Young
Thanks all for your feedback, since no objection has been raised, I've
created
https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.

Since this issue would require lots of tests adjustment before it really
happen,
it won't be done in a short time. Feel free to give feedback anytime here
or in jira
if you have other opinions.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:26 PM Kurt Young  wrote:

> Hi Zhenghua,
>
> After removing TableSource::getTableSchema, during optimization, I could
> imagine
> the schema information might come from relational nodes such as TableScan.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young  wrote:
>
>> Hi Jingsong,
>>
>> Yes current TableFactory is not ideal for users to use either. I think we
>> should
>> also spend some time in 1.11 to improve the usability of TableEnvironment
>> when
>> users trying to read or write something. Automatic scheme inference would
>> be
>> one of them. Other from this, we also support convert a DataStream to
>> Table, which
>> can serve some flexible requirements to read or write data.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao  wrote:
>>
>>> +1 to remove these methods.
>>>
>>> One concern about invocations of TableSource::getTableSchema:
>>> By removing such methods, we can stop calling TableSource::getTableSchema
>>> in some place(such
>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>
>>> But in other place we need field types and names of the table source(such
>>> as
>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>> PushProjectIntoTableSourceScanRule,
>>> CommonLookupJoin).  So how should we deal with this?
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young  wrote:
>>>
>>> > Hi all,
>>> >
>>> > I'd like to bring up a discussion about removing registration of
>>> > TableSource and
>>> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>>> > affected
>>> > method would be:
>>> >
>>> > TableEnvironment::registerTableSource
>>> > TableEnvironment::fromTableSource
>>> > TableEnvironment::registerTableSink
>>> > ConnectTableDescriptor::registerTableSource
>>> > ConnectTableDescriptor::registerTableSink
>>> > ConnectTableDescriptor::registerTableSourceAndSink
>>> >
>>> > (Most of them are already deprecated, except for
>>> > TableEnvironment::fromTableSource,
>>> > which was intended to deprecate but missed by accident).
>>> >
>>> > FLIP-64 [1] already explained why we want to deprecate TableSource &
>>> > TableSink from
>>> > user's interface. In a short word, these interfaces should only read &
>>> > write the physical
>>> > representation of the table, and they are not fitting well after we
>>> already
>>> > introduced some
>>> > logical table fields such as computed column, watermarks.
>>> >
>>> > Another reason is the exposure of registerTableSource in Table Env just
>>> > make the whole
>>> > SQL protocol opposite. TableSource should be used as a reader of
>>> table, it
>>> > should rely on
>>> > other metadata information held by framework, which eventually comes
>>> from
>>> > DDL or
>>> > ConnectDescriptor. But if we register a TableSource to Table Env, we
>>> have
>>> > no choice but
>>> > have to rely on TableSource::getTableSchema. It will make the design
>>> > obscure, sometimes
>>> > TableSource should trust the information comes from framework, but
>>> > sometimes it should
>>> > also generate its own schema information.
>>> >
>>> > Furthermore, if the authority about schema information is not clear, it
>>> > will make things much
>>> > more complicated if we want to improve the table api usability such as
>>> > introducing automatic
>>> > schema inference in the near future.
>>> >
>>> > Since this is an API break change, I've also included user mailing
>>> list to
>>> > gather more feedbacks.
>>> >
>>> > Best,
>>> > Kurt
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>> >
>>>
>>


[jira] [Created] (FLINK-15950) Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-07 Thread Kurt Young (Jira)
Kurt Young created FLINK-15950:
--

 Summary: Remove registration of TableSource/TableSink in Table Env 
and ConnectTableDescriptor
 Key: FLINK-15950
 URL: https://issues.apache.org/jira/browse/FLINK-15950
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Kurt Young
 Fix For: 1.11.0


This ticket would track the removal of direct TableSource/TableSink 
registration to TableEnvironment. Since we have lots of tests rely on this, I 
will create some sub tasks to divide this big one. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: performances of S3 writing with many buckets in parallel

2020-02-07 Thread Enrico Agnoli
I finally found the time to dig a little more on this and found the real 
problem.
The culprit of the slow-down is this piece of code:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L543-L551

This alone takes around 4-5 secs, with a total of 6 secs to open the file. Logs 
from an instrumented call:
2020-02-07 08:51:05,825 INFO  BucketingSink  - openNewPartFile FS verification
2020-02-07 08:51:09,906 INFO  BucketingSink  - openNewPartFile FS verification 
- done
2020-02-07 08:51:11,181 INFO  BucketingSink  - openNewPartFile FS - completed 
partPath = s3a://

This together with the default setup of the bucketing sink with 60 secs 
inactivity rollover 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L195
 
means that with more than 10 parallel bucket on a slot by the time we finish 
creating the last bucket the first one became stale, so needs to be rotated 
generating a blocking situation.

We solved this by deleting the FS check mentioned above (now the file opening 
takes ~1.2sec) and set the default inactive threshold to 5 mins. With this 
changes we can easily handle more than 200 buckets per slot (once the job takes 
speed it will ingest on all the slots so postponing the inactive timeout)

-Enrico


Re: [DISCUSS] Does removing deprecated interfaces needs another FLIP

2020-02-07 Thread Kurt Young
Thanks for the clarification, that make sense to me.

Best,
Kurt


On Fri, Feb 7, 2020 at 4:56 PM Timo Walther  wrote:

> Hi Kurt,
>
> I agree with Aljoscha. We don't need to introduce a big process or do
> voting but we should ensure that all stakeholders are notified and have
> a chance to raise doubts.
>
> Regards,
> Timo
>
>
> On 07.02.20 09:51, Aljoscha Krettek wrote:
> > I would say a ML discussion or even a Jira issue is enough because
> >
> > a) the methods are already deprecated
> > b) the methods are @PublicEvolving, which I don't consider a super
> > strong guarantee to users (we still shouldn't remove them lightly, but
> > we can if we have to...)
> >
> > Best,
> > Aljoscha
> >
> > On 07.02.20 04:40, Kurt Young wrote:
> >> Hi dev,
> >>
> >> Currently I want to remove some already deprecated methods from
> >> TableEnvironment which annotated with @PublicEnvolving. And I also
> >> created
> >> a discussion thread [1] to both dev and user mailing lists to gather
> >> feedback on that. But I didn't find any matching rule in Flink bylaw
> >> [2] to
> >> follow. Since this is definitely a API breaking change, but we already
> >> voted for that back in the FLIP which deprecated these methods.
> >>
> >> I'm not sure about how to proceed for now. Looks like I have 2 choices:
> >>
> >> 1. If no one raise any objections in discuss thread in like 72 hours, I
> >> will create a jira to start working on it.
> >> 2. Since this is a API breaking change, I need to open another FLIP to
> >> tell
> >> that I want to remove these deprecated methods. This seems a little
> >> redundant with the first FLIP which deprecate the methods.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Kurt
> >>
> >> [1]
> >>
> https://lists.apache.org/thread.html/r98af66feb531ce9e6b94914e44391609cad857e16ea84db5357c1980%40%3Cdev.flink.apache.org%3E
> >>
> >> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> >>
>
>


[jira] [Created] (FLINK-15949) Harden jackson dependency constraints

2020-02-07 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-15949:


 Summary: Harden jackson dependency constraints
 Key: FLINK-15949
 URL: https://issues.apache.org/jira/browse/FLINK-15949
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.10.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


Replace the individual jackson dependency management entries with the jackson 
bom, and introduce an enforcer check to ban older jackson dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15948) Resource will be wasted when the task manager memory is not a multiple of Yarn minimum allocation

2020-02-07 Thread Yang Wang (Jira)
Yang Wang created FLINK-15948:
-

 Summary: Resource will be wasted when the task manager memory is 
not a multiple of Yarn minimum allocation
 Key: FLINK-15948
 URL: https://issues.apache.org/jira/browse/FLINK-15948
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.10.0
Reporter: Yang Wang


If the {{taskmanager.memory.process.size}} is set to 2000m and the Yarn minimum 
allocation is 128m, we will get a container with 2048m. Currently, 
{{TaskExecutorProcessSpec}} is built with 2000m, so we will have 48m wasted and 
they could not be used by Flink.

I think Flink has accounted all the jvm heap, off-heap, overhead resources. So 
we should not leave these free memory there. And i suggest to update the 
{{TaskExecutorProcessSpec}} according to the Yarn allocated container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Gyula Fóra
Maybe we could improve the Pipeline interface in the long run, but as a
temporary solution the JobClient could expose a getPipeline() method.

That way the implementation of the JobListener could check if its a
StreamGraph or a Plan.

How bad does that sound?

Gyula

On Fri, Feb 7, 2020 at 10:19 AM Gyula Fóra  wrote:

> Hi Aljoscha!
>
> That's a valid concert but we should try to figure something out, many
> users need this before they can use Flink.
>
> I think the closest thing we have right now is the StreamGraph. In
> contrast with the JobGraph  the StreamGraph is pretty nice from a metadata
> perspective :D
> The big downside of exposing the StreamGraph is that we don't have it in
> batch. On the other hand we could expose the JobGraph but then the
> integration component would still have to do the heavy lifting for batch
> and stream specific operators and UDFs.
>
> Instead of exposing either StreamGraph/JobGraph, we could come up with a
> metadata like representation for the users but that would be like
> implementing Atlas integration itself without Atlas dependencies :D
>
> As a comparison point, this is how it works in Storm:
> Every operator (spout/bolt), stores a config map (string->string) with all
> the metadata such as operator class, and the operator specific configs. The
> Atlas hook works on this map.
> This is very fragile and depends on a lot of internals. Kind of like
> exposing the JobGraph but much worse. I think we can do better.
>
> Gyula
>
> On Fri, Feb 7, 2020 at 9:55 AM Aljoscha Krettek 
> wrote:
>
>> If we need it, we can probably beef up the JobListener to allow
>> accessing some information about the whole graph or sources and sinks.
>> My only concern right now is that we don't have a stable interface for
>> our job graphs/pipelines right now.
>>
>> Best,
>> Aljoscha
>>
>> On 06.02.20 23:00, Gyula Fóra wrote:
>> > Hi Jeff & Till!
>> >
>> > Thanks for the feedback, this is exactly the discussion I was looking
>> for.
>> > The JobListener looks very promising if we can expose the JobGraph
>> somehow
>> > (correct me if I am wrong but it is not accessible at the moment).
>> >
>> > I did not know about this feature that's why I added my JobSubmission
>> hook
>> > which was pretty similar but only exposing the JobGraph. In general I
>> like
>> > the listener better and I would not like to add anything extra if we can
>> > avoid it.
>> >
>> > Actually the bigger part of the integration work that will need more
>> > changes in Flink will be regarding the accessibility of sources/sinks
>> from
>> > the JobGraph and their specific properties. For instance at the moment
>> the
>> > Kafka sources and sinks do not expose anything publicly such as topics,
>> > kafka configs, etc. Same goes for other data connectors that we need to
>> > integrate in the long run. I guess there will be a separate thread on
>> this
>> > once we iron out the initial integration points :)
>> >
>> > I will try to play around with the JobListener interface tomorrow and
>> see
>> > if I can extend it to meet our needs.
>> >
>> > Cheers,
>> > Gyula
>> >
>> > On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:
>> >
>> >> Hi Gyula,
>> >>
>> >> Flink 1.10 introduced JobListener which is invoked after job
>> submission and
>> >> finished.  May we can add api on JobClient to get what info you needed
>> for
>> >> altas integration.
>> >>
>> >>
>> >>
>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46
>> >>
>> >>
>> >> Gyula Fóra  于2020年2月5日周三 下午7:48写道:
>> >>
>> >>> Hi all!
>> >>>
>> >>> We have started some preliminary work on the Flink - Atlas
>> integration at
>> >>> Cloudera. It seems that the integration will require some new hook
>> >>> interfaces at the jobgraph generation and submission phases, so I
>> >> figured I
>> >>> will open a discussion thread with my initial ideas to get some early
>> >>> feedback.
>> >>>
>> >>> *Minimal background*
>> >>> Very simply put Apache Atlas is a data governance framework that
>> stores
>> >>> metadata for our data and processing logic to track ownership, lineage
>> >> etc.
>> >>> It is already integrated with systems like HDFS, Kafka, Hive and many
>> >>> others.
>> >>>
>> >>> Adding Flink integration would mean that we can track the input output
>> >> data
>> >>> of our Flink jobs, their owners and how different Flink jobs are
>> >> connected
>> >>> to each other through the data they produce (lineage). This seems to
>> be a
>> >>> very big deal for a lot of companies :)
>> >>>
>> >>> *Flink - Atlas integration in a nutshell*
>> >>> In order to integrate with Atlas we basically need 2 things.
>> >>>   - Flink entity definitions
>> >>>   - Flink Atlas hook
>> >>>
>> >>> The entity definition is the easy part. It is a json that contains the
>> >>> objects (entities) that we want to store for any give Flink job. As a
>> >>> starter we could have a single FlinkApplication entity that has a set
>> of
>> >>> 

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-07 Thread Timo Walther

Hi Zhenghua,

Jark is right. The reason why we haven't updated those interfaces yet is 
because we are actually would like to introduce new interfaces. We 
should target new interfaces in this release. Even a short-term fix as 
you proposed with `getRecordDataType` does actually not help as Jingsong 
pointed out because we cannot represent tuples in DataType and are also 
not planning to support them natively but only as a structured type in 
the future.


In my envisioned design, the new sink interface should just always get a 
`ChangeRow` which is never serialized and just a data structure for 
communicating between the wrapping sink function and the returned sink 
function by the table sink.


Let me sketch a rough design document that I will share with you 
shortly. Then we could also discuss alternatives.


Thanks,
Timo


On 04.02.20 04:18, Zhenghua Gao wrote:

Hi Jark, thanks for your comments.

IIUC, the framework will only recognize getRecordDataType and
ignore getConsumedDataType for UpsertStreamTableSink, is that right?

Your are right.


getRecordDataType is little confused as UpsertStreamTableSink already has
three getXXXType().

the getRecordType and getOutputType is deprecated and mainly for backward
compatibility.

*Best Regards,*
*Zhenghua Gao*


On Mon, Feb 3, 2020 at 10:11 PM Jark Wu  wrote:


Thanks Zhenghua for starting this discussion.

Currently, all the UpsertStreamTableSinks can't upgrade to the new type
system which affects usability a lot.
I hope we can fix that in 1.11.

I'm find with *getRecordDataType* for a temporary solution.
IIUC, the framework will only recognize getRecordDataType and
ignore getConsumedDataType for UpsertStreamTableSink, is that right?

I guess Timo are planning to design a new source/sink interface which will
also fix this problem, but I'm not sure the timelines. cc @Timo
It would be better if we can have a new and complete interface, because
getRecordDataType is little confused as UpsertStreamTableSink already has
three getXXXType().

Best,
Jark


On Mon, 3 Feb 2020 at 17:48, Zhenghua Gao  wrote:


Hi Jingsong,  For now, only UpsertStreamTableSink and
RetractStreamTableSink consumes JTuple2
So the 'getConsumedDataType' interface is not necessary in validate &
codegen phase.
See



https://github.com/apache/flink/pull/10880/files#diff-137d090bd719f3a99ae8ba6603ed81ebR52

  and



https://github.com/apache/flink/pull/10880/files#diff-8405c17e5155aa63d16e497c4c96a842R304


What about stay the same to use RAW type?

*Best Regards,*
*Zhenghua Gao*


On Mon, Feb 3, 2020 at 4:59 PM Jingsong Li 

wrote:



Hi Zhenghua,

The *getRecordDataType* looks good to me.

But the main problem is how to represent the tuple type in DataType. I
understand that it is necessary to use StructuredType, but at present,
planner does not support StructuredType, so the other way is to support
StructuredType.

Best,
Jingsong Lee

On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:


Would overriding `getConsumedDataType` do the job?

Best,
Kurt


On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao 

wrote:



Hi all,

FLINK-12254[1] [2] updated TableSink and related interfaces to new

type

system which
allows connectors use the new type system based on DataTypes.

But FLINK-12911 port UpsertStreamTableSink and

RetractStreamTableSink

to

flink-api-java-bridge and returns TypeInformation of the requested

record

type which
can't support types with precision and scale, e.g. TIMESTAMP(p),
DECIMAL(p,s).

/**
  * Returns the requested record type.
  */
TypeInformation getRecordType();


A proposal is deprecating the *getRecordType* API and adding a
*getRecordDataType* API instead to return the data type of the

requested

record. I have filed the issue FLINK-15469 and
an initial PR to verify it.

What do you think about this API changes? Any feedback are

appreciated.

[1] https://issues.apache.org/jira/browse/FLINK-12254
[2] https://github.com/apache/flink/pull/8596
[3] https://issues.apache.org/jira/browse/FLINK-15469

*Best Regards,*
*Zhenghua Gao*





--
Best, Jingsong Lee











Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Gyula Fóra
Hi Aljoscha!

That's a valid concert but we should try to figure something out, many
users need this before they can use Flink.

I think the closest thing we have right now is the StreamGraph. In contrast
with the JobGraph  the StreamGraph is pretty nice from a metadata
perspective :D
The big downside of exposing the StreamGraph is that we don't have it in
batch. On the other hand we could expose the JobGraph but then the
integration component would still have to do the heavy lifting for batch
and stream specific operators and UDFs.

Instead of exposing either StreamGraph/JobGraph, we could come up with a
metadata like representation for the users but that would be like
implementing Atlas integration itself without Atlas dependencies :D

As a comparison point, this is how it works in Storm:
Every operator (spout/bolt), stores a config map (string->string) with all
the metadata such as operator class, and the operator specific configs. The
Atlas hook works on this map.
This is very fragile and depends on a lot of internals. Kind of like
exposing the JobGraph but much worse. I think we can do better.

Gyula

On Fri, Feb 7, 2020 at 9:55 AM Aljoscha Krettek  wrote:

> If we need it, we can probably beef up the JobListener to allow
> accessing some information about the whole graph or sources and sinks.
> My only concern right now is that we don't have a stable interface for
> our job graphs/pipelines right now.
>
> Best,
> Aljoscha
>
> On 06.02.20 23:00, Gyula Fóra wrote:
> > Hi Jeff & Till!
> >
> > Thanks for the feedback, this is exactly the discussion I was looking
> for.
> > The JobListener looks very promising if we can expose the JobGraph
> somehow
> > (correct me if I am wrong but it is not accessible at the moment).
> >
> > I did not know about this feature that's why I added my JobSubmission
> hook
> > which was pretty similar but only exposing the JobGraph. In general I
> like
> > the listener better and I would not like to add anything extra if we can
> > avoid it.
> >
> > Actually the bigger part of the integration work that will need more
> > changes in Flink will be regarding the accessibility of sources/sinks
> from
> > the JobGraph and their specific properties. For instance at the moment
> the
> > Kafka sources and sinks do not expose anything publicly such as topics,
> > kafka configs, etc. Same goes for other data connectors that we need to
> > integrate in the long run. I guess there will be a separate thread on
> this
> > once we iron out the initial integration points :)
> >
> > I will try to play around with the JobListener interface tomorrow and see
> > if I can extend it to meet our needs.
> >
> > Cheers,
> > Gyula
> >
> > On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:
> >
> >> Hi Gyula,
> >>
> >> Flink 1.10 introduced JobListener which is invoked after job submission
> and
> >> finished.  May we can add api on JobClient to get what info you needed
> for
> >> altas integration.
> >>
> >>
> >>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46
> >>
> >>
> >> Gyula Fóra  于2020年2月5日周三 下午7:48写道:
> >>
> >>> Hi all!
> >>>
> >>> We have started some preliminary work on the Flink - Atlas integration
> at
> >>> Cloudera. It seems that the integration will require some new hook
> >>> interfaces at the jobgraph generation and submission phases, so I
> >> figured I
> >>> will open a discussion thread with my initial ideas to get some early
> >>> feedback.
> >>>
> >>> *Minimal background*
> >>> Very simply put Apache Atlas is a data governance framework that stores
> >>> metadata for our data and processing logic to track ownership, lineage
> >> etc.
> >>> It is already integrated with systems like HDFS, Kafka, Hive and many
> >>> others.
> >>>
> >>> Adding Flink integration would mean that we can track the input output
> >> data
> >>> of our Flink jobs, their owners and how different Flink jobs are
> >> connected
> >>> to each other through the data they produce (lineage). This seems to
> be a
> >>> very big deal for a lot of companies :)
> >>>
> >>> *Flink - Atlas integration in a nutshell*
> >>> In order to integrate with Atlas we basically need 2 things.
> >>>   - Flink entity definitions
> >>>   - Flink Atlas hook
> >>>
> >>> The entity definition is the easy part. It is a json that contains the
> >>> objects (entities) that we want to store for any give Flink job. As a
> >>> starter we could have a single FlinkApplication entity that has a set
> of
> >>> inputs and outputs. These inputs/outputs are other Atlas entities that
> >> are
> >>> already defines such as Kafka topic or Hbase table.
> >>>
> >>> The Flink atlas hook will be the logic that creates the entity instance
> >> and
> >>> uploads it to Atlas when we start a new Flink job. This is the part
> where
> >>> we implement the core logic.
> >>>
> >>> *Job submission hook*
> >>> In order to implement the Atlas hook we need a place where we can
> inspect
> >>> 

Re: [VOTE] Improve TableFactory to add Context

2020-02-07 Thread Leonard Xu
+1(non-binding), nice design!  
after read full discussion mail list. 

Best,
Leonard Xu

> 在 2020年2月6日,23:12,Timo Walther  写道:
> 
> +1
> 
> On 06.02.20 05:54, Bowen Li wrote:
>> +1, LGTM
>> On Tue, Feb 4, 2020 at 11:28 PM Jark Wu  wrote:
>>> +1 form my side.
>>> Thanks for driving this.
>>> 
>>> Btw, could you also attach a JIRA issue with the changes described in it,
>>> so that users can find the issue through the mailing list in the future.
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Wed, 5 Feb 2020 at 13:38, Kurt Young  wrote:
>>> 
 +1 from my side.
 
 Best,
 Kurt
 
 
 On Wed, Feb 5, 2020 at 10:59 AM Jingsong Li 
 wrote:
 
> Hi all,
> 
> Interface updated.
> Please re-vote.
> 
> Best,
> Jingsong Lee
> 
> On Tue, Feb 4, 2020 at 1:28 PM Jingsong Li 
 wrote:
> 
>> Hi all,
>> 
>> I would like to start the vote for the improve of
>> TableFactory, which is discussed and
>> reached a consensus in the discussion thread[2].
>> 
>> The vote will be open for at least 72 hours. I'll try to close it
>> unless there is an objection or not enough votes.
>> 
>> [1]
>> 
> 
 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html
>> 
>> Best,
>> Jingsong Lee
>> 
> 
> 
> --
> Best, Jingsong Lee
> 
 
>>> 
> 



Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2020-02-07 Thread Yadong Xie
Hi Till

FLIP-75 has been open since September, and the design doc has been iterated
over 3 versions and more than 20 patches.
I had a try, but it is hard to split the design docs into sub FLIP and keep
all the discussion history at the same time.

Maybe it is better to start another discussion to talk about the individual
sub FLIP voting? and make the next FLIP follow the new practice if possible.

Till Rohrmann  于2020年2月3日周一 下午6:28写道:

> I think there is no such description because we never did it before. I just
> figured that FLIP-75 could actually be a good candidate to start this
> practice. We would need a community discussion first, though.
>
> Cheers,
> Till
>
> On Mon, Feb 3, 2020 at 10:28 AM Yadong Xie  wrote:
>
> > Hi Till
> > I didn’t find how to create of sub flip at cwiki.apache.org
> > do you mean to create 9 more FLIPS instead of FLIP-75?
> >
> > Till Rohrmann  于2020年1月30日周四 下午11:12写道:
> >
> > > Would it be easier if FLIP-75 would be the umbrella FLIP and we would
> > vote
> > > on the individual improvements as sub FLIPs? Decreasing the scope
> should
> > > make things easier.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Jan 30, 2020 at 2:35 PM Robert Metzger 
> > > wrote:
> > >
> > > > Thanks a lot for this work! I believe the web UI is very important,
> in
> > > > particular to new users. I'm very happy to see that you are putting
> > > effort
> > > > into improving the visibility into Flink through the proposed
> changes.
> > > >
> > > > I can not judge if all the changes make total sense, but the
> discussion
> > > has
> > > > been open since September, and a good number of people have commented
> > in
> > > > the document.
> > > > I wonder if we can move this FLIP to the VOTing stage?
> > > >
> > > > On Wed, Jan 22, 2020 at 6:27 PM Till Rohrmann 
> > > > wrote:
> > > >
> > > > > Thanks for the update Yadong. Big +1 for the proposed improvements
> > for
> > > > > Flink's web UI. I think they will be super helpful for our users.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jan 7, 2020 at 10:00 AM Yadong Xie 
> > > wrote:
> > > > >
> > > > > > Hi everyone
> > > > > >
> > > > > > We have spent some time updating the documentation since the last
> > > > > > discussion.
> > > > > >
> > > > > > In short, the latest FLIP-75 contains the following
> > > proposal(including
> > > > > both
> > > > > > frontend and RestAPI)
> > > > > >
> > > > > >1. Job Level
> > > > > >   - better job backpressure detection
> > > > > >   - load more feature in job exception
> > > > > >   - show attempt history in the subtask
> > > > > >   - show attempt timeline
> > > > > >   - add pending slots
> > > > > >2. Task Manager Level
> > > > > >   - add more metrics
> > > > > >   - better log display
> > > > > >3. Job Manager Level
> > > > > >   - add metrics tab
> > > > > >   - better log display
> > > > > >
> > > > > > To help everyone better understand the proposal, we spent efforts
> > on
> > > > > making
> > > > > > an online POC .
> > > > > >
> > > > > > Now you can compare the difference between the new and old
> > > Web/RestAPI
> > > > > (the
> > > > > > link is inside the doc)!
> > > > > >
> > > > > > Here is the latest FLIP-75 doc:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > > > >
> > > > > > Looking forward to your feedback
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yadong
> > > > > >
> > > > > > lining jing  于2019年10月24日周四 下午2:11写道:
> > > > > >
> > > > > > > Hi all, I have updated the backend design in FLIP-75
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > > > >
> > > > > > > .
> > > > > > >
> > > > > > > Here are some brief introductions:
> > > > > > >
> > > > > > >- Add metric for manage memory FLINK-14406
> > > > > > >.
> > > > > > >- Expose TaskExecutor resource configurations to REST API
> > > > > FLINK-14422
> > > > > > >.
> > > > > > >- Add TaskManagerResourceInfo in TaskManagerDetailsInfo to
> > show
> > > > > > >TaskManager Resource FLINK-14435
> > > > > > >.
> > > > > > >
> > > > > > > I will continue to update the rest part of the backend design
> in
> > > the
> > > > > doc,
> > > > > > > let's keep discuss here, any feedback is appreciated.
> > > > > > >
> > > > > > > Yadong Xie  于2019年9月27日周五 上午10:13写道:
> > > > > > >
> > > > > > > > Hi all
> > > > > > > >
> > > > > > > > Flink Web UI is the main platform for most users to monitor
> > their
> > > > > jobs
> > > > > > > and
> > > > > > > > clusters. We have reconstructed Flink web in 1.9.0 version,
> but
> 

Re: [DISCUSS] Does removing deprecated interfaces needs another FLIP

2020-02-07 Thread Timo Walther

Hi Kurt,

I agree with Aljoscha. We don't need to introduce a big process or do 
voting but we should ensure that all stakeholders are notified and have 
a chance to raise doubts.


Regards,
Timo


On 07.02.20 09:51, Aljoscha Krettek wrote:

I would say a ML discussion or even a Jira issue is enough because

a) the methods are already deprecated
b) the methods are @PublicEvolving, which I don't consider a super 
strong guarantee to users (we still shouldn't remove them lightly, but 
we can if we have to...)


Best,
Aljoscha

On 07.02.20 04:40, Kurt Young wrote:

Hi dev,

Currently I want to remove some already deprecated methods from
TableEnvironment which annotated with @PublicEnvolving. And I also 
created

a discussion thread [1] to both dev and user mailing lists to gather
feedback on that. But I didn't find any matching rule in Flink bylaw 
[2] to

follow. Since this is definitely a API breaking change, but we already
voted for that back in the FLIP which deprecated these methods.

I'm not sure about how to proceed for now. Looks like I have 2 choices:

1. If no one raise any objections in discuss thread in like 72 hours, I
will create a jira to start working on it.
2. Since this is a API breaking change, I need to open another FLIP to 
tell

that I want to remove these deprecated methods. This seems a little
redundant with the first FLIP which deprecate the methods.

What do you think?

Best,
Kurt

[1]
https://lists.apache.org/thread.html/r98af66feb531ce9e6b94914e44391609cad857e16ea84db5357c1980%40%3Cdev.flink.apache.org%3E 


[2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws





Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Aljoscha Krettek
If we need it, we can probably beef up the JobListener to allow 
accessing some information about the whole graph or sources and sinks. 
My only concern right now is that we don't have a stable interface for 
our job graphs/pipelines right now.


Best,
Aljoscha

On 06.02.20 23:00, Gyula Fóra wrote:

Hi Jeff & Till!

Thanks for the feedback, this is exactly the discussion I was looking for.
The JobListener looks very promising if we can expose the JobGraph somehow
(correct me if I am wrong but it is not accessible at the moment).

I did not know about this feature that's why I added my JobSubmission hook
which was pretty similar but only exposing the JobGraph. In general I like
the listener better and I would not like to add anything extra if we can
avoid it.

Actually the bigger part of the integration work that will need more
changes in Flink will be regarding the accessibility of sources/sinks from
the JobGraph and their specific properties. For instance at the moment the
Kafka sources and sinks do not expose anything publicly such as topics,
kafka configs, etc. Same goes for other data connectors that we need to
integrate in the long run. I guess there will be a separate thread on this
once we iron out the initial integration points :)

I will try to play around with the JobListener interface tomorrow and see
if I can extend it to meet our needs.

Cheers,
Gyula

On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:


Hi Gyula,

Flink 1.10 introduced JobListener which is invoked after job submission and
finished.  May we can add api on JobClient to get what info you needed for
altas integration.


https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46


Gyula Fóra  于2020年2月5日周三 下午7:48写道:


Hi all!

We have started some preliminary work on the Flink - Atlas integration at
Cloudera. It seems that the integration will require some new hook
interfaces at the jobgraph generation and submission phases, so I

figured I

will open a discussion thread with my initial ideas to get some early
feedback.

*Minimal background*
Very simply put Apache Atlas is a data governance framework that stores
metadata for our data and processing logic to track ownership, lineage

etc.

It is already integrated with systems like HDFS, Kafka, Hive and many
others.

Adding Flink integration would mean that we can track the input output

data

of our Flink jobs, their owners and how different Flink jobs are

connected

to each other through the data they produce (lineage). This seems to be a
very big deal for a lot of companies :)

*Flink - Atlas integration in a nutshell*
In order to integrate with Atlas we basically need 2 things.
  - Flink entity definitions
  - Flink Atlas hook

The entity definition is the easy part. It is a json that contains the
objects (entities) that we want to store for any give Flink job. As a
starter we could have a single FlinkApplication entity that has a set of
inputs and outputs. These inputs/outputs are other Atlas entities that

are

already defines such as Kafka topic or Hbase table.

The Flink atlas hook will be the logic that creates the entity instance

and

uploads it to Atlas when we start a new Flink job. This is the part where
we implement the core logic.

*Job submission hook*
In order to implement the Atlas hook we need a place where we can inspect
the pipeline, create and send the metadata when the job starts. When we
create the FlinkApplication entity we need to be able to easily determine
the sources and sinks (and their properties) of the pipeline.

Unfortunately there is no JobSubmission hook in Flink that could execute
this logic and even if there was one there is a mismatch of abstraction
levels needed to implement the integration.
We could imagine a JobSubmission hook executed in the JobManager runner

as

this:

void onSuccessfulSubmission(JobGraph jobGraph, Configuration
configuration);

This is nice but the JobGraph makes it super difficult to extract sources
and UDFs to create the metadata entity. The atlas entity however could be
easily created from the StreamGraph object (used to represent the logical
flow) before the JobGraph is generated. To go around this limitation we
could add a JobGraphGeneratorHook interface:

void preProcess(StreamGraph streamGraph); void postProcess(JobGraph
jobGraph);

We could then generate the atlas entity in the preprocess step and add a
jobmission hook in the postprocess step that will simply send the already
baked in entity.

*This kinda works but...*
The approach outlined above seems to work and we have built a POC using

it.

Unfortunately it is far from nice as it exposes non-public APIs such as

the

StreamGraph. Also it feels a bit weird to have 2 hooks instead of one.

It would be much nicer if we could somehow go back from JobGraph to
StreamGraph or at least have an easy way to access source/sink UDFS.

What do you think?

Cheers,
Gyula




--
Best Regards

Jeff Zhang





Re: [DISCUSS] Does removing deprecated interfaces needs another FLIP

2020-02-07 Thread Aljoscha Krettek

I would say a ML discussion or even a Jira issue is enough because

a) the methods are already deprecated
b) the methods are @PublicEvolving, which I don't consider a super 
strong guarantee to users (we still shouldn't remove them lightly, but 
we can if we have to...)


Best,
Aljoscha

On 07.02.20 04:40, Kurt Young wrote:

Hi dev,

Currently I want to remove some already deprecated methods from
TableEnvironment which annotated with @PublicEnvolving. And I also created
a discussion thread [1] to both dev and user mailing lists to gather
feedback on that. But I didn't find any matching rule in Flink bylaw [2] to
follow. Since this is definitely a API breaking change, but we already
voted for that back in the FLIP which deprecated these methods.

I'm not sure about how to proceed for now. Looks like I have 2 choices:

1. If no one raise any objections in discuss thread in like 72 hours, I
will create a jira to start working on it.
2. Since this is a API breaking change, I need to open another FLIP to tell
that I want to remove these deprecated methods. This seems a little
redundant with the first FLIP which deprecate the methods.

What do you think?

Best,
Kurt

[1]
https://lists.apache.org/thread.html/r98af66feb531ce9e6b94914e44391609cad857e16ea84db5357c1980%40%3Cdev.flink.apache.org%3E
[2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws



Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-07 Thread Becket Qin
Thanks everyone for voting. The voting result is following:

+1 (Binding): 5 (Yu, Jark, Zhijiang, Piotr, Becket)

+1 (Non-binding): 4 (Jingsong, Danny, Wei, Guowei)

-1: 0

FLIP-27 has passed.

Thanks,

Jiangjie (Becket) Qin

On Tue, Feb 4, 2020 at 3:42 PM Piotr Nowojski  wrote:

> +1 (binding)
>
> Piotrek
>
> > On 4 Feb 2020, at 05:39, Zhijiang 
> wrote:
> >
> > +1 (binding), we are waiting too long for it. :)
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:Guowei Ma 
> > Send Time:2020 Feb. 4 (Tue.) 12:34
> > To:dev 
> > Subject:Re: [VOTE] FLIP-27 - Refactor Source Interface
> >
> > +1 (non-binding), thanks for driving.
> >
> > Best,
> > Guowei
> >
> >
> > Jingsong Li  于2020年2月4日周二 上午11:20写道:
> >
> >> +1 (non-binding), thanks for driving.
> >> FLIP-27 is the basis of a lot of follow-up work.
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Tue, Feb 4, 2020 at 10:26 AM Jark Wu  wrote:
> >>
> >>> Thanks for driving this Becket!
> >>>
> >>> +1 from my side.
> >>>
> >>> Cheers,
> >>> Jark
> >>>
> >>> On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:
> >>>
>  +1, thanks for the efforts Becket!
> 
>  Best Regards,
>  Yu
> 
> 
>  On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:
> 
> > Bump up the thread.
> >
> > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
>  wrote:
> >
> >> Hi Folks,
> >>
> >> I'd like to resume the voting thread for FlIP-27.
> >>
> >> Please note that the FLIP wiki has been updated to reflect the
> >> latest
> >> discussions in the discussion thread.
> >>
> >> To avoid confusion, I'll only count the votes casted after this
> >>> point.
> >>
> >> FLIP wiki:
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> >> %3A+Refactor+Source+Interface
> >>
> >> Discussion thread:
> >>
> >>
> >
> 
> >>>
> >>
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> >>
> >> The vote will last for at least 72 hours, following the consensus
>  voting
> >> process.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Thu, Dec 5, 2019 at 10:31 AM jincheng sun <
> >>> sunjincheng...@gmail.com
> >
> >> wrote:
> >>
> >>> +1 (binding), and looking forward to seeing the new interface in
> >> the
> >>> master.
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> Becket Qin  于2019年12月5日周四 上午8:05写道:
> >>>
>  Hi all,
> 
>  I would like to start the vote for FLIP-27 which proposes to
> > introduce a
>  new Source connector interface to address a few problems in the
> > existing
>  source connector. The main goals of the the FLIP are following:
> 
>  1. Unify the Source interface in Flink for batch and stream.
>  2. Significantly reduce the work for developers to develop new
>  source
>  connectors.
>  3. Provide a common abstraction for all the sources, as well as
> >> a
> >>> mechanism
>  to allow source subtasks to coordinate among themselves.
> 
>  The vote will last for at least 72 hours, following the
> >> consensus
> > voting
>  process.
> 
>  FLIP wiki:
> 
> 
> >>>
> >
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> 
>  Discussion thread:
> 
> 
> >>>
> >
> 
> >>>
> >>
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> 
>  Thanks,
> 
>  Jiangjie (Becket) Qin
> 
> >>>
> >>
> >
> 
> >>>
> >>
> >>
> >> --
> >> Best, Jingsong Lee
> >>
> >
>
>


[jira] [Created] (FLINK-15947) Finish moving scala expression DSL to flink-table-api-scala

2020-02-07 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-15947:


 Summary: Finish moving scala expression DSL to 
flink-table-api-scala
 Key: FLINK-15947
 URL: https://issues.apache.org/jira/browse/FLINK-15947
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0


FLINK-13045 performed the first step of moving implicit conversions to a long 
term package object. It also added release notes so that users have time to 
adapt to the changes.

Now that it's two releases since that time,  we can finish moving all the 
intended conversions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)