date:20200924

[jira] [Created] (FLINK-19390) mysql jdbc join error java.lang.ClassCastException: java.math.BigInteger cannot be cast to java.lang.Long

2020-09-24 Thread badqiu (Jira)

badqiu created FLINK-19390:
--

 Summary: mysql jdbc join error java.lang.ClassCastException: 
java.math.BigInteger cannot be cast to java.lang.Long
 Key: FLINK-19390
 URL: https://issues.apache.org/jira/browse/FLINK-19390
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.11.1
Reporter: badqiu


join on by   t1.order_id = t2.order_id.

and order_id dataType: *bigint*.  will error.

 

 
Caused by: java.lang.RuntimeException: java.lang.ClassCastException: 
java.math.BigInteger cannot be cast to java.lang.Long
at 
org.apache.flink.table.runtime.operators.join.lookup.LookupJoinWithCalcRunner$CalcCollector.collect(LookupJoinWithCalcRunner.java:84)
at 
org.apache.flink.table.runtime.operators.join.lookup.LookupJoinWithCalcRunner$CalcCollector.collect(LookupJoinWithCalcRunner.java:71)
at 
org.apache.flink.table.functions.TableFunction.collect(TableFunction.java:203)
at 
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction.eval(JdbcRowDataLookupFunction.java:162)
at LookupFunction$43.flatMap(Unknown Source)
at 
org.apache.flink.table.runtime.operators.join.lookup.LookupJoinRunner.processElement(LookupJoinRunner.java:82)
at 
org.apache.flink.table.runtime.operators.join.lookup.LookupJoinRunner.processElement(LookupJoinRunner.java:36)
at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:717)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at StreamExecCalc$40.processElement(Unknown Source)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:717)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at 
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordsWithTimestamps(AbstractFetcher.java:352)
at 
org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:185)
at 
org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)
Caused by: java.lang.ClassCastException: java.math.BigInteger cannot be cast to 
java.lang.Long
at 
org.apache.flink.table.data.GenericRowData.getLong(GenericRowData.java:154)
at TableCalcMapFunction$47.flatMap(Unknown Source)
at 
org.apache.flink.table.runtime.operators.join.lookup.LookupJoinWithCalcRunner$CalcCollector.collect(LookupJoinWithCalcRunner.java:82)
... 27 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19391) Deadlock during partition update

2020-09-24 Thread Arvid Heise (Jira)

Arvid Heise created FLINK-19391:
---

 Summary: Deadlock during partition update
 Key: FLINK-19391
 URL: https://issues.apache.org/jira/browse/FLINK-19391
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.12.0
Reporter: Arvid Heise
Assignee: Arvid Heise


Master cron job is currently failing because of a deadlock introduced in 
FLINK-19026.
{noformat}
2020-09-23T21:50:39.2444176Z Found one Java-level deadlock:
2020-09-23T21:50:39.2444633Z =
2020-09-23T21:50:39.2445001Z "Temp writer":
2020-09-23T21:50:39.2445484Z   waiting to lock monitor 0x7f4e14004ca8 
(object 0x86501948, a java.lang.Object),
2020-09-23T21:50:39.2446418Z   which is held by 
"flink-akka.actor.default-dispatcher-2"
2020-09-23T21:50:39.2447193Z "flink-akka.actor.default-dispatcher-2":
2020-09-23T21:50:39.2447903Z   waiting to lock monitor 0x7f4e14004bf8 
(object 0x86501930, a 
org.apache.flink.runtime.io.network.partition.PrioritizedDeque),
2020-09-23T21:50:39.2448703Z   which is held by "Temp writer"
2020-09-23T21:50:39.2448965Z 
2020-09-23T21:50:39.2449384Z Java stack information for the threads listed 
above:
2020-09-23T21:50:39.2449900Z ===
2020-09-23T21:50:39.2450325Z "Temp writer":
2020-09-23T21:50:39.2451050Zat 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.checkAndWaitForSubpartitionView(LocalInputChannel.java:244)
2020-09-23T21:50:39.2452264Z- waiting to lock <0x86501948> (a 
java.lang.Object)
2020-09-23T21:50:39.2453183Zat 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:205)
2020-09-23T21:50:39.2454173Zat 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:642)
2020-09-23T21:50:39.2455422Z- locked <0x86501930> (a 
org.apache.flink.runtime.io.network.partition.PrioritizedDeque)
2020-09-23T21:50:39.2456310Zat 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:619)
2020-09-23T21:50:39.2457311Zat 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNext(SingleInputGate.java:602)
2020-09-23T21:50:39.2458205Zat 
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.getNext(InputGateWithMetrics.java:105)
2020-09-23T21:50:39.2459258Zat 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:100)
2020-09-23T21:50:39.2460465Zat 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
2020-09-23T21:50:39.2461344Zat 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
2020-09-23T21:50:39.2462164Zat 
org.apache.flink.runtime.operators.TempBarrier$TempWritingThread.run(TempBarrier.java:178)
2020-09-23T21:50:39.2463418Z "flink-akka.actor.default-dispatcher-2":
2020-09-23T21:50:39.2464109Zat 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.queueChannel(SingleInputGate.java:825)
2020-09-23T21:50:39.2465336Z- waiting to lock <0x86501930> (a 
org.apache.flink.runtime.io.network.partition.PrioritizedDeque)
2020-09-23T21:50:39.2466228Zat 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.notifyChannelNonEmpty(SingleInputGate.java:791)
2020-09-23T21:50:39.2467222Zat 
org.apache.flink.runtime.io.network.partition.consumer.InputChannel.notifyChannelNonEmpty(InputChannel.java:154)
2020-09-23T21:50:39.2468212Zat 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.notifyDataAvailable(LocalInputChannel.java:236)
2020-09-23T21:50:39.2469577Zat 
org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:76)
2020-09-23T21:50:39.2470607Zat 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:133)
2020-09-23T21:50:39.2471765Z- locked <0x86501948> (a 
java.lang.Object)
2020-09-23T21:50:39.2472685Zat 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.updateInputChannel(SingleInputGate.java:489)
2020-09-23T21:50:39.2473727Z- locked <0x86532500> (a 
java.lang.Object)
2020-09-23T21:50:39.2474449Zat 
org.apache.flink.runtime.io.network.NettyShuffleEnvironment.updatePartitionInfo(NettyShuffleEnvironment.java:279)
2020-09-23T21:50:39.2475394Zat 
org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$updatePartitions$12(TaskExecutor.java:758)
2020-09-23T21:50:39.2476235Zat 
org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$406/1860601696.run(Unknown
 Source)
2020-09-23T21:50:39.2476973Za

[jira] [Created] (FLINK-19392) Detect the execution mode based on the sources in the job.

2020-09-24 Thread Kostas Kloudas (Jira)

Kostas Kloudas created FLINK-19392:
--

 Summary: Detect the execution mode based on the sources in the job.
 Key: FLINK-19392
 URL: https://issues.apache.org/jira/browse/FLINK-19392
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Affects Versions: 1.12.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas


As part of FLIP-134, we introduce the option {{execution.runtime-mode}} which 
can take the values: BATCH, STREAMING, and AUTOMATIC. 

In case of the latter, the system will scan the sources and detect if the job 
is to be execute either using batch scheduling or streaming. If all the sources 
are bounded, the system will go with BATCH, if at least one is unbounded, then 
the system will go with STREAMING.

This issue targets introducing the logic of detecting the runtime mode based on 
the sources without exposing it yet to the user. The latter will happen in a 
follow-up issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-24 Thread Rui Li

+1

On Thu, Sep 24, 2020 at 2:59 PM Timo Walther  wrote:

> +1
>
> On 24.09.20 06:54, Jark Wu wrote:
> > +1 to move it there.
> >
> > On Thu, 24 Sep 2020 at 12:16, Jingsong Li 
> wrote:
> >
> >> Hi devs and users:
> >>
> >> After the 1.11 release, I heard some voices recently: How can't Hive's
> >> documents be found in the "Table & SQL Connectors".
> >>
> >> Actually, Hive's documents are in the "Table API & SQL". Since the
> "Table &
> >> SQL Connectors" document was extracted separately, Hive is a little out
> of
> >> place.
> >> And Hive's code is also in "flink-connector-hive", which should be a
> >> connector.
> >> Hive also includes the concept of HiveCatalog. Is catalog a part of the
> >> connector? I think so.
> >>
> >> What do you think? If you don't object, I think we can move it.
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >
>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread Kurt Young

Thanks Jingsong for driving this, this is indeed a useful feature and lots
of users are asking for it.

For setting a fixed source parallelism, I'm wondering whether this is
necessary. For kafka,
I can imagine users would expect Flink will use the number of partitions as
the parallelism. If it's too
large, one can use the max parallelism to make it smaller.
But for ES, which doesn't have ability to decide a reasonable parallelism
on its own, it might make sense
to introduce a user specified parallelism for such table source.

So I think it would be better to reorganize the document a little bit, to
explain the connectors one by one. Briefly
introduce use cases and what kind of options are needed in your opinion.

Regarding the interface `DataStreamScanProvider`, a concrete example would
help the discussion. What kind
of scenarios do you want to support? And what kind of connectors need such
an interface?

Best,
Kurt

On Wed, Sep 23, 2020 at 9:30 PM admin <17626017...@163.com> wrote:

> +1,it’s a good news
>
> > 2020年9月23日 下午6:22，Jingsong Li  写道：
> >
> > Hi all,
> >
> > I'd like to start a discussion about improving the new TableSource and
> > TableSink interfaces.
> >
> > Most connectors have been migrated to FLIP-95, but there are still the
> > Filesystem and Hive that have not been migrated. They have some
> > requirements on table connector API. And users also have some additional
> > requirements：
> > - Some connectors have the ability to infer parallelism, the parallelism
> is
> > good for most cases.
> > - Users have customized parallelism configuration requirements for source
> > and sink.
> > - The connectors need to use topology to build their source/sink instead
> of
> > a single function. Like JIRA[1], Partition Commit feature and File
> > Compaction feature.
> >
> > Details are in [2].
> >
> > [1]https://issues.apache.org/jira/browse/FLINK-18674
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> >
> > Best,
> > Jingsong
>
>

[VOTE] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-24 Thread Timo Walther


Hi all,

after the discussion in [1], I would like to open a second voting thread 
for FLIP-136 [2] which covers different topic to improve the 
back-and-forth communication between DataStream API and Table API.


The vote will be open until September 29th (72h + weekend), unless there 
is an objection or not enough votes.


Regards,
Timo

[1] 
https://lists.apache.org/thread.html/r62b47ec6812ddbafed65ac79e31ca0305099893559f1e5a991dee550%40%3Cdev.flink.apache.org%3E
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread Flavio Pompermaier

Hi Kurt, in the past we had a very interesting use case in this regard: our
customer (oracle) db was quite big and running too many queries in parallel
was too heavy and it was causing the queries to fail.
So we had to limit the source parallelism to 2 threads. After the fetching
of the data the other operators could use the max parallelism as usual..

Best,
Flavio

On Thu, Sep 24, 2020 at 9:59 AM Kurt Young  wrote:

> Thanks Jingsong for driving this, this is indeed a useful feature and lots
> of users are asking for it.
>
> For setting a fixed source parallelism, I'm wondering whether this is
> necessary. For kafka,
> I can imagine users would expect Flink will use the number of partitions as
> the parallelism. If it's too
> large, one can use the max parallelism to make it smaller.
> But for ES, which doesn't have ability to decide a reasonable parallelism
> on its own, it might make sense
> to introduce a user specified parallelism for such table source.
>
> So I think it would be better to reorganize the document a little bit, to
> explain the connectors one by one. Briefly
> introduce use cases and what kind of options are needed in your opinion.
>
> Regarding the interface `DataStreamScanProvider`, a concrete example would
> help the discussion. What kind
> of scenarios do you want to support? And what kind of connectors need such
> an interface?
>
> Best,
> Kurt
>
>
> On Wed, Sep 23, 2020 at 9:30 PM admin <17626017...@163.com> wrote:
>
> > +1,it’s a good news
> >
> > > 2020年9月23日 下午6:22，Jingsong Li  写道：
> > >
> > > Hi all,
> > >
> > > I'd like to start a discussion about improving the new TableSource and
> > > TableSink interfaces.
> > >
> > > Most connectors have been migrated to FLIP-95, but there are still the
> > > Filesystem and Hive that have not been migrated. They have some
> > > requirements on table connector API. And users also have some
> additional
> > > requirements：
> > > - Some connectors have the ability to infer parallelism, the
> parallelism
> > is
> > > good for most cases.
> > > - Users have customized parallelism configuration requirements for
> source
> > > and sink.
> > > - The connectors need to use topology to build their source/sink
> instead
> > of
> > > a single function. Like JIRA[1], Partition Commit feature and File
> > > Compaction feature.
> > >
> > > Details are in [2].
> > >
> > > [1]https://issues.apache.org/jira/browse/FLINK-18674
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> > >
> > > Best,
> > > Jingsong
> >
> >

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread Kurt Young

Yeah, JDBC is definitely a popular use case we should consider.

Best,
Kurt


On Thu, Sep 24, 2020 at 4:11 PM Flavio Pompermaier 
wrote:

> Hi Kurt, in the past we had a very interesting use case in this regard: our
> customer (oracle) db was quite big and running too many queries in parallel
> was too heavy and it was causing the queries to fail.
> So we had to limit the source parallelism to 2 threads. After the fetching
> of the data the other operators could use the max parallelism as usual..
>
> Best,
> Flavio
>
> On Thu, Sep 24, 2020 at 9:59 AM Kurt Young  wrote:
>
> > Thanks Jingsong for driving this, this is indeed a useful feature and
> lots
> > of users are asking for it.
> >
> > For setting a fixed source parallelism, I'm wondering whether this is
> > necessary. For kafka,
> > I can imagine users would expect Flink will use the number of partitions
> as
> > the parallelism. If it's too
> > large, one can use the max parallelism to make it smaller.
> > But for ES, which doesn't have ability to decide a reasonable parallelism
> > on its own, it might make sense
> > to introduce a user specified parallelism for such table source.
> >
> > So I think it would be better to reorganize the document a little bit, to
> > explain the connectors one by one. Briefly
> > introduce use cases and what kind of options are needed in your opinion.
> >
> > Regarding the interface `DataStreamScanProvider`, a concrete example
> would
> > help the discussion. What kind
> > of scenarios do you want to support? And what kind of connectors need
> such
> > an interface?
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Sep 23, 2020 at 9:30 PM admin <17626017...@163.com> wrote:
> >
> > > +1,it’s a good news
> > >
> > > > 2020年9月23日 下午6:22，Jingsong Li  写道：
> > > >
> > > > Hi all,
> > > >
> > > > I'd like to start a discussion about improving the new TableSource
> and
> > > > TableSink interfaces.
> > > >
> > > > Most connectors have been migrated to FLIP-95, but there are still
> the
> > > > Filesystem and Hive that have not been migrated. They have some
> > > > requirements on table connector API. And users also have some
> > additional
> > > > requirements：
> > > > - Some connectors have the ability to infer parallelism, the
> > parallelism
> > > is
> > > > good for most cases.
> > > > - Users have customized parallelism configuration requirements for
> > source
> > > > and sink.
> > > > - The connectors need to use topology to build their source/sink
> > instead
> > > of
> > > > a single function. Like JIRA[1], Partition Commit feature and File
> > > > Compaction feature.
> > > >
> > > > Details are in [2].
> > > >
> > > > [1]https://issues.apache.org/jira/browse/FLINK-18674
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> > > >
> > > > Best,
> > > > Jingsong
> > >
> > >
>

Re: [VOTE] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-24 Thread Kurt Young

+1 (binding)

Best,
Kurt


On Thu, Sep 24, 2020 at 4:01 PM Timo Walther  wrote:

> Hi all,
>
> after the discussion in [1], I would like to open a second voting thread
> for FLIP-136 [2] which covers different topic to improve the
> back-and-forth communication between DataStream API and Table API.
>
> The vote will be open until September 29th (72h + weekend), unless there
> is an objection or not enough votes.
>
> Regards,
> Timo
>
> [1]
>
> https://lists.apache.org/thread.html/r62b47ec6812ddbafed65ac79e31ca0305099893559f1e5a991dee550%40%3Cdev.flink.apache.org%3E
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API
>
>

Re: [VOTE] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-24 Thread Jingsong Li

+1 (binding)

Best,
Jingsong

On Thu, Sep 24, 2020 at 4:18 PM Kurt Young  wrote:

> +1 (binding)
>
> Best,
> Kurt
>
>
> On Thu, Sep 24, 2020 at 4:01 PM Timo Walther  wrote:
>
> > Hi all,
> >
> > after the discussion in [1], I would like to open a second voting thread
> > for FLIP-136 [2] which covers different topic to improve the
> > back-and-forth communication between DataStream API and Table API.
> >
> > The vote will be open until September 29th (72h + weekend), unless there
> > is an objection or not enough votes.
> >
> > Regards,
> > Timo
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/r62b47ec6812ddbafed65ac79e31ca0305099893559f1e5a991dee550%40%3Cdev.flink.apache.org%3E
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API
> >
> >
>


-- 
Best, Jingsong Lee

Re: [VOTE] Apache Flink Stateful Functions 2.2.0, release candidate #2

2020-09-24 Thread Tzu-Li (Gordon) Tai

FYI - the PR for the release announcement has just been drafted:
https://github.com/apache/flink-web/pull/379
Any comments there is also highly appreciated!

On Thu, Sep 24, 2020 at 5:47 AM Seth Wiesman  wrote:

> +1 (non binding)
>
> - Verified signatures and checksums
> - Checked licenses and notices
> - Clean build from source
> - Executed all end to end tests
> - Deployed to K8s with Go SDK (no sdk changes required btw :))
> - Simulated TM and remote function failure and verified recovery
> - Checked updated docs
>
> Seth
>
> On Wed, Sep 23, 2020 at 9:55 AM Igal Shilman  wrote:
>
> > +1 (non binding)
> >
> > - Verified the signatures and the checksums
> > - Built with JDK11 and JDK8
> > - Verified that the source distribution does not contain any binary data.
> > - Run e2e tests.
> > - Run few examples via docker-compose
> > - Deployed to Kubernetes with checkpointing to S3, with failure
> scenarios:
> > -- Killing a TM
> > -- Killing a remote function
> > -- Killing all the remote functions.
> >
> > Thanks,
> > Igal.
> >
> > On Wed, Sep 23, 2020 at 10:29 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #2 for the version
> 2.2.0
> > of
> > > Apache Flink Stateful Functions,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > ***Testing Guideline***
> > >
> > > You can find here [1] a page in the project wiki on instructions for
> > > testing.
> > > To cast a vote, it is not necessary to perform all listed checks,
> > > but please mention which checks you have performed when voting.
> > >
> > > ***Release Overview***
> > >
> > > As an overview, the release consists of the following:
> > > a) Stateful Functions canonical source distribution, to be deployed to
> > the
> > > release repository at dist.apache.org
> > > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > >
> > > ***Staging Areas to Review***
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a) and b) can be found in the corresponding dev
> > > repository at dist.apache.org [2]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> > >
> > > All artifacts are signed with the key
> > > 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> > >
> > > Other links for your review:
> > > * JIRA release notes [5]
> > > * source code tag "release-2.2.0-rc2" [6]
> > >
> > > ***Vote Duration***
> > >
> > > The only changes since the last RC were the following:
> > >
> > >- [FLINK-19327][k8s] Bump JobManager heap size to 1 GB: config
> change
> > in
> > >example helm charts (non-blocking)
> > >- [FLINK-19329] FunctionGroupOperator#dispose() might throw NPE
> during
> > >an unclean shutdown (non-blocking, as it doesn't affect recovery or
> > >execution)
> > >- [FLINK-19330][core] Move intialization logic to open() instead
> > >initializeState (blocking, but was a very simple fix that was
> > thoroughly
> > >tested in previous RC).
> > >- [hotfix] Fixed out-of-date NOTICE entries caused by Flink version
> > >upgrade (blocking)
> > >
> > > Since the changes are fairly minimal and had been tested while fixing
> the
> > > previous RC,
> > > I'd like to propose a slightly shorter voting period of 48 hours for
> this
> > > RC.
> > > If there are objections for this, please let me know.
> > >
> > > The voting time will run to at least until 25 Sep., 4pm UTC.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > Thanks,
> > > Gordon
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Stateful+Functions+Release
> > > [2]
> > https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.2.0-rc2/
> > > [3]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1399/
> > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [5]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12348350
> > > [6]
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=4420c0382e6ee535daeda2b572ab8eea0af8a614
> > >
> >
>

[jira] [Created] (FLINK-19393) Translate the 'SQL Hints' page of 'Table API & SQL' into Chinese

2020-09-24 Thread Roc Marshal (Jira)

Roc Marshal created FLINK-19393:
---

 Summary: Translate the 'SQL Hints' page of 'Table API & SQL' into 
Chinese
 Key: FLINK-19393
 URL: https://issues.apache.org/jira/browse/FLINK-19393
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.11.2
Reporter: Roc Marshal


File localtion is: flink/docs/dev/table/sql/hints.md
The link of this page is: 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/table/sql/hints.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19394) Translate the 'Monitoring Checkpointing' page of 'Debugging & Monitoring' into Chinese

2020-09-24 Thread Roc Marshal (Jira)

Roc Marshal created FLINK-19394:
---

 Summary: Translate the 'Monitoring Checkpointing' page of 
'Debugging & Monitoring' into Chinese
 Key: FLINK-19394
 URL: https://issues.apache.org/jira/browse/FLINK-19394
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.11.2
Reporter: Roc Marshal


The file location: flink/docs/monitoring/checkpoint_monitoring.md
The link of the page: 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/monitoring/checkpoint_monitoring.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-24 Thread Jark Wu

+1 (binding)

Best,
Jark

On Thu, 24 Sep 2020 at 16:22, Jingsong Li  wrote:

> +1 (binding)
>
> Best,
> Jingsong
>
> On Thu, Sep 24, 2020 at 4:18 PM Kurt Young  wrote:
>
> > +1 (binding)
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Sep 24, 2020 at 4:01 PM Timo Walther  wrote:
> >
> > > Hi all,
> > >
> > > after the discussion in [1], I would like to open a second voting
> thread
> > > for FLIP-136 [2] which covers different topic to improve the
> > > back-and-forth communication between DataStream API and Table API.
> > >
> > > The vote will be open until September 29th (72h + weekend), unless
> there
> > > is an objection or not enough votes.
> > >
> > > Regards,
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/r62b47ec6812ddbafed65ac79e31ca0305099893559f1e5a991dee550%40%3Cdev.flink.apache.org%3E
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API
> > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

[jira] [Created] (FLINK-19395) Replace SqlConversionException with either TableException or ValidationException

2020-09-24 Thread Timo Walther (Jira)

Timo Walther created FLINK-19395:


 Summary: Replace SqlConversionException with either TableException 
or ValidationException
 Key: FLINK-19395
 URL: https://issues.apache.org/jira/browse/FLINK-19395
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner, Table SQL / Planner
Reporter: Timo Walther


We should avoid creating too many different exceptions without reason. 
`SqlConversionException` is not necessary and can be replaced by the two 
commonly used exceptions for TableException or ValidationException. Currently, 
we throw all three in `SqlToOperation` converters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19396) Fix properties type cast error

2020-09-24 Thread zhangmeng (Jira)

zhangmeng created FLINK-19396:
-

 Summary: Fix properties type cast error
 Key: FLINK-19396
 URL: https://issues.apache.org/jira/browse/FLINK-19396
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.11.2
Reporter: zhangmeng


Properties extends Hashtable ，The key and value of the 
Properties class support object objects. If the user's value is a boolean type, 
this code "(String) k, (String) v" will occur an error.

such as : java.lang.ClassCastException: java.lang.Boolean cannot be cast to 
java.lang.String



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[VOTE] FLIP-143: Unified Sink API

2020-09-24 Thread Guowei Ma

Hi, all

After the discussion in [1], I would like to open a voting thread for
FLIP-143 [2], which proposes a unified sink api.

The vote will be open until September 29th (72h + weekend), unless there is
an objection or not enough votes.

[1]
https://lists.apache.org/thread.html/rf09dfeeaf35da5ee98afe559b5a6e955c9f03ade0262727f6b5c4c1e%40%3Cdev.flink.apache.org%3E
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API

Best,
Guowei

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread Jingsong Li

Thanks Kurt and Flavio for your feedback.

To Kurt:

> Briefly introduce use cases and what kind of options are needed in your
opinion.

In the "Choose Scan Parallelism" chapter:
- I explained the user cases
- I adjusted the relationship to make user specified parallelism more
convenient

To Flavio:

Yes, you can configure `scan.infer-parallelism.max` or directly use
`scan.parallelism`.

To Kurt:

> Regarding the interface `DataStreamScanProvider`, a concrete example would
help the discussion.

>From the user feedback in [1]. There are two users have similar following
feedback (CC: liu shouwei):
(It is from user-zh, Translate to English)

Briefly introduce the background. One of the tasks of our group is that
users write SQL on the page. We are responsible for converting SQL
processing into Flink jobs and running them on our platform. The conversion
process depends on our SQL SDK.
Let me give you a few examples that we often use and feel that the new API
1.11 is not easy to implement:
1. We now have a specific Kafka data format. One Kafka data will be
converted into n (n is a positive integer) row data. Our approach is to add
a process / flatmap phase to emit datastream to deal with this situation,
which is transparent to users.
2. At present, we have encapsulated some of our own sinks. We will add a
process / filter before the sink to perform buffer pool / micro batch /
data filtering functions.
3. Adjust or specify the source / sink parallelism to the user specified
value. We also do this on the datastream level.
4. For some special source sinks, they will be combined with keyby
operations (transparent to users). We also do this on the datastream level.

For example, in question 2 above, we can implement it in the sinkfunction,
but I personally think it may not be ideal in design. When designing and
arranging functions / operators, I am used to following the principle of
"single responsibility of operators". This is why I split multiple process
/ filter operators in front of sink functions instead of coupling these
functions to sink functions. On the other hand, without datastream, the
cost of migrating to the new API is relatively higher. Or, we have some
special reasons. When operators are arranged, we will modify the task chain
strategy. At this time, the flexibility of datastream is essential.

[1]https://issues.apache.org/jira/browse/FLINK-18674

Best,
Jingsong

On Thu, Sep 24, 2020 at 4:15 PM Kurt Young  wrote:

> Yeah, JDBC is definitely a popular use case we should consider.
>
> Best,
> Kurt
>
>
> On Thu, Sep 24, 2020 at 4:11 PM Flavio Pompermaier 
> wrote:
>
> > Hi Kurt, in the past we had a very interesting use case in this regard:
> our
> > customer (oracle) db was quite big and running too many queries in
> parallel
> > was too heavy and it was causing the queries to fail.
> > So we had to limit the source parallelism to 2 threads. After the
> fetching
> > of the data the other operators could use the max parallelism as usual..
> >
> > Best,
> > Flavio
> >
> > On Thu, Sep 24, 2020 at 9:59 AM Kurt Young  wrote:
> >
> > > Thanks Jingsong for driving this, this is indeed a useful feature and
> > lots
> > > of users are asking for it.
> > >
> > > For setting a fixed source parallelism, I'm wondering whether this is
> > > necessary. For kafka,
> > > I can imagine users would expect Flink will use the number of
> partitions
> > as
> > > the parallelism. If it's too
> > > large, one can use the max parallelism to make it smaller.
> > > But for ES, which doesn't have ability to decide a reasonable
> parallelism
> > > on its own, it might make sense
> > > to introduce a user specified parallelism for such table source.
> > >
> > > So I think it would be better to reorganize the document a little bit,
> to
> > > explain the connectors one by one. Briefly
> > > introduce use cases and what kind of options are needed in your
> opinion.
> > >
> > > Regarding the interface `DataStreamScanProvider`, a concrete example
> > would
> > > help the discussion. What kind
> > > of scenarios do you want to support? And what kind of connectors need
> > such
> > > an interface?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Sep 23, 2020 at 9:30 PM admin <17626017...@163.com> wrote:
> > >
> > > > +1,it’s a good news
> > > >
> > > > > 2020年9月23日 下午6:22，Jingsong Li  写道：
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion about improving the new TableSource
> > and
> > > > > TableSink interfaces.
> > > > >
> > > > > Most connectors have been migrated to FLIP-95, but there are still
> > the
> > > > > Filesystem and Hive that have not been migrated. They have some
> > > > > requirements on table connector API. And users also have some
> > > additional
> > > > > requirements：
> > > > > - Some connectors have the ability to infer parallelism, the
> > > parallelism
> > > > is
> > > > > good for most cases.
> > > > > - Users have customized parallelism configuration requirements for
> > > sour

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread Benchao Li

Hi Jingsong,

Thanks for preparing this FLIP.
WRT ParallelismProvider, it looks good to me.

Kurt Young  于2020年9月24日周四 下午4:14写道：

> Yeah, JDBC is definitely a popular use case we should consider.
>
> Best,
> Kurt
>
>
> On Thu, Sep 24, 2020 at 4:11 PM Flavio Pompermaier 
> wrote:
>
> > Hi Kurt, in the past we had a very interesting use case in this regard:
> our
> > customer (oracle) db was quite big and running too many queries in
> parallel
> > was too heavy and it was causing the queries to fail.
> > So we had to limit the source parallelism to 2 threads. After the
> fetching
> > of the data the other operators could use the max parallelism as usual..
> >
> > Best,
> > Flavio
> >
> > On Thu, Sep 24, 2020 at 9:59 AM Kurt Young  wrote:
> >
> > > Thanks Jingsong for driving this, this is indeed a useful feature and
> > lots
> > > of users are asking for it.
> > >
> > > For setting a fixed source parallelism, I'm wondering whether this is
> > > necessary. For kafka,
> > > I can imagine users would expect Flink will use the number of
> partitions
> > as
> > > the parallelism. If it's too
> > > large, one can use the max parallelism to make it smaller.
> > > But for ES, which doesn't have ability to decide a reasonable
> parallelism
> > > on its own, it might make sense
> > > to introduce a user specified parallelism for such table source.
> > >
> > > So I think it would be better to reorganize the document a little bit,
> to
> > > explain the connectors one by one. Briefly
> > > introduce use cases and what kind of options are needed in your
> opinion.
> > >
> > > Regarding the interface `DataStreamScanProvider`, a concrete example
> > would
> > > help the discussion. What kind
> > > of scenarios do you want to support? And what kind of connectors need
> > such
> > > an interface?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Sep 23, 2020 at 9:30 PM admin <17626017...@163.com> wrote:
> > >
> > > > +1,it’s a good news
> > > >
> > > > > 2020年9月23日 下午6:22，Jingsong Li  写道：
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion about improving the new TableSource
> > and
> > > > > TableSink interfaces.
> > > > >
> > > > > Most connectors have been migrated to FLIP-95, but there are still
> > the
> > > > > Filesystem and Hive that have not been migrated. They have some
> > > > > requirements on table connector API. And users also have some
> > > additional
> > > > > requirements：
> > > > > - Some connectors have the ability to infer parallelism, the
> > > parallelism
> > > > is
> > > > > good for most cases.
> > > > > - Users have customized parallelism configuration requirements for
> > > source
> > > > > and sink.
> > > > > - The connectors need to use topology to build their source/sink
> > > instead
> > > > of
> > > > > a single function. Like JIRA[1], Partition Commit feature and File
> > > > > Compaction feature.
> > > > >
> > > > > Details are in [2].
> > > > >
> > > > > [1]https://issues.apache.org/jira/browse/FLINK-18674
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > >
> > > >
> >
>


-- 

Best,
Benchao Li

[jira] [Created] (FLINK-19397) Offering error handling for the SerializationSchema analogous to DeserializationSchema

2020-09-24 Thread Matthias (Jira)

Matthias created FLINK-19397:


 Summary: Offering error handling for the SerializationSchema 
analogous to DeserializationSchema
 Key: FLINK-19397
 URL: https://issues.apache.org/jira/browse/FLINK-19397
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Common
Reporter: Matthias


Yuval raised the issue about missing error handling when serializing objects 
into Kafka in this [ML 
thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Ignoring-invalid-values-in-KafkaSerializationSchema-td38299.html].

[FLIP-124|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988]
 already addressed this issue for the {{DeserializationSchema}}. We could 
introduce the same solution for {{SerializationSchema}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19398) Hive connector fails with IllegalAccessError if submitted as usercode

2020-09-24 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-19398:
-

 Summary: Hive connector fails with IllegalAccessError if submitted 
as usercode
 Key: FLINK-19398
 URL: https://issues.apache.org/jira/browse/FLINK-19398
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.11.2, 1.12.0
Reporter: Fabian Hueske


Using Flink's Hive connector fails if the dependency is loaded with the user 
code classloader with the following exception.


{code:java}
java.lang.IllegalAccessError: tried to access method 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.(Lorg/apache/flink/core/fs/Path;Lorg/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner;Lorg/apache/flink/streaming/api/functions/sink/filesystem/BucketFactory;Lorg/apache/flink/streaming/api/functions/sink/filesystem/BucketWriter;Lorg/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy;ILorg/apache/flink/streaming/api/functions/sink/filesystem/OutputFileConfig;)V
 from class 
org.apache.flink.streaming.api.functions.sink.filesystem.HadoopPathBasedBulkFormatBuilder
at 
org.apache.flink.streaming.api.functions.sink.filesystem.HadoopPathBasedBulkFormatBuilder.createBuckets(HadoopPathBasedBulkFormatBuilder.java:127)
 
~[flink-connector-hive_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.table.filesystem.stream.StreamingFileWriter.initializeState(StreamingFileWriter.java:81)
 ~[flink-table-blink_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:106)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528) 
~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) 
[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) 
[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
{code}

The problem is the constructor of {{Buckets}} with default visibility which is 
called from {{HadoopPathBasedBulkFormatBuilder}} . This works as long as both 
classes are loaded with the same classloader but when they are loaded in 
different classloaders, the access fails.

{{Buckets}} is loaded with the system CL because it is part of 
flink-streaming-java. 

 

To solve this issue, we should change the visibility of the {{Buckets}} 
constructor to {{public}}.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-24 Thread Seth Wiesman

+1

On Thu, Sep 24, 2020 at 2:49 AM Rui Li  wrote:

> +1
>
> On Thu, Sep 24, 2020 at 2:59 PM Timo Walther  wrote:
>
> > +1
> >
> > On 24.09.20 06:54, Jark Wu wrote:
> > > +1 to move it there.
> > >
> > > On Thu, 24 Sep 2020 at 12:16, Jingsong Li 
> > wrote:
> > >
> > >> Hi devs and users:
> > >>
> > >> After the 1.11 release, I heard some voices recently: How can't Hive's
> > >> documents be found in the "Table & SQL Connectors".
> > >>
> > >> Actually, Hive's documents are in the "Table API & SQL". Since the
> > "Table &
> > >> SQL Connectors" document was extracted separately, Hive is a little
> out
> > of
> > >> place.
> > >> And Hive's code is also in "flink-connector-hive", which should be a
> > >> connector.
> > >> Hive also includes the concept of HiveCatalog. Is catalog a part of
> the
> > >> connector? I think so.
> > >>
> > >> What do you think? If you don't object, I think we can move it.
> > >>
> > >> Best,
> > >> Jingsong Lee
> > >>
> > >
> >
> >
>
> --
> Best regards!
> Rui Li
>

Re: [VOTE] Apache Flink Stateful Functions 2.2.0, release candidate #2

2020-09-24 Thread Aljoscha Krettek


+1 (binding)

 - built from source
 - built docker image
 - verified Rust SDK works with the 2.2.0 docker image

Aljoscha

On 24.09.20 10:32, Tzu-Li (Gordon) Tai wrote:

FYI - the PR for the release announcement has just been drafted:
https://github.com/apache/flink-web/pull/379
Any comments there is also highly appreciated!

On Thu, Sep 24, 2020 at 5:47 AM Seth Wiesman  wrote:


+1 (non binding)

- Verified signatures and checksums
- Checked licenses and notices
- Clean build from source
- Executed all end to end tests
- Deployed to K8s with Go SDK (no sdk changes required btw :))
- Simulated TM and remote function failure and verified recovery
- Checked updated docs

Seth

On Wed, Sep 23, 2020 at 9:55 AM Igal Shilman  wrote:


+1 (non binding)

- Verified the signatures and the checksums
- Built with JDK11 and JDK8
- Verified that the source distribution does not contain any binary data.
- Run e2e tests.
- Run few examples via docker-compose
- Deployed to Kubernetes with checkpointing to S3, with failure

scenarios:

-- Killing a TM
-- Killing a remote function
-- Killing all the remote functions.

Thanks,
Igal.

On Wed, Sep 23, 2020 at 10:29 AM Tzu-Li (Gordon) Tai <

tzuli...@apache.org>

wrote:


Hi everyone,

Please review and vote on the release candidate #2 for the version

2.2.0

of

Apache Flink Stateful Functions,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

***Testing Guideline***

You can find here [1] a page in the project wiki on instructions for
testing.
To cast a vote, it is not necessary to perform all listed checks,
but please mention which checks you have performed when voting.

***Release Overview***

As an overview, the release consists of the following:
a) Stateful Functions canonical source distribution, to be deployed to

the

release repository at dist.apache.org
b) Stateful Functions Python SDK distributions to be deployed to PyPI
c) Maven artifacts to be deployed to the Maven Central Repository

***Staging Areas to Review***

The staging areas containing the above mentioned artifacts are as

follows,

for your review:
* All artifacts for a) and b) can be found in the corresponding dev
repository at dist.apache.org [2]
* All artifacts for c) can be found at the Apache Nexus Repository [3]

All artifacts are signed with the key
1C1E2394D3194E1944613488F320986D35C33D6A [4]

Other links for your review:
* JIRA release notes [5]
* source code tag "release-2.2.0-rc2" [6]

***Vote Duration***

The only changes since the last RC were the following:

- [FLINK-19327][k8s] Bump JobManager heap size to 1 GB: config

change

in

example helm charts (non-blocking)
- [FLINK-19329] FunctionGroupOperator#dispose() might throw NPE

during

an unclean shutdown (non-blocking, as it doesn't affect recovery or
execution)
- [FLINK-19330][core] Move intialization logic to open() instead
initializeState (blocking, but was a very simple fix that was

thoroughly

tested in previous RC).
- [hotfix] Fixed out-of-date NOTICE entries caused by Flink version
upgrade (blocking)

Since the changes are fairly minimal and had been tested while fixing

the

previous RC,
I'd like to propose a slightly shorter voting period of 48 hours for

this

RC.
If there are objections for this, please let me know.

The voting time will run to at least until 25 Sep., 4pm UTC.
It is adopted by majority approval, with at least 3 PMC affirmative

votes.


Thanks,
Gordon

[1]





https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Stateful+Functions+Release

[2]

https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.2.0-rc2/

[3]


https://repository.apache.org/content/repositories/orgapacheflink-1399/

[4] https://dist.apache.org/repos/dist/release/flink/KEYS
[5]





https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12348350

[6]





https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=4420c0382e6ee535daeda2b572ab8eea0af8a614

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread Aljoscha Krettek

Thanks for the proposal! I think the use cases that we are trying to
solve are indeed valid. However, I think we might have to take a step
back to look at what we're trying to solve and how we can solve it.

The FLIP seems to have two broader topics: 1) add "get parallelism" to
sinks/sources 2) let users write DataStream topologies for
sinks/sources. I'll treat them separately below.

I think we should not add "get parallelism" to the Table Sink API
because I think it's the wrong level of abstraction. The Table API
connectors are (or should be) more or less thin wrappers around
"physical" connectors. By "physical" I mean the underlying (mostly
DataStream API) connectors. For example, with the Kafka Connector the
Table API connector just does the configuration parsing and determines a
good (de)serialization format and then creates the underlying
FlinkKafkaConsumer/FlinkKafkaProducer.

If we wanted to add a "get parallelism" it would be in those underlying
connectors but I'm also skeptical about adding such a method there
because it is a static assignment and would preclude clever
optimizations about the parallelism of a connector at runtime. But maybe
that's thinking too much about future work so I'm open to discussion there.

Regarding the second point of letting Table connector developers use
DataStream: I think we should not do it. One of the purposes of FLIP-95
[1] was to decouple the Table API from the DataStream API for the basic
interfaces. Coupling the two too closely at that basic level will make
our live harder in the future when we want to evolve those APIs or when
we want the system to be better at choosing how to execute sources and
sinks. An example of this is actually the past of the Table API. Before
FLIP-95 we had connectors that dealt directly with DataSet and
DataStream, meaning that if users wanted their Table Sink to work in
both BATCH and STREAMING mode they had to provide two implementations.
The trend is towards unifying the sources/sinks to common interfaces
that can be used for both BATCH and STREAMING execution but, again, I
think exposing DataStream here would be a step back in the wrong direction.

I think the solution to the existing user requirement of using
DataStream sources and sinks with the Table API should be better
interoperability between the two APIs, which is being tackled right now
in FLIP-136 [2]. If FLIP-136 is not adequate for the use cases that
we're trying to solve here, maybe we should think about FLIP-136 some more.

What do you think?

Best,
Aljoscha

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API

[jira] [Created] (FLINK-19399) Add Python AsyncRequestReplyHandler docs

2020-09-24 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-19399:


 Summary: Add Python AsyncRequestReplyHandler docs
 Key: FLINK-19399
 URL: https://issues.apache.org/jira/browse/FLINK-19399
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Igal Shilman
Assignee: Igal Shilman


Document the new handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-24 Thread Yu Li

*bq. What new confusion would be introduced here?*
No *new* confusion introduced, but as mentioned at the very beginning of
the motivation ("Apache Flink's durability story is a mystery to many
users"), I thought the FLIP aims at resolving some *existing*
confusions, i.e. the durability mystery to users.

For me, I'm not 100% clear about how to write the javadoc of the
setCheckpointStorage API. Would it be like "specify where the checkpoint
data is stored"? If so, do we need to explain the fact that when a
checkpoint path is given, JM will also persist the checkpoint data to DFS?
It's true that such confusion also exists today, but would the introduction
of the new API expose it further?

IMHO we need to document the newly introduced API / classes and their
semantics clearly in the FLIP to make sure everyone is on the same page,
but if we feel such work / discussions are all details and only need to
happen at the documenting and release note phase, it's also fine to me.

And if I'm the only one who has such questions / concerns on the new
`setCheckpointStorage` API and most of others feel its semantic is sound
and clear, then please just ignore me and move on.

Thanks.

Best Regards,
Yu


On Wed, 23 Sep 2020 at 17:08, Stephan Ewen  wrote:

> I am confused now with the concerns here. This is very much from the user
> perspective (which is partially also the developer perspective which is the
> sign of an intuitive abstraction).
>
> Of course, there will be docs describing what JMCheckpointStorage and
> FsCheckpointStorage are.
> And having release notes that describe that RocksDBStateBackend("s3://...")
> now corresponds to a combination of "RocksDBBackend" and
> "FsCheckpointStorage" is also straightforward.
>
> We said to keep the old RocksDBStateBackend class and let it implement both
> interfaces such that the old code still works exactly as before.
>
> What new confusion would be introduced here?
> Understanding the difference between JMCheckpointStorage and
> FsCheckpointStorage was always necessary when one needed to understand the
> difference between MemoryStateBackend and FsStateBackend. It should be
> easier to define this after this change, because it is the only thing that
> we describe when explaining what checkpoint storage to use (rather than
> also having the choice of index structure coupled to that).
>
>
> On Wed, Sep 23, 2020 at 10:39 AM Aljoscha Krettek 
> wrote:
>
> > On 23.09.20 04:40, Yu Li wrote:
> > > To be specific, with the old API users don't need to set checkpoint
> > > storage, instead they only need to pass the checkpoint path w/o caring
> > > about the storage. The new APIs are forcing users to set the storage so
> > > they have to know the difference between different storages. It's not
> an
> > > implementation change, but an API change that users have to understand
> > and
> > > follow-up.
> >
> > I think the main point of the FLIP is to make it more obvious to users
> > what is happening.
> >
> > With current Flink, they would do a `setStateBackend(new
> > FsStateBackend())`. What the user is actually "saying" with this
> > is: I want to keep state on heap but store checkpoints in DFS. They are
> > not actually changing the "State Backend", the thing that keeps state in
> > operators, but only where state is checkpointed. The thing that is used
> > for local state storage in operators is still the "Heap Backend".
> >
> > With the proposed FLIP, a user would do a `setCheckpointStorage(new
> > FsStorage())`. Which makes it obvious that they're changing where
> > checkpoints are stored but not the actual "State Backend", which is
> > still "Heap Backend" (the default).
> >
> > I do understand Yu's point, though, that this will be confusing for
> > current Flink users. They are used to setting a "State Backend" if/when
> > they want to change the storage location. To fit the new model they
> > would have to change the call from `setStateBackend()` to
> > `setCheckpointStorage()`.
> >
> > I think we need to life with this short-term confusion because in the
> > long run the proposed split between checkpoint location and state
> > backend makes sense and will be more straightforward for users to
> > understand.
> >
> > Best,
> > Aljoscha
> >
> >
>

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-24 Thread Yu Li

And to make it clear, I'm +1 on the idea of decoupling state backends with
checkpointing. I don't have any question about making it clear that
heap/RocksDB is where we serve the routine state read/write and where to
put the checkpoint data is another story. My only concern lies in the newly
introduced setCheckpointStorage API and how we define its semantics, and
not sure whether it's due to my ignorance.

Best Regards,
Yu


On Thu, 24 Sep 2020 at 23:11, Yu Li  wrote:

> *bq. What new confusion would be introduced here?*
> No *new* confusion introduced, but as mentioned at the very beginning of
> the motivation ("Apache Flink's durability story is a mystery to many
> users"), I thought the FLIP aims at resolving some *existing*
> confusions, i.e. the durability mystery to users.
>
> For me, I'm not 100% clear about how to write the javadoc of the
> setCheckpointStorage API. Would it be like "specify where the checkpoint
> data is stored"? If so, do we need to explain the fact that when a
> checkpoint path is given, JM will also persist the checkpoint data to DFS?
> It's true that such confusion also exists today, but would the introduction
> of the new API expose it further?
>
> IMHO we need to document the newly introduced API / classes and their
> semantics clearly in the FLIP to make sure everyone is on the same page,
> but if we feel such work / discussions are all details and only need to
> happen at the documenting and release note phase, it's also fine to me.
>
> And if I'm the only one who has such questions / concerns on the new
> `setCheckpointStorage` API and most of others feel its semantic is sound
> and clear, then please just ignore me and move on.
>
> Thanks.
>
> Best Regards,
> Yu
>
>
> On Wed, 23 Sep 2020 at 17:08, Stephan Ewen  wrote:
>
>> I am confused now with the concerns here. This is very much from the user
>> perspective (which is partially also the developer perspective which is
>> the
>> sign of an intuitive abstraction).
>>
>> Of course, there will be docs describing what JMCheckpointStorage and
>> FsCheckpointStorage are.
>> And having release notes that describe that
>> RocksDBStateBackend("s3://...")
>> now corresponds to a combination of "RocksDBBackend" and
>> "FsCheckpointStorage" is also straightforward.
>>
>> We said to keep the old RocksDBStateBackend class and let it implement
>> both
>> interfaces such that the old code still works exactly as before.
>>
>> What new confusion would be introduced here?
>> Understanding the difference between JMCheckpointStorage and
>> FsCheckpointStorage was always necessary when one needed to understand the
>> difference between MemoryStateBackend and FsStateBackend. It should be
>> easier to define this after this change, because it is the only thing that
>> we describe when explaining what checkpoint storage to use (rather than
>> also having the choice of index structure coupled to that).
>>
>>
>> On Wed, Sep 23, 2020 at 10:39 AM Aljoscha Krettek 
>> wrote:
>>
>> > On 23.09.20 04:40, Yu Li wrote:
>> > > To be specific, with the old API users don't need to set checkpoint
>> > > storage, instead they only need to pass the checkpoint path w/o caring
>> > > about the storage. The new APIs are forcing users to set the storage
>> so
>> > > they have to know the difference between different storages. It's not
>> an
>> > > implementation change, but an API change that users have to understand
>> > and
>> > > follow-up.
>> >
>> > I think the main point of the FLIP is to make it more obvious to users
>> > what is happening.
>> >
>> > With current Flink, they would do a `setStateBackend(new
>> > FsStateBackend())`. What the user is actually "saying" with this
>> > is: I want to keep state on heap but store checkpoints in DFS. They are
>> > not actually changing the "State Backend", the thing that keeps state in
>> > operators, but only where state is checkpointed. The thing that is used
>> > for local state storage in operators is still the "Heap Backend".
>> >
>> > With the proposed FLIP, a user would do a `setCheckpointStorage(new
>> > FsStorage())`. Which makes it obvious that they're changing where
>> > checkpoints are stored but not the actual "State Backend", which is
>> > still "Heap Backend" (the default).
>> >
>> > I do understand Yu's point, though, that this will be confusing for
>> > current Flink users. They are used to setting a "State Backend" if/when
>> > they want to change the storage location. To fit the new model they
>> > would have to change the call from `setStateBackend()` to
>> > `setCheckpointStorage()`.
>> >
>> > I think we need to life with this short-term confusion because in the
>> > long run the proposed split between checkpoint location and state
>> > backend makes sense and will be more straightforward for users to
>> > understand.
>> >
>> > Best,
>> > Aljoscha
>> >
>> >
>>
>

[jira] [Created] (FLINK-19400) Removed unused BufferPoolOwner

2020-09-24 Thread Arvid Heise (Jira)

Arvid Heise created FLINK-19400:
---

 Summary: Removed unused BufferPoolOwner
 Key: FLINK-19400
 URL: https://issues.apache.org/jira/browse/FLINK-19400
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.12.0
Reporter: Arvid Heise


{{BufferPoolOwner}} does not have any production usages and just complicates a 
few tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19401) Job stuck in restart loop due to "Could not find registered job manager"

2020-09-24 Thread Steven Zhen Wu (Jira)

Steven Zhen Wu created FLINK-19401:
--

 Summary: Job stuck in restart loop due to "Could not find 
registered job manager"
 Key: FLINK-19401
 URL: https://issues.apache.org/jira/browse/FLINK-19401
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.1
Reporter: Steven Zhen Wu


Flink job sometimes got into a restart loop for many hours and can't recover 
until redeployed. We had some issue with Kafka that initially caused the job to 
restart. 

Below is the first of the many exceptions for "ResourceManagerException: Could 
not find registered job manager" error.
{code}
2020-09-19 00:03:31,614 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl 
[flink-akka.actor.default-dispatcher-35973]  - Requesting new slot 
[SlotRequestId{171f1df017dab3a42c032abd07908b9b}] and profile ResourceP
rofile{UNKNOWN} from resource manager.
2020-09-19 00:03:31,615 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl 
[flink-akka.actor.default-dispatcher-35973]  - Requesting new slot 
[SlotRequestId{cc7d136c4ce1f32285edd4928e3ab2e2}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-09-19 00:03:31,615 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl 
[flink-akka.actor.default-dispatcher-35973]  - Requesting new slot 
[SlotRequestId{024c8a48dafaf8f07c49dd4320d5cc94}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-09-19 00:03:31,615 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl 
[flink-akka.actor.default-dispatcher-35973]  - Requesting new slot 
[SlotRequestId{a591eda805b3081ad2767f5641d0db06}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-09-19 00:03:31,620 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   
[flink-akka.actor.default-dispatcher-35973]  - Source: k2-csevpc -> 
k2-csevpcRaw -> (vhsPlaybackEvents -> Flat Map, merchImpressionsClientLog -> 
Flat Map) (56/640) (1b0d3dd1f19890886ff373a3f08809e8) switched from SCHEDULED 
to FAILED.
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: No 
pooled slot available and request to ResourceManager for new slot failed
at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:433)
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$forward$21(FutureUtils.java:1065)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:792)
at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2153)
at 
org.apache.flink.runtime.concurrent.FutureUtils.forward(FutureUtils.java:1063)
at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager.createRootSlot(SlotSharingManager.java:155)
at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.allocateMultiTaskSlot(SchedulerImpl.java:511)
at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.allocateSharedSlot(SchedulerImpl.java:311)
at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.internalAllocateSlot(SchedulerImpl.java:160)
at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.allocateSlotInternal(SchedulerImpl.java:143)
at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.allocateSlot(SchedulerImpl.java:113)
at 
org.apache.flink.runtime.executiongraph.SlotProviderStrategy$NormalSlotProviderStrategy.allocateSlot(SlotProviderStrategy.java:115)
at 
org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator.lambda$allocateSlotsFor$0(DefaultExecutionSlotAllocator.java:104)
at 
java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:995)
at 
java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2137)
at 
org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator.allocateSlotsFor(DefaultExecutionSlotAllocator.java:102)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.allocateSlots(DefaultScheduler.java:342)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.allocateSlotsAndDeploy(DefaultScheduler.java:311)
at 
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy.allocateSlotsAndD

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-24 Thread Steven Wu

Guowei,

Thanks a lot for updating the wiki page. It looks great.

I noticed one inconsistency in the wiki with your last summary email for
GlobalCommitter interface. I think the version in the summary email is the
intended one, because rollover from previous failed commits can accumulate
a list.
CommitResult commit(GlobalCommT globalCommittable); // in the wiki
=>
CommitResult commit(List globalCommittable);  // in the
summary email

I also have a clarifying question regarding the WriterStateT. Since
IcebergWriter won't need to checkpoint any state, should we set it to *Void*
type? Since getWriterStateSerializer() returns Optional, that is clear and
we can return Optional.empty().

Thanks,
Steven

On Wed, Sep 23, 2020 at 6:59 PM Guowei Ma  wrote:

> Thanks Aljoscha for your suggestion.  I have updated FLIP. Any comments are
> welcome.
>
> Best,
> Guowei
>
>
> On Wed, Sep 23, 2020 at 4:25 PM Aljoscha Krettek 
> wrote:
>
> > Yes, that sounds good! I'll probably have some comments on the FLIP
> > about the names of generic parameters and the Javadoc but we can address
> > them later or during implementation.
> >
> > I also think that we probably need the FAIL,RETRY,SUCCESS result for
> > globalCommit() but we can also do that as a later addition.
> >
> > So I think we're good to go to update the FLIP, do any last minute
> > changes and then vote.
> >
> > Best,
> > Aljoscha
> >
> > On 23.09.20 06:13, Guowei Ma wrote:
> > > Hi, all
> > >
> > > Thank everyone very much for your ideas and suggestions. I would try to
> > > summarize again the consensus :). Correct me if I am wrong or
> > misunderstand
> > > you.
> > >
> > > ## Consensus-1
> > >
> > > 1. The motivation of the unified sink API is to decouple the sink
> > > implementation from the different runtime execution mode.
> > > 2. The initial scope of the unified sink API only covers the file
> system
> > > type, which supports the real transactions. The FLIP focuses more on
> the
> > > semantics the new sink api should support.
> > > 3. We prefer the first alternative API, which could give the framework
> a
> > > greater opportunity to optimize.
> > > 4. The `Writer` needs to add a method `prepareCommit`, which would be
> > > called from `prepareSnapshotPreBarrier`. And remove the `Flush` method.
> > > 5. The FLIP could move the `Snapshot & Drain` section in order to be
> more
> > > focused.
> > >
> > > ## Consensus-2
> > >
> > > 1. What should the “Unified Sink API” support/cover? It includes two
> > > aspects. 1. The same sink implementation would work for both the batch
> > and
> > > stream execution mode. 2. In the long run we should give the sink
> > developer
> > > the ability of building “arbitrary” topologies. But for Flink-1.12 we
> > > should be more focused on only satisfying the S3/HDFS/Iceberg sink.
> > > 2. Because the batch execution mode does not have the normal checkpoint
> > the
> > > sink developer should not depend on it any more if we want a unified
> > sink.
> > > 3. We can benefit by providing an asynchronous `Writer` version. But
> > > because the unified sink is already very complicated, we don’t add this
> > in
> > > the first version.
> > >
> > >
> > > According to these consensus I would propose the first version of the
> new
> > > sink api as follows. What do you think? Any comments are welcome.
> > >
> > > /**
> > >   * This interface lets the sink developer build a simple transactional
> > sink
> > > topology pattern, which satisfies the HDFS/S3/Iceberg sink.
> > >   * This sink topology includes one {@link Writer} + one {@link
> > Committer} +
> > > one {@link GlobalCommitter}.
> > >   * The {@link Writer} is responsible for producing the committable.
> > >   * The {@link Committer} is responsible for committing a single
> > > committables.
> > >   * The {@link GlobalCommitter} is responsible for committing an
> > aggregated
> > > committable, which we called global committables.
> > >   *
> > >   * But both the {@link Committer} and the {@link GlobalCommitter} are
> > > optional.
> > >   */
> > > interface TSink {
> > >
> > >  Writer createWriter(InitContext
> > initContext);
> > >
> > >  Writer restoreWriter(InitContext
> > initContext,
> > > List states);
> > >
> > >  Optional> createCommitter();
> > >
> > >  Optional>
> > createGlobalCommitter();
> > >
> > >  SimpleVersionedSerializer getCommittableSerializer();
> > >
> > >  Optional>
> > > getGlobalCommittableSerializer();
> > > }
> > >
> > > /**
> > >   * The {@link GlobalCommitter} is responsible for committing an
> > aggregated
> > > committable, which we called global committables.
> > >   */
> > > interface GlobalCommitter {
> > >
> > >  /**
> > >   * This method is called when restoring from a failover.
> > >   * @param globalCommittables the global committables that are
> > not
> > > committed in the previous session.
> > >   * @return the global committables that should be committed

[jira] [Created] (FLINK-19402) Metrics for measuring Flink application deployment latency in Yarn

2020-09-24 Thread Yu Yang (Jira)

Yu Yang created FLINK-19402:
---

 Summary: Metrics for measuring Flink application deployment 
latency in Yarn
 Key: FLINK-19402
 URL: https://issues.apache.org/jira/browse/FLINK-19402
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Yu Yang


Real-time streaming applications often have strict down-time SLO during 
deployment. We are trying to measure the Flink job deployment latency on yarn 
cluster. That is the elapsed time between flink job submission to the yarn 
cluster, and the flink job gets up and running.  

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-24 Thread Seth Wiesman

Hi Yu,

bq* I thought the FLIP aims at resolving some *existing* confusion, i.e.
the durability mystery to users.

I think it might help to highlight specific stumbling blocks users have
today and why I believe this change addresses those issues. Some frequent
things I've heard over the past several years include:

1) "We use RocksDB because we don't need fault tolerance."
2) "We don't use RocksDB because we don't want to manage an external
database."
3) Believing RocksDB is reading and writing directly with S3 or HDFS (vs.
local disk)
4) Believing FsStateBackend spills to disk or has anything to do with the
local filesystem
5) Pointing RocksDB at network-attached storage, believing that the state
backend needs to be fault-tolerant

This question from the ml is very representative of where users are
struggling[1]. Many of these questions were not from new users but from
organizations that were in production! Just yesterday I was on the phone
with a company that didn't realize they were in production without
checkpointing; honestly, you would be shocked how often this happens. The
current state backend abstraction is to complex for many of our users. What
all these questions have in common is misunderstanding the relationship
between how data is stored locally on TMs vs how checkpoints make that
state durable.

The FLIP aims actively help users by allowing them to reason about state
backends separately from checkpoint durability. In the future, a state
backend only defines where and how state is stored locally on the TM while
checkpoint storage defines where and how checkpoints are stored for
recovery. To be concrete I think the JavaDoc for setCheckpointStorage would
be something like:

```java
/**
 * CheckpointStorage defines how checkpoint snapshots are persisted for
fault tolerance
*. Various implementations  store their checkpoints in different fashions
and have different requirements and
 * availability guarantees.
 *
 *For example, JobManagerCheckpointStorage stores checkpoints in the
memory of the JobManager.
 * It is lightweight and without additional dependencies but is not highly
available
 * and only supports small state sizes. This checkpoint storage policy is
convenient for
 * local testing and development.
 *
 *FileSystemCheckpointStorage stores checkpoints in a filesystem. For
systems like
 * HDFS, NFS Drives, S3, and GCS, this storage policy supports large state
size,
 * in the magnitude of many terabytes while providing a highly available
foundation
 * for stateful applications. This checkpoint storage policy is recommended
for most
 * production deployments.
 */
void setCheckpointStorage(CheckpointStorage storage) {}
```

Seth

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/State-Storage-Questions-td37919.html
[2] Also naming, but we're aligned here

On Thu, Sep 24, 2020 at 10:24 AM Yu Li  wrote:

> And to make it clear, I'm +1 on the idea of decoupling state backends with
> checkpointing. I don't have any question about making it clear that
> heap/RocksDB is where we serve the routine state read/write and where to
> put the checkpoint data is another story. My only concern lies in the newly
> introduced setCheckpointStorage API and how we define its semantics, and
> not sure whether it's due to my ignorance.
>
> Best Regards,
> Yu
>
>
> On Thu, 24 Sep 2020 at 23:11, Yu Li  wrote:
>
> > *bq. What new confusion would be introduced here?*
> > No *new* confusion introduced, but as mentioned at the very beginning of
> > the motivation ("Apache Flink's durability story is a mystery to many
> > users"), I thought the FLIP aims at resolving some *existing*
> > confusions, i.e. the durability mystery to users.
> >
> > For me, I'm not 100% clear about how to write the javadoc of the
> > setCheckpointStorage API. Would it be like "specify where the checkpoint
> > data is stored"? If so, do we need to explain the fact that when a
> > checkpoint path is given, JM will also persist the checkpoint data to
> DFS?
> > It's true that such confusion also exists today, but would the
> introduction
> > of the new API expose it further?
> >
> > IMHO we need to document the newly introduced API / classes and their
> > semantics clearly in the FLIP to make sure everyone is on the same page,
> > but if we feel such work / discussions are all details and only need to
> > happen at the documenting and release note phase, it's also fine to me.
> >
> > And if I'm the only one who has such questions / concerns on the new
> > `setCheckpointStorage` API and most of others feel its semantic is sound
> > and clear, then please just ignore me and move on.
> >
> > Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Wed, 23 Sep 2020 at 17:08, Stephan Ewen  wrote:
> >
> >> I am confused now with the concerns here. This is very much from the
> user
> >> perspective (which is partially also the developer perspective which is
> >> the
> >> sign of an intuitive abstraction).
> >>
> >> Of course, there wil

[jira] [Created] (FLINK-19403) Support Pandas Stream Group Window Aggregation

2020-09-24 Thread Huang Xingbo (Jira)

Huang Xingbo created FLINK-19403:


 Summary: Support Pandas Stream Group Window Aggregation
 Key: FLINK-19403
 URL: https://issues.apache.org/jira/browse/FLINK-19403
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.12.0


We will add Stream Physical Pandas Group Window RelNode and 
StreamArrowPythonGroupWindowAggregateFunctionOperator to support Pandas Stream 
Group Window Aggregation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19404) Support Pandas Stream Over Window Aggregation

2020-09-24 Thread Huang Xingbo (Jira)

Huang Xingbo created FLINK-19404:


 Summary: Support Pandas Stream Over Window Aggregation
 Key: FLINK-19404
 URL: https://issues.apache.org/jira/browse/FLINK-19404
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.12.0


We will add Stream Physical Pandas Over Window RelNode and 
StreamArrowPythonOverWindowAggregateFunctionOperator to support Pandas Stream 
Over Window Aggregation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19405) Translate "DataSet Connectors" page of "Connectors" into Chinese

2020-09-24 Thread weizheng (Jira)

weizheng created FLINK-19405:


 Summary: Translate "DataSet Connectors" page of "Connectors" into 
Chinese
 Key: FLINK-19405
 URL: https://issues.apache.org/jira/browse/FLINK-19405
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Documentation
Reporter: weizheng


The page url is 
[https://ci.apache.org/projects/flink/flink-docs-master/dev/batch/connectors.html
 
|https://ci.apache.org/projects/flink/flink-docs-master/dev/batch/connectors.html]

The markdown file is located in flink/docs/dev/batch/connectors.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

2020-09-24 Thread wenlong.lwl

Hi，Aljoscha, I would like to share a use case to second setting parallelism
of table sink(or limiting parallelism range of table sink): When writing
data to databases, there is limitation for number of jdbc connections and
query TPS. we would get errors of too many connections or high load for
db and poor performance because of too many small requests if the optimizer
didn't know such information, and set a large parallelism for sink when
matching the parallelism of its input.

On Thu, 24 Sep 2020 at 22:41, Aljoscha Krettek  wrote:

> Thanks for the proposal! I think the use cases that we are trying to
> solve are indeed valid. However, I think we might have to take a step
> back to look at what we're trying to solve and how we can solve it.
>
> The FLIP seems to have two broader topics: 1) add "get parallelism" to
> sinks/sources 2) let users write DataStream topologies for
> sinks/sources. I'll treat them separately below.
>
> I think we should not add "get parallelism" to the Table Sink API
> because I think it's the wrong level of abstraction. The Table API
> connectors are (or should be) more or less thin wrappers around
> "physical" connectors. By "physical" I mean the underlying (mostly
> DataStream API) connectors. For example, with the Kafka Connector the
> Table API connector just does the configuration parsing and determines a
> good (de)serialization format and then creates the underlying
> FlinkKafkaConsumer/FlinkKafkaProducer.
>
> If we wanted to add a "get parallelism" it would be in those underlying
> connectors but I'm also skeptical about adding such a method there
> because it is a static assignment and would preclude clever
> optimizations about the parallelism of a connector at runtime. But maybe
> that's thinking too much about future work so I'm open to discussion there.
>
> Regarding the second point of letting Table connector developers use
> DataStream: I think we should not do it. One of the purposes of FLIP-95
> [1] was to decouple the Table API from the DataStream API for the basic
> interfaces. Coupling the two too closely at that basic level will make
> our live harder in the future when we want to evolve those APIs or when
> we want the system to be better at choosing how to execute sources and
> sinks. An example of this is actually the past of the Table API. Before
> FLIP-95 we had connectors that dealt directly with DataSet and
> DataStream, meaning that if users wanted their Table Sink to work in
> both BATCH and STREAMING mode they had to provide two implementations.
> The trend is towards unifying the sources/sinks to common interfaces
> that can be used for both BATCH and STREAMING execution but, again, I
> think exposing DataStream here would be a step back in the wrong direction.
>
> I think the solution to the existing user requirement of using
> DataStream sources and sinks with the Table API should be better
> interoperability between the two APIs, which is being tackled right now
> in FLIP-136 [2]. If FLIP-136 is not adequate for the use cases that
> we're trying to solve here, maybe we should think about FLIP-136 some more.
>
> What do you think?
>
> Best,
> Aljoscha
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API
>
>

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-24 Thread Leonard Xu

+1

> 在 2020年9月24日，21:54，Seth Wiesman  写道：
> 
> +1
> 
> On Thu, Sep 24, 2020 at 2:49 AM Rui Li  wrote:
> 
>> +1
>> 
>> On Thu, Sep 24, 2020 at 2:59 PM Timo Walther  wrote:
>> 
>>> +1
>>> 
>>> On 24.09.20 06:54, Jark Wu wrote:
 +1 to move it there.
 
 On Thu, 24 Sep 2020 at 12:16, Jingsong Li 
>>> wrote:
 
> Hi devs and users:
> 
> After the 1.11 release, I heard some voices recently: How can't Hive's
> documents be found in the "Table & SQL Connectors".
> 
> Actually, Hive's documents are in the "Table API & SQL". Since the
>>> "Table &
> SQL Connectors" document was extracted separately, Hive is a little
>> out
>>> of
> place.
> And Hive's code is also in "flink-connector-hive", which should be a
> connector.
> Hive also includes the concept of HiveCatalog. Is catalog a part of
>> the
> connector? I think so.
> 
> What do you think? If you don't object, I think we can move it.
> 
> Best,
> Jingsong Lee
> 
 
>>> 
>>> 
>> 
>> --
>> Best regards!
>> Rui Li
>>

[jira] [Created] (FLINK-19406) Casting row time to timestamp loses nullability info

2020-09-24 Thread Rui Li (Jira)

Rui Li created FLINK-19406:
--

 Summary: Casting row time to timestamp loses nullability info
 Key: FLINK-19406
 URL: https://issues.apache.org/jira/browse/FLINK-19406
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Rui Li
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19407) Translate "Elasticsearch Connector" page of "Connectors" into Chinese

2020-09-24 Thread Jira

魏旭斌 created FLINK-19407:
---

 Summary: Translate "Elasticsearch Connector" page of "Connectors" 
into Chinese
 Key: FLINK-19407
 URL: https://issues.apache.org/jira/browse/FLINK-19407
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Documentation
Reporter: 魏旭斌


The page url is 
[https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/elasticsearch.html]

The markdown file is located in flink/docs/dev/connectors/elasticsearch.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19408) Update flink-statefun-docker release scripts for cross release Java 8 and 11

2020-09-24 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-19408:
---

 Summary: Update flink-statefun-docker release scripts for cross 
release Java 8 and 11
 Key: FLINK-19408
 URL: https://issues.apache.org/jira/browse/FLINK-19408
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
 Fix For: statefun-2.2.0


Currently, the {{add-version.sh}} script in the {{flink-statefun-docker}} repo 
does not generate Dockerfiles for different Java versions.
Since we have decided to cross-release images for Java 8 and 11, that script 
needs to be updated as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

40 matches

Mail list logo