[jira] [Created] (FLINK-25640) Enhance the document for blocking shuffle

2022-01-12 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-25640:
---

 Summary: Enhance the document for blocking shuffle
 Key: FLINK-25640
 URL: https://issues.apache.org/jira/browse/FLINK-25640
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Yingjie Cao
 Fix For: 1.15.0


As discussed in 
[https://lists.apache.org/thread/pt2b1f17x2l5rlvggwxs6m265lo4ly7p], this ticket 
aims to enhance the document for blocking shuffle and add more operation 
guidelines.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25639) Increase the default read buffer size of sort-shuffle to 64M

2022-01-12 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-25639:
---

 Summary: Increase the default read buffer size of sort-shuffle to 
64M
 Key: FLINK-25639
 URL: https://issues.apache.org/jira/browse/FLINK-25639
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.15.0


As discussed in 
[https://lists.apache.org/thread/pt2b1f17x2l5rlvggwxs6m265lo4ly7p], this ticket 
aims to increase the default read buffer size of sort-shuffle to 64M.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25638) Increase the default write buffer size of sort-shuffle to 16M

2022-01-12 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-25638:
---

 Summary: Increase the default write buffer size of sort-shuffle to 
16M
 Key: FLINK-25638
 URL: https://issues.apache.org/jira/browse/FLINK-25638
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.15.0


As discussed in 
[https://lists.apache.org/thread/pt2b1f17x2l5rlvggwxs6m265lo4ly7p], this ticket 
aims to increase the default write buffer size of sort-shuffle to 16M.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25637) Make sort-shuffle the default shuffle implementation for batch jobs

2022-01-12 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-25637:
---

 Summary: Make sort-shuffle the default shuffle implementation for 
batch jobs
 Key: FLINK-25637
 URL: https://issues.apache.org/jira/browse/FLINK-25637
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.15.0


As discussed in 
[https://lists.apache.org/thread/pt2b1f17x2l5rlvggwxs6m265lo4ly7p], this ticket 
aims to make sort-shuffle the default shuffle implementation for batch jobs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25636) FLIP-199: Change some default config values of blocking shuffle for better usability

2022-01-12 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-25636:
---

 Summary: FLIP-199: Change some default config values of blocking 
shuffle for better usability
 Key: FLINK-25636
 URL: https://issues.apache.org/jira/browse/FLINK-25636
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.15.0


This is the umbrella issue for FLIP-199, we will change the several default 
config value for batch shuffle and update the document accordingly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Flink native k8s integration vs. operator

2022-01-12 Thread Thomas Weise
Hi everyone,

Thanks for the feedback and discussion. A few additional thoughts:

[Konstantin] > With respect to common lifecycle management operations:
these features are
> not available (within Apache Flink) for any of the other resource providers
> (YARN, Standalone) either. From this perspective, I wouldn't consider this
> a shortcoming of the Kubernetes integration.

I think time and evolution of the ecosystem are factors to consider as
well. The state and usage of Flink was much different when YARN
integration was novel. Expectations are different today and the
lifecycle functionality provided by an operator may as well be
considered essential to support the concept of a Flink application on
k8s. After few years learning from operator experience outside of
Flink it might be a good time to fill the gap.

[Konstantin] > I still believe that we should keep this focus on low
> level composable building blocks (like Jobs and Snapshots) in Apache Flink
> to make it easy for everyone to build fitting higher level abstractions
> like a FlinkApplication Custom Resource on top of it.

I completely agree that it is important that the basic functions of
Flink are solid and continued focus is necessary. Thanks for sharing
the pointers, these are great improvements. At the same time,
ecosystem, contributor base and user spectrum are growing. There have
been significant additions in many areas of Flink including connectors
and higher level abstractions like statefun, SQL and Python. It's also
evident from additional repositories/subprojects that we have in Flink
today.

[Konstantin] > Having said this, if others in the community have the
capacity to push and
> *maintain* a somewhat minimal "reference" Kubernetes Operator for Apache
> Flink, I don't see any blockers. If or when this happens, I'd see some
> clear benefits of using a separate repository (easier independent
> versioning and releases, different build system & tooling (go, I assume)).

Naturally different contributors to the project have different focus.
Let's find out if there is strong enough interest to take this on and
strong enough commitment to maintain. As I see it, there is a
tremendous amount of internal investment going into operationalizing
Flink within many companies. Improvements to the operational side of
Flink like the operator would complement Flink nicely. I assume that
you are referring to a separate repository within Apache Flink, which
would give it the chance to achieve better sustainability than the
existing external operator efforts. There is also the fact that some
organizations which are heavily invested in operationalizing Flink are
allowing contributing to Apache Flink itself but less so to arbitrary
github projects. Regarding the tooling, it could well turn out that
Java is a good alternative given the ecosystem focus and that there is
an opportunity for reuse in certain aspects (metrics, logging etc.).

[Yang] > I think Xintong has given a strong point why we introduced
the native K8s integration, which is active resource management.
> I have a concrete example for this in the production. When a K8s node is 
> down, the standalone K8s deployment will take longer
> recovery time based on the K8s eviction time(IIRC, default is 5 minutes). For 
> the native K8s integration, Flink RM could be aware of the
> TM heartbeat lost and allocate a new one timely.

Thanks for sharing this, we should evaluate it as part of a proposal.
If we can optimize recovery or scaling with active resource management
then perhaps it is worth to support it through the operator.
Previously mentioned operators all rely on the standalone model.

Cheers,
Thomas

On Wed, Jan 12, 2022 at 3:21 AM Konstantin Knauf  wrote:
>
> cc dev@
>
> Hi Thomas, Hi everyone,
>
> Thank you for starting this discussion and sorry for chiming in late.
>
> I agree with Thomas' and David's assessment of Flink's "Native Kubernetes
> Integration", in particular, it does actually not integrate well with the
> Kubernetes ecosystem despite being called "native" (tooling, security
> concerns).
>
> With respect to common lifecycle management operations: these features are
> not available (within Apache Flink) for any of the other resource providers
> (YARN, Standalone) either. From this perspective, I wouldn't consider this
> a shortcoming of the Kubernetes integration. Instead, we have been focusing
> our efforts in Apache Flink on the operations of a single Job, and left
> orchestration and lifecycle management that spans multiple Jobs to
> ecosystem projects. I still believe that we should keep this focus on low
> level composable building blocks (like Jobs and Snapshots) in Apache Flink
> to make it easy for everyone to build fitting higher level abstractions
> like a FlinkApplication Custom Resource on top of it. For example, we are
> currently contributing multiple improvements [1,2,3,4] to the REST API and
> Application Mode that in our experience will make it easier to manage
> 

Re: [DISCUSS] Seek help for making JIRA links clickable in github

2022-01-12 Thread Jingsong Li
Thanks Yun Tang,

flink-table-store and flink-ml can work now.

Best,
Jingsong

On Wed, Jan 12, 2022 at 4:41 PM Jingsong Li  wrote:
>
> Hi Yun Gao,
> Yes, the issues of flink-statefun and flink-ml are also managed in the
> issues.apache.org. But the FLINK-XX in github is not clickable.
>
> Thanks Yun Tang! That's what I want.
>
> Best,
> Jingsong
>
> On Wed, Jan 12, 2022 at 4:08 PM Yun Tang  wrote:
> >
> > Hi Jingsong,
> >
> > I have already created the ticket [1] to ask for help from ASF 
> > infrastructure.
> >
> > Let's wait to see the progress to make it done in github.
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-22729
> >
> > Best
> > Yun Tang
> > 
> > From: Yun Gao 
> > Sent: Wednesday, January 12, 2022 14:59
> > To: Jingsong Li ; dev 
> > Subject: Re: [DISCUSS] Seek help for making JIRA links clickable in github
> >
> > Currently it seems the issues of flink-statefun and flink-ml are
> > also managed in the issues.apache.org ?
> >
> > Best,
> > Yun
> >
> >
> >  --Original Mail --
> > Sender:Jingsong Li 
> > Send Date:Wed Jan 12 13:54:03 2022
> > Recipients:dev 
> > Subject:[DISCUSS] Seek help for making JIRA links clickable in github
> > Hi everyone,
> >
> > We are creating flink-table-store[1] and we also find that flink-ml[2]
> > does not have clickable JIRA links, while flink-statefun[3] and
> > flink[4] do.
> >
> > So I'm asking for PMC's help on how to make JIRA links clickable in github.
> >
> > [1] https://github.com/apache/flink-table-store
> > [2] https://github.com/apache/flink-ml
> > [3] https://github.com/apache/flink-statefun
> > [4] https://github.com/apache/flink
> >
> > Best,
> > Jingsong Lee


[jira] [Created] (FLINK-25635) Using sql client create hive catalog throw exception

2022-01-12 Thread dalongliu (Jira)
dalongliu created FLINK-25635:
-

 Summary: Using sql client create hive catalog throw exception
 Key: FLINK-25635
 URL: https://issues.apache.org/jira/browse/FLINK-25635
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Hive, Table SQL / Ecosystem
Affects Versions: 1.14.0, 1.15.0
Reporter: dalongliu
 Fix For: 1.15.0


CREATE CATALOG hive WITH('type' = 'hive', 
'hive-conf-dir'='/usr/local/hive/conf');
[ERROR] Could not execute SQL statement. Reason:
java.util.ServiceConfigurationError: 
org.apache.flink.table.factories.TableFactory: Provider 
org.apache.flink.table.planner.delegation.hive.HiveParserFactory not a subtype
  at java.util.ServiceLoader.fail(ServiceLoader.java:239)
  at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
  at java.util.Iterator.forEachRemaining(Iterator.java:116)
  at 
org.apache.flink.table.factories.TableFactoryService.discoverFactories(TableFactoryService.java:208)
  at 
org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:153)
  at 
org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:123)
  at 
org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:264)
  at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1292)
  at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1122)
  at 
org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$executeOperation$3(LocalExecutor.java:209)
  at 
org.apache.flink.table.client.gateway.context.ExecutionContext.wrapClassLoader(ExecutionContext.java:88)
  at 
org.apache.flink.table.client.gateway.local.LocalExecutor.executeOperation(LocalExecutor.java:209)
  at 
org.apache.flink.table.client.cli.CliClient.executeOperation(CliClient.java:625)
  at 
org.apache.flink.table.client.cli.CliClient.callOperation(CliClient.java:447)
  at 
org.apache.flink.table.client.cli.CliClient.lambda$executeStatement$1(CliClient.java:332)
  at java.util.Optional.ifPresent(Optional.java:159)
  at 
org.apache.flink.table.client.cli.CliClient.executeStatement(CliClient.java:325)
  at 
org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:297)
  at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:221)
  at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:151)
  at org.apache.flink.table.client.SqlClient.start(SqlClient.java:95)
  at org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:187)
  at org.apache.flink.table.client.SqlClient.main(SqlClient.java:161)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-01-12 Thread Dong Lin
Hi Fabian,

Thank you for the explanation.

The current approach needs to add new constructors for SourceReaderBase
and SingleThreadMultiplexSourceReaderBase. This proposed change has now
been included in the Public Interfaces section in the FLIP.

And yes, I also agree it would be preferred if developers do not have to
change their SourceReaders to implement this new logic. The suggestion
mentioned by Qingshen in this thread could achieve this goal. Qingshen's
idea is to let user specify eofRecordEvaluator via
StreamExecutionEnvironment::fromSource(...).withEofRecordEvaluator(...) and
pass this evaluator through DataStreamSource to SourceOperator. And
SourceOperator::emitNext(...) could use this evaluator as appropriate.

For now I have chosen not to use this approach because this approach
requires users to pass some source configuration via
StreamExecutionEnvironment::fromSource(...) and some other source
configuration via e.g. KafkaSourceBuilder(...). This might create a sense
of inconsistency/confusion. Given that the number of connector users are
much more than the number of connector developers, I believe it is probably
better to optimize the user experience in this case.

The description of this alternative approach and its pros/cons has been
included in the FLIP.

And yep, I think I understand your suggestion. Indeed those connector
configs (e.g. de-serializer, boundedness, eofRecordEvaluator) can be passed
from XXXSourceBuilder to their shared infra (e.g. SourceReaderBase). Both
solutions work. Given that the existing configs (e.g. serializer) are
already passed to SourceReaderBase via the constructor parameter, I guess
it is simpler to follow the existing pattern for now.

Regards,
Dong


On Wed, Jan 12, 2022 at 11:17 PM Fabian Paul  wrote:

> Hi Dong,
>
> I think I am beginning to understand your idea. Since SourceReaderBase
> is marked as PublicEvolving can you also update the FLIP with the
> changes you want to make to it? Ideally, connector developers do not
> have to change their SourceReaders to implement this new logic.
>
> My idea was to introduce a second source interface that extends the
> existing interface and offers only the method getRecordEvaluator().
> The record evaluator is still passed as you have described through the
> builder and at the end held by the source object. This way the source
> framework can automatically use the evaluator without the need that
> connector developers have to implement the complicated stopping logic
> or change their SourceReaders.
>
> Best,
> Fabian
>
>
> On Wed, Jan 12, 2022 at 2:22 AM Dong Lin  wrote:
> >
> > Hi Fabian,
> >
> > Thanks for the comments. Please see my reply inline.
> >
> > On Tue, Jan 11, 2022 at 11:46 PM Fabian Paul  wrote:
> >
> > > Hi Dong,
> > >
> > > I wouldn't change the org.apache.flink.api.connector.source.Source
> > > interface because it either breaks existing sinks or we introduce it
> > > as some kind of optional. I deem both options as not great. My idea is
> > > to introduce a new interface that extends the Source. This way users
> > > who want to develop a source that stops with the record evaluator can
> > > implement the new interface. It also has the nice benefit that we can
> > > give this new type of source a lower stability guarantee than Public
> > > to allow some changes.
> > >
> >
> > Currently the eofRecodEvaluator can be passed from
> > KafkaSourceBuilder/PulsarSourceBuilder
> > to SingleThreadMultiplexSourceReaderBase and SourceReaderBase. This
> > approach also allows developers who want to develop a source that stops
> > with the record evaluator to implement the new feature. Adding a new
> > interface could increase the complexity in our interface and
> > infrastructure. I am not sure if it has added benefits compared to the
> > existing proposal. Could you explain more?
> >
> > I am not very sure what "new type of source a lower stability guarantee"
> > you are referring to. Could you explain more? It looks like a new feature
> > not mentioned in the FLIP. If the changes proposed in this FLIP also
> > support the feature you have in mind, could we discuss this in a separate
> > FLIP?
> >
> > In the SourceOperatorFactory we can then access the record evaluator
> > > from the respective sources and pass it to the source operator.
> > >
> > > Hopefully, this makes sense. So far I did not find information about
> > > the actual stopping logic in the FLIP maybe you had something
> > > different in mind.
> > >
> >
> > By "actual stopping logic", do you mean an example implementation of the
> > RecordEvalutor? I think the use-case is described in the motivation
> > section, which is about a pipeline processing stock transaction data.
> >
> > We can support this use-case with this FLIP, by implementing this
> > RecordEvaluator that stops reading data from a split when there is a
> > message that says "EOF". Users can trigger this feature by sending
> messages
> > with "EOF" in the payload to all partitions 

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-12 Thread 张俊帆
Hi G,

Thanks for starting the discussion. I think this is a important improvement for 
Flink.
The proposal looks good to me. And I focus on one point.

1. Hope that keeping the consistent with current implementation, we rely on the 
config
of  'security.kerberos.fetch.delegation-token’ to submit Flink Batch Action in 
Oozie.
More details could be found in FLINK-21700

Looking forward to your implementations.

Best
JunFan.
On Jan 12, 2022, 4:03 AM +0800, Márton Balassi , 
wrote:
> Hi G,
>
> Thanks for taking this challenge on. Scalable Kerberos authentication
> support is important for Flink, delegation tokens is a great mechanism to
> future-proof this. I second your assessment that the existing
> implementation could use some improvement too and like the approach you
> have outlined. It is crucial that the changes are self-contained and will
> not affect users that do not use Kerberos, while are minimal for the ones
> who do (configuration values change, but the defaults just keep working in
> most cases).
>
> Thanks,
> Marton
>
> On Tue, Jan 11, 2022 at 2:59 PM Gabor Somogyi 
> wrote:
>
> > Hi All,
> >
> > Hope all of you have enjoyed the holiday season.
> >
> > I would like to start the discussion on FLIP-211
> > <
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework
> > >
> > which
> > aims to provide a
> > Kerberos delegation token framework that /obtains/renews/distributes tokens
> > out-of-the-box.
> >
> > Please be aware that the FLIP wiki area is not fully done since the
> > discussion may
> > change the feature in major ways. The proposal can be found in a google doc
> > here
> > <
> > https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ
> > >
> > .
> > As the community agrees on the approach the content will be moved to the
> > wiki page.
> >
> > Feel free to add your thoughts to make this feature better!
> >
> > BR,
> > G
> >


[jira] [Created] (FLINK-25634) flink-read-onyarn-configuration

2022-01-12 Thread Jira
宇宙先生 created FLINK-25634:


 Summary: flink-read-onyarn-configuration
 Key: FLINK-25634
 URL: https://issues.apache.org/jira/browse/FLINK-25634
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.11.2
Reporter: 宇宙先生
 Attachments: image-2022-01-13-10-02-57-803.png, 
image-2022-01-13-10-03-28-230.png, image-2022-01-13-10-07-02-908.png, 
image-2022-01-13-10-09-36-890.png, image-2022-01-13-10-10-44-945.png

in flink-src.code :

!image-2022-01-13-10-03-28-230.png!

Set the number of retries for failed YARN ApplicationMasters/JobManagers in high
 availability mode. This value is usually limited by YARN.
By default, it's 1 in the standalone case and 2 in the high availability case

 

in my cluster,yarn's configuration  also like this

!image-2022-01-13-10-07-02-908.png!

But it keeps restarting when my taskfails,

!image-2022-01-13-10-10-44-945.png!

I would like to know the reason why the configuration is not taking effect.

sincere thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release 1.14.3, release candidate #1

2022-01-12 Thread Jingsong Li
+1 (non-binding)

- Verified checksums and signatures
- Build from source
- Start a standalone cluster, web is OK
- Start sql-client, everything looks good

Best,
Jingsong

On Wed, Jan 12, 2022 at 11:00 PM Till Rohrmann  wrote:
>
> +1 (binding)
>
> - Verified checksums and signatures
> - Build from source and ran StateMachineExample
> - Reviewed the flink-web PR
> - Verified that there were no other dependency changes than testcontainer,
> japicmp-plugin and log4j.
>
> Cheers,
> Till
>
> On Wed, Jan 12, 2022 at 9:58 AM Yun Tang  wrote:
>
> > +1 (non-binding)
> >
> >
> >   *   Checked the signature of source code, some of binaries and some of
> > python packages.
> >   *   Launch a local cluster successfully on linux machine with correct
> > flink-version and commit id and run the state machine example successfully
> > as expected.
> >   *   Reviewed the flink-web PR.
> >
> > Best
> > Yun Tang
> > 
> > From: Martijn Visser 
> > Sent: Wednesday, January 12, 2022 0:34
> > To: dev 
> > Subject: [VOTE] Release 1.14.3, release candidate #1
> >
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> > 1.14.3, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> > be deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 12DEE3E4D920A98C [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.14.3-rc1" [5],
> > * website pull request listing the new release and adding announcement
> > blog post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks on behalf of Thomas Weise and myself,
> >
> > Martijn Visser
> > http://twitter.com/MartijnVisser82
> >
> > [1]
> >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351075=12315522
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1481/
> > [5] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
> > [6] https://github.com/apache/flink-web/pull/497
> >


Re: [VOTE] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Dian Fu
+1 (binding)

Regards,
Dian

On Thu, Jan 13, 2022 at 6:50 AM Thomas Weise  wrote:

> +1 (binding)
>
> On Wed, Jan 12, 2022 at 4:58 AM Till Rohrmann 
> wrote:
> >
> > +1 (binding)
> >
> > Cheers,
> > Till
> >
> > On Wed, Jan 12, 2022 at 11:07 AM Wei Zhong 
> wrote:
> >
> > > +1(binding)
> > >
> > > Best,
> > > Wei
> > >
> > > > 2022年1月12日 下午5:58,Xingbo Huang  写道:
> > > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for FLIP-206[1], which was discussed
> and
> > > > reached a consensus in the discussion thread[2].
> > > >
> > > > The vote will be open for at least 72h, unless there is an objection
> or
> > > not
> > > > enough votes.
> > > >
> > > > [1]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
> > > > [2] https://lists.apache.org/thread/c7d2mt1vh8v11x2sl8slm4sy9j3t2pdg
> > > >
> > > > Best,
> > > > Xingbo
> > >
> > >
>


Re: [VOTE] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Thomas Weise
+1 (binding)

On Wed, Jan 12, 2022 at 4:58 AM Till Rohrmann  wrote:
>
> +1 (binding)
>
> Cheers,
> Till
>
> On Wed, Jan 12, 2022 at 11:07 AM Wei Zhong  wrote:
>
> > +1(binding)
> >
> > Best,
> > Wei
> >
> > > 2022年1月12日 下午5:58,Xingbo Huang  写道:
> > >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-206[1], which was discussed and
> > > reached a consensus in the discussion thread[2].
> > >
> > > The vote will be open for at least 72h, unless there is an objection or
> > not
> > > enough votes.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
> > > [2] https://lists.apache.org/thread/c7d2mt1vh8v11x2sl8slm4sy9j3t2pdg
> > >
> > > Best,
> > > Xingbo
> >
> >


[jira] [Created] (FLINK-25633) CPUResourceTest does not work with German Locale

2022-01-12 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-25633:
-

 Summary: CPUResourceTest does not work with German Locale
 Key: FLINK-25633
 URL: https://issues.apache.org/jira/browse/FLINK-25633
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.14.2, 1.13.5
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.15.0, 1.13.6, 1.14.3


The {{CPUResourceTest}} does not work with a German Locale because it expects 
decimals to be formatted with a dot (e.g. {{0.00}} instead of {{0,00}}). 

I propose to fix the Locale in the {{pom.xml}} to US in order to fix this 
problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-01-12 Thread Fabian Paul
Hi Dong,

I think I am beginning to understand your idea. Since SourceReaderBase
is marked as PublicEvolving can you also update the FLIP with the
changes you want to make to it? Ideally, connector developers do not
have to change their SourceReaders to implement this new logic.

My idea was to introduce a second source interface that extends the
existing interface and offers only the method getRecordEvaluator().
The record evaluator is still passed as you have described through the
builder and at the end held by the source object. This way the source
framework can automatically use the evaluator without the need that
connector developers have to implement the complicated stopping logic
or change their SourceReaders.

Best,
Fabian


On Wed, Jan 12, 2022 at 2:22 AM Dong Lin  wrote:
>
> Hi Fabian,
>
> Thanks for the comments. Please see my reply inline.
>
> On Tue, Jan 11, 2022 at 11:46 PM Fabian Paul  wrote:
>
> > Hi Dong,
> >
> > I wouldn't change the org.apache.flink.api.connector.source.Source
> > interface because it either breaks existing sinks or we introduce it
> > as some kind of optional. I deem both options as not great. My idea is
> > to introduce a new interface that extends the Source. This way users
> > who want to develop a source that stops with the record evaluator can
> > implement the new interface. It also has the nice benefit that we can
> > give this new type of source a lower stability guarantee than Public
> > to allow some changes.
> >
>
> Currently the eofRecodEvaluator can be passed from
> KafkaSourceBuilder/PulsarSourceBuilder
> to SingleThreadMultiplexSourceReaderBase and SourceReaderBase. This
> approach also allows developers who want to develop a source that stops
> with the record evaluator to implement the new feature. Adding a new
> interface could increase the complexity in our interface and
> infrastructure. I am not sure if it has added benefits compared to the
> existing proposal. Could you explain more?
>
> I am not very sure what "new type of source a lower stability guarantee"
> you are referring to. Could you explain more? It looks like a new feature
> not mentioned in the FLIP. If the changes proposed in this FLIP also
> support the feature you have in mind, could we discuss this in a separate
> FLIP?
>
> In the SourceOperatorFactory we can then access the record evaluator
> > from the respective sources and pass it to the source operator.
> >
> > Hopefully, this makes sense. So far I did not find information about
> > the actual stopping logic in the FLIP maybe you had something
> > different in mind.
> >
>
> By "actual stopping logic", do you mean an example implementation of the
> RecordEvalutor? I think the use-case is described in the motivation
> section, which is about a pipeline processing stock transaction data.
>
> We can support this use-case with this FLIP, by implementing this
> RecordEvaluator that stops reading data from a split when there is a
> message that says "EOF". Users can trigger this feature by sending messages
> with "EOF" in the payload to all partitions of the source Kafka topic.
>
> Does this make sense?
>
>
> >
> > Best,
> > Fabian
> >
> > On Tue, Jan 11, 2022 at 1:40 AM Dong Lin  wrote:
> > >
> > > Hi Fabian,
> > >
> > > Thanks for the comments!
> > >
> > > By "add a source mixin interface", are you suggesting to update
> > > the org.apache.flink.api.connector.source.Source interface to add the API
> > > "RecordEvaluator getRecordEvaluator()"? If so, it seems to add more
> > > public API and thus more complexity than the solution in the FLIP. Could
> > > you help explain more about the benefits of doing this?
> > >
> > > Regarding the 2nd question, I think this FLIP does not change whether
> > > sources are treated as bounded or unbounded. For example, the
> > KafkaSource's
> > > boundedness will continue to be determined with the API
> > > KafkaSourceBuilder::setBounded(..) and
> > > KafkaSourceBuilder::setUnbounded(..). Does this answer your question?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jan 10, 2022 at 8:01 PM Fabian Paul  wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > Thank you for updating the FLIP and making it applicable for all
> > > > sources. I am a bit unsure about the implementation part. I would
> > > > propose to add a source mixin interface that implements
> > > > `getRecordEvaluator` and sources that want to allow dynamically
> > > > stopping implement that interface.
> > > >
> > > > Another question I had was how do we treat sources using the record
> > > > evaluator as bounded or unbounded?
> > > >
> > > > Best,
> > > > Fabian
> > > >
> > > > On Sat, Jan 8, 2022 at 11:52 AM Dong Lin  wrote:
> > > > >
> > > > > Hi Martijn and Qingsheng,
> > > > >
> > > > > The FLIP has been updated to extend the dynamic EOF support for the
> > > > > PulsarSource. I have not extended this feature to other sources yet
> > > > since I
> > > > > am not sure it is a requirement 

[jira] [Created] (FLINK-25632) Makes Job name available in cleanup stage

2022-01-12 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-25632:
-

 Summary: Makes Job name available in cleanup stage
 Key: FLINK-25632
 URL: https://issues.apache.org/jira/browse/FLINK-25632
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Matthias Pohl


{{CheckpointResourcesCleanupRunner}} implements the {{JobManagerRunner}} 
interface also providing {{ExecutionGraphInfo}} stub through {{requestJob}} 
(similar to what's returned by 
[JobMasterServiceLeadershipRunner.requestJob|https://github.com/apache/flink/blob/5b13772c954cd13993e792153a69e96e3706c6ce/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L228]
 returns when the {{JobMaster}} is still in initialization phase).

Currently, the job name is not passed as part of the {{ExecutionGraphInfo}} 
stub because it's not provided by the {{JobResult}} that stored in the 
{{{}JobResultStore{}}}. We have multiple options now to fix that:
 # {{JobResult}} is extended to also serve the job's name. As a consequence, it 
will be also present in the REST API
 # The {{JobResultStore}} stores this information as additional field besides 
the {{{}jobResult{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release 1.14.3, release candidate #1

2022-01-12 Thread Till Rohrmann
+1 (binding)

- Verified checksums and signatures
- Build from source and ran StateMachineExample
- Reviewed the flink-web PR
- Verified that there were no other dependency changes than testcontainer,
japicmp-plugin and log4j.

Cheers,
Till

On Wed, Jan 12, 2022 at 9:58 AM Yun Tang  wrote:

> +1 (non-binding)
>
>
>   *   Checked the signature of source code, some of binaries and some of
> python packages.
>   *   Launch a local cluster successfully on linux machine with correct
> flink-version and commit id and run the state machine example successfully
> as expected.
>   *   Reviewed the flink-web PR.
>
> Best
> Yun Tang
> 
> From: Martijn Visser 
> Sent: Wednesday, January 12, 2022 0:34
> To: dev 
> Subject: [VOTE] Release 1.14.3, release candidate #1
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version
> 1.14.3, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to
> be deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 12DEE3E4D920A98C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.14.3-rc1" [5],
> * website pull request listing the new release and adding announcement
> blog post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks on behalf of Thomas Weise and myself,
>
> Martijn Visser
> http://twitter.com/MartijnVisser82
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351075=12315522
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1481/
> [5] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
> [6] https://github.com/apache/flink-web/pull/497
>


Re: OutOfMemoryError: Java heap space while implmentating flink sql api

2022-01-12 Thread Martijn Visser
Hi Ronak,

I would like to ask you to stop cross-posting to all the Flink mailing
lists and then also post the same question to Stackoverflow. Both the
mailing lists and Stackoverflow are designed for asynchronous communication
and you should allow the community some days to address your question.

Joins are state heavy. As mentioned in the documentation [1] "Thus, the
required state for computing the query result might grow infinitely
depending on the number of distinct input rows of all input tables and
intermediate join results."

Best regards,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/


On Wed, 12 Jan 2022 at 11:06, Ronak Beejawat (rbeejawa)
 wrote:

> Hi Team,
>
> I was trying to implement flink sql api join with 2 tables it is throwing
> error OutOfMemoryError: Java heap space . PFB screenshot for flink cluster
> memory details.
> [Flink Memory Model][1]
>
>
>   [1]: https://i.stack.imgur.com/AOnQI.png
>
> **PFB below code snippet which I was trying:**
> ```
> EnvironmentSettings settings =
> EnvironmentSettings.newInstance().inStreamingMode().build();
> TableEnvironment tableEnv = TableEnvironment.create(settings);
>
>
> tableEnv.getConfig().getConfiguration().setString("table.optimizer.agg-phase-strategy",
> "TWO_PHASE");
> tableEnv.getConfig().getConfiguration().setString("table.optimizer.join-reorder-enabled",
> "true");
> tableEnv.getConfig().getConfiguration().setString("table.exec.resource.default-parallelism",
> "16");
>
> tableEnv.executeSql("CREATE TEMPORARY TABLE ccmversionsumapTable (\r\n"
>  + "  suname STRING\r\n"
>  + "  ,ccmversion STRING\r\n"
>  + "   )\r\n"
>  + "   WITH (\r\n"
>  + "   'connector' =
> 'jdbc'\r\n"
>  + "   ,'url' =
> 'jdbc:mysql://:3306/ccucdb'\r\n"
>  + "   ,'table-name' =
> 'ccmversionsumap'\r\n"
>  + "   ,'username' =
> '*'\r\n"
>  + "   ,'password' =
> ''\r\n"
>  + "   )");
>
> tableEnv.executeSql("CREATE TEMPORARY TABLE cdrTable (\r\n"
>+ "   org_id STRING\r\n"
>+ "   ,cluster_id STRING\r\n"
>+ "   ,cluster_name STRING\r\n"
>+ "   ,version STRING\r\n"
>+ "   ,ip_address STRING\r\n"
>+ "   ,pkid STRING\r\n"
>+ "   ,globalcallid_callid INT\r\n"
>   ... --- multiple columns can be added
>+ "   )\r\n"
>+ "   WITH (\r\n"
>+ "   'connector' = 'kafka'\r\n"
>+ "   ,'topic' = 'cdr'\r\n"
>+ "   ,'properties.bootstrap.servers' =
> ':9092'\r\n"
>+ "   ,'scan.startup.mode' =
> 'earliest-offset'\r\n"
>//+ ",'value.fields-include' =
> 'EXCEPT_KEY'\r\n"
>+ "   ,'format' = 'json'\r\n"
>+ "   )");
>
>
> String sql = "SELECT cdr.org_id orgid,\r\n"
>   + "
>  cdr.cluster_name clustername,\r\n"
>   + "
>  cdr.cluster_id clusterid,\r\n"
>   + "
>  cdr.ip_address clusteripaddr,\r\n"
>   + "
>  cdr.version clusterversion,\r\n"
>   + "
>  cvsm.suname clustersuname,\r\n"
>   + "
>  cdr.pkid cdrpkid,\r\n"
>   ... ---
> multiple columns can be added
>   + " from
> cdrTable cdr\r\n"
>   + " left join
> ccmversionsumapTable cvsm ON (cdr.version = cvsm.ccmversion) group by
> TUMBLE(PROCTIME(), INTERVAL '1' MINUTE), cdr.org_id, cdr.cluster_name,
> cdr.cluster_id, cdr.ip_address, cdr.version, cdr.pkid,
> cdr.globalcallid_callid, ..."
>
> Table order20 = tableEnv.sqlQuery(sql);
> order20.executeInsert("outputCdrTable");
> ```
>
> **scenario / use case :**
>
> we are pushing 2.5 million json record in kafka topic and reading it via
> kafka connector as 

Re: OutOfMemoryError: Java heap space while implmentating flink sql api

2022-01-12 Thread Roman Khachatryan
Hi Ronak,

You shared a screenshot of JM. Do you mean that exception also happens
on JM? (I'd rather assume TM).

Could you explain the join clause: left join ccmversionsumapTable cvsm
ON (cdr.version = cvsm.ccmversion)
"version" doesn't sound very selective, so maybe you end up with
(almost) Cartesian product?

Regards,
Roman

On Wed, Jan 12, 2022 at 11:06 AM Ronak Beejawat (rbeejawa)
 wrote:
>
> Hi Team,
>
> I was trying to implement flink sql api join with 2 tables it is throwing 
> error OutOfMemoryError: Java heap space . PFB screenshot for flink cluster 
> memory details.
> [Flink Memory Model][1]
>
>
>   [1]: https://i.stack.imgur.com/AOnQI.png
>
> **PFB below code snippet which I was trying:**
> ```
> EnvironmentSettings settings = 
> EnvironmentSettings.newInstance().inStreamingMode().build();
> TableEnvironment tableEnv = TableEnvironment.create(settings);
>
>
> tableEnv.getConfig().getConfiguration().setString("table.optimizer.agg-phase-strategy",
>  "TWO_PHASE");
> tableEnv.getConfig().getConfiguration().setString("table.optimizer.join-reorder-enabled",
>  "true");
> tableEnv.getConfig().getConfiguration().setString("table.exec.resource.default-parallelism",
>  "16");
>
> tableEnv.executeSql("CREATE TEMPORARY TABLE ccmversionsumapTable (\r\n"
>  + "  suname STRING\r\n"
>  + "  ,ccmversion STRING\r\n"
>  + "   )\r\n"
>  + "   WITH (\r\n"
>  + "   'connector' = 
> 'jdbc'\r\n"
>  + "   ,'url' = 
> 'jdbc:mysql://:3306/ccucdb'\r\n"
>  + "   ,'table-name' = 
> 'ccmversionsumap'\r\n"
>  + "   ,'username' = 
> '*'\r\n"
>  + "   ,'password' = 
> ''\r\n"
>  + "   )");
>
> tableEnv.executeSql("CREATE TEMPORARY TABLE cdrTable (\r\n"
>+ "   org_id STRING\r\n"
>+ "   ,cluster_id STRING\r\n"
>+ "   ,cluster_name STRING\r\n"
>+ "   ,version STRING\r\n"
>+ "   ,ip_address STRING\r\n"
>+ "   ,pkid STRING\r\n"
>+ "   ,globalcallid_callid INT\r\n"
>   ... --- multiple columns can be added
>+ "   )\r\n"
>+ "   WITH (\r\n"
>+ "   'connector' = 'kafka'\r\n"
>+ "   ,'topic' = 'cdr'\r\n"
>+ "   ,'properties.bootstrap.servers' = 
> ':9092'\r\n"
>+ "   ,'scan.startup.mode' = 
> 'earliest-offset'\r\n"
>//+ ",'value.fields-include' = 
> 'EXCEPT_KEY'\r\n"
>+ "   ,'format' = 'json'\r\n"
>+ "   )");
>
>
> String sql = "SELECT cdr.org_id orgid,\r\n"
>   + " 
> cdr.cluster_name clustername,\r\n"
>   + " 
> cdr.cluster_id clusterid,\r\n"
>   + " 
> cdr.ip_address clusteripaddr,\r\n"
>   + " 
> cdr.version clusterversion,\r\n"
>   + " 
> cvsm.suname clustersuname,\r\n"
>   + " 
> cdr.pkid cdrpkid,\r\n"
>   ... --- 
> multiple columns can be added
>   + " from 
> cdrTable cdr\r\n"
>   + " left join 
> ccmversionsumapTable cvsm ON (cdr.version = cvsm.ccmversion) group by 
> TUMBLE(PROCTIME(), INTERVAL '1' MINUTE), cdr.org_id, cdr.cluster_name, 
> cdr.cluster_id, cdr.ip_address, cdr.version, cdr.pkid, 
> cdr.globalcallid_callid, ..."
>
> Table order20 = tableEnv.sqlQuery(sql);
> order20.executeInsert("outputCdrTable");
> ```
>
> **scenario / use case :**
>
> we are pushing 2.5 million json record in kafka topic and reading it via 
> kafka connector as temporary cdrTable as shown in above code and we reading 
> 23 records from jdbc static/reference table via jdbc connector as temporary 
> ccmversionsumapTable as shown in above code and doing a left join for 1 min 
> tumble window .
>
> So while doing 

Re: Could not find any factory for identifier 'jdbc'

2022-01-12 Thread Roman Khachatryan
Hi,

I think Chesnay's suggestion to double-check the bundle makes sense.
Additionally, I'd try flink-connector-jdbc_2.12 instead of
flink-connector-jdbc_2.11.

Regards,
Roman

On Wed, Jan 12, 2022 at 12:23 PM Chesnay Schepler  wrote:
>
> I would try double-checking whether the jdbc connector was truly bundled
> in your jar, specifically whether
> org.apache.flink.connector.jdbc.table.JdbcDynamicTableFactory is.
>
> I can't think of a reason why this shouldn't work for the JDBC connector.
>
> On 12/01/2022 06:34, Ronak Beejawat (rbeejawa) wrote:
> > Hi Chesnay,
> >
> > How do you ensure that the connector is actually available at runtime?
> >
> > We are providing below mentioned dependency inside pom.xml with scope 
> > compile that will be available in class path and it was there in my fink 
> > job bundled jar. Same we are doing the same for other connector say kafka 
> > it worked for that
> >
> > 
> >org.apache.flink
> >flink-connector-jdbc_2.11
> >1.14.2
> > 
> > 
> >mysql
> >mysql-connector-java
> >5.1.41
> > 
> >
> > Are you bundling it in a jar or putting it into Flinks lib directory?
> > Yes we are building jar it is bundled with that but still we saw this error 
> > . So we tried the workaround which is mentioned in some article to put 
> > inside a flink lib directory then it worked 
> > https://blog.csdn.net/weixin_44056920/article/details/118110949 . So this 
> > is extra stuff which we have to do to make it work with restart of cluster .
> >
> > But the question is how it worked for kafka and not for jdbc ? I didn't put 
> > kafka jar explicitly in flink lib folder
> >
> > Note : I am using flink release 1.14 version for all my job execution / 
> > implementation which is a stable version I guess
> >
> > Thanks
> > Ronak Beejawat
> > From: Chesnay Schepler mailto:ches...@apache.org>>
> > Date: Tuesday, 11 January 2022 at 7:45 PM
> > To: Ronak Beejawat (rbeejawa) 
> > mailto:rbeej...@cisco.com.INVALID>>, 
> > u...@flink.apache.org 
> > mailto:u...@flink.apache.org>>
> > Cc: Hang Ruan mailto:ruanhang1...@gmail.com>>, 
> > Shrinath Shenoy K (sshenoyk) 
> > mailto:sshen...@cisco.com>>, Karthikeyan Muthusamy 
> > (karmuthu) mailto:karmu...@cisco.com>>, Krishna 
> > Singitam (ksingita) mailto:ksing...@cisco.com>>, Arun 
> > Yadav (aruny) mailto:ar...@cisco.com>>, Jayaprakash 
> > Kuravatti (jkuravat) mailto:jkura...@cisco.com>>, Avi 
> > Sanwal (asanwal) mailto:asan...@cisco.com>>
> > Subject: Re: Could not find any factory for identifier 'jdbc'
> > How do you ensure that the connector is actually available at runtime?
> > Are you bundling it in a jar or putting it into Flinks lib directory?
> >
> > On 11/01/2022 14:14, Ronak Beejawat (rbeejawa) wrote:
> >> Correcting subject -> Could not find any factory for identifier 'jdbc'
> >>
> >> From: Ronak Beejawat (rbeejawa)
> >> Sent: Tuesday, January 11, 2022 6:43 PM
> >> To: 'dev@flink.apache.org' 
> >> mailto:dev@flink.apache.org>>; 
> >> 'commun...@flink.apache.org' 
> >> mailto:commun...@flink.apache.org>>; 
> >> 'u...@flink.apache.org' 
> >> mailto:u...@flink.apache.org>>
> >> Cc: 'Hang Ruan' mailto:ruanhang1...@gmail.com>>; 
> >> Shrinath Shenoy K (sshenoyk) 
> >> mailto:sshen...@cisco.com>>; Karthikeyan Muthusamy 
> >> (karmuthu) mailto:karmu...@cisco.com>>; Krishna 
> >> Singitam (ksingita) mailto:ksing...@cisco.com>>; Arun 
> >> Yadav (aruny) mailto:ar...@cisco.com>>; Jayaprakash 
> >> Kuravatti (jkuravat) mailto:jkura...@cisco.com>>; Avi 
> >> Sanwal (asanwal) mailto:asan...@cisco.com>>
> >> Subject: what is efficient way to write Left join in flink
> >>
> >> Hi Team,
> >>
> >> Getting below exception while using jdbc connector :
> >>
> >> Caused by: org.apache.flink.table.api.ValidationException: Could not find 
> >> any factory for identifier 'jdbc' that implements 
> >> 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.
> >>
> >> Available factory identifiers are:
> >>
> >> blackhole
> >> datagen
> >> filesystem
> >> kafka
> >> print
> >> upsert-kafka
> >>
> >>
> >> I have already added dependency for jdbc connector in pom.xml as mentioned 
> >> below:
> >>
> >> 
> >> org.apache.flink
> >>  flink-connector-jdbc_2.11
> >>  1.14.2
> >> 
> >> 
> >> mysql
> >>  mysql-connector-java
> >>  5.1.41
> >> 
> >>
> >> Referred release doc link for the same 
> >> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/table/jdbc/
> >>
> >>
> >>
> >> Please help me on this and provide the solution for it !!!
> >>
> >>
> >> Thanks
> >> Ronak Beejawat
>
>


Re: [VOTE] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Till Rohrmann
+1 (binding)

Cheers,
Till

On Wed, Jan 12, 2022 at 11:07 AM Wei Zhong  wrote:

> +1(binding)
>
> Best,
> Wei
>
> > 2022年1月12日 下午5:58,Xingbo Huang  写道:
> >
> > Hi all,
> >
> > I would like to start the vote for FLIP-206[1], which was discussed and
> > reached a consensus in the discussion thread[2].
> >
> > The vote will be open for at least 72h, unless there is an objection or
> not
> > enough votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
> > [2] https://lists.apache.org/thread/c7d2mt1vh8v11x2sl8slm4sy9j3t2pdg
> >
> > Best,
> > Xingbo
>
>


[jira] [Created] (FLINK-25631) Support enhanced `show tables` statement

2022-01-12 Thread Yubin Li (Jira)
Yubin Li created FLINK-25631:


 Summary: Support enhanced `show tables` statement
 Key: FLINK-25631
 URL: https://issues.apache.org/jira/browse/FLINK-25631
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Affects Versions: 1.14.4
Reporter: Yubin Li


Enhanced `show tables` statement like ` show tables from db1 like 't%' ` has 
been supported broadly in many popular data process engine like 
presto/trino/spark
[https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html]
I have investigated the syntax of engines as mentioned above.
 
We could use such statement to easily show the tables of specified databases 
without switching db frequently, alse we could use regexp pattern to find 
focused tables quickly from plenty of tables. besides, the new statement is 
compatible completely with the old one, users could use `show tables` as before.
 
h3. SHOW TABLES [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT] LIKE 
regex_pattern ]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25630) Introduce MergeTree writer and reader for table store

2022-01-12 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-25630:


 Summary: Introduce MergeTree writer and reader for table store
 Key: FLINK-25630
 URL: https://issues.apache.org/jira/browse/FLINK-25630
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.1.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25629) Introduce CompactStrategy and CompactManager for table store

2022-01-12 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-25629:


 Summary: Introduce CompactStrategy and CompactManager for table 
store
 Key: FLINK-25629
 URL: https://issues.apache.org/jira/browse/FLINK-25629
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.1.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25628) Introduce RecordReader and related classes for table store

2022-01-12 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-25628:


 Summary: Introduce RecordReader and related classes for table store
 Key: FLINK-25628
 URL: https://issues.apache.org/jira/browse/FLINK-25628
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.1.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Could not find any factory for identifier 'jdbc'

2022-01-12 Thread Chesnay Schepler
I would try double-checking whether the jdbc connector was truly bundled 
in your jar, specifically whether 
org.apache.flink.connector.jdbc.table.JdbcDynamicTableFactory is.


I can't think of a reason why this shouldn't work for the JDBC connector.

On 12/01/2022 06:34, Ronak Beejawat (rbeejawa) wrote:

Hi Chesnay,

How do you ensure that the connector is actually available at runtime?

We are providing below mentioned dependency inside pom.xml with scope compile 
that will be available in class path and it was there in my fink job bundled 
jar. Same we are doing the same for other connector say kafka it worked for that


   org.apache.flink
   flink-connector-jdbc_2.11
   1.14.2


   mysql
   mysql-connector-java
   5.1.41


Are you bundling it in a jar or putting it into Flinks lib directory?
Yes we are building jar it is bundled with that but still we saw this error . 
So we tried the workaround which is mentioned in some article to put inside a 
flink lib directory then it worked 
https://blog.csdn.net/weixin_44056920/article/details/118110949 . So this is 
extra stuff which we have to do to make it work with restart of cluster .

But the question is how it worked for kafka and not for jdbc ? I didn't put 
kafka jar explicitly in flink lib folder

Note : I am using flink release 1.14 version for all my job execution / 
implementation which is a stable version I guess

Thanks
Ronak Beejawat
From: Chesnay Schepler mailto:ches...@apache.org>>
Date: Tuesday, 11 January 2022 at 7:45 PM
To: Ronak Beejawat (rbeejawa) mailto:rbeej...@cisco.com.INVALID>>, 
u...@flink.apache.org 
mailto:u...@flink.apache.org>>
Cc: Hang Ruan mailto:ruanhang1...@gmail.com>>, Shrinath Shenoy K (sshenoyk) 
mailto:sshen...@cisco.com>>, Karthikeyan Muthusamy (karmuthu) mailto:karmu...@cisco.com>>, Krishna 
Singitam (ksingita) mailto:ksing...@cisco.com>>, Arun Yadav (aruny) mailto:ar...@cisco.com>>, 
Jayaprakash Kuravatti (jkuravat) mailto:jkura...@cisco.com>>, Avi Sanwal (asanwal) 
mailto:asan...@cisco.com>>
Subject: Re: Could not find any factory for identifier 'jdbc'
How do you ensure that the connector is actually available at runtime?
Are you bundling it in a jar or putting it into Flinks lib directory?

On 11/01/2022 14:14, Ronak Beejawat (rbeejawa) wrote:

Correcting subject -> Could not find any factory for identifier 'jdbc'

From: Ronak Beejawat (rbeejawa)
Sent: Tuesday, January 11, 2022 6:43 PM
To: 'dev@flink.apache.org' mailto:dev@flink.apache.org>>; 
'commun...@flink.apache.org' mailto:commun...@flink.apache.org>>; 
'u...@flink.apache.org' mailto:u...@flink.apache.org>>
Cc: 'Hang Ruan' mailto:ruanhang1...@gmail.com>>; Shrinath Shenoy K (sshenoyk) 
mailto:sshen...@cisco.com>>; Karthikeyan Muthusamy (karmuthu) mailto:karmu...@cisco.com>>; Krishna 
Singitam (ksingita) mailto:ksing...@cisco.com>>; Arun Yadav (aruny) mailto:ar...@cisco.com>>; 
Jayaprakash Kuravatti (jkuravat) mailto:jkura...@cisco.com>>; Avi Sanwal (asanwal) 
mailto:asan...@cisco.com>>
Subject: what is efficient way to write Left join in flink

Hi Team,

Getting below exception while using jdbc connector :

Caused by: org.apache.flink.table.api.ValidationException: Could not find any 
factory for identifier 'jdbc' that implements 
'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.

Available factory identifiers are:

blackhole
datagen
filesystem
kafka
print
upsert-kafka


I have already added dependency for jdbc connector in pom.xml as mentioned 
below:


org.apache.flink
 flink-connector-jdbc_2.11
 1.14.2


mysql
 mysql-connector-java
 5.1.41


Referred release doc link for the same 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/table/jdbc/



Please help me on this and provide the solution for it !!!


Thanks
Ronak Beejawat





Re: Flink native k8s integration vs. operator

2022-01-12 Thread Konstantin Knauf
cc dev@

Hi Thomas, Hi everyone,

Thank you for starting this discussion and sorry for chiming in late.

I agree with Thomas' and David's assessment of Flink's "Native Kubernetes
Integration", in particular, it does actually not integrate well with the
Kubernetes ecosystem despite being called "native" (tooling, security
concerns).

With respect to common lifecycle management operations: these features are
not available (within Apache Flink) for any of the other resource providers
(YARN, Standalone) either. From this perspective, I wouldn't consider this
a shortcoming of the Kubernetes integration. Instead, we have been focusing
our efforts in Apache Flink on the operations of a single Job, and left
orchestration and lifecycle management that spans multiple Jobs to
ecosystem projects. I still believe that we should keep this focus on low
level composable building blocks (like Jobs and Snapshots) in Apache Flink
to make it easy for everyone to build fitting higher level abstractions
like a FlinkApplication Custom Resource on top of it. For example, we are
currently contributing multiple improvements [1,2,3,4] to the REST API and
Application Mode that in our experience will make it easier to manage
Apache Flink with a Kubernetes operator. Given this background, I suspect a
Kubernetes Operator in Apache Flink would not be a priority for us at
Ververica - at least right now.

Having said this, if others in the community have the capacity to push and
*maintain* a somewhat minimal "reference" Kubernetes Operator for Apache
Flink, I don't see any blockers. If or when this happens, I'd see some
clear benefits of using a separate repository (easier independent
versioning and releases, different build system & tooling (go, I assume)).

Looking forward to your thoughts,

Konstantin

[1] https://issues.apache.org/jira/browse/FLINK-24275
[2] https://issues.apache.org/jira/browse/FLINK-24208
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore
[4] https://issues.apache.org/jira/browse/FLINK-24113

On Mon, Jan 10, 2022 at 2:11 PM Gyula Fóra  wrote:

> Hi All!
>
> This is a very interesting discussion.
>
> I think many users find it confusing what deployment mode to choose when
> considering a new production application on Kubernetes. With all the
> options of native, standalone and different operators this can get tricky :)
>
> I really like the idea that Thomas brought up to have at least a minimal
> operator implementation in Flink itself to cover the most common production
> job lifecycle management scenarios. I think the Flink community has a very
> strong experience in this area to create a successful implementation that
> would benefit most production users on Kubernetes.
>
> Cheers,
> Gyula
>
> On Mon, Jan 10, 2022 at 4:29 AM Yang Wang  wrote:
>
>> Thanks all for this fruitful discussion.
>>
>> I think Xintong has given a strong point why we introduced the native K8s
>> integration, which is active resource management.
>> I have a concrete example for this in the production. When a K8s node is
>> down, the standalone K8s deployment will take longer
>> recovery time based on the K8s eviction time(IIRC, default is 5 minutes).
>> For the native K8s integration, Flink RM could be aware of the
>> TM heartbeat lost and allocate a new one timely.
>>
>> Also when introducing the native K8s integration, another hit is that we
>> should make the users are easy enough to migrate from YARN deployment.
>> They already have a production-ready job life-cycle management system,
>> which is using Flink CLI to submit the Flink jobs.
>> So we provide a consistent command "bin/flink run-application -t
>> kubernetes-application/yarn-application" to start a Flink application and
>> "bin/flink cancel/stop ..."
>> to terminate a Flink application.
>>
>>
>> Compared with K8s operator, I know that this is not a K8s
>> native mechanism. Hence, I also agree that we still need a powerful K8s
>> operator which
>> could work with both standalone and native K8s modes. The major
>> difference between them is how to start the JM and TM pods. For standalone,
>> they are managed by K8s job/deployment. For native, maybe we could simply
>> create a submission carrying the "flink run-application" arguments
>> which is derived from the Flink application CR.
>>
>> Make the Flink's active resource manager can talk to the K8s operator is
>> an interesting option, which could support both standalone and native.
>> Then Flink RM just needs to declare the resource requirement(e.g. 2 *
>> <2G, 1CPU>, 2 * <4G, 1CPU>) and defer the resource allocation/de-allocation
>> to the K8s operator. It feels like an intermediate form between native
>> and standalone mode :)
>>
>>
>>
>> Best,
>> Yang
>>
>>
>>
>> Xintong Song  于2022年1月7日周五 12:02写道:
>>
>>> Hi folks,
>>>
>>> Thanks for the discussion. I'd like to share my two cents on this topic.
>>>
>>> Firstly, I'd like to clarify my understanding of the concepts "native
>>> k8s 

[DISCUSS] Data quality by apache flink

2022-01-12 Thread tanjialiang
Hi everyone,

I would like to start a discussion thread on "Flink SQL support data quality"

For example, I have a SQL job, in this job i have a source table with a column 
named phone, and i want to set the data quality of the data in the column 
phone's pattern, such as it must match the pattern of telephone, if not match, 
i can choose to drop it or ignored. Also, we can mark the quality into the 
metrics, so that user can monitor the data quality from the source and the sink.

After this, user can kown about the data quality from the source and sink, 
which is very useful for the downstream.

How to do that:
In my opinion, we can set this quality option in the table with properties, and 
add a flatmap operator after SourceOperator and before SinkOperator, the 
flatmap operator will do the quality logic like match the pattern, drop it or 
ignored, mark it into the metrics which user can monitor. 

It Is a draft and everyone has any good idea?

从 Windows 版邮件发送



Re: [VOTE] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Wei Zhong
+1(binding)

Best,
Wei

> 2022年1月12日 下午5:58,Xingbo Huang  写道:
> 
> Hi all,
> 
> I would like to start the vote for FLIP-206[1], which was discussed and
> reached a consensus in the discussion thread[2].
> 
> The vote will be open for at least 72h, unless there is an objection or not
> enough votes.
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
> [2] https://lists.apache.org/thread/c7d2mt1vh8v11x2sl8slm4sy9j3t2pdg
> 
> Best,
> Xingbo



OutOfMemoryError: Java heap space while implmentating flink sql api

2022-01-12 Thread Ronak Beejawat (rbeejawa)
Hi Team,

I was trying to implement flink sql api join with 2 tables it is throwing error 
OutOfMemoryError: Java heap space . PFB screenshot for flink cluster memory 
details.
[Flink Memory Model][1]


  [1]: https://i.stack.imgur.com/AOnQI.png

**PFB below code snippet which I was trying:**
```
EnvironmentSettings settings = 
EnvironmentSettings.newInstance().inStreamingMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);


tableEnv.getConfig().getConfiguration().setString("table.optimizer.agg-phase-strategy",
 "TWO_PHASE");
tableEnv.getConfig().getConfiguration().setString("table.optimizer.join-reorder-enabled",
 "true");
tableEnv.getConfig().getConfiguration().setString("table.exec.resource.default-parallelism",
 "16");

tableEnv.executeSql("CREATE TEMPORARY TABLE ccmversionsumapTable (\r\n"
 + "  suname STRING\r\n"
 + "  ,ccmversion STRING\r\n"
 + "   )\r\n"
 + "   WITH (\r\n"
 + "   'connector' = 'jdbc'\r\n"
 + "   ,'url' = 
'jdbc:mysql://:3306/ccucdb'\r\n"
 + "   ,'table-name' = 
'ccmversionsumap'\r\n"
 + "   ,'username' = 
'*'\r\n"
 + "   ,'password' = ''\r\n"
 + "   )");

tableEnv.executeSql("CREATE TEMPORARY TABLE cdrTable (\r\n"
   + "   org_id STRING\r\n"
   + "   ,cluster_id STRING\r\n"
   + "   ,cluster_name STRING\r\n"
   + "   ,version STRING\r\n"
   + "   ,ip_address STRING\r\n"
   + "   ,pkid STRING\r\n"
   + "   ,globalcallid_callid INT\r\n"
  ... --- multiple columns can be added
   + "   )\r\n"
   + "   WITH (\r\n"
   + "   'connector' = 'kafka'\r\n"
   + "   ,'topic' = 'cdr'\r\n"
   + "   ,'properties.bootstrap.servers' = 
':9092'\r\n"
   + "   ,'scan.startup.mode' = 
'earliest-offset'\r\n"
   //+ ",'value.fields-include' = 
'EXCEPT_KEY'\r\n"
   + "   ,'format' = 'json'\r\n"
   + "   )");


String sql = "SELECT cdr.org_id orgid,\r\n"
  + " 
cdr.cluster_name clustername,\r\n"
  + " 
cdr.cluster_id clusterid,\r\n"
  + " 
cdr.ip_address clusteripaddr,\r\n"
  + " 
cdr.version clusterversion,\r\n"
  + " 
cvsm.suname clustersuname,\r\n"
  + " cdr.pkid 
cdrpkid,\r\n"
  ... --- multiple 
columns can be added
  + " from 
cdrTable cdr\r\n"
  + " left join 
ccmversionsumapTable cvsm ON (cdr.version = cvsm.ccmversion) group by 
TUMBLE(PROCTIME(), INTERVAL '1' MINUTE), cdr.org_id, cdr.cluster_name, 
cdr.cluster_id, cdr.ip_address, cdr.version, cdr.pkid, cdr.globalcallid_callid, 
..."

Table order20 = tableEnv.sqlQuery(sql);
order20.executeInsert("outputCdrTable");
```

**scenario / use case :**

we are pushing 2.5 million json record in kafka topic and reading it via kafka 
connector as temporary cdrTable as shown in above code and we reading 23 
records from jdbc static/reference table via jdbc connector as temporary 
ccmversionsumapTable as shown in above code and doing a left join for 1 min 
tumble window .

So while doing a join we are getting OutOfMemoryError: jvm heap space error 
while processing it.

but the similar use case we tried to do left join with two tables cdr (2.5m 
records) and cmr (5m records) with same tumbling window we are able to process 
that without any issue and both are reading from kafka as shown in above code 
snnipet for cdrTable

Thanks
Ronak Beejawat


[VOTE] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Xingbo Huang
Hi all,

I would like to start the vote for FLIP-206[1], which was discussed and
reached a consensus in the discussion thread[2].

The vote will be open for at least 72h, unless there is an objection or not
enough votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
[2] https://lists.apache.org/thread/c7d2mt1vh8v11x2sl8slm4sy9j3t2pdg

Best,
Xingbo


[jira] [Created] (FLINK-25627) Add basic structures of file store in table-store

2022-01-12 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-25627:


 Summary: Add basic structures of file store in table-store
 Key: FLINK-25627
 URL: https://issues.apache.org/jira/browse/FLINK-25627
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.1.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Xingbo Huang
Hi Thomas,

Thanks for the confirmation. I will now start a vote.

Best,
Xingbo

Thomas Weise  于2022年1月12日周三 02:20写道:

> Hi Xingbo,
>
> +1 from my side
>
> Thanks for the clarification. For your use case the parameter size and
> therefore serialization overhead was the limiting factor. I have seen
> use cases where that is not the concern, because the Python logic
> itself is heavy and dwarfs the protocol overhead (for example when
> interacting with external systems from the UDF). Hence it is good to
> give users options to optimize for their application requirements.
>
> Cheers,
> Thomas
>
> On Tue, Jan 11, 2022 at 3:44 AM Xingbo Huang  wrote:
> >
> > Hi everyone,
> >
> > Thanks to all of you for the discussion.
> > If there are no objections, I would like to start a vote thread tomorrow.
> >
> > Best,
> > Xingbo
> >
> > Xingbo Huang  于2022年1月7日周五 16:18写道:
> >
> > > Hi Till,
> > >
> > > I have written a more complicated PyFlink job. Compared with the
> previous
> > > single python udf job, there is an extra stage of converting between
> table
> > > and datastream. Besides, I added a python map function for the job.
> Because
> > > python datastream has not yet implemented Thread mode, the python map
> > > function operator is still running in Process Mode.
> > >
> > > ```
> > > source = t_env.from_path("source_table")  # schema [id: String, d:int]
> > >
> > > @udf(result_type=DataTypes.STRING(), func_type="general")
> > > def upper(x):
> > > return x.upper()
> > >
> > > t_env.create_temporary_system_function("upper", upper)
> > > # python map function
> > > ds = t_env.to_data_stream(source) \
> > > .map(lambda x: x, output_type=Types.ROW_NAMED(["id",
> "d"],
> > >
> > >[Types.STRING(),
> > >
> > > Types.INT()]))
> > >
> > > t = t_env.from_data_stream(ds)
> > > t.select('upper(id)').execute_insert('sink_table')
> > > ```
> > >
> > > The input data size is 1k.
> > >
> > > Mode |   QPS
> > > Process Mode   |3w
> > > Thread Mode + Process mode |4w
> > >
> > > From the table, we can find that the nodes run in Process Mode is the
> > > performance bottleneck of the job.
> > >
> > > Best,
> > > Xingbo
> > >
> > > Till Rohrmann  于2022年1月5日周三 23:16写道:
> > >
> > >> Thanks for the detailed answer Xingbo. Quick question on the last
> figure
> > >> in
> > >> the FLIP. You said that this is a real world Flink stream SQL job. The
> > >> title of the graph says UDF(String Upper). So do I understand
> correctly
> > >> that string upper is the real world use case you have measured? What I
> > >> wanted to ask is how a slightly more complex Flink Python job
> (involving
> > >> shuffles, with back pressure, etc.) performs using the thread and
> process
> > >> mode respectively.
> > >>
> > >> If the mode solely needs changes in the Python part of Flink, then I
> don't
> > >> have any concerns from the runtime perspective.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Wed, Jan 5, 2022 at 1:55 PM Xingbo Huang 
> wrote:
> > >>
> > >> > Hi Till and Thomas,
> > >> >
> > >> > Thanks a lot for joining the discussion.
> > >> >
> > >> > For Till:
> > >> >
> > >> > >>> Is the slower performance currently the biggest pain point for
> our
> > >> > Python users? What else are our Python users mainly complaining
> about?
> > >> >
> > >> > PyFlink users are most concerned about two parts, one is better
> > >> usability,
> > >> > the other is performance. Users often make some benchmarks when they
> > >> > investigate pyflink[1][2] at the beginning to decide whether to use
> > >> > PyFlink. The performance of a PyFlink job depends on two parts, one
> is
> > >> the
> > >> > overhead of the PyFlink framework, and the other is the Python
> function
> > >> > complexity implemented by the user. In the Python ecosystem, there
> are
> > >> many
> > >> > libraries and tools that can help Python users improve the
> performance
> > >> of
> > >> > their custom functions, such as pandas[3], numba[4] and cython[5].
> So we
> > >> > hope that the framework overhead of PyFlink itself can also be
> reduced.
> > >> >
> > >> > >>> Concerning the proposed changes, are there any changes required
> on
> > >> the
> > >> > runtime side (changes to Flink)? How will the deployment and memory
> > >> > management be affected when using the thread execution mode?
> > >> >
> > >> > The changes on PyFlink Runtime mentioned here are actually only
> > >> > modifications of PyFlink custom Operators, such as
> > >> > PythonScalarFunctionOperator[6], which won't affect deployment and
> > >> memory
> > >> > management.
> > >> >
> > >> > >>> One more question that came to my mind: How much performance
> > >> > improvement dowe gain on a real-world Python use case? Were the
> > >> > measurements more like micro benchmarks where the Python UDF was
> called
> > >> w/o
> > >> > the overhead of Flink? I 

Re: what is efficient way to write Left join in flink

2022-01-12 Thread Chesnay Schepler

Your best bet is to try out both approaches with some representative data.

On 12/01/2022 08:11, Ronak Beejawat (rbeejawa) wrote:

Hi Team,

Can you please help me with the below query, I wanted to know which approach 
will be better and efficient for multiple left join within one min tumbling 
window concept (Datastream Vs SQL API wrt. performance and memory management)

Use case :
1. We have topic one (testtopic1) which will get half a million data every 
minute.
2. We have topic two (testtopic2) which will get 23 data points as static or 
reference.
3. We have topic two (testtopic3) which will get one million data every minute.


So we are doing join as (select * testtopic1  left join  testtopic2 left join 
testtopic3  group by tumble window of 1 min duration)

So the question is which API will be more efficient and faster for such use 
case (datastream API or sql API) for intensive joining logic?

Thanks
Ronak Beejawat



From: Ronak Beejawat (rbeejawa)
Sent: Tuesday, January 11, 2022 6:12 PM
To: 'dev@flink.apache.org' ; 'commun...@flink.apache.org' 
; 'u...@flink.apache.org' 
Cc: 'Hang Ruan' ; Shrinath Shenoy K (sshenoyk) 

Subject: RE: what is efficient way to write Left join in flink

Can please someone help / reply on below Question ?

From: Ronak Beejawat (rbeejawa)
Sent: Monday, January 10, 2022 7:40 PM
To: dev@flink.apache.org; 
commun...@flink.apache.org; 
u...@flink.apache.org
Cc: Hang Ruan mailto:ruanhang1...@gmail.com>>; Shrinath Shenoy K 
(sshenoyk) mailto:sshen...@cisco.com>>
Subject: what is efficient way to write Left join in flink

Hi Team,

We want a clarification on one real time processing scenario for below 
mentioned use case.

Use case :
1. We have topic one (testtopic1) which will get half a million data every 
minute.
2. We have topic two (testtopic2) which will get one million data every minute.

So we are doing join as testtopic1  left join  testtopic2 which has a 
correlated data 1:2

So the question is which API will be more efficient and faster for such use 
case (datastream API or sql API) for intensive joining logic?

Thanks
Ronak Beejawat





Re: [VOTE] Release 1.14.3, release candidate #1

2022-01-12 Thread Yun Tang
+1 (non-binding)


  *   Checked the signature of source code, some of binaries and some of python 
packages.
  *   Launch a local cluster successfully on linux machine with correct 
flink-version and commit id and run the state machine example successfully as 
expected.
  *   Reviewed the flink-web PR.

Best
Yun Tang

From: Martijn Visser 
Sent: Wednesday, January 12, 2022 0:34
To: dev 
Subject: [VOTE] Release 1.14.3, release candidate #1

Hi everyone,
Please review and vote on the release candidate #1 for the version
1.14.3, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to
be deployed to dist.apache.org [2], which are signed with the key with
fingerprint 12DEE3E4D920A98C [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.14.3-rc1" [5],
* website pull request listing the new release and adding announcement
blog post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks on behalf of Thomas Weise and myself,

Martijn Visser
http://twitter.com/MartijnVisser82

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351075=12315522
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1481/
[5] https://dist.apache.org/repos/dist/dev/flink/flink-1.14.3-rc1/
[6] https://github.com/apache/flink-web/pull/497


Re: Use of JIRA fixVersion

2022-01-12 Thread Till Rohrmann
Hi Thomas,

Thanks for creating this proposal. I agree with Konstantin and you that a
bit more guidance for this topic is definitely helpful. I would be ok with
adopting your proposal.

Cheers,
Till

On Wed, Jan 12, 2022 at 9:23 AM Konstantin Knauf  wrote:

> Hi Thomas,
>
> I am also +1 to set fixVersion more conservatively, in particular for patch
> releases. I am not sure we can or should solve this in a "strict" way. I am
> happy to give contributors and contributing teams a bit of freedom in how
> they use fixVersion for planning, but giving more concrete guidance than
> what we currently have in [1] would be desirable, I believe.
>
> Cheers,
>
> Konstantin
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-FixVersion/s
> :
>
> On Tue, Jan 11, 2022 at 8:44 PM Thomas Weise  wrote:
>
> > Hi,
> >
> > As part of preparing the 1.14.3 release, I observed that there were
> > around 200 JIRA issues with fixVersion 1.14.3 that were unresolved
> > (after blocking issues had been dealt with). Further cleanup resulted
> > in removing fixVersion 1.14.3  from most of these and we are left with
> > [1] - these are the tickets that rolled over to 1.14.4.
> >
> > The disassociated issues broadly fell into following categories:
> >
> > * test infrastructure / stability related (these can be found by label)
> > * stale tickets (can also be found by label)
> > * tickets w/o label that pertain to addition of features that don't
> > fit into or don't have to go into patch release
> >
> > I wanted to bring this up so that we can potentially come up with
> > better guidance for use of the fixVersion field, since it is important
> > for managing releases [2]. Manual cleanup as done in this case isn't
> > desirable. A few thoughts:
> >
> > * In most cases, it is not necessary to set fixVersion upfront.
> > Instead, we can set it when the issue is actually resolved, and set it
> > for all versions/branches for which a backport occured after the
> > changes are merged
> > * How to know where to backport? "Affect versions" seems to be the
> > right field to use for that purpose. While resolving an issue for
> > master it can guide backporting.
> > * What if an issue should block a release? The priority of the issue
> > should be blocker. Blockers are low cardinality and need to be fixed
> > before release. So that would be the case where fixVersion is set
> > upfront.
> >
> > Thanks,
> > Thomas
> >
> > [1]
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20Flink%20and%20fixVersion%20%3D%201.14.4%20and%20resolution%20%3D%20Unresolved%20
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


what is efficient way to write Left join in flink

2022-01-12 Thread Ronak Beejawat (rbeejawa)
Hi Team,

Can you please help me with the below query, I wanted to know which approach 
will be better and efficient for multiple left join within one min tumbling 
window concept (Datastream Vs SQL API wrt. performance and memory management)

Use case :
1. We have topic one (testtopic1) which will get half a million data every 
minute.
2. We have topic two (testtopic2) which will get 23 data points as static or 
reference.
3. We have topic two (testtopic3) which will get one million data every minute.


So we are doing join as (select * testtopic1  left join  testtopic2 left join 
testtopic3  group by tumble window of 1 min duration)

So the question is which API will be more efficient and faster for such use 
case (datastream API or sql API) for intensive joining logic?

Thanks
Ronak Beejawat



From: Ronak Beejawat (rbeejawa)
Sent: Tuesday, January 11, 2022 6:12 PM
To: 'dev@flink.apache.org' ; 'commun...@flink.apache.org' 
; 'u...@flink.apache.org' 
Cc: 'Hang Ruan' ; Shrinath Shenoy K (sshenoyk) 

Subject: RE: what is efficient way to write Left join in flink

Can please someone help / reply on below Question ?

From: Ronak Beejawat (rbeejawa)
Sent: Monday, January 10, 2022 7:40 PM
To: dev@flink.apache.org; 
commun...@flink.apache.org; 
u...@flink.apache.org
Cc: Hang Ruan mailto:ruanhang1...@gmail.com>>; Shrinath 
Shenoy K (sshenoyk) mailto:sshen...@cisco.com>>
Subject: what is efficient way to write Left join in flink

Hi Team,

We want a clarification on one real time processing scenario for below 
mentioned use case.

Use case :
1. We have topic one (testtopic1) which will get half a million data every 
minute.
2. We have topic two (testtopic2) which will get one million data every minute.

So we are doing join as testtopic1  left join  testtopic2 which has a 
correlated data 1:2

So the question is which API will be more efficient and faster for such use 
case (datastream API or sql API) for intensive joining logic?

Thanks
Ronak Beejawat


RE: Could not find any factory for identifier 'jdbc'

2022-01-12 Thread Ronak Beejawat (rbeejawa)
Hi Chesnay,

How do you ensure that the connector is actually available at runtime?

We are providing below mentioned dependency inside pom.xml with scope compile 
that will be available in class path and it was there in my fink job bundled 
jar. Same we are doing the same for other connector say kafka it worked for that


  org.apache.flink
  flink-connector-jdbc_2.11
  1.14.2


  mysql
  mysql-connector-java
  5.1.41


Are you bundling it in a jar or putting it into Flinks lib directory?
Yes we are building jar it is bundled with that but still we saw this error . 
So we tried the workaround which is mentioned in some article to put inside a 
flink lib directory then it worked 
https://blog.csdn.net/weixin_44056920/article/details/118110949 . So this is 
extra stuff which we have to do to make it work with restart of cluster .

But the question is how it worked for kafka and not for jdbc ? I didn't put 
kafka jar explicitly in flink lib folder

Note : I am using flink release 1.14 version for all my job execution / 
implementation which is a stable version I guess

Thanks
Ronak Beejawat
From: Chesnay Schepler mailto:ches...@apache.org>>
Date: Tuesday, 11 January 2022 at 7:45 PM
To: Ronak Beejawat (rbeejawa) 
mailto:rbeej...@cisco.com.INVALID>>, 
u...@flink.apache.org 
mailto:u...@flink.apache.org>>
Cc: Hang Ruan mailto:ruanhang1...@gmail.com>>, Shrinath 
Shenoy K (sshenoyk) mailto:sshen...@cisco.com>>, 
Karthikeyan Muthusamy (karmuthu) 
mailto:karmu...@cisco.com>>, Krishna Singitam (ksingita) 
mailto:ksing...@cisco.com>>, Arun Yadav (aruny) 
mailto:ar...@cisco.com>>, Jayaprakash Kuravatti (jkuravat) 
mailto:jkura...@cisco.com>>, Avi Sanwal (asanwal) 
mailto:asan...@cisco.com>>
Subject: Re: Could not find any factory for identifier 'jdbc'
How do you ensure that the connector is actually available at runtime?
Are you bundling it in a jar or putting it into Flinks lib directory?

On 11/01/2022 14:14, Ronak Beejawat (rbeejawa) wrote:
> Correcting subject -> Could not find any factory for identifier 'jdbc'
>
> From: Ronak Beejawat (rbeejawa)
> Sent: Tuesday, January 11, 2022 6:43 PM
> To: 'dev@flink.apache.org' 
> mailto:dev@flink.apache.org>>; 
> 'commun...@flink.apache.org' 
> mailto:commun...@flink.apache.org>>; 
> 'u...@flink.apache.org' mailto:u...@flink.apache.org>>
> Cc: 'Hang Ruan' mailto:ruanhang1...@gmail.com>>; 
> Shrinath Shenoy K (sshenoyk) mailto:sshen...@cisco.com>>; 
> Karthikeyan Muthusamy (karmuthu) 
> mailto:karmu...@cisco.com>>; Krishna Singitam (ksingita) 
> mailto:ksing...@cisco.com>>; Arun Yadav (aruny) 
> mailto:ar...@cisco.com>>; Jayaprakash Kuravatti (jkuravat) 
> mailto:jkura...@cisco.com>>; Avi Sanwal (asanwal) 
> mailto:asan...@cisco.com>>
> Subject: what is efficient way to write Left join in flink
>
> Hi Team,
>
> Getting below exception while using jdbc connector :
>
> Caused by: org.apache.flink.table.api.ValidationException: Could not find any 
> factory for identifier 'jdbc' that implements 
> 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.
>
> Available factory identifiers are:
>
> blackhole
> datagen
> filesystem
> kafka
> print
> upsert-kafka
>
>
> I have already added dependency for jdbc connector in pom.xml as mentioned 
> below:
>
> 
> org.apache.flink
> flink-connector-jdbc_2.11
> 1.14.2
> 
> 
> mysql
> mysql-connector-java
> 5.1.41
> 
>
> Referred release doc link for the same 
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/table/jdbc/
>
>
>
> Please help me on this and provide the solution for it !!!
>
>
> Thanks
> Ronak Beejawat


Re: [DISCUSS] Seek help for making JIRA links clickable in github

2022-01-12 Thread Jingsong Li
Hi Yun Gao,
Yes, the issues of flink-statefun and flink-ml are also managed in the
issues.apache.org. But the FLINK-XX in github is not clickable.

Thanks Yun Tang! That's what I want.

Best,
Jingsong

On Wed, Jan 12, 2022 at 4:08 PM Yun Tang  wrote:
>
> Hi Jingsong,
>
> I have already created the ticket [1] to ask for help from ASF infrastructure.
>
> Let's wait to see the progress to make it done in github.
>
> [1] https://issues.apache.org/jira/browse/INFRA-22729
>
> Best
> Yun Tang
> 
> From: Yun Gao 
> Sent: Wednesday, January 12, 2022 14:59
> To: Jingsong Li ; dev 
> Subject: Re: [DISCUSS] Seek help for making JIRA links clickable in github
>
> Currently it seems the issues of flink-statefun and flink-ml are
> also managed in the issues.apache.org ?
>
> Best,
> Yun
>
>
>  --Original Mail --
> Sender:Jingsong Li 
> Send Date:Wed Jan 12 13:54:03 2022
> Recipients:dev 
> Subject:[DISCUSS] Seek help for making JIRA links clickable in github
> Hi everyone,
>
> We are creating flink-table-store[1] and we also find that flink-ml[2]
> does not have clickable JIRA links, while flink-statefun[3] and
> flink[4] do.
>
> So I'm asking for PMC's help on how to make JIRA links clickable in github.
>
> [1] https://github.com/apache/flink-table-store
> [2] https://github.com/apache/flink-ml
> [3] https://github.com/apache/flink-statefun
> [4] https://github.com/apache/flink
>
> Best,
> Jingsong Lee


Re: [DISCUSS] Deprecate MapR FS

2022-01-12 Thread Martijn Visser
Thanks everyone. We've moved forward with removing the MapR file system in
Flink 1.15.

On Tue, 11 Jan 2022 at 10:37, Jingsong Li  wrote:

> +1 for dropping the MapR FS. Thanks for driving.
>
> Best,
> Jingsong
>
> On Tue, Jan 11, 2022 at 5:22 PM Chang Li  wrote:
> >
> > +1 for dropping the MapR FS.
> >
> > Till Rohrmann  于2022年1月5日周三 18:33写道:
> >
> > > +1 for dropping the MapR FS.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jan 5, 2022 at 10:11 AM Martijn Visser 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for your input. I've checked the MapR implementation and it
> has no
> > > > annotation at all. Given the circumstances that we thought that MapR
> was
> > > > already dropped, I would propose to immediately remove MapR in Flink
> 1.15
> > > > instead of first marking it as deprecated and removing it in Flink
> 1.16.
> > > >
> > > > Please let me know what you think.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Thu, 9 Dec 2021 at 17:27, David Morávek  wrote:
> > > >
> > > >> +1, agreed with Seth's reasoning. There has been no real activity in
> > > MapR
> > > >> FS module for years [1], so the eventual users should be good with
> using
> > > >> the jars from the older Flink versions for quite some time
> > > >>
> > > >> [1]
> > > >>
> > >
> https://github.com/apache/flink/commits/master/flink-filesystems/flink-mapr-fs
> > > >>
> > > >> Best,
> > > >> D.
> > > >>
> > > >> On Thu, Dec 9, 2021 at 4:28 PM Konstantin Knauf 
> > > >> wrote:
> > > >>
> > > >>> +1 (what Seth said)
> > > >>>
> > > >>> On Thu, Dec 9, 2021 at 4:15 PM Seth Wiesman 
> > > wrote:
> > > >>>
> > > >>> > +1
> > > >>> >
> > > >>> > I actually thought we had already dropped this FS. If anyone is
> still
> > > >>> > relying on it in production, the file system abstraction in
> Flink has
> > > >>> been
> > > >>> > incredibly stable over the years. They should be able to use the
> 1.14
> > > >>> MapR
> > > >>> > FS with later versions of Flink.
> > > >>> >
> > > >>> > Seth
> > > >>> >
> > > >>> > On Wed, Dec 8, 2021 at 10:03 AM Martijn Visser <
> > > mart...@ververica.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> >> Hi all,
> > > >>> >>
> > > >>> >> Flink supports multiple file systems [1] which includes MapR FS.
> > > MapR
> > > >>> as
> > > >>> >> a company doesn't exist anymore since 2019, the technology and
> > > >>> intellectual
> > > >>> >> property has been sold to Hewlett Packard.
> > > >>> >>
> > > >>> >> I don't think that there's anyone who's using MapR anymore and
> > > >>> therefore
> > > >>> >> I think it would be good to deprecate this for Flink 1.15 and
> then
> > > >>> remove
> > > >>> >> it in Flink 1.16. Removing this from Flink will slightly shrink
> the
> > > >>> >> codebase and CI runtime.
> > > >>> >>
> > > >>> >> I'm also cross posting this to the User mailing list, in case
> > > there's
> > > >>> >> still anyone who's using MapR.
> > > >>> >>
> > > >>> >> Best regards,
> > > >>> >>
> > > >>> >> Martijn
> > > >>> >>
> > > >>> >> [1]
> > > >>> >>
> > > >>>
> > >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > >>> >>
> > > >>> >
> > > >>>
> > > >>> --
> > > >>>
> > > >>> Konstantin Knauf
> > > >>>
> > > >>> https://twitter.com/snntrable
> > > >>>
> > > >>> https://github.com/knaufk
> > > >>>
> > > >>
> > >
>


Re: Use of JIRA fixVersion

2022-01-12 Thread Konstantin Knauf
Hi Thomas,

I am also +1 to set fixVersion more conservatively, in particular for patch
releases. I am not sure we can or should solve this in a "strict" way. I am
happy to give contributors and contributing teams a bit of freedom in how
they use fixVersion for planning, but giving more concrete guidance than
what we currently have in [1] would be desirable, I believe.

Cheers,

Konstantin

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-FixVersion/s
:

On Tue, Jan 11, 2022 at 8:44 PM Thomas Weise  wrote:

> Hi,
>
> As part of preparing the 1.14.3 release, I observed that there were
> around 200 JIRA issues with fixVersion 1.14.3 that were unresolved
> (after blocking issues had been dealt with). Further cleanup resulted
> in removing fixVersion 1.14.3  from most of these and we are left with
> [1] - these are the tickets that rolled over to 1.14.4.
>
> The disassociated issues broadly fell into following categories:
>
> * test infrastructure / stability related (these can be found by label)
> * stale tickets (can also be found by label)
> * tickets w/o label that pertain to addition of features that don't
> fit into or don't have to go into patch release
>
> I wanted to bring this up so that we can potentially come up with
> better guidance for use of the fixVersion field, since it is important
> for managing releases [2]. Manual cleanup as done in this case isn't
> desirable. A few thoughts:
>
> * In most cases, it is not necessary to set fixVersion upfront.
> Instead, we can set it when the issue is actually resolved, and set it
> for all versions/branches for which a backport occured after the
> changes are merged
> * How to know where to backport? "Affect versions" seems to be the
> right field to use for that purpose. While resolving an issue for
> master it can guide backporting.
> * What if an issue should block a release? The priority of the issue
> should be blocker. Blockers are low cardinality and need to be fixed
> before release. So that would be the case where fixVersion is set
> upfront.
>
> Thanks,
> Thomas
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20Flink%20and%20fixVersion%20%3D%201.14.4%20and%20resolution%20%3D%20Unresolved%20
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [VOTE] FLIP-199: Change some default config values of blocking shuffle for better usability

2022-01-12 Thread Guowei Ma
+1(binding)
Best,
Guowei


On Wed, Jan 12, 2022 at 3:44 PM Jingsong Li  wrote:

> +1 Thanks Yingjie for driving.
>
> Best,
> Jingsong Lee
>
> On Wed, Jan 12, 2022 at 3:16 PM 刘建刚  wrote:
> >
> > +1 for the proposal. In fact, we have used these params in our inner
> flink
> > version for good performance.
> >
> > Yun Gao  于2022年1月12日周三 10:42写道:
> >
> > > +1 since it would highly improve the open-box experience for batch
> jobs.
> > >
> > > Thanks Yingjie for drafting the PR and initiating the discussion.
> > >
> > > Best,
> > > Yun
> > >
> > >
> > >
> > >  --Original Mail --
> > > Sender:Yingjie Cao 
> > > Send Date:Tue Jan 11 15:15:01 2022
> > > Recipients:dev 
> > > Subject:[VOTE] FLIP-199: Change some default config values of blocking
> > > shuffle for better usability
> > > Hi all,
> > >
> > > I'd like to start a vote on FLIP-199: Change some default config
> values of
> > > blocking shuffle for better usability [1] which has been discussed in
> this
> > > thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection or
> > > not enough votes.
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
> > > [2] https://lists.apache.org/thread/pt2b1f17x2l5rlvggwxs6m265lo4ly7p
> > >
> > > Best,
> > > Yingjie
> > >
>


Re: [DISCUSS] Seek help for making JIRA links clickable in github

2022-01-12 Thread Yun Tang
Hi Jingsong,

I have already created the ticket [1] to ask for help from ASF infrastructure.

Let's wait to see the progress to make it done in github.

[1] https://issues.apache.org/jira/browse/INFRA-22729

Best
Yun Tang

From: Yun Gao 
Sent: Wednesday, January 12, 2022 14:59
To: Jingsong Li ; dev 
Subject: Re: [DISCUSS] Seek help for making JIRA links clickable in github

Currently it seems the issues of flink-statefun and flink-ml are
also managed in the issues.apache.org ?

Best,
Yun


 --Original Mail --
Sender:Jingsong Li 
Send Date:Wed Jan 12 13:54:03 2022
Recipients:dev 
Subject:[DISCUSS] Seek help for making JIRA links clickable in github
Hi everyone,

We are creating flink-table-store[1] and we also find that flink-ml[2]
does not have clickable JIRA links, while flink-statefun[3] and
flink[4] do.

So I'm asking for PMC's help on how to make JIRA links clickable in github.

[1] https://github.com/apache/flink-table-store
[2] https://github.com/apache/flink-ml
[3] https://github.com/apache/flink-statefun
[4] https://github.com/apache/flink

Best,
Jingsong Lee


[jira] [Created] (FLINK-25626) Azure failed due to "Error occurred in starting fork"

2022-01-12 Thread Yun Gao (Jira)
Yun Gao created FLINK-25626:
---

 Summary: Azure failed due to "Error occurred in starting fork"
 Key: FLINK-25626
 URL: https://issues.apache.org/jira/browse/FLINK-25626
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Affects Versions: 1.15.0
Reporter: Yun Gao


{code:java}
2022-01-11T17:00:14.3916937Z Jan 11 17:00:14 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test 
(integration-tests) on project flink-tests: There are test failures.
2022-01-11T17:00:14.3918346Z Jan 11 17:00:14 [ERROR] 
2022-01-11T17:00:14.3919668Z Jan 11 17:00:14 [ERROR] Please refer to 
/__w/2/s/flink-tests/target/surefire-reports for the individual test results.
2022-01-11T17:00:14.3921108Z Jan 11 17:00:14 [ERROR] Please refer to dump files 
(if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
2022-01-11T17:00:14.3922067Z Jan 11 17:00:14 [ERROR] ExecutionException Error 
occurred in starting fork, check output in log
2022-01-11T17:00:14.3923180Z Jan 11 17:00:14 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException Error occurred in starting fork, check output in log
2022-01-11T17:00:14.3924445Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532)
2022-01-11T17:00:14.3925659Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:479)
2022-01-11T17:00:14.3926838Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:322)
2022-01-11T17:00:14.3928106Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266)
2022-01-11T17:00:14.3929409Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
2022-01-11T17:00:14.3930681Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
2022-01-11T17:00:14.3931722Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
2022-01-11T17:00:14.3932849Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
2022-01-11T17:00:14.3934297Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
2022-01-11T17:00:14.3935334Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
2022-01-11T17:00:14.3936483Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
2022-01-11T17:00:14.3937954Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
2022-01-11T17:00:14.3939199Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
2022-01-11T17:00:14.3940419Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
2022-01-11T17:00:14.3941590Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
2022-01-11T17:00:14.3942627Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
2022-01-11T17:00:14.3943549Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
2022-01-11T17:00:14.3944460Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
2022-01-11T17:00:14.3945373Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
2022-01-11T17:00:14.3946231Z Jan 11 17:00:14 [ERROR] at 
org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
2022-01-11T17:00:14.3947072Z Jan 11 17:00:14 [ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-01-11T17:00:14.3948163Z Jan 11 17:00:14 [ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-01-11T17:00:14.3949328Z Jan 11 17:00:14 [ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-01-11T17:00:14.3950312Z Jan 11 17:00:14 [ERROR] at 
java.lang.reflect.Method.invoke(Method.java:498)
2022-01-11T17:00:14.3951268Z Jan 11 17:00:14 [ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
2022-01-11T17:00:14.3952301Z Jan 11 17:00:14 [ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
2022-01-11T17:00:14.3953340Z Jan 11 17:00:14 [ERROR]