[DISCUSS] Introduce The Batch/Stream ExecutionEnvironment in Yarn mode

2020-05-01 Thread Roc Marshal
Hi all.
   Expect to have such a mode of submission. Build the job directly in the 
Environment, and then submit the job in yarn mode. Just like 
RemoteStreamEnvironment, as long as you specify the parameters of the yarn 
cluster (host, port) or yarn configuration directory and HADOOP_USER_NAME, you 
can use the topology built by Env to submit the job .
   This submission method is best to minimize the transmission of resources 
required by yarn to start flink-jobmanager and taskmanagerrunner to ensure that 
flink can deploy job on the yarn cluster as quickly as possible.
The simple demo as shown in  the picture .the parameter named 'env' containes 
all the operators about job ,like sources,maps,etc..


Thank you for your attention.

[jira] [Created] (FLINK-17496) Performance regression with amazon-kinesis-producer 0.13.1 in Flink 1.10.x

2020-05-01 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-17496:


 Summary: Performance regression with amazon-kinesis-producer 
0.13.1 in Flink 1.10.x
 Key: FLINK-17496
 URL: https://issues.apache.org/jira/browse/FLINK-17496
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.10.0
 Environment: The KPL upgrade in 1.10.0 has introduced a performance 
issue, which can be addressed by reverting to 0.12.9 or forward fix with 
0.14.0. 
Reporter: Thomas Weise
Assignee: Thomas Weise






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-web] XBaith edited a comment on pull request #333: [FLINK-17490] Add training page

2020-05-01 Thread GitBox


XBaith edited a comment on pull request #333:
URL: https://github.com/apache/flink-web/pull/333#issuecomment-622439043


   > Hey, David! Thanks a lot for doing the whole integration of the training — 
it's a super valuable resource that indeed deserves more attention.
   > 
   > What about linking this from the "Getting Started" dropdown, instead? I'm 
afraid we're starting to accumulate quite a lot of disparate links in the 
navigation bar. Once the discussion to revamp the landing page kicks in, then 
we could make sure that the self-paced training is properly highlighted in 
there.
   
   Good idea.I also think that putting "Training" in "Get Started" will help 
users quickly and systematically learn Flink



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] XBaith commented on pull request #333: [FLINK-17490] Add training page

2020-05-01 Thread GitBox


XBaith commented on pull request #333:
URL: https://github.com/apache/flink-web/pull/333#issuecomment-622439043


   > Hey, David! Thanks a lot for doing the whole integration of the training — 
it's a super valuable resource that indeed deserves more attention.
   > 
   > What about linking this from the "Getting Started" dropdown, instead? I'm 
afraid we're starting to accumulate quite a lot of disparate links in the 
navigation bar. Once the discussion to revamp the landing page kicks in, then 
we could make sure that the self-paced training is properly highlighted in 
there.
   
   Good idea.I also think that putting "Training" in "Get Started" will help 
users quickly and systematically learn with Flink



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] morsapaes commented on pull request #333: [FLINK-17490] Add training page

2020-05-01 Thread GitBox


morsapaes commented on pull request #333:
URL: https://github.com/apache/flink-web/pull/333#issuecomment-622432180


   Hey, David! Thanks a lot for doing the whole integration of the training — 
it's a super valuable resource that indeed deserves more attention.
   
   What about linking this from the "Getting Started" dropdown, instead? I'm 
afraid we're starting to accumulate quite a lot of disparate links in the 
navigation bar. Once the discussion to revamp the landing page kicks in, then 
we could make sure that the self-paced training is properly highlighted in 
there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-05-01 Thread Austin Cawley-Edwards
Hey,

(Switching to my personal email)

Correct me if I'm wrong, but I think Aljoscha is proposing keeping the
public API as is, and adding some new constructors/ custom deserialization
schemas as was done with Kafka. Here's what I was able to find on that
feature:

* https://issues.apache.org/jira/browse/FLINK-8354
*
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaDeserializationSchema.java
*
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer011.java#L100-L114

Best,
Austin

On Fri, May 1, 2020 at 6:19 AM seneg...@gmail.com 
wrote:

> Hello,
>
> So the proposal is to keep the current RMQSource constructors /  public api
> as is and create new ones that gives more granular parsing ?
>
> Regards,
> Karim Mansour
>
> On Thu, Apr 30, 2020 at 5:23 PM Austin Cawley-Edwards <
> aus...@fintechstudios.com> wrote:
>
> > Hey all + thanks Konstantin,
> >
> > Like mentioned, we also run into issues with the RMQ Source
> inflexibility.
> > I think Aljoscha's idea of supporting both would be a nice way to
> > incorporate new changes without breaking the current API.
> >
> > We'd definitely benefit from the changes proposed here but have another
> > issue with the Correlation ID. When a message gets in the queue without a
> > correlation ID, the source errors and the job cannot recover, requiring
> > (painful) manual intervention. It would be nice to be able to dead-letter
> > these inputs from the source, but I don't think that's possible with the
> > current source interface (don't know too much about the source
> specifics).
> > We might be able to work around this with a custom Correlation ID
> > extractor, as proposed by Karim.
> >
> > Also, if there are other tickets in the RMQ integrations that have gone
> > unmaintained, I'm also happy to chip it at maintaining them!
> >
> > Best,
> > Austin
> > 
> > From: Konstantin Knauf 
> > Sent: Thursday, April 30, 2020 6:14 AM
> > To: dev 
> > Cc: Austin Cawley-Edwards 
> > Subject: Re: [DISCUSS] flink-connector-rabbitmq api changes
> >
> > Hi everyone,
> >
> > just looping in Austin as he mentioned that they also ran into issues due
> > to the inflexibility of the RabiitMQSourcce to me yesterday.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Thu, Apr 30, 2020 at 11:23 AM seneg...@gmail.com > seneg...@gmail.com> mailto:seneg...@gmail.com>>
> wrote:
> > Hello Guys,
> >
> > Thanks for all the responses, i want to stress out that i didn't feel
> > ignored i just thought that i forgot an important step or something.
> >
> > Since i am a newbie i would follow whatever route you guys would suggest
> :)
> > and i agree that the RMQ connector needs a lot of love still "which i
> would
> > be happy to submit gradually"
> >
> > as for the code i have it here in the PR:
> > https://github.com/senegalo/flink/pull/1 it's not that much of a change
> in
> > terms of logic but more of what is exposed.
> >
> > Let me know how you want me to proceed.
> >
> > Thanks again,
> > Karim Mansour
> >
> > On Thu, Apr 30, 2020 at 10:40 AM Aljoscha Krettek  > >
> > wrote:
> >
> > > Hi,
> > >
> > > I think it's good to contribute the changes to Flink directly since we
> > > already have the RMQ connector in the respository.
> > >
> > > I would propose something similar to the Kafka connector, which takes
> > > both the generic DeserializationSchema and a KafkaDeserializationSchema
> > > that is specific to Kafka and allows access to the ConsumerRecord and
> > > therefore all the Kafka features. What do you think about that?
> > >
> > > Best,
> > > Aljoscha
> > >
> > > On 30.04.20 10:26, Robert Metzger wrote:
> > > > Hey Karim,
> > > >
> > > > I'm sorry that you had such a bad experience contributing to Flink,
> > even
> > > > though you are nicely following the rules.
> > > >
> > > > You mentioned that you've implemented the proposed change already.
> > Could
> > > > you share a link to a branch here so that we can take a look? I can
> > > assess
> > > > the API changes easier if I see them :)
> > > >
> > > > Thanks a lot!
> > > >
> > > >
> > > > Best,
> > > > Robert
> > > >
> > > > On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz <
> > dwysakow...@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Hi Karim,
> > > >>
> > > >> Sorry you did not have the best first time experience. You certainly
> > did
> > > >> everything right which I definitely appreciate.
> > > >>
> > > >> The problem in that particular case, as I see it, is that RabbitMQ
> is
> > > >> not very actively maintained and therefore it is not easy too find a
> > > >> committer willing to take on this topic. The point of connectors not
> > > >> being properly maintained was raised a few times in the past on the
> > ML.
> > > >> One of 

Re: "[VOTE] FLIP-108: edit the Public API"

2020-05-01 Thread Yu Li
+1

I could see there's a thorough discussion and the solution looks good.
Thanks for driving this Yangze.

Best Regards,
Yu


On Fri, 1 May 2020 at 21:34, Till Rohrmann  wrote:

> Thanks for updating the FLIP Yangze.
>
> +1 (binding)
>
> for the update.
>
> Cheers,
> Till
>
> On Thu, Apr 30, 2020 at 4:34 PM Yangze Guo  wrote:
>
> > Hi, there.
> >
> > The "FLIP-108: Add GPU support in Flink"[1] is now working in
> > progress. However, we met problems regarding class loader and
> > dependency. For more details, you could look at the discussion[2]. The
> > discussion thread is now converged and the solution is changing the
> > RuntimeContext#getExternalResourceInfos, let it return
> > ExternalResourceInfo and adding methods to ExternalResourceInfo
> > interface.
> >
> > Since the solution involves changes in the Public API. We'd like to
> > start a voting thread for it.
> >
> > The proposed change is:
> >
> > ```
> > public interface RuntimeContext {
> > /**
> >  * Get the specific external resource information by the
> resourceName.
> >  */
> > Set getExternalResourceInfos(String
> > resourceName);
> > }
> > ```
> >
> > ```
> > public interface ExternalResourceInfo {
> >   String getProperty(String key);
> >   Collection getKeys();
> > }
> > ```
> >
> > The vote will be open for at least 72 hours. Unless there is an
> objection,
> > I will try to close it by May 4, 2020 14:00 UTC if we have received
> > sufficient votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-108-Problems-regarding-the-class-loader-and-dependency-td40893.html
> >
> > Best,
> > Yangze Guo
> >
>


[jira] [Created] (FLINK-17495) Add custom labels on PrometheusReporter like PrometheusPushGatewayReporter's groupingKey

2020-05-01 Thread jinhai (Jira)
jinhai created FLINK-17495:
--

 Summary: Add custom labels on PrometheusReporter like 
PrometheusPushGatewayReporter's groupingKey
 Key: FLINK-17495
 URL: https://issues.apache.org/jira/browse/FLINK-17495
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: jinhai


We need to add some custom labels on Prometheus, so we can query by them.

Now we can add jobName\groupingKey to PrometheusPushGatewayReporter in version 
1.10, but not in PrometheusReporter.

Can we add AbstractPrometheusReporter#addDimension method to support this, so 
they will be no differences except for the metrics exposing mechanism 
pulling/pushing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17494) Possible direct memory leak in cassandra sink

2020-05-01 Thread nobleyd (Jira)
nobleyd created FLINK-17494:
---

 Summary: Possible direct memory leak in cassandra sink
 Key: FLINK-17494
 URL: https://issues.apache.org/jira/browse/FLINK-17494
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Cassandra
Affects Versions: 1.10.0, 1.9.3
Reporter: nobleyd


# Cassandra Sink use direct memorys.
 # Start a standalone cluster(1 machines) for test.
 # After the cluster started, check the flink web-ui, and record the task 
manager's memory info. I mean the direct memory part info.
 # Start a job which read from kafka and write to cassandra using the cassandra 
sink, and you can see that the direct memory count in 'Outside JVM' part go up.
 # Stop the job, and the direct memory count is not decreased(using 'jmap 
-histo:live pid' to make the task manager gc).
 # Repeat serveral times, the direct memory count will be more and more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: "[VOTE] FLIP-108: edit the Public API"

2020-05-01 Thread Till Rohrmann
Thanks for updating the FLIP Yangze.

+1 (binding)

for the update.

Cheers,
Till

On Thu, Apr 30, 2020 at 4:34 PM Yangze Guo  wrote:

> Hi, there.
>
> The "FLIP-108: Add GPU support in Flink"[1] is now working in
> progress. However, we met problems regarding class loader and
> dependency. For more details, you could look at the discussion[2]. The
> discussion thread is now converged and the solution is changing the
> RuntimeContext#getExternalResourceInfos, let it return
> ExternalResourceInfo and adding methods to ExternalResourceInfo
> interface.
>
> Since the solution involves changes in the Public API. We'd like to
> start a voting thread for it.
>
> The proposed change is:
>
> ```
> public interface RuntimeContext {
> /**
>  * Get the specific external resource information by the resourceName.
>  */
> Set getExternalResourceInfos(String
> resourceName);
> }
> ```
>
> ```
> public interface ExternalResourceInfo {
>   String getProperty(String key);
>   Collection getKeys();
> }
> ```
>
> The vote will be open for at least 72 hours. Unless there is an objection,
> I will try to close it by May 4, 2020 14:00 UTC if we have received
> sufficient votes.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-108-Problems-regarding-the-class-loader-and-dependency-td40893.html
>
> Best,
> Yangze Guo
>


[jira] [Created] (FLINK-17493) Possible direct memory leak in cassandra sink

2020-05-01 Thread nobleyd (Jira)
nobleyd created FLINK-17493:
---

 Summary: Possible direct memory leak in cassandra sink
 Key: FLINK-17493
 URL: https://issues.apache.org/jira/browse/FLINK-17493
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Cassandra
Affects Versions: 1.10.0, 1.9.3
Reporter: nobleyd


# Cassandra Sink use direct memorys.
 # Start a standalone cluster(1 machines) for test.
 # After the cluster started, check the flink web-ui, and record the task 
manager's memory info. I mean the direct memory part info.
 # Start a job which read from kafka and write to cassandra using the cassandra 
sink, and you can see that the direct memory count in 'Outside JVM' part go up.
 # Stop the job, and the direct memory count is not decreased(using 'jmap 
-histo:live pid' to make the task manager gc).
 # Repeat serveral times, the direct memory count will be more and more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17492) Possible direct memory leak in cassandra sink

2020-05-01 Thread nobleyd (Jira)
nobleyd created FLINK-17492:
---

 Summary: Possible direct memory leak in cassandra sink
 Key: FLINK-17492
 URL: https://issues.apache.org/jira/browse/FLINK-17492
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Cassandra
Affects Versions: 1.10.0, 1.9.3
Reporter: nobleyd


# Cassandra Sink use direct memorys.
 # Start a standalone cluster(1 machines) for test.
 # After the cluster started, check the flink web-ui, and record the task 
manager's memory info. I mean the direct memory part info.
 # Start a job which read from kafka and write to cassandra using the cassandra 
sink, and you can see that the direct memory count in 'Outside JVM' part go up.
 # Stop the job, and the direct memory count is not decreased(using 'jmap 
-histo:live pid' to make the task manager gc).
 # Repeat serveral times, the direct memory count will be more and more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [jira] [Created] (FLINK-17491) Translate Training page on project website

2020-05-01 Thread Ben Chen
It seems that the link of the md file needs to be updated.

On Fri, May 1, 2020 at 7:10 AM David Anderson (Jira) 
wrote:

> David Anderson created FLINK-17491:
> --
>
>  Summary: Translate Training page on project website
>  Key: FLINK-17491
>  URL: https://issues.apache.org/jira/browse/FLINK-17491
>  Project: Flink
>   Issue Type: Improvement
>   Components: chinese-translation, Project Website
> Reporter: David Anderson
>
>
> Translate the training page for the project website to Chinese. The file
> is training.zh.md.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


[VOTE] Release 1.10.1, release candidate #2

2020-05-01 Thread Yu Li
Hi everyone,

Please review and vote on the release candidate #2 for version 1.10.1, as
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint D8D3D42E84C753CA5F170BDF93C07902771AB743 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.10.1-rc2" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Yu

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346891
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.1-rc2/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1363/
[5]
https://github.com/apache/flink/commit/f92e8a9d60ef664acd66230da43c6f0a1cd87adc
[6] https://github.com/apache/flink-web/pull/330


[GitHub] [flink-web] alpinegizmo commented on pull request #333: [FLINK-17490] Add training page

2020-05-01 Thread GitBox


alpinegizmo commented on pull request #333:
URL: https://github.com/apache/flink-web/pull/333#issuecomment-622351679


   I've created https://issues.apache.org/jira/browse/FLINK-17491 for the 
Chinese translation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] alpinegizmo opened a new pull request #333: [FLINK-17490] Add training page

2020-05-01 Thread GitBox


alpinegizmo opened a new pull request #333:
URL: https://github.com/apache/flink-web/pull/333


   Now that the documentation has a training section, it would be good to help 
folks find it by promoting it from the project website.
   
   This adds training.md and training.zh.md, and adds a Training entry to the 
site navigation.
   
   
![image](https://user-images.githubusercontent.com/43608/80802203-ffc6a400-8bae-11ea-927a-2f2db8b79028.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-17491) Translate Training page on project website

2020-05-01 Thread David Anderson (Jira)
David Anderson created FLINK-17491:
--

 Summary: Translate Training page on project website
 Key: FLINK-17491
 URL: https://issues.apache.org/jira/browse/FLINK-17491
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Project Website
Reporter: David Anderson


Translate the training page for the project website to Chinese. The file is 
training.zh.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17490) Add Training page to project website

2020-05-01 Thread David Anderson (Jira)
David Anderson created FLINK-17490:
--

 Summary: Add Training page to project website
 Key: FLINK-17490
 URL: https://issues.apache.org/jira/browse/FLINK-17490
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: David Anderson
Assignee: David Anderson


Now that the documentation has a training section, it would be good to help 
folks find it by promoting it from the project website.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-05-01 Thread seneg...@gmail.com
Hello,

So the proposal is to keep the current RMQSource constructors /  public api
as is and create new ones that gives more granular parsing ?

Regards,
Karim Mansour

On Thu, Apr 30, 2020 at 5:23 PM Austin Cawley-Edwards <
aus...@fintechstudios.com> wrote:

> Hey all + thanks Konstantin,
>
> Like mentioned, we also run into issues with the RMQ Source inflexibility.
> I think Aljoscha's idea of supporting both would be a nice way to
> incorporate new changes without breaking the current API.
>
> We'd definitely benefit from the changes proposed here but have another
> issue with the Correlation ID. When a message gets in the queue without a
> correlation ID, the source errors and the job cannot recover, requiring
> (painful) manual intervention. It would be nice to be able to dead-letter
> these inputs from the source, but I don't think that's possible with the
> current source interface (don't know too much about the source specifics).
> We might be able to work around this with a custom Correlation ID
> extractor, as proposed by Karim.
>
> Also, if there are other tickets in the RMQ integrations that have gone
> unmaintained, I'm also happy to chip it at maintaining them!
>
> Best,
> Austin
> 
> From: Konstantin Knauf 
> Sent: Thursday, April 30, 2020 6:14 AM
> To: dev 
> Cc: Austin Cawley-Edwards 
> Subject: Re: [DISCUSS] flink-connector-rabbitmq api changes
>
> Hi everyone,
>
> just looping in Austin as he mentioned that they also ran into issues due
> to the inflexibility of the RabiitMQSourcce to me yesterday.
>
> Cheers,
>
> Konstantin
>
> On Thu, Apr 30, 2020 at 11:23 AM seneg...@gmail.com seneg...@gmail.com> mailto:seneg...@gmail.com>> wrote:
> Hello Guys,
>
> Thanks for all the responses, i want to stress out that i didn't feel
> ignored i just thought that i forgot an important step or something.
>
> Since i am a newbie i would follow whatever route you guys would suggest :)
> and i agree that the RMQ connector needs a lot of love still "which i would
> be happy to submit gradually"
>
> as for the code i have it here in the PR:
> https://github.com/senegalo/flink/pull/1 it's not that much of a change in
> terms of logic but more of what is exposed.
>
> Let me know how you want me to proceed.
>
> Thanks again,
> Karim Mansour
>
> On Thu, Apr 30, 2020 at 10:40 AM Aljoscha Krettek  >
> wrote:
>
> > Hi,
> >
> > I think it's good to contribute the changes to Flink directly since we
> > already have the RMQ connector in the respository.
> >
> > I would propose something similar to the Kafka connector, which takes
> > both the generic DeserializationSchema and a KafkaDeserializationSchema
> > that is specific to Kafka and allows access to the ConsumerRecord and
> > therefore all the Kafka features. What do you think about that?
> >
> > Best,
> > Aljoscha
> >
> > On 30.04.20 10:26, Robert Metzger wrote:
> > > Hey Karim,
> > >
> > > I'm sorry that you had such a bad experience contributing to Flink,
> even
> > > though you are nicely following the rules.
> > >
> > > You mentioned that you've implemented the proposed change already.
> Could
> > > you share a link to a branch here so that we can take a look? I can
> > assess
> > > the API changes easier if I see them :)
> > >
> > > Thanks a lot!
> > >
> > >
> > > Best,
> > > Robert
> > >
> > > On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > > wrote:
> > >
> > >> Hi Karim,
> > >>
> > >> Sorry you did not have the best first time experience. You certainly
> did
> > >> everything right which I definitely appreciate.
> > >>
> > >> The problem in that particular case, as I see it, is that RabbitMQ is
> > >> not very actively maintained and therefore it is not easy too find a
> > >> committer willing to take on this topic. The point of connectors not
> > >> being properly maintained was raised a few times in the past on the
> ML.
> > >> One of the ideas how to improve the situation there was to start a
> > >> https://flink-packages.org/ page. The idea is to ask active users of
> > >> certain connectors to maintain those connectors outside of the core
> > >> project, while giving them a platform within the community where they
> > >> can make their modules visible. That way it is possible to overcome
> the
> > >> lack of capabilities within the core committers without loosing much
> on
> > >> the visibility.
> > >>
> > >> I would kindly ask you to consider that path, if you are interested.
> You
> > >> can of course also wait/reach out to more committers if you feel
> strong
> > >> about contributing those changes back to the Flink repository itself.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 30/04/2020 07:29, seneg...@gmail.com
> wrote:
> > >>> Hello,
> > >>>
> > >>> I am new to the mailing list and to contributing in Big opensource
> > >> projects
> > >>> in general and i don't know if i did 

[jira] [Created] (FLINK-17489) Support any kind of array in StringUtils.arrayAwareToString()

2020-05-01 Thread Timo Walther (Jira)
Timo Walther created FLINK-17489:


 Summary: Support any kind of array in 
StringUtils.arrayAwareToString()
 Key: FLINK-17489
 URL: https://issues.apache.org/jira/browse/FLINK-17489
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Reporter: Timo Walther
Assignee: Timo Walther


{{StringUtils.arrayAwareToString()}} is a very basic implementation for 
one-level nested arrays. However, such a prominent utility class should support 
all kinds of nested arrays. Both FLINK-16817 and FLINK-17175 tried to improve 
the implementation but the utility still does not support even {{int[][]}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Hierarchies in ConfigOption

2020-05-01 Thread Timo Walther

Hi Jark,

yes, in theory every connector can design options as they like. But for 
user experience and good coding style we should be consistent in Flink 
connectors and configuration. Because implementers of new connectors 
will copy the design of existing ones.


Furthermore, I could image that people in the DataStream API would also 
like to configure their connector based on options in the near future. 
It might be the case that Flink DataStream API connectors will reuse the 
ConfigOptions from Table API for consistency.


I'm favoring either:

format.kind = json
format.fail-on-missing-field: true

Or:

format = json
json.fail-on-missing-field: true

Both are valid hierarchies.

Regards,
Timo


On 30.04.20 17:57, Jark Wu wrote:

Hi Dawid,

I just want to mention one of your response,


What you described with
'format' = 'csv',
'csv.allow-comments' = 'true',
'csv.ignore-parse-errors' = 'true'
would not work though as the `format` prefix is mandatory in the sources

as only the properties with format

  will be passed to the format factory in majority of cases. We already

have some implicit contracts.

IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
by connectors, not the framework.
So I custom connector can define above properties, and extract the value of
'format', i.e. 'csv', to find the format factory.
And extract the properties with `csv.` prefix and remove the prefix, and
pass the properties (e.g. 'allow-comments' = 'true')
into the format factory to create format.

So there is no a strict guarantee to have a "nested JSON style" properties.
Users can still develop a custom connector with this
un-hierarchy properties and works well.

'format' = 'json',
'format.fail-on-missing-field' = 'false'

Best,
Jark


On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz 
wrote:


Hi all,

I'd like to start with a comment that I am ok with the current state of
the FLIP-122 if there is a strong preference for it. Nevertheless I still
like the idea of adding `type` to the `format` to have it as `format.type`
= `json`.

I wanted to clarify a few things though:

@Jingsong As far as I see it most of the users copy/paste the properties
from the documentation to the SQL, so I don't think additional four
characters are too cumbersome. Plus if you force the additional suffix onto
all the options of a format you introduce way more boilerplate than if we
added the `type/kind/name`

@Kurt I agree that we cannot force it, but I think it is more of a
question to set standards/implicit contracts on the properties. What you
described with
'format' = 'csv',
'csv.allow-comments' = 'true',
'csv.ignore-parse-errors' = 'true'

would not work though as the `format` prefix is mandatory in the sources
as only the properties with format will be passed to the format factory in
majority of cases. We already have some implicit contracts.

@Forward I did not necessarily get the example. Aren't json and bson two
separate formats? Do you mean you can have those two at the same time? Why
do you need to differentiate the options for each? The way I see it is:

‘format(.name)' = 'json',
‘format.fail-on-missing-field' = 'false'

or

‘format(.name)' = 'bson',
‘format.fail-on-missing-field' = 'false'

@Benchao I'd be fine with any of name, kind, type(this we already had in
the past)

Best,
Dawid

On 30/04/2020 04:17, Forward Xu wrote:

Here I have a little doubt. At present, our json only supports the
conventional json format. If we need to implement json with bson, json with
avro, etc., how should we express it?
Do you need like the following:

‘format.name' = 'json',

‘format.json.fail-on-missing-field' = 'false'


‘format.name' = 'bson',

‘format.bson.fail-on-missing-field' = ‘false'


Best,

Forward

Benchao Li   于2020年4月30日周四 上午9:58写道:


Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like
json/yaml.

 From the user's perspective, I don't use table configs from a config file
like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that
this could be a yaml like
configuration hierarchy. If it has a hierarchy, we maybe consider that in
the future to load the
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top
of my head:
'format.name', WDYT?

Dawid Wysakowicz   
于2020年4月29日周三 下午11:56写道:


Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink
cluster I would be a strong advocate for keeping a yaml/hocon/json/...
compatible style. Those options are primarily read from a file and thus
should at least try to follow common practices for nested formats if we
ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The
origin/destination of these usually will be external catalog, usually in

a

flattened(key/value) representation so I agree it is not as 

[jira] [Created] (FLINK-17488) JdbcSink has to support setting auto-commit mode of DB

2020-05-01 Thread Khokhlov Pavel (Jira)
Khokhlov Pavel created FLINK-17488:
--

 Summary: JdbcSink has to support setting auto-commit mode of DB
 Key: FLINK-17488
 URL: https://issues.apache.org/jira/browse/FLINK-17488
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Affects Versions: 1.10.0
Reporter: Khokhlov Pavel


Just played with new
{noformat}
org.apache.flink.api.java.io.jdbc.JdbcSink{noformat}
({{1.11-SNAPSHOT)}}

And batch mode with mysql driver.

Noticed that *JdbcSink* supports only *autoCommit true* and developer cannot 
change that behaviour. But it's very important from Transactional and 
Performance point of view to support autoCommit {color:#00875a}*false* 
{color:#172b4d}and call commit explicitly. ** {color}{color}

 ** When a connection is created, it is in auto-commit mode. This means that 
each individual SQL statement is treated as a transaction and is automatically 
committed right after it is executed.

For example Confluent connector disable it by default.

https://github.com/confluentinc/kafka-connect-jdbc/blob/da9619af1d7442dd91793dbc4dc65b8e7414e7b5/src/main/java/io/confluent/connect/jdbc/sink/JdbcDbWriter.java#L50

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)