[jira] [Assigned] (FLINK-3370) Add an aligned version of the window operator

2021-04-17 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen reassigned FLINK-3370:
---

Assignee: (was: Stephan Ewen)

> Add an aligned version of the window operator
> -
>
> Key: FLINK-3370
> URL: https://issues.apache.org/jira/browse/FLINK-3370
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Priority: Major
>  Labels: stale-assigned
>
> The windowing operators currently follow a generic implementation for support 
> of unaligned windows.
> We can gain efficiency by creating a variant that is optimized for aligned 
> windows:
>   - Aligned windows can use aligned triggers, which keep no per-key state
>   - Less trigger state means less checkpointing data
>   - Based on the aligned windows, we can create sliding event time windows 
> that do not replicate data into the different overlapping windows



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-3370) Add an aligned version of the window operator

2021-04-17 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-3370.
---
Resolution: Abandoned

> Add an aligned version of the window operator
> -
>
> Key: FLINK-3370
> URL: https://issues.apache.org/jira/browse/FLINK-3370
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Priority: Major
>  Labels: stale-assigned
>
> The windowing operators currently follow a generic implementation for support 
> of unaligned windows.
> We can gain efficiency by creating a variant that is optimized for aligned 
> windows:
>   - Aligned windows can use aligned triggers, which keep no per-key state
>   - Less trigger state means less checkpointing data
>   - Based on the aligned windows, we can create sliding event time windows 
> that do not replicate data into the different overlapping windows



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-1101) Make memory management adaptive

2021-04-17 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1101.
---
Resolution: Won't Fix

A different solution has been implemented in the meantime. Instead of making 
the memory management adaptive, batch jobs scheduler smaller units (pipelined 
regions) at a time and configure more precisely the memory needs of the 
deployed tasks.

> Make memory management adaptive
> ---
>
> Key: FLINK-1101
> URL: https://issues.apache.org/jira/browse/FLINK-1101
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 0.7.0-incubating
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
>  Labels: stale-assigned
>
> I suggest to rework the memory management.
> Right now, it works the following way: When the program is submitted, it is 
> checked how many memory consuming operations happen (sort, hash, explicit 
> cache, ... ) Each one is assigned a static relative memory fraction, which 
> the taskmanager provides.
> This is a very conservative planning and mostly due to the fact that with the 
> streaming runtime, we may have all operations running concurrently. But in 
> fact we mostly have not and are therefore wasting memory by being too 
> conservative.
> To make the most of the available memory, I suggest to make the management 
> adaptive:
>   - Operators need to be able to request memory bit by bit
>   - Operators need to be able to release memory on request. The sorter  / 
> hash table / cache do this naturally by spilling.
>   - Memory has to be redistributed between operators when new requesters come.
> This also plays nicely with the idea of leaving all non-assigned memory to 
> intermediate results, to allow for maximum caching of historic intermediate 
> results.
> This would solve [FLINK-170] and [FLINK-84].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-1272) Add a "reduceWithKey" function

2021-04-17 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1272.
---
Resolution: Abandoned

Closing due to long inactivity.

> Add a "reduceWithKey" function
> --
>
> Key: FLINK-1272
> URL: https://issues.apache.org/jira/browse/FLINK-1272
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataSet
>Reporter: Stephan Ewen
>Assignee: Fabian Hueske
>Priority: Major
>  Labels: stale-assigned
>
> Flink does not assume a key/value model for grouping/aggregating/joining. The 
> keys are specified as positions or paths of the objects to be grouped/joined.
> Currently, we do not expose the key in the {{ReduceFunction}} and 
> {{GroupReduceFunction}}, bit give (iterators over) the objects themselves.
> Since it is a common case to access the key, I suggest to add a convenience 
> function {{GroupReduceWithKey}} that has the following signature and can be 
> called as follows:
> {code}
> public interface GroupReduceWithKeyFunction {
> void reduceGroup(KEY key, Iterable value, Collector out);
> }
> {code}
> Scala:
> {code}
> val  data : DataSet[SomePOJO] = ...
> data
>   .groupBy("id")
>   .reduceGroup( (key, value, out : Collector[(String, Long)]) =>
> out.collect( (key, values.minBy(_.timestamp) ) );
> {code}
> Java:
> {code}
> DataSet data = ...
> data
>   .groupBy("id")
>   .reduceGroup(
>   new GroupReduceWithKeyFunction> {
>   ...
>   }
> {code}
> The sae 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] birbilis opened a new pull request #226: Update java.md

2021-04-17 Thread GitBox


birbilis opened a new pull request #226:
URL: https://github.com/apache/flink-statefun/pull/226


   Changed Python mention to Java, probably left-over from python doc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-1269) Easy way to "group count" dataset

2021-04-17 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1269.
---
Resolution: Abandoned

Closing this as abandoned due to long inactivity.

> Easy way to "group count" dataset
> -
>
> Key: FLINK-1269
> URL: https://issues.apache.org/jira/browse/FLINK-1269
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataSet
>Affects Versions: 0.7.0-incubating
>Reporter: Sebastian Schelter
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: stale-assigned
>
> Flink should offer an easy way to group datasets and compute the sizes of the 
> resulting groups. This is one of the most essential operations in distributed 
> processing, yet it is very hard to implement in Flink.
> I assume it could be a show-stopper for people trying Flink, because at the 
> moment, users have to perform the grouping and then write a groupReduce that 
> counts the tuples in the group and extracts the group key at the same time.
> Here is what I would semantically expect to happen:
> {noformat}
> def groupCount[T, K](data: DataSet[T], extractKey: (T) => K): DataSet[(K, 
> Long)] = {
> data.groupBy { extractKey }
> .reduceGroup { group => countBy(extractKey, group) }
>   }
>   private[this] def countBy[T, K](extractKey: T => K,
>   group: Iterator[T]): (K, Long) = {
> val key = extractKey(group.next())
> var count = 1L
> while (group.hasNext) {
>   group.next()
>   count += 1
> }
> key -> count
>   }
> {noformat}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15652: [FLINK-22305][network] Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15652:
URL: https://github.com/apache/flink/pull/15652#issuecomment-821782680


   
   ## CI report:
   
   * 38caca4f7b38bbc5811cd9ba5e890243e1709ce8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16702)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15483: [FLINK-22092][hive] Ignore static conf file URLs in HiveConf

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15483:
URL: https://github.com/apache/flink/pull/15483#issuecomment-812520117


   
   ## CI report:
   
   * d635372648a9d043a3526d7122eae3099d7f0ede Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16068)
 
   * b91b371f5b6a9b522c2478ec56dfe48231c9c976 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16705)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-2646) User functions should be able to differentiate between successful close and erroneous close

2021-04-17 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2646.
---
Resolution: Staged

Closing this issue, as this has been inactive for long and there needs to be 
solution to this as part of FLIP-147: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished

> User functions should be able to differentiate between successful close and 
> erroneous close
> ---
>
> Key: FLINK-2646
> URL: https://issues.apache.org/jira/browse/FLINK-2646
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Affects Versions: 0.10.0
>Reporter: Stephan Ewen
>Priority: Major
>  Labels: usability
>
> Right now, the {{close()}} method of rich functions is invoked in case of 
> proper completion, and in case of canceling in case of error (to allow for 
> cleanup).
> In certain cases, the user function needs to know why it is closed, whether 
> the task completed in a regular fashion, or was canceled/failed.
> I suggest to add a method {{closeAfterFailure()}} to the {{RichFunction}}. By 
> default, this method calls {{close()}}. The runtime is the changed to call 
> {{close()}} as part of the regular execution and {{closeAfterFailure()}} in 
> case of an irregular exit.
> Because by default all cases call {{close()}} the change would not be API 
> breaking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15483: [FLINK-22092][hive] Ignore static conf file URLs in HiveConf

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15483:
URL: https://github.com/apache/flink/pull/15483#issuecomment-812520117


   
   ## CI report:
   
   * d635372648a9d043a3526d7122eae3099d7f0ede Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16068)
 
   * b91b371f5b6a9b522c2478ec56dfe48231c9c976 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15653: [FLINK-22329] Inject current ugi credentials into jobconf when getting file split in hive connector

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15653:
URL: https://github.com/apache/flink/pull/15653#issuecomment-821801891


   
   ## CI report:
   
   * 41dbfa9649149d9616337cccf4fc8f6c6f59a9c5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16704)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zuston commented on pull request #15653: [FLINK-22329] Inject current ugi credentials into jobconf when getting file split in hive connector

2021-04-17 Thread GitBox


zuston commented on pull request #15653:
URL: https://github.com/apache/flink/pull/15653#issuecomment-821802718


   Do you have time to review it? @wuchong @lirui-apache . Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasource

2021-04-17 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22329:
-
Description: 
Related Flink code: 
[https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]

 

In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
getSplits}} method. related hadoop code is 
[here|https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L426].
 Simple code is as follows
{code:java}
// Hadoop FileInputFormat

public InputSplit[] getSplits(JobConf job, int numSplits)
  throws IOException {
  StopWatch sw = new StopWatch().start();
  FileStatus[] stats = listStatus(job);

 
  ..
}


protected FileStatus[] listStatus(JobConf job) throws IOException {
  Path[] dirs = getInputPaths(job);
  if (dirs.length == 0) {
throw new IOException("No input paths specified in job");
  }

  // get tokens for all the required FileSystems..
  TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job);
  
  // Whether we need to recursive look into the directory structure

  ..
}
{code}
 

In {{listStatus}} method, it will obtain delegation tokens by calling  
{{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will give 
up to get delegation tokens when credentials in jobconf.

So it's neccessary to inject current ugi credentials into jobconf.

 

Besides, when Flink support delegation tokens directly without keytab([refer to 
this PR|https://issues.apache.org/jira/browse/FLINK-21700]), 
{{TokenCache.obtainTokensForNamenodes}} will failed  without this patch because 
of no corresponding credentials.

 

 

 

 

  was:
Related Flink code: 
[https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]

 

In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
getSplits}} method. related hadoop code is here. Simple code is as follows
{code:java}
// Hadoop FileInputFormat

public InputSplit[] getSplits(JobConf job, int numSplits)
  throws IOException {
  StopWatch sw = new StopWatch().start();
  FileStatus[] stats = listStatus(job);

 
  ..
}


protected FileStatus[] listStatus(JobConf job) throws IOException {
  Path[] dirs = getInputPaths(job);
  if (dirs.length == 0) {
throw new IOException("No input paths specified in job");
  }

  // get tokens for all the required FileSystems..
  TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job);
  
  // Whether we need to recursive look into the directory structure

  ..
}
{code}
 

In {{listStatus}} method, it will obtain delegation tokens by calling  
{{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will give 
up to get delegation tokens when credentials in jobconf.

So it's neccessary to inject current ugi credentials into jobconf.

 

Besides, when Flink support delegation tokens directly without keytab([refer to 
this PR|https://issues.apache.org/jira/browse/FLINK-21700]), 
{{TokenCache.obtainTokensForNamenodes}} will failed  without this patch because 
of no corresponding credentials.

 

 

 

 


> Missing crendentials in jobconf causes repeated authentication in Hive 
> datasource
> -
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Related Flink code: 
> [https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]
>  
> In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
> getSplits}} method. related hadoop code is 
> [here|https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L426].
>  Simple code is as follows
> {code:java}
> // Hadoop FileInputFormat
> public InputSplit[] getSplits(JobConf job, int numSplits)
>   throws IOException {
>   StopWatch sw = new StopWatch().start();
>   FileStatus[] stats = listStatus(job);
>  
>   ..
> }
> protected FileStatus[] listStatus(JobConf job) throws IOException {
>   Path[] 

[jira] [Updated] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasource

2021-04-17 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22329:
-
Description: 
Related Flink code: 
[https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]

 

In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
getSplits}} method. related hadoop code is here. Simple code is as follows
{code:java}
// Hadoop FileInputFormat

public InputSplit[] getSplits(JobConf job, int numSplits)
  throws IOException {
  StopWatch sw = new StopWatch().start();
  FileStatus[] stats = listStatus(job);

 
  ..
}


protected FileStatus[] listStatus(JobConf job) throws IOException {
  Path[] dirs = getInputPaths(job);
  if (dirs.length == 0) {
throw new IOException("No input paths specified in job");
  }

  // get tokens for all the required FileSystems..
  TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job);
  
  // Whether we need to recursive look into the directory structure

  ..
}
{code}
 

In {{listStatus}} method, it will obtain delegation tokens by calling  
{{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will give 
up to get delegation tokens when credentials in jobconf.

So it's neccessary to inject current ugi credentials into jobconf.

 

Besides, when Flink support delegation tokens directly without keytab([refer to 
this PR|https://issues.apache.org/jira/browse/FLINK-21700]), 
{{TokenCache.obtainTokensForNamenodes}} will failed  without this patch because 
of no corresponding credentials.

 

 

 

 

> Missing crendentials in jobconf causes repeated authentication in Hive 
> datasource
> -
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Related Flink code: 
> [https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]
>  
> In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
> getSplits}} method. related hadoop code is here. Simple code is as follows
> {code:java}
> // Hadoop FileInputFormat
> public InputSplit[] getSplits(JobConf job, int numSplits)
>   throws IOException {
>   StopWatch sw = new StopWatch().start();
>   FileStatus[] stats = listStatus(job);
>  
>   ..
> }
> protected FileStatus[] listStatus(JobConf job) throws IOException {
>   Path[] dirs = getInputPaths(job);
>   if (dirs.length == 0) {
> throw new IOException("No input paths specified in job");
>   }
>   // get tokens for all the required FileSystems..
>   TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job);
>   
>   // Whether we need to recursive look into the directory structure
>   ..
> }
> {code}
>  
> In {{listStatus}} method, it will obtain delegation tokens by calling  
> {{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will 
> give up to get delegation tokens when credentials in jobconf.
> So it's neccessary to inject current ugi credentials into jobconf.
>  
> Besides, when Flink support delegation tokens directly without keytab([refer 
> to this PR|https://issues.apache.org/jira/browse/FLINK-21700]), 
> {{TokenCache.obtainTokensForNamenodes}} will failed  without this patch 
> because of no corresponding credentials.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15653: [FLINK-22329] Inject current ugi credentials into jobconf when getting file split in hive connector

2021-04-17 Thread GitBox


flinkbot commented on pull request #15653:
URL: https://github.com/apache/flink/pull/15653#issuecomment-821801891


   
   ## CI report:
   
   * 41dbfa9649149d9616337cccf4fc8f6c6f59a9c5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur commented on pull request #15636: [FLINK-22239][jdbc] Support exactly-once (sink) for PgSql and MySql

2021-04-17 Thread GitBox


curcur commented on pull request #15636:
URL: https://github.com/apache/flink/pull/15636#issuecomment-821800955


   Hey, @rkhachatryan , I've done with the review.
   
   1. Overall the logic is pretty clear, please refer to a few of my comments.
   2. The exactly-once end2end test is too simple, it only includes a test for 
insert, at least add one failure recovery test?
   3. Address the connection closing issue found yesterday on top of 
"https://github.com/apache/flink/pull/15627; 
   4. I think this PR might also fix the problem a user reported when 
connecting to Oracle, please double-check (FLINK-22311)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15653: [FLINK-22329] Inject current ugi credentials into jobconf when getting file split in hive connector

2021-04-17 Thread GitBox


flinkbot commented on pull request #15653:
URL: https://github.com/apache/flink/pull/15653#issuecomment-821800733


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 41dbfa9649149d9616337cccf4fc8f6c6f59a9c5 (Sat Apr 17 
10:14:12 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-22329).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur commented on a change in pull request #15636: [FLINK-22239][jdbc] Support exactly-once (sink) for PgSql and MySql

2021-04-17 Thread GitBox


curcur commented on a change in pull request #15636:
URL: https://github.com/apache/flink/pull/15636#discussion_r615231118



##
File path: 
flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/xa/XaFacadePoolingImpl.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.jdbc.xa;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.util.function.ThrowingConsumer;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import javax.transaction.xa.Xid;
+
+import java.io.Serializable;
+import java.sql.Connection;
+import java.sql.SQLException;
+import java.util.Collection;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.Map;
+import java.util.function.Supplier;
+
+import static org.apache.flink.util.ExceptionUtils.rethrow;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A "pooling" implementation of {@link XaFacade}. Some database implement XA 
such that one
+ * connection is limited to a single transaction. As a workaround, this 
implementation creates a new
+ * XA resource after each xa_prepare call is made (and associates the current 
one with the xid to
+ * commit later).
+ */
+@Internal
+class XaFacadePoolingImpl implements XaFacade {
+private static final long serialVersionUID = 1L;
+
+public interface FacadeSupplier extends Serializable, Supplier {}
+
+private static final transient Logger LOG = 
LoggerFactory.getLogger(XaFacadePoolingImpl.class);
+private final FacadeSupplier facadeSupplier;
+private transient XaFacade active;
+private transient Map prepared;
+private transient Deque pooled;

Review comment:
   make these two final and initialize them in the constructor instead?

##
File path: 
flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/xa/JdbcXaSinkFunction.java
##
@@ -286,12 +287,15 @@ private void prepareCurrentTx(long checkpointId) throws 
IOException {
 outputFormat.flush();
 try {
 xaFacade.endAndPrepare(currentXid);
+outputFormat.reconnect(false);

Review comment:
   Is this the fix for FLINK-22311?

##
File path: 
flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/xa/XaFacadePoolingImpl.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.jdbc.xa;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.util.function.ThrowingConsumer;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import javax.transaction.xa.Xid;
+
+import java.io.Serializable;
+import java.sql.Connection;
+import java.sql.SQLException;
+import java.util.Collection;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.Map;
+import java.util.function.Supplier;
+
+import static org.apache.flink.util.ExceptionUtils.rethrow;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A "pooling" implementation of {@link XaFacade}. Some database implement XA 
such that one
+ * connection is limited to a single transaction. As a workaround, this 
implementation creates a new
+ * XA resource after each xa_prepare call 

[jira] [Commented] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasource

2021-04-17 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324223#comment-17324223
 ] 

Junfan Zhang commented on FLINK-22329:
--

[~jark] Cloud you assign this task to me. Thanks

> Missing crendentials in jobconf causes repeated authentication in Hive 
> datasource
> -
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasource

2021-04-17 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324223#comment-17324223
 ] 

Junfan Zhang edited comment on FLINK-22329 at 4/17/21, 10:11 AM:
-

[~jark] [~lirui] Cloud you assign this task to me. Thanks


was (Author: zuston):
[~jark] Cloud you assign this task to me. Thanks

> Missing crendentials in jobconf causes repeated authentication in Hive 
> datasource
> -
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasource

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-22329:
---
Labels: pull-request-available  (was: )

> Missing crendentials in jobconf causes repeated authentication in Hive 
> datasource
> -
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zuston opened a new pull request #15653: [FLINK-22329] Inject current ugi credentials into jobconf when getting file split in hive connector

2021-04-17 Thread GitBox


zuston opened a new pull request #15653:
URL: https://github.com/apache/flink/pull/15653


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasource

2021-04-17 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22329:
-
Summary: Missing crendentials in jobconf causes repeated authentication in 
Hive datasource  (was: Missing crendentials in jobconf causes repeated 
authentication in Hive datasources)

> Missing crendentials in jobconf causes repeated authentication in Hive 
> datasource
> -
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22329) Missing crendentials in jobconf causes repeated authentication in Hive datasources

2021-04-17 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-22329:


 Summary: Missing crendentials in jobconf causes repeated 
authentication in Hive datasources
 Key: FLINK-22329
 URL: https://issues.apache.org/jira/browse/FLINK-22329
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-22294) Hive reading fail when getting file numbers on different filesystem nameservices

2021-04-17 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322117#comment-17322117
 ] 

Junfan Zhang edited comment on FLINK-22294 at 4/17/21, 10:03 AM:
-

Could you assign this task to me? [~lirui]  [~jark]


was (Author: zuston):
Could you assign this task to me? [~lirui]

> Hive reading fail when getting file numbers on different filesystem 
> nameservices
> 
>
> Key: FLINK-22294
> URL: https://issues.apache.org/jira/browse/FLINK-22294
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.12.2
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> The same problem like https://issues.apache.org/jira/browse/FLINK-20710



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15569: [FLINK-21247][table] Fix problem in MapDataSerializer#copy when there…

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15569:
URL: https://github.com/apache/flink/pull/15569#issuecomment-817675895


   
   ## CI report:
   
   * c62a25d1ffa4f2f6f76be2b11b77d9c3126d50c4 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16701)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15651: [FLINK-22307][network] Increase the data writing cache size of sort-merge blocking shuffle

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15651:
URL: https://github.com/apache/flink/pull/15651#issuecomment-821767454


   
   ## CI report:
   
   * 3d65a719c16d9aaf8a44f843e2e21d1719947559 UNKNOWN
   * c840d852f79ed239c164c4809f36529bed7ffe5d Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16700)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-22325) Channel state iterator is accessed concurrently without proper synchronization

2021-04-17 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan closed FLINK-22325.
-
Fix Version/s: (was: 1.12.3)
   Resolution: Invalid

Closing as invalid (as there is a memory fence after compareAndSet for 
visibility; and iteration can not proceed simultaneously with close because of 
state check)

> Channel state iterator is accessed concurrently without proper synchronization
> --
>
> Key: FLINK-22325
> URL: https://issues.apache.org/jira/browse/FLINK-22325
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> ChannelStateWriter adds input/output data that is written by a dedicated 
> thread.
> The data is passed as CloseableIterator.
> In some cases, iterator.close can be called from the task thread which can 
> lead to double release of buffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22325) Channel state iterator is accessed concurrently without proper synchronization

2021-04-17 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-22325:
--
Affects Version/s: (was: 1.12.2)

> Channel state iterator is accessed concurrently without proper synchronization
> --
>
> Key: FLINK-22325
> URL: https://issues.apache.org/jira/browse/FLINK-22325
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.13.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> ChannelStateWriter adds input/output data that is written by a dedicated 
> thread.
> The data is passed as CloseableIterator.
> In some cases, iterator.close can be called from the task thread which can 
> lead to double release of buffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rkhachatryan closed pull request #15646: [FLINK-22325][runtime] Synchronize access to iterator when writing channel state

2021-04-17 Thread GitBox


rkhachatryan closed pull request #15646:
URL: https://github.com/apache/flink/pull/15646


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (FLINK-22117) Not print stack trace for checkpoint trigger failure if not all tasks are started.

2021-04-17 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan resolved FLINK-22117.
---
Fix Version/s: 1.13.0
 Assignee: Yun Gao
   Resolution: Fixed

Merged into master as 577113f0c339df844f2cc32b1d4a09d3da28085a

> Not print stack trace for checkpoint trigger failure if not all tasks are 
> started.
> --
>
> Key: FLINK-22117
> URL: https://issues.apache.org/jira/browse/FLINK-22117
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.13.0
>Reporter: Yun Gao
>Assignee: Yun Gao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> Currently the stack trace is printed compared with the previous versions, but 
> it might cover the actual exception that user want to locate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rkhachatryan commented on pull request #15643: [FLINK-22117] Reduce the logs if not all tasks are RUNNING when checkpointing

2021-04-17 Thread GitBox


rkhachatryan commented on pull request #15643:
URL: https://github.com/apache/flink/pull/15643#issuecomment-821792260


   Thanks for updating the PR @gaoyunhaii!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan merged pull request #15643: [FLINK-22117] Reduce the logs if not all tasks are RUNNING when checkpointing

2021-04-17 Thread GitBox


rkhachatryan merged pull request #15643:
URL: https://github.com/apache/flink/pull/15643


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-22311) Flink JDBC XA connector need to set maxRetries to 0 to properly working

2021-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-22311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324171#comment-17324171
 ] 

Maciej Bryński edited comment on FLINK-22311 at 4/17/21, 8:34 AM:
--

This is the log.
 I have few duplicate records with 2021-04-15T22:15:42 timestamp.
{code:java}
1:052021-04-15T23:16:49.762722475+00:00 stdout F 2021-04-15 23:16:49,761 INFO 
org.apache.kafka.clients.FetchSessionHandler [] - [Consumer 
clientId=consumer-flink-ingestion-raw-2, groupId=flink-ingestion-raw] Error 
sending fetch request (sessionId=842995853, epoch=11339) to node 1: {}. 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) 
[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) 
[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) 
[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
 [flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
 [flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54)
 ~[flink-dist_2.12-1.12.2-stream1.jar:1.12.2-stream1] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
com.getindata.flink.ingestion.JdbcXaSinkDeserializationWrapper.invoke(JdbcXaSinkDeserializationWrapper.java:22)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
com.getindata.flink.ingestion.JdbcXaSinkDeserializationWrapper.invoke(JdbcXaSinkDeserializationWrapper.java:37)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.connector.jdbc.xa.JdbcXaSinkFunction.invoke(JdbcXaSinkFunction.java:287)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:167)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.flush(JdbcBatchingOutputFormat.java:184)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.attemptFlush(JdbcBatchingOutputFormat.java:216)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
org.apache.flink.connector.jdbc.internal.executor.DynamicBatchStatementExecutor.executeBatch(DynamicBatchStatementExecutor.java:73)
 ~[ingestion-1.2.1.jar:?] 2021-04-15T22:16:04.498583738+00:00 stdout F at 
oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:237)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:9487)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
oracle.jdbc.driver.T4CPreparedStatement.executeLargeBatch(T4CPreparedStatement.java:1447)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F at 
oracle.jdbc.driver.OraclePreparedStatement.executeLargeBatch(OraclePreparedStatement.java:9711)
 ~[ingestion-1.2.1.jar:?] 
2021-04-15T22:16:04.498583738+00:00 stdout F 
2021-04-15T22:16:04.498583738+00:00 

[jira] [Commented] (FLINK-22141) Manually test exactly-once JDBC sink

2021-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-22141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324199#comment-17324199
 ] 

Maciej Bryński commented on FLINK-22141:


[~ym] 
No problem at all. 
We have this code integrated with 1.12 and want to go to production soon. So we 
did a lot of tests.

> Manually test exactly-once JDBC sink
> 
>
> Key: FLINK-22141
> URL: https://issues.apache.org/jira/browse/FLINK-22141
> Project: Flink
>  Issue Type: Test
>  Components: Connectors / JDBC
>Reporter: Roman Khachatryan
>Assignee: Yuan Mei
>Priority: Blocker
>  Labels: pull-request-available, release-testing
> Fix For: 1.13.0
>
>
> In FLINK-15578, an API and its implementation were added to JDBC connector to 
> support exactly-once semantics for sinks. The implementation uses JDBC XA 
> transactions.
> The scope of this task is to make sure:
>  # The feature is well-documented
>  # The API is reasonably easy to use
>  # The implementation works as expected
>  ## normal case: database is updated on checkpointing
>  ## failure and recovery case: no duplicates inserted, no records skipped
>  ## several DBs: postgressql, mssql, oracle (mysql has a known issue: 
> FLINK-21743)
>  ## concurrent checkpoints > 1, DoP > 1
>  # Logging is meaningful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15652: [FLINK-22305][network] Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15652:
URL: https://github.com/apache/flink/pull/15652#issuecomment-821782680


   
   ## CI report:
   
   * 38caca4f7b38bbc5811cd9ba5e890243e1709ce8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16702)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15652: [FLINK-22305][network] Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread GitBox


flinkbot commented on pull request #15652:
URL: https://github.com/apache/flink/pull/15652#issuecomment-821782680


   
   ## CI report:
   
   * 38caca4f7b38bbc5811cd9ba5e890243e1709ce8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-17 Thread Jiayi Liao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324189#comment-17324189
 ] 

Jiayi Liao commented on FLINK-13247:


Actually we've already implemented Yarn shuffle service in our internal 
version, and also can be contributed to Flink community. I can summarize our 
implementation and write a detailed design before 5th May, if the community 
still needs this. What do you think [~zjwang]? 

 

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15652: [FLINK-22305][network] Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread GitBox


flinkbot commented on pull request #15652:
URL: https://github.com/apache/flink/pull/15652#issuecomment-821780333


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 38caca4f7b38bbc5811cd9ba5e890243e1709ce8 (Sat Apr 17 
07:11:55 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-22305).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22305) Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-22305:
---
Labels: pull-request-available  (was: )

> Improve log messages of sort-merge blocking shuffle
> ---
>
> Key: FLINK-22305
> URL: https://issues.apache.org/jira/browse/FLINK-22305
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Yingjie Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> Currently, the default value of taskmanager.network.sort-shuffle.min-buffers 
> is 64, which is pretty small. As suggested, we'd like to increase the default 
> value of taskmanager.network.sort-shuffle.min-buffers. By increasing the 
> default taskmanager.network.sort-shuffle.min-buffers, the corner case of very 
> small in-memory sort-buffer and write-buffer can be avoid, which is better 
> for performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wsry opened a new pull request #15652: [FLINK-22305][network] Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread GitBox


wsry opened a new pull request #15652:
URL: https://github.com/apache/flink/pull/15652


   ## What is the purpose of the change
   
   Improve log messages of sort-merge blocking shuffle.
   
   
   ## Brief change log
   
 - Improve log messages of sort-merge blocking shuffle.
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22328) Failed to send data to Kafka: Producer attempted an operation with an old epoch

2021-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-22328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

孙峰 updated FLINK-22328:
---
Environment: 
kafka 3.0.0

Flink 1.11.1

  was:kafka 3.0.0

 Labels: Transactional kafka  (was: )

> Failed to send data to Kafka: Producer attempted an operation with an old 
> epoch
> ---
>
> Key: FLINK-22328
> URL: https://issues.apache.org/jira/browse/FLINK-22328
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.11.1
> Environment: kafka 3.0.0
> Flink 1.11.1
>Reporter: 孙峰
>Priority: Major
>  Labels: Transactional, kafka
>
> Flink job fails occasionally.Here is the stacktrace:
> {code:java}
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to 
> send data to Kafka: Producer attempted an operation with an old epoch.Either 
> there is a newer producer with the same transactionalId, or the producer's 
> transactioin has been expired by the broker.
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:640)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:157)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:81)
> at 
> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:235)
> at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
> 
> ...
> Caused by:org.apache.kafka.common.errors.ProducerFencedException: Producer 
> attempted an operation with an old epoch.Either there is a newer producer 
> with the same transactionalId, or the producer's transactioin has been 
> expired by the broker.{code}
> The job use FlinkKafkaProducer with EXACTLY_ONCE and deploy on Yarn.
> In the debugging information I found the transactionalId is "Source: Custom 
> Source -> (Process -> Sink: errorMessageToKafka, Sink: etlMultiTopicSink) 
> -03f86923ea4164263684d81917202071-0".
> In kafka server.log,the exception:
> {code:java}
> ERROR [ReplicaManager borker=1004] Error processing append on partition 
> ods_source-2 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.ProducerFencedExceptioin: Producer's epoch is 
> no longer valid.There is probably another producer with a newer epoch. 158 
> (request epoch), 159 (server:epoch){code}
> Here is the log that kafka increase epoch with this transactionalId "Source: 
> Custom Source -> (Process -> Sink: errorMessageToKafka, Sink: 
> etlMultiTopicSink) -03f86923ea4164263684d81917202071-0":
> {code:java}
> INFO [TransactionCoordinator id=1003] Initialized transactionalId Source: 
> Custom Source -> (Process -> Sink: errorMessageToKafka, Sink: 
> etlMultiTopicSink) -03f86923ea4164263684d81917202071-0 with producerId 21036 
> and producer epoch 158 on partition _transaction_state-3 
> (kafka.coordinator.transaction.TransactionCoordinator)
> INFO [TransactionCoordinator id=1003] Initialized transactionalId Source: 
> Custom Source -> (Process -> Sink: errorMessageToKafka, Sink: 
> etlMultiTopicSink) -03f86923ea4164263684d81917202071-0 with producerId 21036 
> and producer epoch 160 on partition _transaction_state-3 
> (kafka.coordinator.transaction.TransactionCoordinator)
> {code}
> There is no info that record kafka set producer epoch to 159.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22328) Failed to send data to Kafka: Producer attempted an operation with an old epoch

2021-04-17 Thread Jira
孙峰 created FLINK-22328:
--

 Summary: Failed to send data to Kafka: Producer attempted an 
operation with an old epoch
 Key: FLINK-22328
 URL: https://issues.apache.org/jira/browse/FLINK-22328
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.11.1
 Environment: kafka 3.0.0
Reporter: 孙峰


Flink job fails occasionally.Here is the stacktrace:
{code:java}
org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send 
data to Kafka: Producer attempted an operation with an old epoch.Either there 
is a newer producer with the same transactionalId, or the producer's 
transactioin has been expired by the broker.
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:640)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:157)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:81)
at 
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:235)
at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)

...
Caused by:org.apache.kafka.common.errors.ProducerFencedException: Producer 
attempted an operation with an old epoch.Either there is a newer producer with 
the same transactionalId, or the producer's transactioin has been expired by 
the broker.{code}
The job use FlinkKafkaProducer with EXACTLY_ONCE and deploy on Yarn.

In the debugging information I found the transactionalId is "Source: Custom 
Source -> (Process -> Sink: errorMessageToKafka, Sink: etlMultiTopicSink) 
-03f86923ea4164263684d81917202071-0".

In kafka server.log,the exception:
{code:java}
ERROR [ReplicaManager borker=1004] Error processing append on partition 
ods_source-2 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.ProducerFencedExceptioin: Producer's epoch is no 
longer valid.There is probably another producer with a newer epoch. 158 
(request epoch), 159 (server:epoch){code}
Here is the log that kafka increase epoch with this transactionalId "Source: 
Custom Source -> (Process -> Sink: errorMessageToKafka, Sink: 
etlMultiTopicSink) -03f86923ea4164263684d81917202071-0":
{code:java}
INFO [TransactionCoordinator id=1003] Initialized transactionalId Source: 
Custom Source -> (Process -> Sink: errorMessageToKafka, Sink: 
etlMultiTopicSink) -03f86923ea4164263684d81917202071-0 with producerId 21036 
and producer epoch 158 on partition _transaction_state-3 
(kafka.coordinator.transaction.TransactionCoordinator)
INFO [TransactionCoordinator id=1003] Initialized transactionalId Source: 
Custom Source -> (Process -> Sink: errorMessageToKafka, Sink: 
etlMultiTopicSink) -03f86923ea4164263684d81917202071-0 with producerId 21036 
and producer epoch 160 on partition _transaction_state-3 
(kafka.coordinator.transaction.TransactionCoordinator)
{code}
There is no info that record kafka set producer epoch to 159.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22305) Improve log messages of sort-merge blocking shuffle

2021-04-17 Thread Yingjie Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-22305:

Summary: Improve log messages of sort-merge blocking shuffle  (was: Improve 
log message of sort-merge blocking shuffle)

> Improve log messages of sort-merge blocking shuffle
> ---
>
> Key: FLINK-22305
> URL: https://issues.apache.org/jira/browse/FLINK-22305
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Yingjie Cao
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently, the default value of taskmanager.network.sort-shuffle.min-buffers 
> is 64, which is pretty small. As suggested, we'd like to increase the default 
> value of taskmanager.network.sort-shuffle.min-buffers. By increasing the 
> default taskmanager.network.sort-shuffle.min-buffers, the corner case of very 
> small in-memory sort-buffer and write-buffer can be avoid, which is better 
> for performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22305) Improve log message of sort-merge blocking shuffle

2021-04-17 Thread Yingjie Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-22305:

Summary: Improve log message of sort-merge blocking shuffle  (was: Increase 
the default value of taskmanager.network.sort-shuffle.min-buffers)

> Improve log message of sort-merge blocking shuffle
> --
>
> Key: FLINK-22305
> URL: https://issues.apache.org/jira/browse/FLINK-22305
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Yingjie Cao
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently, the default value of taskmanager.network.sort-shuffle.min-buffers 
> is 64, which is pretty small. As suggested, we'd like to increase the default 
> value of taskmanager.network.sort-shuffle.min-buffers. By increasing the 
> default taskmanager.network.sort-shuffle.min-buffers, the corner case of very 
> small in-memory sort-buffer and write-buffer can be avoid, which is better 
> for performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22305) Increase the default value of taskmanager.network.sort-shuffle.min-buffers

2021-04-17 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324181#comment-17324181
 ] 

Yingjie Cao commented on FLINK-22305:
-

After some tests, I find that increasing this default value can impact the out 
of box user experience of sort-merge blocking shuffle. We decide to only 
improve the log message.

> Increase the default value of taskmanager.network.sort-shuffle.min-buffers
> --
>
> Key: FLINK-22305
> URL: https://issues.apache.org/jira/browse/FLINK-22305
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Yingjie Cao
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently, the default value of taskmanager.network.sort-shuffle.min-buffers 
> is 64, which is pretty small. As suggested, we'd like to increase the default 
> value of taskmanager.network.sort-shuffle.min-buffers. By increasing the 
> default taskmanager.network.sort-shuffle.min-buffers, the corner case of very 
> small in-memory sort-buffer and write-buffer can be avoid, which is better 
> for performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15651: [FLINK-22307][network] Increase the data writing cache size of sort-merge blocking shuffle

2021-04-17 Thread GitBox


flinkbot edited a comment on pull request #15651:
URL: https://github.com/apache/flink/pull/15651#issuecomment-821767454


   
   ## CI report:
   
   * 3d65a719c16d9aaf8a44f843e2e21d1719947559 UNKNOWN
   * c840d852f79ed239c164c4809f36529bed7ffe5d Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16700)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




<    1   2