[jira] [Commented] (FLINK-13856) Reduce the delete file api when the checkpoint is completed

2021-04-20 Thread Andrew.D.lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325541#comment-17325541
 ] 

Andrew.D.lin commented on FLINK-13856:
--

[~sewen] Thanks for your careful reply.

In the next minor version of 1.13 or 1.14, we plan to do this?  It is 
convenient for me to modify the expected repair version of the issue.

In addition, I noticed that we can get all the shared state through the 
[getSharedState|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-runtime/src/main/java/org/apache/flink/runtime/state/IncrementalRemoteKeyedStateHandle.java#L123]
 method. This way we can get all the shared state for separate 
processing(discard of shared state handles). Then we can directly and safely 
delete the exclusive folder.

> Reduce the delete file api when the checkpoint is completed
> ---
>
> Key: FLINK-13856
> URL: https://issues.apache.org/jira/browse/FLINK-13856
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Andrew.D.lin
>Assignee: Andrew.D.lin
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Attachments: after.png, before.png, 
> f6cc56b7-2c74-4f4b-bb6a-476d28a22096.png
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> When the new checkpoint is completed, an old checkpoint will be deleted by 
> calling CompletedCheckpoint.discardOnSubsume().
> When deleting old checkpoints, follow these steps:
> 1, drop the metadata
> 2, discard private state objects
> 3, discard location as a whole
> In some cases, is it possible to delete the checkpoint folder recursively by 
> one call?
> As far as I know the full amount of checkpoint, it should be possible to 
> delete the folder directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wenlong88 opened a new pull request #15684: [FLINK-20899][table-planner-blink][hotfix] use cost factory in RelOptCluster to create HepPlanner.

2021-04-20 Thread GitBox


wenlong88 opened a new pull request #15684:
URL: https://github.com/apache/flink/pull/15684


   
   ## What is the purpose of the change
   Avoid ClassCastException when calculating cost in HepPlanner and log level 
is set to TRACE
   
   ## Brief change log
   create HepPlanner using cost factory in RelOptCluster.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage, 
and functionality of HepProgram is already validated by the existing planner 
test.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19441) Performance regression on 24.09.2020

2021-04-20 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-19441.
--
Resolution: Cannot Reproduce

> Performance regression on 24.09.2020
> 
>
> Key: FLINK-19441
> URL: https://issues.apache.org/jira/browse/FLINK-19441
> Project: Flink
>  Issue Type: Bug
>Reporter: Arvid Heise
>Priority: Critical
>  Labels: pull-request-available
>
> A couple of benchmarks are showing a small performance regression on 
> 24.09.2020:
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy&env=2 (?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14484: [FLINK-20722][hive] HiveTableSink should copy the record when convert…

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #14484:
URL: https://github.com/apache/flink/pull/14484#issuecomment-750730068


   
   ## CI report:
   
   * af484144ed52eb8074b78710915bff0fb049dc97 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16577)
 
   * d95bcdeacb60d70108bcb9fe880ad212e9f8a62e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15684: [FLINK-20899][table-planner-blink][hotfix] use cost factory in RelOptCluster to create HepPlanner.

2021-04-20 Thread GitBox


flinkbot commented on pull request #15684:
URL: https://github.com/apache/flink/pull/15684#issuecomment-823034191


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit d2e4faadb080c3c64fbae293f18cdaec3e0c3d95 (Tue Apr 20 
07:13:28 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15121:
URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240


   
   ## CI report:
   
   * cf26ce895a7956d258acc073817f578558e78227 UNKNOWN
   * 709c4110370b845f42d916a06e56c6026cf2fac8 UNKNOWN
   * 4ea99332e1997eebca1f3f0a9d9229b8265fe32c UNKNOWN
   * 8c146a5610a6438326789cf57390154e8f749329 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16832)
 
   * 599b7ee7a3505fcc875aafb75a2b061684881b74 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15657: [FLINK-22335][runtime][config] Increase default resource wait timeout for the adaptive scheduler.

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15657:
URL: https://github.com/apache/flink/pull/15657#issuecomment-822150567


   
   ## CI report:
   
   * 8932614d850db2c2295ad6906e61c61bd18cd381 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16835)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15658: [FLINK-18206][table-api / sql-client] Fix the incorrect timestamp display

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15658:
URL: https://github.com/apache/flink/pull/15658#issuecomment-822156568


   
   ## CI report:
   
   * 1a8bf8e0a3f13a0ab00b93ccdf4ee9ff94c61063 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16825)
 
   * afdd1ac85a3c04bcdb361597bab735d59cec19e6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15668: [FLINK-21174][coordination] Optimize the performance of DefaultResourceAllocationStrategy

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15668:
URL: https://github.com/apache/flink/pull/15668#issuecomment-822310910


   
   ## CI report:
   
   * d2518709273629655be88001f8a2d2af71130b95 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16777)
 
   * 322e02f59c30488847a1973c15aeef3bfda04382 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16849)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15667: Reduce the delete file api when the checkpoint is completed. [FLINK-…

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15667:
URL: https://github.com/apache/flink/pull/15667#issuecomment-822292375


   
   ## CI report:
   
   * 19a17bf030e448ae4b68cf3b892366c23ae23397 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16848)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15670: [FLINK-22346][sql-client] Remove sql-client-defaults.yaml

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15670:
URL: https://github.com/apache/flink/pull/15670#issuecomment-822351871


   
   ## CI report:
   
   * 2bb46a5c653477296ba9ae26296c8a57c3826c9f Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16841)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-13856) Reduce the delete file api when the checkpoint is completed

2021-04-20 Thread Andrew.D.lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325549#comment-17325549
 ] 

Andrew.D.lin commented on FLINK-13856:
--

Said above is wrong. The 
[getSharedState|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-runtime/src/main/java/org/apache/flink/runtime/state/IncrementalRemoteKeyedStateHandle.java#L123]
 method return is FileStateHandle not IncrementalRemoteKeyedStateHandle (share 
state handle). This method is easy to misunderstand and misuse. 

So currently we can only collect IncrementalRemoteKeyedStateHandle (only type 
of increment state handle) for separate processing.

> Reduce the delete file api when the checkpoint is completed
> ---
>
> Key: FLINK-13856
> URL: https://issues.apache.org/jira/browse/FLINK-13856
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Andrew.D.lin
>Assignee: Andrew.D.lin
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Attachments: after.png, before.png, 
> f6cc56b7-2c74-4f4b-bb6a-476d28a22096.png
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> When the new checkpoint is completed, an old checkpoint will be deleted by 
> calling CompletedCheckpoint.discardOnSubsume().
> When deleting old checkpoints, follow these steps:
> 1, drop the metadata
> 2, discard private state objects
> 3, discard location as a whole
> In some cases, is it possible to delete the checkpoint folder recursively by 
> one call?
> As far as I know the full amount of checkpoint, it should be possible to 
> delete the folder directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-21659) Running HA per-job cluster (rocks, incremental) end-to-end test fails

2021-04-20 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-21659:


Assignee: Dawid Wysakowicz

> Running HA per-job cluster (rocks, incremental) end-to-end test fails
> -
>
> Key: FLINK-21659
> URL: https://issues.apache.org/jira/browse/FLINK-21659
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Assignee: Dawid Wysakowicz
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.13.0
>
> Attachments: screenshot-1.png
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14232&view=logs&j=4dd4dbdd-1802-5eb7-a518-6acd9d24d0fc&t=8d6b4dd3-4ca1-5611-1743-57a7d76b395a
> “Completed checkpoint” is more than two times(42 times in the 
> "flink-vsts-standalonejob-2-fv-az83-563.log") but the test still fail. 
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22369) RocksDB state backend might occur ClassNotFoundException when deserializing on TM side

2021-04-20 Thread Yun Tang (Jira)
Yun Tang created FLINK-22369:


 Summary: RocksDB state backend might occur ClassNotFoundException 
when deserializing on TM side
 Key: FLINK-22369
 URL: https://issues.apache.org/jira/browse/FLINK-22369
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Yun Tang
 Fix For: 1.13.0
 Attachments: image-2021-04-20-15-18-49-706.png

FLINK-19467 introduced new {{EmbeddedRocksDBStateBackend}} and added new 
interface 
{{[EmbeddedRocksDBStateBackend#setLogger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L287]}}
 to ensures users of the legacy {{RocksDBStateBackend}} see consistent logging.

However, this change introduce another non transient 
{{[logger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L115]}}
 and it would be deserialized on TM side first. If the client has different 
log4j implementation, we might meet ClassNotFoundException:
 !image-2021-04-20-15-18-49-706.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22369) RocksDB state backend might occur ClassNotFoundException when deserializing on TM side

2021-04-20 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-22369:
-
Description: 
FLINK-19467 introduced new {{EmbeddedRocksDBStateBackend}} and added new 
interface 
{{[EmbeddedRocksDBStateBackend#setLogger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L287]}}
 to ensures users of the legacy {{RocksDBStateBackend}} see consistent logging.

However, this change introduce another non transient 
{{[logger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L115]}}
 and it would be deserialized on TM side first. If the client has different 
log4j implementation from TM side, we might meet ClassNotFoundException:
 !image-2021-04-20-15-18-49-706.png!

  was:
FLINK-19467 introduced new {{EmbeddedRocksDBStateBackend}} and added new 
interface 
{{[EmbeddedRocksDBStateBackend#setLogger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L287]}}
 to ensures users of the legacy {{RocksDBStateBackend}} see consistent logging.

However, this change introduce another non transient 
{{[logger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L115]}}
 and it would be deserialized on TM side first. If the client has different 
log4j implementation, we might meet ClassNotFoundException:
 !image-2021-04-20-15-18-49-706.png! 


> RocksDB state backend might occur ClassNotFoundException when deserializing 
> on TM side
> --
>
> Key: FLINK-22369
> URL: https://issues.apache.org/jira/browse/FLINK-22369
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Blocker
> Fix For: 1.13.0
>
> Attachments: image-2021-04-20-15-18-49-706.png
>
>
> FLINK-19467 introduced new {{EmbeddedRocksDBStateBackend}} and added new 
> interface 
> {{[EmbeddedRocksDBStateBackend#setLogger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L287]}}
>  to ensures users of the legacy {{RocksDBStateBackend}} see consistent 
> logging.
> However, this change introduce another non transient 
> {{[logger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L115]}}
>  and it would be deserialized on TM side first. If the client has different 
> log4j implementation from TM side, we might meet ClassNotFoundException:
>  !image-2021-04-20-15-18-49-706.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22369) RocksDB state backend might occur ClassNotFoundException when deserializing on TM side

2021-04-20 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325553#comment-17325553
 ] 

Yun Tang commented on FLINK-22369:
--

[~sjwiesman] , I already have local PR to fix this. However, I wonder do we 
really need this feature to ensure users of the legacy {{RocksDBStateBackend}} 
see consistent logging since we need to introduce other more code and test to 
just make such a small feature work. If we could just remove such small 
feature, things could be much simpler.

> RocksDB state backend might occur ClassNotFoundException when deserializing 
> on TM side
> --
>
> Key: FLINK-22369
> URL: https://issues.apache.org/jira/browse/FLINK-22369
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Blocker
> Fix For: 1.13.0
>
> Attachments: image-2021-04-20-15-18-49-706.png
>
>
> FLINK-19467 introduced new {{EmbeddedRocksDBStateBackend}} and added new 
> interface 
> {{[EmbeddedRocksDBStateBackend#setLogger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L287]}}
>  to ensures users of the legacy {{RocksDBStateBackend}} see consistent 
> logging.
> However, this change introduce another non transient 
> {{[logger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L115]}}
>  and it would be deserialized on TM side first. If the client has different 
> log4j implementation from TM side, we might meet ClassNotFoundException:
>  !image-2021-04-20-15-18-49-706.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] xintongsong closed pull request #15657: [FLINK-22335][runtime][config] Increase default resource wait timeout for the adaptive scheduler.

2021-04-20 Thread GitBox


xintongsong closed pull request #15657:
URL: https://github.com/apache/flink/pull/15657


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-22179) Manual test time function changes and new time attributes support

2021-04-20 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-22179.
---
Resolution: Resolved

> Manual test time function changes and new time attributes support
> -
>
> Key: FLINK-22179
> URL: https://issues.apache.org/jira/browse/FLINK-22179
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.13.0
>Reporter: Leonard Xu
>Assignee: Jark Wu
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.13.0
>
> Attachments: time-functions-test.md
>
>
> * 1. Test time functions return expected value in both batch and streaming 
> mode, we can use Sql-client to test.
> - LOCALTIME
> - LOCALTIMESTAMP
> - CURRENT_DATE
> - CURRENT_TIME
> - CURRENT_TIMESTAMP
> - NOW()
> - PROCTIME()
> * 2. Test PROCTIME attribute works well in different case, eg(`interval 
> join`, `temporal join(lookup)`, `window`), and check 
> sink.partition-commit.trigger  that used proctime is as expected in 
> Hive/FileSystem sink 
> * 3. Test ROWTIME attribute that use TIMESTAMP_LTZ, cover `interval join`, 
> `temporal join`, `window` scenarios
> * 4. Test the `window` result is as expected in  DaylightSaving timezone



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-22176) Test SQL Client manually

2021-04-20 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-22176.
---
Resolution: Resolved

> Test SQL Client manually
> 
>
> Key: FLINK-22176
> URL: https://issues.apache.org/jira/browse/FLINK-22176
> Project: Flink
>  Issue Type: Test
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Shengkai Fang
>Assignee: Rui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.13.0
>
>
> Test SQL Client including
>  - SET/RESET runtime parameter, e.g job name
>  - Use SQL client to submit jobs by -i/-f
>  - Test statement set and new table option {{table.dml-sync}}
>  - Test execute sql file( contains SELECT ) and redirect result to the file
>  - Test tableau/table/changelog print all types of the results is correct
>  - Test load savepoint to start the job
>  - Test start with YAML file and init file
>  - Test run job with specified job name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-22335) Increase default resource wait timeout for the adaptive scheduler

2021-04-20 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-22335.

Resolution: Done

Merged via
* master (1.14): 4a7db1c6b81d17eccb8ae582f0c94e006392cbec
* release-1.13: 5e336cf0f7e79f197e8e4c61568fc9c3d183fa8f

> Increase default resource wait timeout for the adaptive scheduler
> -
>
> Key: FLINK-22335
> URL: https://issues.apache.org/jira/browse/FLINK-22335
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration, Runtime / Coordination
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> As discussed in FLINK-22135, the current default value {{10s}} for 
> {{jobmanager.adaptive-scheduler.resource-wait-timeout}} is too short and can 
> easily lead to job failures when working with active resource managers. We'd 
> like to increase the default value to {{5min}}, aligning with 
> {{slot.request.timeout}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14484: [FLINK-20722][hive] HiveTableSink should copy the record when convert…

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #14484:
URL: https://github.com/apache/flink/pull/14484#issuecomment-750730068


   
   ## CI report:
   
   * af484144ed52eb8074b78710915bff0fb049dc97 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16577)
 
   * d95bcdeacb60d70108bcb9fe880ad212e9f8a62e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16850)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15121:
URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240


   
   ## CI report:
   
   * cf26ce895a7956d258acc073817f578558e78227 UNKNOWN
   * 709c4110370b845f42d916a06e56c6026cf2fac8 UNKNOWN
   * 4ea99332e1997eebca1f3f0a9d9229b8265fe32c UNKNOWN
   * 8c146a5610a6438326789cf57390154e8f749329 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16832)
 
   * 599b7ee7a3505fcc875aafb75a2b061684881b74 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16851)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15658: [FLINK-18206][table-api / sql-client] Fix the incorrect timestamp display

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15658:
URL: https://github.com/apache/flink/pull/15658#issuecomment-822156568


   
   ## CI report:
   
   * 1a8bf8e0a3f13a0ab00b93ccdf4ee9ff94c61063 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16825)
 
   * afdd1ac85a3c04bcdb361597bab735d59cec19e6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16852)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15684: [FLINK-20899][table-planner-blink][hotfix] use cost factory in RelOptCluster to create HepPlanner.

2021-04-20 Thread GitBox


flinkbot commented on pull request #15684:
URL: https://github.com/apache/flink/pull/15684#issuecomment-823050966


   
   ## CI report:
   
   * d2e4faadb080c3c64fbae293f18cdaec3e0c3d95 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-22369) RocksDB state backend might occur ClassNotFoundException when deserializing on TM side

2021-04-20 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325553#comment-17325553
 ] 

Yun Tang edited comment on FLINK-22369 at 4/20/21, 7:39 AM:


[~sjwiesman] , I wonder do we really need this feature to ensure users of the 
legacy {{RocksDBStateBackend}} see consistent logging since we need to 
introduce other more code and test to just make such a small feature work. If 
we could just remove such small feature, things could be much simpler.


was (Author: yunta):
[~sjwiesman] , I already have local PR to fix this. However, I wonder do we 
really need this feature to ensure users of the legacy {{RocksDBStateBackend}} 
see consistent logging since we need to introduce other more code and test to 
just make such a small feature work. If we could just remove such small 
feature, things could be much simpler.

> RocksDB state backend might occur ClassNotFoundException when deserializing 
> on TM side
> --
>
> Key: FLINK-22369
> URL: https://issues.apache.org/jira/browse/FLINK-22369
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Yun Tang
>Priority: Blocker
> Fix For: 1.13.0
>
> Attachments: image-2021-04-20-15-18-49-706.png
>
>
> FLINK-19467 introduced new {{EmbeddedRocksDBStateBackend}} and added new 
> interface 
> {{[EmbeddedRocksDBStateBackend#setLogger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L287]}}
>  to ensures users of the legacy {{RocksDBStateBackend}} see consistent 
> logging.
> However, this change introduce another non transient 
> {{[logger|https://github.com/apache/flink/blob/24031e55e4cf35a5818db2e927e65b290a9b2aed/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java#L115]}}
>  and it would be deserialized on TM side first. If the client has different 
> log4j implementation from TM side, we might meet ClassNotFoundException:
>  !image-2021-04-20-15-18-49-706.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on pull request #15642: [FLINK-22159][FLINK-22302][docs][table] Restructure SQL "Queries" pages and add docs for the new window TVF based operations

2021-04-20 Thread GitBox


wuchong commented on pull request #15642:
URL: https://github.com/apache/flink/pull/15642#issuecomment-823060260


   Thanks @leonardBang for the reviewing, I have addressed your comments. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-21986:
--
Affects Version/s: 1.13.0
   1.11.3

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-21986:
--
Fix Version/s: 1.12.3
   1.13.0
   1.11.4

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22361) flink run -c not upload udf jar

2021-04-20 Thread todd (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325567#comment-17325567
 ] 

todd commented on FLINK-22361:
--

Thank you very much, I can execute the program by using -yD 
yarn.ship-directories combined with the -C command. But I think the parameter 
setting related documents are not detailed, I hope to add it.

> flink run -c not upload udf jar
> ---
>
> Key: FLINK-22361
> URL: https://issues.apache.org/jira/browse/FLINK-22361
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.11.1
>Reporter: todd
>Priority: Minor
>
> flink  run   \
> -m yarn-cluster \
> -C file:////flink-demo-1.0.jar \
> x
>  
> flink-demo-1.0.jar not in classpath, will throw class not find error
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-21986:
--
Priority: Critical  (was: Major)

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-21986:
-

Assignee: Feifan Wang

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-22361) flink run -c not upload udf jar

2021-04-20 Thread todd (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

todd closed FLINK-22361.

Resolution: Invalid

> flink run -c not upload udf jar
> ---
>
> Key: FLINK-22361
> URL: https://issues.apache.org/jira/browse/FLINK-22361
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.11.1
>Reporter: todd
>Priority: Minor
>
> flink  run   \
> -m yarn-cluster \
> -C file:////flink-demo-1.0.jar \
> x
>  
> flink-demo-1.0.jar not in classpath, will throw class not find error
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325570#comment-17325570
 ] 

Till Rohrmann commented on FLINK-21986:
---

Thanks for reporting this issue and analyzing it [~Feifan Wang]. From what you 
describe I believe that we should fix this problem for the upcoming release. cc 
[~dwysakowicz].

[~yunta] could you help reviewing this change? If not then we should find 
somebody else taking a look at the PR.

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on pull request #15670: [FLINK-22346][sql-client] Remove sql-client-defaults.yaml

2021-04-20 Thread GitBox


wuchong commented on pull request #15670:
URL: https://github.com/apache/flink/pull/15670#issuecomment-823063644


   The build is failed 
   
   ```
   Apr 20 06:11:54 Traceback (most recent call last):
   Apr 20 06:11:54   File "setup.py", line 225, in 
   Apr 20 06:11:54 copy_files(conf_paths, CONF_TEMP_PATH)
   Apr 20 06:11:54   File "setup.py", line 52, in copy_files
   Apr 20 06:11:54 dst_path = copy(src_path, os.path.join(output_directory, 
os.path.basename(src_path)))
   Apr 20 06:11:54   File 
"/__w/3/s/flink-python/dev/.conda/lib/python3.7/shutil.py", line 245, in copy
   Apr 20 06:11:54 copyfile(src, dst, follow_symlinks=follow_symlinks)
   Apr 20 06:11:54   File 
"/__w/3/s/flink-python/dev/.conda/lib/python3.7/shutil.py", line 120, in 
copyfile
   Apr 20 06:11:54 with open(src, 'rb') as fsrc:
   Apr 20 06:11:54 FileNotFoundError: [Errno 2] No such file or directory: 
'/__w/3/s/flink-dist/../flink-table/flink-sql-client/conf/'
   ```
   
   Maybe we should remove "copy SQL client configuration files" entry in dist 
assemblies. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-22370) ParquetColumnarRowSplitReader#reachedEnd() returns false after it returns true

2021-04-20 Thread Danny Chen (Jira)
Danny Chen created FLINK-22370:
--

 Summary: ParquetColumnarRowSplitReader#reachedEnd() returns false 
after it returns true
 Key: FLINK-22370
 URL: https://issues.apache.org/jira/browse/FLINK-22370
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.12.2, 1.13.1
Reporter: Danny Chen
 Fix For: 1.13.1


{{ParquetColumnarRowSplitReader#reachedEnd()}} should always return true after 
it first time returns true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on a change in pull request #15595: [FLINK-22171][sql-client][docs] Update the SQL Client doc

2021-04-20 Thread GitBox


wuchong commented on a change in pull request #15595:
URL: https://github.com/apache/flink/pull/15595#discussion_r616439563



##
File path: docs/content/docs/dev/table/sqlClient.md
##
@@ -169,559 +177,523 @@ Mode "embedded" (default) submits Flink jobs from the 
local machine.
 
   Syntax: [embedded] [OPTIONS]
   "embedded" mode options:
- -d,--defaults   The environment properties with 
which
-   every new session is initialized.
-   Properties might be overwritten by
-   session properties.
- -e,--environmentThe environment properties to be
-   imported into the session. It might
-   overwrite default environment
-   properties.
- -h,--help Show the help message with
-   descriptions of all options.
- -hist,--historyThe file which you want to save the
-   command history into. If not
-   specified, we will auto-generate one
-   under your user's home directory.
- -j,--jarA JAR file to be imported into the
-   session. The file might contain
-   user-defined classes needed for the
-   execution of statements such as
-   functions, table sources, or sinks.
-   Can be used multiple times.
- -l,--library   A JAR file directory with which 
every
-   new session is initialized. The 
files
-   might contain user-defined classes
-   needed for the execution of
-   statements such as functions, table
-   sources, or sinks. Can be used
-   multiple times.
- -pyarch,--pyArchives Add python archive files for job. 
The
-   archive files will be extracted to
-   the working directory of python UDF
-   worker. Currently only zip-format is
-   supported. For each archive file, a
-   target directory be specified. If 
the
-   target directory name is specified,
-   the archive file will be extracted 
to
-   a name can directory with the
-   specified name. Otherwise, the
-   archive file will be extracted to a
-   directory with the same name of the
-   archive file. The files uploaded via
-   this option are accessible via
-   relative path. '#' could be used as
-   the separator of the archive file
-   path and the target directory name.
-   Comma (',') could be used as the
-   separator to specify multiple 
archive
-   files. This option can be used to
-   upload the virtual environment, the
-   data files used in Python UDF (e.g.:
-   --pyArchives
-   
file:///tmp/py37.zip,file:///tmp/data
-   .zip#data --pyExecutable
-   py37.zip/py37/bin/python). The data
-   files could be accessed in Python
-   UDF, e.g.: f = open('data/data.txt',
-   'r').
- -pyexec,--pyExecutable   Specify the path of the python
-   interpreter used to execute the
-   python UDF worker (e.g.:
-   --pyExecutable
-   /usr/local/bin/python3). The python
-   UDF worker depends on Python 3.6+,
-

[jira] [Commented] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325573#comment-17325573
 ] 

Till Rohrmann commented on FLINK-21986:
---

The change is so simple that I can review and merge it. I will take care of it.

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325575#comment-17325575
 ] 

Yun Tang commented on FLINK-21986:
--

Sorry for late reply and glad to see you finally figured out the root cause 
[~Feifan Wang], great work! 

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14880: [FLINK-21302][table-planner-blink] Fix NPE when use row_number() in over agg

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #14880:
URL: https://github.com/apache/flink/pull/14880#issuecomment-774029330


   
   ## CI report:
   
   * ebe907b1a908c981b23450d81f95350f43826c09 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16834)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14894: [FLINK-20895][flink-table-planner-blink] support local aggregate push down in blink planner

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #14894:
URL: https://github.com/apache/flink/pull/14894#issuecomment-774905432


   
   ## CI report:
   
   * e6b4615624d467a6df5fdd98d00f4c19a4973800 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16838)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15684: [FLINK-20899][table-planner-blink][hotfix] use cost factory in RelOptCluster to create HepPlanner.

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15684:
URL: https://github.com/apache/flink/pull/15684#issuecomment-823050966


   
   ## CI report:
   
   * d2e4faadb080c3c64fbae293f18cdaec3e0c3d95 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16855)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-3089) State API Should Support Data Expiration (State TTL)

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-3089:


Assignee: (was: Andrey Zagrebin)

> State API Should Support Data Expiration (State TTL)
> 
>
> Key: FLINK-3089
> URL: https://issues.apache.org/jira/browse/FLINK-3089
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataStream, Runtime / State Backends
>Reporter: Niels Basjes
>Priority: Major
>  Labels: stale-assigned
>
> In some usecases (webanalytics) there is a need to have a state per visitor 
> on a website (i.e. keyBy(sessionid) ).
> At some point the visitor simply leaves and no longer creates new events (so 
> a special 'end of session' event will not occur).
> The only way to determine that a visitor has left is by choosing a timeout, 
> like "After 30 minutes no events we consider the visitor 'gone'".
> Only after this (chosen) timeout has expired should we discard this state.
> In the Trigger part of Windows we can set a timer and close/discard this kind 
> of information. But that introduces the buffering effect of the window (which 
> in some scenarios is unwanted).
> What I would like is to be able to set a timeout on a specific state which I 
> can update afterwards.
> This makes it possible to create a map function that assigns the right value 
> and that discards the state automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-6800) PojoSerializer ignores added pojo fields

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-6800:


Assignee: (was: boshu Zheng)

> PojoSerializer ignores added pojo fields
> 
>
> Key: FLINK-6800
> URL: https://issues.apache.org/jira/browse/FLINK-6800
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Priority: Major
>  Labels: stale-assigned
>
> The {{PojoSerializer}} contains a list of pojo fields which are represented 
> as {{Field}} instances. Upon serialization the names of these fields are 
> serialized. When being deserialized these names are used to look up the 
> respective {{Fields}} of a dynamically loaded class. If the dynamically 
> loaded class has additional fields (compared to when the serializer was 
> serialized), then these fields will be ignored (for the read and for the 
> write path). While this is necessary to read stored data, it is dangerous 
> when writing new data, because all newly added fields won't be serialized. 
> This subtleness is really hard to detect for the user. Therefore, I think we 
> should eagerly fail if the newly loaded type contains new fields which 
> haven't been present before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] XComp commented on a change in pull request #15020: [FLINK-21445][kubernetes] Application mode does not set the configuration when building PackagedProgram

2021-04-20 Thread GitBox


XComp commented on a change in pull request #15020:
URL: https://github.com/apache/flink/pull/15020#discussion_r616071812



##
File path: 
flink-clients/src/main/java/org/apache/flink/client/deployment/application/PythonBasedPackagedProgramRetriever.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.client.deployment.application;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.client.program.PackagedProgram;
+import org.apache.flink.client.program.PackagedProgramRetriever;
+import org.apache.flink.client.program.PackagedProgramUtils;
+import org.apache.flink.client.program.ProgramInvocationException;
+import org.apache.flink.configuration.Configuration;
+
+import javax.annotation.Nullable;
+
+import java.io.File;
+import java.io.IOException;
+
+/**
+ * A python based {@link PackagedProgramRetriever} which creates the {@link 
PackagedProgram} when
+ * program arguments contain "-py/--python" or "-pym/--pyModule".

Review comment:
   The JavaDoc should only describe what the class is doing. The condition 
on when to use this class (i.e. "when program arguments contain [...]") is made 
outside of this class in what's currently called 
`PackagedProgramRetrieverAdapter.Builder`.

##
File path: 
flink-clients/src/main/java/org/apache/flink/client/deployment/application/AbstractPackagedProgramRetriever.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.client.deployment.application;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.client.program.PackagedProgram;
+import org.apache.flink.client.program.PackagedProgramRetriever;
+import org.apache.flink.client.program.ProgramInvocationException;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.util.FileUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.Preconditions;
+import org.apache.flink.util.function.FunctionUtils;
+
+import javax.annotation.Nullable;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URL;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.stream.Collectors;
+
+/**
+ * An abstract {@link PackagedProgramRetriever} which creates the {@link 
PackagedProgram} containing
+ * the user's {@code main()} from a class on the class path.
+ */
+@Internal
+public abstract class AbstractPackagedProgramRetriever implements 
PackagedProgramRetriever {
+
+private final String[] programArguments;
+
+private final Configuration configuration;
+
+/** User class paths in relative form to the working directory. */
+protected final Collection userClasspaths;
+
+AbstractPackagedProgramRetriever(
+String[] programArguments, Configuration configuration, @Nullable 
File userLibDirectory)
+throws IOException {
+this.programArguments = Preconditions.checkNotNull(programArguments);
+this.configuration = Preconditions.checkNotNull(configuration);
+this.userClasspaths = discoverUserClasspaths(userLibDirectory);
+}
+
+private static Collection discoverUserClasspaths(@Nullable File 
jobDir)

Review comment:
   ```suggestion
   private static List discoverUserClasspaths(@Nullable File jobDir)
   ```
   This would enable us to no

[jira] [Assigned] (FLINK-3089) State API Should Support Data Expiration (State TTL)

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-3089:


Assignee: Andrey Zagrebin

> State API Should Support Data Expiration (State TTL)
> 
>
> Key: FLINK-3089
> URL: https://issues.apache.org/jira/browse/FLINK-3089
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataStream, Runtime / State Backends
>Reporter: Niels Basjes
>Assignee: Andrey Zagrebin
>Priority: Major
>  Labels: stale-assigned
>
> In some usecases (webanalytics) there is a need to have a state per visitor 
> on a website (i.e. keyBy(sessionid) ).
> At some point the visitor simply leaves and no longer creates new events (so 
> a special 'end of session' event will not occur).
> The only way to determine that a visitor has left is by choosing a timeout, 
> like "After 30 minutes no events we consider the visitor 'gone'".
> Only after this (chosen) timeout has expired should we discard this state.
> In the Trigger part of Windows we can set a timer and close/discard this kind 
> of information. But that introduces the buffering effect of the window (which 
> in some scenarios is unwanted).
> What I would like is to be able to set a timeout on a specific state which I 
> can update afterwards.
> This makes it possible to create a map function that assigns the right value 
> and that discards the state automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-3089) State API Should Support Data Expiration (State TTL)

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-3089.

Fix Version/s: 1.8.4
   Resolution: Fixed

Flink now supports state TTL based on processing time.

> State API Should Support Data Expiration (State TTL)
> 
>
> Key: FLINK-3089
> URL: https://issues.apache.org/jira/browse/FLINK-3089
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataStream, Runtime / State Backends
>Reporter: Niels Basjes
>Assignee: Andrey Zagrebin
>Priority: Major
>  Labels: stale-assigned
> Fix For: 1.8.4
>
>
> In some usecases (webanalytics) there is a need to have a state per visitor 
> on a website (i.e. keyBy(sessionid) ).
> At some point the visitor simply leaves and no longer creates new events (so 
> a special 'end of session' event will not occur).
> The only way to determine that a visitor has left is by choosing a timeout, 
> like "After 30 minutes no events we consider the visitor 'gone'".
> Only after this (chosen) timeout has expired should we discard this state.
> In the Trigger part of Windows we can set a timer and close/discard this kind 
> of information. But that introduces the buffering effect of the window (which 
> in some scenarios is unwanted).
> What I would like is to be able to set a timeout on a specific state which I 
> can update afterwards.
> This makes it possible to create a map function that assigns the right value 
> and that discards the state automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann closed pull request #15619: [FLINK-21986][State Backends] fix native memory used by RocksDB not be released timely after job restart

2021-04-20 Thread GitBox


tillrohrmann closed pull request #15619:
URL: https://github.com/apache/flink/pull/15619


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-6802) PojoSerializer does not create ConvertDeserializer for removed/added fields

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-6802.

  Assignee: (was: Tzu-Li (Gordon) Tai)
Resolution: Won't Do

Won't do at the moment since this feature is not really asked for.

> PojoSerializer does not create ConvertDeserializer for removed/added fields
> ---
>
> Key: FLINK-6802
> URL: https://issues.apache.org/jira/browse/FLINK-6802
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Priority: Major
>  Labels: stale-assigned
>
> When calling {{PojoSerializer#ensureCompatibility}}, the PojoSerializer 
> checks for compatibility. Currently, the method only construct a 
> ConvertDeserializer if the number of old and new pojo fields is exactly the 
> same. However, given the {{TypeSerializerConfigurationSnapshots}} and the 
> current set of fields, it should also be possible to construct a 
> ConvertDeserializer if new fields were added or old fields removed from the 
> Pojo. I think that we should add this functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on a change in pull request #15678: [FLINK-22349][table-api] Throw Exception for unsupported Zone ID instead of using wrong value.

2021-04-20 Thread GitBox


wuchong commented on a change in pull request #15678:
URL: https://github.com/apache/flink/pull/15678#discussion_r616455314



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/assigners/SlidingEventTimeWindows.java
##
@@ -123,9 +123,9 @@ public static SlidingEventTimeWindows of(Time size, Time 
slide) {
  * windows start at 0:15:00,1:15:00,2:15:00,etc.
  *
  * Rather than that,if you are living in somewhere which is not using 
UTC±00:00 time, such as
- * China which is using UTC+08:00,and you want a time window with size of 
one day, and window
+ * China which is using GMT+08:00,and you want a time window with size of 
one day, and window

Review comment:
   revert changes to `flink-streaming-java` module. 

##
File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/TableConfig.java
##
@@ -113,6 +113,15 @@ public void setSqlDialect(SqlDialect sqlDialect) {
  */
 public ZoneId getLocalTimeZone() {
 String zone = 
configuration.getString(TableConfigOptions.LOCAL_TIME_ZONE);
+if (zone.startsWith("UTC+") || zone.startsWith("UTC-")) {

Review comment:
   Have a shared private method for them?

##
File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/api/TableConfigTest.java
##
@@ -63,6 +68,26 @@ public void testSetAndGetLocalTimeZone() {
 assertEquals(ZoneId.of("Asia/Shanghai"), 
configByConfiguration.getLocalTimeZone());
 }
 
+@Test
+public void testSetInvalidLocalTimeZone() {
+expectedException.expectMessage(
+"The supported Zone ID is either an abbreviation such as 
'PST',"
++ " a full name such as 'America/Los_Angeles', or a 
custom timezone id such as 'GMT-8:00',"
++ " but configured Zone ID is 'UTC-10:00'.");
+configByMethod.setLocalTimeZone(ZoneId.of("UTC-10:00"));
+}
+
+@Test
+public void testGetInvalidLocalTimeZone() {
+expectedException.expectMessage(

Review comment:
   move this after `configByConfiguration.addConfiguration(configuration);`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-6131) Add side inputs for DataStream API

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-6131:


Assignee: (was: Aljoscha Krettek)

> Add side inputs for DataStream API
> --
>
> Key: FLINK-6131
> URL: https://issues.apache.org/jira/browse/FLINK-6131
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Aljoscha Krettek
>Priority: Major
>  Labels: stale-assigned
>
> This is an umbrella issue for tracking the implementation of FLIP-17: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-5015) Add Tests/ITCase for Kafka Per-Partition Watermarks

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-5015.

  Assignee: (was: Tzu-Li (Gordon) Tai)
Resolution: Abandoned

This issue has been abandoned.

> Add Tests/ITCase for Kafka Per-Partition Watermarks
> ---
>
> Key: FLINK-5015
> URL: https://issues.apache.org/jira/browse/FLINK-5015
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common, Connectors / Kafka
>Reporter: Aljoscha Krettek
>Priority: Major
>  Labels: stale-assigned
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-4175) Broadcast data sent increases with # slots per TM

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-4175.

Resolution: Abandoned

This issue has been abandoned.

> Broadcast data sent increases with # slots per TM
> -
>
> Key: FLINK-4175
> URL: https://issues.apache.org/jira/browse/FLINK-4175
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.0.3
>Reporter: Felix Neutatz
>Priority: Major
>  Labels: performance, stale-assigned
>
> Problem:
> we experience some unexpected increase of data sent over the network for 
> broadcasts with increasing number of slots per Taskmanager.
> We provided a benchmark [1]. It not only increases the size of data sent over 
> the network but also hurts performance as seen in the preliminary results 
> below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with 
> scaling the number of slots per node from 1 - 16.
> +---+--+-+
> | suite | name | median_time |
> +===+==+=+
> | broadcast.cloud-11| broadcast.01 |8796 |
> | broadcast.cloud-11| broadcast.02 |   14802 |
> | broadcast.cloud-11| broadcast.04 |   30173 |
> | broadcast.cloud-11| broadcast.08 |   56936 |
> | broadcast.cloud-11| broadcast.16 |  117507 |
> | broadcast.ibm-power-1 | broadcast.01 |6807 |
> | broadcast.ibm-power-1 | broadcast.02 |8443 |
> | broadcast.ibm-power-1 | broadcast.04 |   11823 |
> | broadcast.ibm-power-1 | broadcast.08 |   21655 |
> | broadcast.ibm-power-1 | broadcast.16 |   37426 |
> +---+--+-+
> After looking into the code base it, it seems that the data is de-serialized 
> only once per TM, but the actual data is sent for all slots running the 
> operator with broadcast vars and just gets discarded in case its already 
> de-serialized.
> We do not see a reason the data can't be shared among the slots of a TM and 
> therefore just sent once.
> [1] https://github.com/TU-Berlin-DIMA/flink-broadcast
> This Jira will continue the discussion started here: 
> https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3c1465386300767.94...@tu-berlin.de%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-4175) Broadcast data sent increases with # slots per TM

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-4175:


Assignee: (was: Felix Neutatz)

> Broadcast data sent increases with # slots per TM
> -
>
> Key: FLINK-4175
> URL: https://issues.apache.org/jira/browse/FLINK-4175
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.0.3
>Reporter: Felix Neutatz
>Priority: Major
>  Labels: performance, stale-assigned
>
> Problem:
> we experience some unexpected increase of data sent over the network for 
> broadcasts with increasing number of slots per Taskmanager.
> We provided a benchmark [1]. It not only increases the size of data sent over 
> the network but also hurts performance as seen in the preliminary results 
> below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with 
> scaling the number of slots per node from 1 - 16.
> +---+--+-+
> | suite | name | median_time |
> +===+==+=+
> | broadcast.cloud-11| broadcast.01 |8796 |
> | broadcast.cloud-11| broadcast.02 |   14802 |
> | broadcast.cloud-11| broadcast.04 |   30173 |
> | broadcast.cloud-11| broadcast.08 |   56936 |
> | broadcast.cloud-11| broadcast.16 |  117507 |
> | broadcast.ibm-power-1 | broadcast.01 |6807 |
> | broadcast.ibm-power-1 | broadcast.02 |8443 |
> | broadcast.ibm-power-1 | broadcast.04 |   11823 |
> | broadcast.ibm-power-1 | broadcast.08 |   21655 |
> | broadcast.ibm-power-1 | broadcast.16 |   37426 |
> +---+--+-+
> After looking into the code base it, it seems that the data is de-serialized 
> only once per TM, but the actual data is sent for all slots running the 
> operator with broadcast vars and just gets discarded in case its already 
> de-serialized.
> We do not see a reason the data can't be shared among the slots of a TM and 
> therefore just sent once.
> [1] https://github.com/TU-Berlin-DIMA/flink-broadcast
> This Jira will continue the discussion started here: 
> https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3c1465386300767.94...@tu-berlin.de%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-3257.

  Assignee: (was: Paris Carbone)
Resolution: Abandoned

This issue has been abandoned.

> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Paris Carbone
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann closed pull request #6136: [FLINK-4303] [CEP] Add CEP examples

2021-04-20 Thread GitBox


tillrohrmann closed pull request #6136:
URL: https://github.com/apache/flink/pull/6136


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on pull request #6136: [FLINK-4303] [CEP] Add CEP examples

2021-04-20 Thread GitBox


tillrohrmann commented on pull request #6136:
URL: https://github.com/apache/flink/pull/6136#issuecomment-823084718


   Closing because the PR is outdated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-4303) Add CEP examples

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-4303:
--
Labels: pull-request-available stale-assigned  (was: stale-assigned)

> Add CEP examples
> 
>
> Key: FLINK-4303
> URL: https://issues.apache.org/jira/browse/FLINK-4303
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / CEP
>Affects Versions: 1.1.0
>Reporter: Timo Walther
>Assignee: boshu Zheng
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> Neither CEP Java nor CEP Scala contain a runnable example. The example on the 
> website is also not runnable without adding some additional code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-4303) Add CEP examples

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-4303.

  Assignee: (was: boshu Zheng)
Resolution: Abandoned

This issue has been abandoned.

> Add CEP examples
> 
>
> Key: FLINK-4303
> URL: https://issues.apache.org/jira/browse/FLINK-4303
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / CEP
>Affects Versions: 1.1.0
>Reporter: Timo Walther
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> Neither CEP Java nor CEP Scala contain a runnable example. The example on the 
> website is also not runnable without adding some additional code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-4419) Batch improvement for supporting dfs as a ResultPartitionType

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-4419.

Resolution: Won't Do

Efforts have stopped.

> Batch improvement for supporting dfs as a ResultPartitionType
> -
>
> Key: FLINK-4419
> URL: https://issues.apache.org/jira/browse/FLINK-4419
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: shuai.xu
>Assignee: shuai.xu
>Priority: Major
>  Labels: stale-assigned
>
> This is the root issue to track a improvement for batch, which will enable 
> dfs as a ResultPartitionType, so that upstream node can exist totally after 
> finished and need not be restarted if downstream nodes fail.
> Full design is shown in 
> (https://docs.google.com/document/d/15HtCtc9Gk8SyHsAezM7Od1opAHgnxLeHm3VX7A8fa-4/edit#).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22345) CoordinatorEventsExactlyOnceITCase hangs on azure

2021-04-20 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325594#comment-17325594
 ] 

Matthias commented on FLINK-22345:
--

https://dev.azure.com/mapohl/flink/_build/results?buildId=402&view=logs&j=cc649950-03e9-5fae-8326-2f1ad744b536&t=51cab6ca-669f-5dc0-221d-1e4f7dc4fc85&l=7561

> CoordinatorEventsExactlyOnceITCase hangs on azure
> -
>
> Key: FLINK-22345
> URL: https://issues.apache.org/jira/browse/FLINK-22345
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Stephan Ewen
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16731&view=logs&j=02c4e775-43bf-5625-d1cc-542b5209e072&t=e5961b24-88d9-5c77-efd3-955422674c25&l=9896
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7fa8c800b800 nid=0x58b3 waiting on 
> condition [0x7fa8cfd1c000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x8147a7e8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:802)
>   at 
> org.apache.flink.runtime.operators.coordination.CoordinatorEventsExactlyOnceITCase.test(CoordinatorEventsExactlyOnceITCase.java:187)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zentol commented on a change in pull request #15680: [FLINK-22359][python] Set the version to 1.13.dev0 for PyFlink

2021-04-20 Thread GitBox


zentol commented on a change in pull request #15680:
URL: https://github.com/apache/flink/pull/15680#discussion_r616465454



##
File path: flink-python/pyflink/version.py
##
@@ -20,4 +20,4 @@
 The pyflink version will be consistent with the flink version and follow the 
PEP440.
 .. seealso:: https://www.python.org/dev/peps/pep-0440
 """
-__version__ = "1.13.0"
+__version__ = "1.13.dev0"

Review comment:
   @HuangXingBo If we do a release, where is this being changed now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15595: [FLINK-22171][sql-client][docs] Update the SQL Client doc

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15595:
URL: https://github.com/apache/flink/pull/15595#issuecomment-818681408


   
   ## CI report:
   
   * b7a82f07683d5a76f6743d68cb4cd174c0453bca UNKNOWN
   * 0b78a5c66116de0181cc645da264115b4b1ee79c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16582)
 
   * 59cdddc569ec0ad3b19d3c2aaa33bb28df8626ea UNKNOWN
   * d3109e7684475c6794cd93f25d475ed2ae279852 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16842)
 
   * b2763a88c91a273536f0ab367ed283fa53ab4fea UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325596#comment-17325596
 ] 

Yun Tang commented on FLINK-21986:
--

Unlike previous RocksDB memory related problem, the created 
{{ColumnFamilyDescriptor}} is just a java object instead of RocksObject which 
holds no native reference, that might explain why force GC could help avoid 
such behavior. Current PR could avoid one time to create 
{{ColumnFamilyDescriptor}}, however, I wonder why another more time to create 
could cause memory leak. Maybe we need some other efforts to dig into this 
problem.

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9043) Introduce a friendly way to resume the job from externalized checkpoints automatically

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325597#comment-17325597
 ] 

Till Rohrmann commented on FLINK-9043:
--

Did you mean that S3 can now be strongly consistent [~sewen]?

> Introduce a friendly way to resume the job from externalized checkpoints 
> automatically
> --
>
> Key: FLINK-9043
> URL: https://issues.apache.org/jira/browse/FLINK-9043
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Reporter: Godfrey He
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I know a flink job can reovery from checkpoint with restart strategy, but can 
> not recovery as spark streaming jobs when job is starting.
> Every time, the submitted flink job is regarded as a new job, while , in the 
> spark streaming  job, which can detect the checkpoint directory first,  and 
> then recovery from the latest succeed one. However, Flink only can recovery 
> until the job failed first, then retry with strategy.
>  
> So, would flink support to recover from the checkpoint directly in a new job?
> h2. New description by [~sihuazhou]
> Currently, it's quite a bit not friendly for users to recover job from the 
> externalized checkpoint, user need to find the dedicate dir for the job which 
> is not a easy thing when there are too many jobs. This ticket attend to 
> introduce a more friendly way to allow the user to use the externalized 
> checkpoint to do recovery.
> The implementation steps are copied from the comments of [~StephanEwen]:
>  - We could make this an option where you pass a flag (-r) to automatically 
> look for the latest checkpoint in a given directory.
>  - If more than one jobs checkpointed there before, this operation would fail.
>  - We might also need a way to have jobs not create the UUID subdirectory, 
> otherwise the scanning for the latest checkpoint would not easily work.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15642: [FLINK-22159][FLINK-22302][docs][table] Restructure SQL "Queries" pages and add docs for the new window TVF based operations

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15642:
URL: https://github.com/apache/flink/pull/15642#issuecomment-821006759


   
   ## CI report:
   
   * e351680d7b2404b5cdecf5597a975c117fc01f8b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16682)
 
   * 61e289e59a8c81abe71567e01d9a32830b706c3e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15670: [FLINK-22346][sql-client] Remove sql-client-defaults.yaml

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15670:
URL: https://github.com/apache/flink/pull/15670#issuecomment-822351871


   
   ## CI report:
   
   * 2bb46a5c653477296ba9ae26296c8a57c3826c9f Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16841)
 
   * 445c928a8e2e721eea10fc3316f62da08fd65234 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-22359) Python tests do not pass on 1.x branch if it has not been released

2021-04-20 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325599#comment-17325599
 ] 

Chesnay Schepler commented on FLINK-22359:
--

Shouldn't the change to the scripts be merged to master? Now we're gonna run 
into the same issue again in 1.14.

> Python tests do not pass on 1.x branch if it has not been released
> --
>
> Key: FLINK-22359
> URL: https://issues.apache.org/jira/browse/FLINK-22359
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Test Infrastructure
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Huang Xingbo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The python tests fail because they try to download something for 0.0 version.
> {code}
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/38/ff06cec1b32e796c8422153d6d29a6c8c6dab962436779e34b0d72df0f2f/grpcio-tools-1.14.2.tar.gz
>  (1.9MB)
> Apr 19 11:39:38 Collecting apache-flink-libraries
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/b1/78dcaec55a437c3b8b3eb479b169d7fac15e86ffe9bd7340767934ceab2f/apache_flink_libraries-0.0.tar.gz
>  (3.5MB)
> Apr 19 11:39:38 Complete output from command python setup.py egg_info:
> Apr 19 11:39:38 The flink core files are not found. Please make sure your 
> installation package is complete, or do this in the flink-python directory of 
> the flink source directory.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-21986.
-
Resolution: Fixed

Fixed via

master: 57ba2ff64add32df2598e1e35ee19a76eff8194c
1.13.0: 0e1468a4aadf68019f034eaad8bbf50a8ecf9589
1.12.3: 2cd2fecc121266109f9635a24ca956022b8ab283
1.11.4: 4dfef1b2e079a0f0e07aec994f721adf27a47b9a

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zentol merged pull request #15673: [FLINK-22352][mesos] Deprecates Mesos support

2021-04-20 Thread GitBox


zentol merged pull request #15673:
URL: https://github.com/apache/flink/pull/15673


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21986) taskmanager native memory not release timely after restart

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325600#comment-17325600
 ] 

Till Rohrmann commented on FLINK-21986:
---

I've merged the PR because it is definitely an improvement (in the worst case 
it avoids duplicate Java object creation). If the problem should still exist, 
then please re-open this issue.

> taskmanager native memory not release timely after restart
> --
>
> Key: FLINK-21986
> URL: https://issues.apache.org/jira/browse/FLINK-21986
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Environment: flink version:1.12.1
> run :yarn session
> job type:mock source -> regular join
>  
> checkpoint interval: 3m
> Taskmanager memory : 16G
>  
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.13.0, 1.12.3
>
> Attachments: 82544.svg, image-2021-03-25-15-53-44-214.png, 
> image-2021-03-25-16-07-29-083.png, image-2021-03-26-11-46-06-828.png, 
> image-2021-03-26-11-47-21-388.png
>
>
> I run a regular join job with flink_1.12.1 , and find taskmanager native 
> memory not release timely after restart cause by exceeded checkpoint 
> tolerable failure threshold.
> *problem job information:*
>  # job first restart cause by exceeded checkpoint tolerable failure threshold.
>  # then taskmanager be killed by yarn many times
>  # in this case,tm heap is set to 7.68G,bug all tm heap size is under 4.2G
>  !image-2021-03-25-15-53-44-214.png|width=496,height=103!
>  # nonheap size increase after restart,but still under 160M.
>  
> !https://km.sankuai.com/api/file/cdn/706284607/716474606?contentType=1&isNewContent=false&isNewContent=false|width=493,height=102!
>  # taskmanager process memory increase 3-4G after restart(this figure show 
> one of taskmanager)
>  !image-2021-03-25-16-07-29-083.png|width=493,height=107!
>  
> *my guess:*
> [RocksDB 
> wiki|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
>  mentioned :Many of the Java Objects used in the RocksJava API will be backed 
> by C++ objects for which the Java Objects have ownership. As C++ has no 
> notion of automatic garbage collection for its heap in the way that Java 
> does, we must explicitly free the memory used by the C++ objects when we are 
> finished with them.
> So, is it possible that RocksDBStateBackend not call 
> AbstractNativeReference#close() to release memory use by RocksDB C++ Object ?
> *I make a change:*
>         Actively call System.gc() and System.runFinalization() every minute.
>  *And run this test again:*
>  # taskmanager process memory no obvious increase
>  !image-2021-03-26-11-46-06-828.png|width=495,height=93!
>  # job run for several days,and restart many times,but no taskmanager killed 
> by yarn like before
>  
> *Summary:*
>  # first,there is some native memory can not release timely after restart in 
> this situation
>  # I guess it maybe RocksDB C++ object,but I hive not check it from source 
> code of RocksDBStateBackend
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21844) Do not auto-configure maxParallelism when setting "scheduler-mode: reactive"

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325601#comment-17325601
 ] 

Till Rohrmann commented on FLINK-21844:
---

Yes, the idea would be to load the savepoint and then extract from it potential 
constraints for the {{ExecutionGraph}} creation [~nicholasjiang].

> Do not auto-configure maxParallelism when setting "scheduler-mode: reactive"
> 
>
> Key: FLINK-21844
> URL: https://issues.apache.org/jira/browse/FLINK-21844
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Konstantin Knauf
>Assignee: Austin Cawley-Edwards
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> I believe we should not automatically change the maxParallelism when the 
>  "scheduler-mode" is set to "reactive", because:
>  * it magically breaks savepoint compatibility, when you switch between 
> default and reactive scheduler mode
>  * the maximum parallelism is an orthogonal concern that in my opinion should 
> not be mixed with the scheduler mode. The reactive scheduler should respect 
> the maxParallelism, but it should not set/ change its default value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13247) Implement external shuffle service for YARN

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-13247:
--
Priority: Major  (was: Minor)

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] HuangXingBo commented on a change in pull request #15680: [FLINK-22359][python] Set the version to 1.13.dev0 for PyFlink

2021-04-20 Thread GitBox


HuangXingBo commented on a change in pull request #15680:
URL: https://github.com/apache/flink/pull/15680#discussion_r616471222



##
File path: flink-python/pyflink/version.py
##
@@ -20,4 +20,4 @@
 The pyflink version will be consistent with the flink version and follow the 
PEP440.
 .. seealso:: https://www.python.org/dev/peps/pep-0440
 """
-__version__ = "1.13.0"
+__version__ = "1.13.dev0"

Review comment:
   It will be changed in `create_release_branch.sh`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325603#comment-17325603
 ] 

Till Rohrmann commented on FLINK-13247:
---

[~wind_ljy] if you can contribute the external shuffle service to Flink then 
this would be really good. I think this feature will be quite helpful for our 
batch/batch-execution-mode users because it makes the failover much cheaper 
also in cases where a TM dies.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-22352) Deprecate Mesos Support in documentation

2021-04-20 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-22352.

Resolution: Fixed

master: 19ef3b09aa45377b181772d00c0ba11e44718fb9

1.13: 040df885147e30ae748c141e4c34aa2ae3b38c16

> Deprecate Mesos Support in documentation
> 
>
> Key: FLINK-22352
> URL: https://issues.apache.org/jira/browse/FLINK-22352
> Project: Flink
>  Issue Type: Task
>  Components: Deployment / Mesos
>Affects Versions: 1.13.0
>Reporter: Matthias
>Assignee: Matthias
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> According to the discussion on the dev mailing list ([[SURVEY] Remove Mesos 
> support|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Remove-Mesos-support-td45974.html]
>  and [[VOTE] Deprecating Mesos 
> support|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Deprecating-Mesos-support-td50142.html]),
>  the community decided to deprecate Mesos support in Apache Flink (see 
> [(RESULT)[VOTE]Deprecating Mesos 
> support|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/RESULT-VOTE-Deprecating-Mesos-support-td50260.html]).
> This issue covers adding corresponding warnings/notices:
>  * Log statements - Mesos-related {{ClusterEntrypoint}} implementations
>  * Warning in the docs
>  * {{@Deprecated}} in relevant classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann commented on pull request #15501: [FLINK-22054][k8s] Using a shared watcher for ConfigMap watching

2021-04-20 Thread GitBox


tillrohrmann commented on pull request #15501:
URL: https://github.com/apache/flink/pull/15501#issuecomment-823096276


   I think it is a problem if a user cannot submit an arbitrarily large number 
of jobs to a Flink session cluster without configuring the number of threads 
for the Kubernetes client in order to support per-job watching.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-11254) Unify serialization format of savepoint for switching state backends

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324736#comment-17324736
 ] 

Till Rohrmann edited comment on FLINK-11254 at 4/20/21, 8:42 AM:
-

This feature is already implemented by FLINK-20976, closing as duplicate.


was (Author: carp84):
This feature is already implemented by FLINK_20976, closing as duplicate.

> Unify serialization format of savepoint for switching state backends
> 
>
> Key: FLINK-11254
> URL: https://issues.apache.org/jira/browse/FLINK-11254
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.7.1
>Reporter: Congxian Qiu
>Assignee: Congxian Qiu
>Priority: Major
>  Labels: stale-assigned
>
> For the current version, the serialization formats of savepoint between 
> HeapKeyedStateBackend and RocksDBStateBackend are different, so we can not 
> switch state backend when using savepoint. We should unify the serialization 
> formats of the savepoint to support state backend switch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann closed pull request #15602: [FLINK-22264][docs] Fix misleading statement about Flink Job Cluster Kubernetes Support in Flink Architecture page

2021-04-20 Thread GitBox


tillrohrmann closed pull request #15602:
URL: https://github.com/apache/flink/pull/15602


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19358) when submit job on application mode with HA,the jobid will be 0000000000

2021-04-20 Thread Wei-Che Wei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325611#comment-17325611
 ] 

Wei-Che Wei commented on FLINK-19358:
-

Hi [~trohrmann], [~kkl0u]

I think this issue might also affect history server. [1]

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/advanced/historyserver.html#available-requests

> when submit job on application mode with HA,the jobid will be 00
> 
>
> Key: FLINK-19358
> URL: https://issues.apache.org/jira/browse/FLINK-19358
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Jun Zhang
>Priority: Major
>  Labels: usability
> Fix For: 1.13.0
>
>
> when submit a flink job on application mode with HA ,the flink job id will be 
> , when I have many jobs ,they have the same 
> job id , it will be lead to a checkpoint error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-22367) JobMasterStopWithSavepointITCase.terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished times out

2021-04-20 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-22367:


Assignee: Chesnay Schepler

> JobMasterStopWithSavepointITCase.terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished
>  times out
> --
>
> Key: FLINK-22367
> URL: https://issues.apache.org/jira/browse/FLINK-22367
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16818&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=2c7d57b9-7341-5a87-c9af-2cf7cc1a37dc&l=3844
> {code}
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 13.135 s <<< FAILURE! - in 
> org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase
> Apr 19 22:28:44 [ERROR] 
> terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished(org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase)
>   Time elapsed: 10.237 s  <<< ERROR!
> Apr 19 22:28:44 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public default 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.stopWithSavepoint(org.apache.flink.api.common.JobID,java.lang.String,boolean,org.apache.flink.api.common.time.Time)
>  timed out.
> Apr 19 22:28:44   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Apr 19 22:28:44   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Apr 19 22:28:44   at 
> org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase.stopWithSavepointNormalExecutionHelper(JobMasterStopWithSavepointITCase.java:123)
> Apr 19 22:28:44   at 
> org.apache.flink.runtime.jobmaster.JobMasterStopWithSavepointITCase.terminateWithSavepointWithoutComplicationsShouldSucceedAndLeadJobToFinished(JobMasterStopWithSavepointITCase.java:111)
> Apr 19 22:28:44   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Apr 19 22:28:44   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Apr 19 22:28:44   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Apr 19 22:28:44   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Apr 19 22:28:44   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> Apr 19 22:28:44   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Apr 19 22:28:44   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> Apr 19 22:28:44   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Apr 19 22:28:44   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Apr 19 22:28:44   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 19 22:28:44   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Apr 19 22:28:44   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> Apr 19 22:28:44   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> Apr 19 22:28:44   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> Apr 19 22:28:44   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> Apr 19 22:28:44   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 19 22:28:44   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> Apr 19 22:28:44   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Apr 19 22:28:44   at 
> org.junit.runners.ParentRunner.run(Paren

[jira] [Reopened] (FLINK-22085) KafkaSourceLegacyITCase hangs/fails on azure

2021-04-20 Thread Matthias (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias reopened FLINK-22085:
--

[~lindong] {{KafkaSourceLegacyITCase}} timed out in [that 
build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16789&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=6593].
 The build itself included the fix 
[98424e6|https://github.com/apache/flink/commit/98424e6383bcce107844cbeecc2e9df4ffb4272a]
 provided by this issue. May you have another look at it?

> KafkaSourceLegacyITCase hangs/fails on azure
> 
>
> Key: FLINK-22085
> URL: https://issues.apache.org/jira/browse/FLINK-22085
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dong Lin
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.13.0
>
>
> 1) Observations
> a) The Azure pipeline would occasionally hang without printing any test error 
> information.
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15939&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=8219]
> b) By running the test KafkaSourceLegacyITCase::testBrokerFailure() with INFO 
> level logging, the the test would hang with the following error message 
> printed repeatedly:
> {code:java}
> 20451 [New I/O boss #50] ERROR 
> org.apache.flink.networking.NetworkFailureHandler [] - Closing communication 
> channel because of an exception
> java.net.ConnectException: Connection refused: localhost/127.0.0.1:50073
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
> ~[?:1.8.0_151]
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
> ~[?:1.8.0_151]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
>  ~[flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_151]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_151]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
> {code}
> *2) Root cause explanations*
> The test would hang because it enters the following loop:
>  - closeOnFlush() is called for a given channel
>  - closeOnFlush() calls channel.write(..)
>  - channel.write() triggers the exceptionCaught(...) callback
>  - closeOnFlush() is called for the same channel again.
> *3) Solution*
> Update closeOnFlush() so that, if a channel is being closed by this method, 
> then closeOnFlush() would not try to write to this channel if it is called on 
> this channel again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-22264) Fix misleading statement about per-job mode support for Kubernetes in Concept/Flink Architecture page

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-22264.
-
Resolution: Fixed

Fixed via 

master: ce536974017f71ff7f430ca478a69236c4bed547
1.13.0: 0b1078c7298778880762dd5d686b5412f6b5309b

> Fix misleading statement about per-job mode support for Kubernetes in 
> Concept/Flink Architecture page
> -
>
> Key: FLINK-22264
> URL: https://issues.apache.org/jira/browse/FLINK-22264
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.12.2
>Reporter: Fuyao Li
>Assignee: Fuyao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> I noticed a conflict in the document for per-job mode support for Kubernetes.
> In the doc here [1], it mentions
> in a Flink Job Cluster, the available cluster manager (like YARN or 
> Kubernetes) is used to spin up a cluster for each submitted job and this 
> cluster is available to that job only.
> It implies per job mode is supported in Kubernetes.
>  
> However, in the docs [2] and [3], it clearly points out per-job mode is not 
> supported in Kubernetes.
>  
> To avoid the misunderstanding, I think we should fix the statement in the 
> concept page. I had a discussion with Yang Wang on flink user mailing list 
> earlier. (link still not available in the archive for now.)
>  
> [1] 
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/concepts/flink-architecture/#flink-job-cluster]
> [2] 
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#per-job-cluster-mode]
> [3] 
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-22085) KafkaSourceLegacyITCase hangs/fails on azure

2021-04-20 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325615#comment-17325615
 ] 

Matthias edited comment on FLINK-22085 at 4/20/21, 8:50 AM:


[~lindong] {{KafkaSourceLegacyITCase}} timed out in [that 
build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16789&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=6593].
 The build itself included the fix 
[98424e6|https://github.com/apache/flink/commit/98424e6383bcce107844cbeecc2e9df4ffb4272a]
 provided by this issue. May you have another look at it to double-check 
whether it's related or a completely different issue/


was (Author: mapohl):
[~lindong] {{KafkaSourceLegacyITCase}} timed out in [that 
build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16789&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=6593].
 The build itself included the fix 
[98424e6|https://github.com/apache/flink/commit/98424e6383bcce107844cbeecc2e9df4ffb4272a]
 provided by this issue. May you have another look at it?

> KafkaSourceLegacyITCase hangs/fails on azure
> 
>
> Key: FLINK-22085
> URL: https://issues.apache.org/jira/browse/FLINK-22085
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dong Lin
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.13.0
>
>
> 1) Observations
> a) The Azure pipeline would occasionally hang without printing any test error 
> information.
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15939&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=8219]
> b) By running the test KafkaSourceLegacyITCase::testBrokerFailure() with INFO 
> level logging, the the test would hang with the following error message 
> printed repeatedly:
> {code:java}
> 20451 [New I/O boss #50] ERROR 
> org.apache.flink.networking.NetworkFailureHandler [] - Closing communication 
> channel because of an exception
> java.net.ConnectException: Connection refused: localhost/127.0.0.1:50073
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
> ~[?:1.8.0_151]
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
> ~[?:1.8.0_151]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
>  ~[flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.shaded.testutils.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  [flink-test-utils_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_151]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_151]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
> {code}
> *2) Root cause explanations*
> The test would hang because it enters the following loop:
>  - closeOnFlush() is called for a given channel
>  - closeOnFlush() calls channel.write(..)
>  - channel.write() triggers the exceptionCaught(...) callback
>  - closeOnFlush() is called for the same channel again.
> *3) Solution*
> Update closeOnFlush() so that, if a channel is being closed by this method, 
> then closeOnFlush() would not try to write to this channel if it is called on 
> this channel again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15121:
URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240


   
   ## CI report:
   
   * cf26ce895a7956d258acc073817f578558e78227 UNKNOWN
   * 709c4110370b845f42d916a06e56c6026cf2fac8 UNKNOWN
   * 4ea99332e1997eebca1f3f0a9d9229b8265fe32c UNKNOWN
   * 8c146a5610a6438326789cf57390154e8f749329 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16832)
 
   * 599b7ee7a3505fcc875aafb75a2b061684881b74 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16851)
 
   * 2cd2fecc121266109f9635a24ca956022b8ab283 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on pull request #15640: [FLINK-22276][runtime] Fixes concurrency bug in exception history

2021-04-20 Thread GitBox


XComp commented on pull request #15640:
URL: https://github.com/apache/flink/pull/15640#issuecomment-823103821


   FYI: The failure might be related to 
[FLINK-22085](https://issues.apache.org/jira/browse/FLINK-22085). I added a 
comment to the ticket and re-opened it as it appears that we included the fix 
of `FLINK-22085` already in that build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22359) Python tests do not pass on 1.x branch if it has not been released

2021-04-20 Thread Huang Xingbo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huang Xingbo updated FLINK-22359:
-
Fix Version/s: 1.14.0

> Python tests do not pass on 1.x branch if it has not been released
> --
>
> Key: FLINK-22359
> URL: https://issues.apache.org/jira/browse/FLINK-22359
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Test Infrastructure
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Huang Xingbo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.13.0, 1.14.0
>
>
> The python tests fail because they try to download something for 0.0 version.
> {code}
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/38/ff06cec1b32e796c8422153d6d29a6c8c6dab962436779e34b0d72df0f2f/grpcio-tools-1.14.2.tar.gz
>  (1.9MB)
> Apr 19 11:39:38 Collecting apache-flink-libraries
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/b1/78dcaec55a437c3b8b3eb479b169d7fac15e86ffe9bd7340767934ceab2f/apache_flink_libraries-0.0.tar.gz
>  (3.5MB)
> Apr 19 11:39:38 Complete output from command python setup.py egg_info:
> Apr 19 11:39:38 The flink core files are not found. Please make sure your 
> installation package is complete, or do this in the flink-python directory of 
> the flink source directory.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22359) Python tests do not pass on 1.x branch if it has not been released

2021-04-20 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325618#comment-17325618
 ] 

Huang Xingbo commented on FLINK-22359:
--

Merged the commit 9e11078f483800776e9b0e12c0a9adb955bf49e9 of updating 
create_snapshot_branch.sh into master.
[~chesnay] Thanks a lot for the reminding.

> Python tests do not pass on 1.x branch if it has not been released
> --
>
> Key: FLINK-22359
> URL: https://issues.apache.org/jira/browse/FLINK-22359
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Test Infrastructure
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Huang Xingbo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The python tests fail because they try to download something for 0.0 version.
> {code}
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/38/ff06cec1b32e796c8422153d6d29a6c8c6dab962436779e34b0d72df0f2f/grpcio-tools-1.14.2.tar.gz
>  (1.9MB)
> Apr 19 11:39:38 Collecting apache-flink-libraries
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/b1/78dcaec55a437c3b8b3eb479b169d7fac15e86ffe9bd7340767934ceab2f/apache_flink_libraries-0.0.tar.gz
>  (3.5MB)
> Apr 19 11:39:38 Complete output from command python setup.py egg_info:
> Apr 19 11:39:38 The flink core files are not found. Please make sure your 
> installation package is complete, or do this in the flink-python directory of 
> the flink source directory.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22359) Python tests do not pass on 1.x branch if it has not been released

2021-04-20 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-22359:
-
Fix Version/s: (was: 1.14.0)

> Python tests do not pass on 1.x branch if it has not been released
> --
>
> Key: FLINK-22359
> URL: https://issues.apache.org/jira/browse/FLINK-22359
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Test Infrastructure
>Affects Versions: 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Huang Xingbo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The python tests fail because they try to download something for 0.0 version.
> {code}
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/38/ff06cec1b32e796c8422153d6d29a6c8c6dab962436779e34b0d72df0f2f/grpcio-tools-1.14.2.tar.gz
>  (1.9MB)
> Apr 19 11:39:38 Collecting apache-flink-libraries
> Apr 19 11:39:38   Downloading 
> https://files.pythonhosted.org/packages/6c/b1/78dcaec55a437c3b8b3eb479b169d7fac15e86ffe9bd7340767934ceab2f/apache_flink_libraries-0.0.tar.gz
>  (3.5MB)
> Apr 19 11:39:38 Complete output from command python setup.py egg_info:
> Apr 19 11:39:38 The flink core files are not found. Please make sure your 
> installation package is complete, or do this in the flink-python directory of 
> the flink source directory.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] XComp commented on a change in pull request #15131: [FLINK-21700][yarn]Allow to disable fetching Hadoop delegation token on Yarn

2021-04-20 Thread GitBox


XComp commented on a change in pull request #15131:
URL: https://github.com/apache/flink/pull/15131#discussion_r616490919



##
File path: flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java
##
@@ -197,21 +197,31 @@ private static LocalResource registerLocalResource(
 }
 
 public static void setTokensFor(
-ContainerLaunchContext amContainer, List paths, 
Configuration conf)
+ContainerLaunchContext amContainer,
+List paths,
+Configuration conf,
+boolean obtainingDelegationTokens)
 throws IOException {
 Credentials credentials = new Credentials();
-// for HDFS
-TokenCache.obtainTokensForNamenodes(credentials, paths.toArray(new 
Path[0]), conf);
-// for HBase
-obtainTokenForHBase(credentials, conf);
+
+if (obtainingDelegationTokens) {

Review comment:
   I guess it's not necessary for now considering that we don't have a 
use-case where we want to disable them individually. So, I'm fine with keeping 
it like that considering that it makes configuration easier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15642: [FLINK-22159][FLINK-22302][docs][table] Restructure SQL "Queries" pages and add docs for the new window TVF based operations

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15642:
URL: https://github.com/apache/flink/pull/15642#issuecomment-821006759


   
   ## CI report:
   
   * e351680d7b2404b5cdecf5597a975c117fc01f8b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16682)
 
   * 61e289e59a8c81abe71567e01d9a32830b706c3e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16861)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15670: [FLINK-22346][sql-client] Remove sql-client-defaults.yaml

2021-04-20 Thread GitBox


flinkbot edited a comment on pull request #15670:
URL: https://github.com/apache/flink/pull/15670#issuecomment-822351871


   
   ## CI report:
   
   * 2bb46a5c653477296ba9ae26296c8a57c3826c9f Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16841)
 
   * 445c928a8e2e721eea10fc3316f62da08fd65234 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16862)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on a change in pull request #15131: [FLINK-21700][yarn]Allow to disable fetching Hadoop delegation token on Yarn

2021-04-20 Thread GitBox


XComp commented on a change in pull request #15131:
URL: https://github.com/apache/flink/pull/15131#discussion_r616496502



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -529,6 +530,17 @@ public void killCluster(ApplicationId applicationId) 
throws FlinkException {
 "Hadoop security with Kerberos is enabled but the 
login user "
 + "does not have Kerberos credentials or 
delegation tokens!");
 }
+
+boolean kerberosFetchDTEnabled =
+
flinkConfiguration.getBoolean(SecurityOptions.KERBEROS_FETCH_DELEGATION_TOKEN);
+boolean yarnAccessFSEnabled =
+flinkConfiguration.get(YarnConfigOptions.YARN_ACCESS) != 
null;
+if (!kerberosFetchDTEnabled && yarnAccessFSEnabled) {
+throw new RuntimeException(

Review comment:
   True, I missed the thread. Thanks for pointing to it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21798) Guard MemorySegment against multiple frees.

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325625#comment-17325625
 ] 

Till Rohrmann commented on FLINK-21798:
---

Nice :-)

> Guard MemorySegment against multiple frees.
> ---
>
> Key: FLINK-21798
> URL: https://issues.apache.org/jira/browse/FLINK-21798
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
>  Labels: Umbrella, pull-request-available
> Fix For: 1.14.0
>
>
> As discussed in FLINK-21419, freeing a memory segment for multiple times 
> usually indicates the ownership of the segment is unclear. It would be good 
> to gradually getting rid of all such multiple-frees.
> This ticket serves as an umbrella for detected multiple-free cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-21798) Guard MemorySegment against multiple frees.

2021-04-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-21798:
-

Assignee: Xintong Song

> Guard MemorySegment against multiple frees.
> ---
>
> Key: FLINK-21798
> URL: https://issues.apache.org/jira/browse/FLINK-21798
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
>  Labels: Umbrella, pull-request-available
> Fix For: 1.14.0
>
>
> As discussed in FLINK-21419, freeing a memory segment for multiple times 
> usually indicates the ownership of the segment is unclear. It would be good 
> to gradually getting rid of all such multiple-frees.
> This ticket serves as an umbrella for detected multiple-free cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] XComp commented on a change in pull request #15131: [FLINK-21700][yarn]Allow to disable fetching Hadoop delegation token on Yarn

2021-04-20 Thread GitBox


XComp commented on a change in pull request #15131:
URL: https://github.com/apache/flink/pull/15131#discussion_r616496734



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1079,15 +1091,24 @@ private ApplicationReport startAppMaster(
 
 // setup security tokens
 if (UserGroupInformation.isSecurityEnabled()) {
-// set HDFS delegation tokens when security is enabled
-LOG.info("Adding delegation token to the AM container.");
-List yarnAccessList =
-ConfigUtils.decodeListFromConfig(
-configuration, YarnConfigOptions.YARN_ACCESS, 
Path::new);
+List yarnAccessList = new ArrayList<>();
+
+Boolean kerberosFetchDelegationTokenEnabled =
+
configuration.getBoolean(SecurityOptions.KERBEROS_FETCH_DELEGATION_TOKEN);
+
+if (kerberosFetchDelegationTokenEnabled) {
+// set HDFS delegation tokens when security is enabled
+LOG.info("Adding delegation token to the AM container.");
+yarnAccessList =
+ConfigUtils.decodeListFromConfig(
+configuration, YarnConfigOptions.YARN_ACCESS, 
Path::new);
+}
+
 Utils.setTokensFor(
 amContainer,
 ListUtils.union(yarnAccessList, 
fileUploader.getRemotePaths()),
-yarnConfiguration);
+yarnConfiguration,
+kerberosFetchDelegationTokenEnabled);

Review comment:
   Got it...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] akalash commented on a change in pull request #15628: [FLINK-21329][tests] Increase timeout and delay in local test_local_recovery_and_scheduling.sh

2021-04-20 Thread GitBox


akalash commented on a change in pull request #15628:
URL: https://github.com/apache/flink/pull/15628#discussion_r616442871



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
##
@@ -1285,7 +1285,8 @@ public void triggerCheckpointBarrier(
 
 final AbstractInvokable invokable = this.invokable;
 final CheckpointMetaData checkpointMetaData =
-new CheckpointMetaData(checkpointID, checkpointTimestamp);
+new CheckpointMetaData(
+checkpointID, checkpointTimestamp, 
System.currentTimeMillis());

Review comment:
   In fact, it can be the old constructor with 2 parameters.

##
File path: flink-end-to-end-tests/run-nightly-tests.sh
##
@@ -250,12 +250,12 @@ fi
 

 
 if [[ ${PROFILE} != *"enable-adaptive-scheduler"* ]]; then #FLINK-21450
-   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 3 hashmap 
false false" "skip_check_exceptions"
-   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 3 hashmap 
false true" "skip_check_exceptions"
-   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
false false" "skip_check_exceptions"
-   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
true false" "skip_check_exceptions"
-   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
false true" "skip_check_exceptions"
-   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
true true" "skip_check_exceptions"
+   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 3 hashmap 
false false 100" "skip_check_exceptions"
+   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 3 hashmap 
false true 100" "skip_check_exceptions"
+   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
false false 100" "skip_check_exceptions"
+   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
true false 100" "skip_check_exceptions"
+   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
false true 100" "skip_check_exceptions"
+   run_test "Local recovery and sticky scheduling end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks 
true true 100" "skip_check_exceptions"

Review comment:
   I actually don't fully get the idea here. As I understand,  after adding 
the delay the backpressure will be decreased. But won't it decrease the test 
effectiveness? I mean right now it tests something under high pressure but 
after this changes, the pressure will be decreased. Perhaps, it is ok because 
it is not a stress test but it is exactly why I ask.

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java
##
@@ -711,4 +715,13 @@ private static OperatorSnapshotFutures 
checkpointStreamOperator(
 throw ex;
 }
 }
+
+private void logCheckpointProcessingDelay(CheckpointMetaData 
checkpointMetaData) {

Review comment:
   can be static

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/checkpointing/CheckpointBarrierHandler.java
##
@@ -108,7 +108,10 @@ public long getCheckpointStartDelayNanos() {
 
 protected void notifyCheckpoint(CheckpointBarrier checkpointBarrier) 
throws IOException {
 CheckpointMetaData checkpointMetaData =
-new CheckpointMetaData(checkpointBarrier.getId(), 
checkpointBarrier.getTimestamp());
+new CheckpointMetaData(
+checkpointBarrier.getId(),
+checkpointBarrier.getTimestamp(),
+System.currentTimeMillis());

Review comment:
   it is the same about constructor

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java
##
@@ -75,6 +75,8 @@
 LoggerFactory.getLogger(SubtaskChe

[GitHub] [flink] wuchong commented on a change in pull request #15585: [FLINK-22065] [sql-client]Beautify the parse error exception

2021-04-20 Thread GitBox


wuchong commented on a change in pull request #15585:
URL: https://github.com/apache/flink/pull/15585#discussion_r616477380



##
File path: 
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliClient.java
##
@@ -325,11 +328,13 @@ private boolean executeStatement(String statement, 
ExecutionMode executionMode)
 try {
 final Optional operation = parseCommand(statement);
 operation.ifPresent(op -> callOperation(op, executionMode));
+return true;
+} catch (SqlParseException e) {

Review comment:
   If we want to distinguish parse error messages, we don't need to 
introduce a new exception class. We can just catch exceptions for 
`parseCommand`. 
   
   The class hierarchy of `SqlClientException`, `SqlExecutionException`, 
`SqlParseException` looks confusing now, I mean it's not clear when to use 
which exception class. 

##
File path: 
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/LocalExecutor.java
##
@@ -175,10 +176,10 @@ public Operation parseStatement(String sessionId, String 
statement)
 try {
 operations = context.wrapClassLoader(() -> 
parser.parse(statement));
 } catch (Exception e) {
-throw new SqlExecutionException("Failed to parse statement: " + 
statement, e);
+throw new SqlParseException("Failed to parse statement: " + 
statement, e);

Review comment:
   Personally, I think this pull request still not resolve the problem of 
the issue. You will still get the reason 
`org.apache.calcite.runtime.CalciteException: Non-query expression encountered 
in illegal context` when entering `invalid command;`. However, the reason 
message still can't help users figure out what happens. 
   
   I think the purpose of this issue is reminding users this is an unsupported 
syntax. In order to improve the exception message, we can catch exception here, 
and wrap it into a new exception with message "Unsupported statement syntax is 
encountered" (or sth. similar) when the exception class is  `CalciteException` 
and the exception message is matched.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on pull request #15131: [FLINK-21700][yarn]Allow to disable fetching Hadoop delegation token on Yarn

2021-04-20 Thread GitBox


XComp commented on pull request #15131:
URL: https://github.com/apache/flink/pull/15131#issuecomment-823119952


   The Yarn test cases are not that easy to maintain. I was just wondering 
whether you could provide an easy-to-run skeleton for reviewers to rerun the 
test you performed already. I am not familiar with Oozie/Yarn/Kerberos to set 
it up. Or would that be to tricky to accomplish?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on a change in pull request #13055: [FLINK-18677] Added handling of suspended or lost connections

2021-04-20 Thread GitBox


tillrohrmann commented on a change in pull request #13055:
URL: https://github.com/apache/flink/pull/13055#discussion_r616501354



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/ZooKeeperLeaderRetrievalService.java
##
@@ -193,14 +185,20 @@ protected void handleStateChange(ConnectionState 
newState) {
break;
case SUSPENDED:
LOG.warn("Connection to ZooKeeper suspended. 
Can no longer retrieve the leader from " +
-   "ZooKeeper.");
+   "ZooKeeper.");
+   synchronized (lock) {
+   notifyLeaderLoss();

Review comment:
   I think we trigger an explicit 
`ZooKeeperLeaderRetrievalDriver.retrieveLeaderInformationFromZooKeeper` after 
we reconnect. Do you think that this won't be enough?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on pull request #15484: [FLINK-21999][Runtime/Coordination] uniformize the logic about whether checkpoint is enabled and fix some typo.

2021-04-20 Thread GitBox


tillrohrmann commented on pull request #15484:
URL: https://github.com/apache/flink/pull/15484#issuecomment-823121421


   PR has been abandoned. Closing it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   >