[GitHub] flink pull request #6370: [FLINK-9894] [runtime] Potential Data Race
Github user tison1 closed the pull request at: https://github.com/apache/flink/pull/6370 ---
[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6370 @zentol ok ... close as suggested, would be resolved in #6353 ---
[GitHub] flink pull request #:
Github user tison1 commented on the pull request: https://github.com/apache/flink/commit/8231b62ff42aae53ca3a7b552980838ccab824ab#commitcomment-29765803 In flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java: In flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java on line 81: here is a question from 4 years later. why this method call `ensureCapacity` twice. it seems to solve some issue about concurrency but as the change made in #6353 , this method throws a out of index exception. so i try to add a synchronized block to make sure it is thread-safe #6370 . definitely i think my code is not that perfect. so i come to here, wonder the original purpose of this code and ask advice about the two PRs mentioned above @StephanEwen looking forward to your advice. thanks in advance! ---
[GitHub] flink issue #6367: [FLINK-9850] Add a string to the print method to identify...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6367 @yanghua also +1 this is a net win. ---
[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6370 but the original `ensureConstraints` is wired. For example it calls `ensureCapacity` twice and the only code path is from `ExecutionJobVertex` construct `ExecutionVertex` which calls `ensureConstraints` from `0` to `N`, which we gain little goodies from `ensureCapacity`. and so on. ---
[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6370 @yanghua AFAIK, yes. ---
[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6360 > When task executor report a slotA with allocationId1, it may happen that slot manager record slotA is assigned to allocationId2, and the slot request with allocationId1 is not assigned. Then slot manager will update itself with slotA assigned to allocationId1, by it does not clear the slot request with allocationId1. > > For example: > \# There is one free slot in slot manager. > \# Now come two slot request with allocationId1 and allocationId2. > \# The slot is assigned to allocationId1, but the requestSlot call timeout. > \# SlotManager assign the slot to allocationId2 and insert a slot request with allocationId1. > \# The second requestSlot call to task executor return SlotOccupiedException. > \# SlotManager update the slot to allocationID1, but the slot request is left. pick from the assigned JIRA for further discuss ---
[GitHub] flink pull request #6367: [FLINK-9850] Add a string to the print method to i...
Github user tison1 commented on a diff in the pull request: https://github.com/apache/flink/pull/6367#discussion_r203644208 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/PrintSinkFunction.java --- @@ -40,6 +40,8 @@ private boolean target; private transient PrintStream stream; private transient String prefix; + private String sinkIdentifier; + private String completedPrefix; --- End diff -- if `prefix` is `transient`, why not `completedPrefix`? ---
[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6370 by analyses the use path of this method, we gain little on `ensureCapacity`, in fact, test fails on #6353 is caused by too many `ensureCapacity` and then `Array.copyOf` race each other. ---
[GitHub] flink pull request #6370: [FLINK-9894] [runtime] Potential Data Race
GitHub user tison1 opened a pull request: https://github.com/apache/flink/pull/6370 [FLINK-9894] [runtime] Potential Data Race ## What is the purpose of the change *(CoLocationGroup#ensureConstraints may cause data race on `constraints`. synchronize its use to avoid that)* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tison1/flink patential-race Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6370.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6370 commit 34eace0951363e1d221d2f145dce80d24e5372a7 Author: éæ¢ç« Date: 2018-07-19T08:13:18Z [FLINK-9894] [runtime] Potential Data Race ---
[GitHub] flink issue #6354: [FLINK-9881] Fixed a typo in table.scala
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6354 @twalthr thank you for the hint. I've updated my PRs #6353 #6345 #6339 as you suggested. ---
[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6353 Fix unstable case, the problem is code below, that may assign `constraints` to a long array and then to a short array, which cause out of index exception. to solve it we could init `constraints` in object construct. https://github.com/apache/flink/blob/056486a1b81e9648a6d3dc795e7e2c6976f8388c/flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java#L87-L91 ---
[GitHub] flink issue #6364: [hotfix] typo for SqlExecutionException msg
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6364 Thanks! LGTM cc @zentol ---
[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6360 @shuai-xu It makes sense. The message that TM has successfully allocated slot might lost in transport. When slot manager receives a slot status report which says one slot has allocation id irrelevant to this offer, then the slot is allocated to another slot request. It looks this PR prevents runtime from some potential resource leak, doesn't it? ---
[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6345 OK. This PR is about performance improvement. I will try to give out a benchmark, but since it is inspired by our own batch table tasks, it might take time to give one. Though since this PR concurrently send partition info and deploy task in another thread, it theoretically does good. Keep on on Flink 1.6! I will nudge you guys to review this one, though(laughed) ---
[GitHub] flink issue #6358: [FLINK-9882] [runtime] A function access can be private
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6358 LGTM ---
[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6353 This pr cause `HITSITCase` unstable, I retrigger ci two times to get more info going to fix it. ---
[GitHub] flink issue #6339: [FLINK-9859][Runtime] Distinguish TM akka config with JM ...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6339 cc @StephanEwen ---
[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6353 I think `ExecutionGraphConstructionTest` covers it. ---
[GitHub] flink issue #6354: [FLINK-9881] Fixed a typo in table.scala
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6354 LGTM! It's a net win. ---
[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6353 the existing tests verify correctness. I will take a try to give out a benchmark report since this PR is more relevant to performance. ---
[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6345 cc @sihuazhou ---
[GitHub] flink pull request #6353: [FLINK-9875] Add concurrent creation of execution ...
GitHub user tison1 opened a pull request: https://github.com/apache/flink/pull/6353 [FLINK-9875] Add concurrent creation of execution job vertex ## Add concurrent creation of execution job vertex in some case like inputformat vertex, creation of execution job vertex is time consuming, this pr add concurrent creation of execution job vertex to accelerate it. ## Brief change log - `ExecutionGraph` - add a method `createExecutionJobVertex` to concurrent creation of execution job vertex - modify method `attachJobGraph` to acquire goodies from `createExecutionJobVertex` ## Verifying this change Current tests confirm the correctness. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no, it's internal) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tison1/flink execution-vertex-init-improvement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6353.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6353 commit f219249ea99d25a41bb40aeb5a2d8c0d637f1cf7 Author: éæ¢ç« Date: 2018-07-17T13:25:16Z [FLINK-9875] Add concurrent creation of execution job vertex ---
[GitHub] flink pull request #6347: [hotfix] consistency: vertexes -> vertices
Github user tison1 closed the pull request at: https://github.com/apache/flink/pull/6347 ---
[GitHub] flink pull request #6347: [hotfix] consistency: vertexes -> vertices
Github user tison1 closed the pull request at: https://github.com/apache/flink/pull/6347 ---
[GitHub] flink issue #6347: [hotfix] consistency: vertexes -> vertices
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6347 > vertices is the correct plural, but this is another one of those cases where fixing it might cause more harm than good since it could cause merge conflicts, yet provides no functional benefit. ... close as @zentol suggested ---
[GitHub] flink issue #6347: [hotfix] consistency: vertexes -> vertices
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6347 > vertices is the correct plural, but this is another one of those cases where fixing it might cause more harm than good since it could cause merge conflicts, yet provides no functional benefit. ... close as @zentol suggested ---
[GitHub] flink issue #6347: [hotfix] consistency: vertexes -> vertices
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6347 > Additionally this PR makes a lot of whitespace changes that should be reverted in any case. did you mean the whitespace in comment `* ` is significant? ---
[GitHub] flink issue #6347: [hotfix] typo: vertexes -> vertices
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6347 look up dictionary and it says "vertexes" is also vertex pluralities :P but the most of our code use "vertices", so could this PR thought as keeping consistency? ---
[GitHub] flink pull request #6347: [hotfix] typo: vertexes -> vertices
GitHub user tison1 opened a pull request: https://github.com/apache/flink/pull/6347 [hotfix] typo: vertexes -> vertices cc @zentol @yanghua You can merge this pull request into a Git repository by running: $ git pull https://github.com/tison1/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6347 commit d3f8e633f7a7368c187612de837b88d6b09cf7c2 Author: éæ¢ç« Date: 2018-07-17T07:55:03Z [hotfix] typo: vertexes -> vertices ---
[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6345 cc @tillrohrmann @fhueske ---
[GitHub] flink pull request #6345: [FLINK-9869] Send PartitionInfo in batch to Improv...
GitHub user tison1 opened a pull request: https://github.com/apache/flink/pull/6345 [FLINK-9869] Send PartitionInfo in batch to Improve perfornance ## What is the purpose of the change Current we send partition info as soon as one arrive. we could `cachePartitionInfo` and then `sendPartitionInfoAsync`, which will improve performance. ... also improve task deployment ## Brief change log - `Execution` - now deploy task in another thread - as describe above, now we first `cachePartitionInfo` and then `sendPartitionInfoAsync` - add a config option `JobManagerOptions#UPDATE_PARTITION_INFO_SEND_INTERVAL`, which config the time window for cachePartitionInfo - update `ExecutionGraphDeploymentTest` and `ExecutionVertexDeploymentTest`, which also tests changes above ## Verifying this change This change is already covered by existing tests, such as `ExecutionGraphDeploymentTest` and `ExecutionVertexDeploymentTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no, it's internal) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tison1/flink partition-improve Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6345 commit ca9ffbb99e91a8415d7469cba4bf2075615edc0d Author: éæ¢ç« Date: 2018-07-17T04:11:36Z [FLINK-9869] Send PartitionInfo in batch to Improve perfornance ---
[GitHub] flink issue #6339: [FLINK-9859][Runtime] Distinguish TM akka config with JM ...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6339 cc @zentol @tzulitai ---
[GitHub] flink pull request #6339: [FLINK-9859][Runtime] Distinguish TM akka config w...
GitHub user tison1 opened a pull request: https://github.com/apache/flink/pull/6339 [FLINK-9859][Runtime] Distinguish TM akka config with JM config ## What is the purpose of the change Distinguish TM akka config with JM config. 1. increase the number of akka threads on JM, to improve its performance 2. decrease the number of akka threads on TM, to save resource. ## Brief change log - `AkkaUtils` - add a method `getTaskExecutorAkkaConfig` to distinguish TM's config with JM's - also the config details - `TaskManagerRunner` && `AkkaRpcServiceUtils` && `AkkaOptions` - add a method `AkkaRpcServiceUtils#createRpcServiceForTaskExecutor` - add a config option `AkkaOptions#AKKA_PORT_BIND_TIMEOUT`, which defines the timeout of binding retry. This is used in `AkkaRpcServiceUtils#createRpcServiceForTaskExecutor` - `TaskManagerRunner#createRpcService` now uses `AkkaRpcServiceUtils#createRpcServiceForTaskExecutor` to create RpcService - add a test `AkkaRpcServiceUtilsTest` to see createRpcService works after changes above ## Verifying this change This change added tests and can be verified as follows: - Unit test. Locally create rpc service using customized config, to see whether we could createRpcService properly. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no, it is internal) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tison1/flink akka-improve Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6339 commit 019c018dfa994b6ef755823658adf613166659c8 Author: éæ¢ç« Date: 2018-07-16T09:47:29Z [FLINK-9859][Runtime] Distinguish TM akka config with JM config ... increase the number of akka threads on JM, to improve its performance; decrease the number of akka threads on TM, to save resource. ---