[GitHub] flink pull request #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 closed the pull request at:

https://github.com/apache/flink/pull/6370


---


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
@zentol ok

... close as suggested, would be resolved in #6353 


---


[GitHub] flink pull request #:

2018-07-19 Thread tison1
Github user tison1 commented on the pull request:


https://github.com/apache/flink/commit/8231b62ff42aae53ca3a7b552980838ccab824ab#commitcomment-29765803
  
In 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java:
In 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java
 on line 81:
here is a question from 4 years later.

why this method call `ensureCapacity` twice. it seems to solve some issue 
about concurrency but as the change made in #6353 , this method throws a out of 
index exception. so i try to add a synchronized block to make sure it is 
thread-safe #6370 . definitely i think my code is not that perfect. so i come 
to here, wonder the original purpose of this code and ask advice about the two 
PRs mentioned above

@StephanEwen looking forward to your advice. thanks in advance!


---


[GitHub] flink issue #6367: [FLINK-9850] Add a string to the print method to identify...

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6367
  
@yanghua also +1

this is a net win.


---


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
but the original `ensureConstraints` is wired. For example it calls 
`ensureCapacity` twice and the only code path is from `ExecutionJobVertex` 
construct `ExecutionVertex` which calls `ensureConstraints` from `0` to `N`, 
which we gain little goodies from  `ensureCapacity`. and so on.


---


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
@yanghua AFAIK, yes.


---


[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6360
  
> When task executor report a slotA with allocationId1, it may happen that 
slot manager record slotA is assigned to allocationId2, and the slot request 
with allocationId1 is not assigned. Then slot manager will update itself with 
slotA assigned to allocationId1, by it does not clear the slot request with 
allocationId1.
>
> For example:
>  \# There is one free slot in slot manager.
>  \# Now come two slot request with allocationId1 and allocationId2.
>  \# The slot is assigned to allocationId1, but the requestSlot call 
timeout.
>  \# SlotManager assign the slot to allocationId2 and insert a slot 
request with allocationId1.
>  \# The second requestSlot call to task executor return 
SlotOccupiedException.
>  \# SlotManager update the slot to allocationID1, but the slot request is 
left.

pick from the assigned JIRA for further discuss


---


[GitHub] flink pull request #6367: [FLINK-9850] Add a string to the print method to i...

2018-07-19 Thread tison1
Github user tison1 commented on a diff in the pull request:

https://github.com/apache/flink/pull/6367#discussion_r203644208
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/PrintSinkFunction.java
 ---
@@ -40,6 +40,8 @@
private boolean target;
private transient PrintStream stream;
private transient String prefix;
+   private String sinkIdentifier;
+   private String completedPrefix;
--- End diff --

if `prefix` is `transient`, why not `completedPrefix`?


---


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
by analyses the use path of this method, we gain little on 
`ensureCapacity`, in fact, test fails on #6353  is caused by too many 
`ensureCapacity` and then `Array.copyOf` race each other.


---


[GitHub] flink pull request #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
GitHub user tison1 opened a pull request:

https://github.com/apache/flink/pull/6370

[FLINK-9894] [runtime] Potential Data Race

## What is the purpose of the change

*(CoLocationGroup#ensureConstraints may cause data race on `constraints`. 
synchronize its use to avoid that)*


## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tison1/flink patential-race

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6370.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6370


commit 34eace0951363e1d221d2f145dce80d24e5372a7
Author: 陈梓立 
Date:   2018-07-19T08:13:18Z

[FLINK-9894] [runtime] Potential Data Race




---


[GitHub] flink issue #6354: [FLINK-9881] Fixed a typo in table.scala

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6354
  
@twalthr thank you for the hint. I've updated my PRs #6353 #6345 #6339 as 
you suggested.


---


[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...

2018-07-18 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6353
  
Fix unstable case, the problem is code below, that may assign `constraints` 
to a long array and then to a short array, which cause out of index exception. 
to solve it we could init `constraints` in object construct.


https://github.com/apache/flink/blob/056486a1b81e9648a6d3dc795e7e2c6976f8388c/flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java#L87-L91


---


[GitHub] flink issue #6364: [hotfix] typo for SqlExecutionException msg

2018-07-18 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6364
  
Thanks! LGTM

cc @zentol 


---


[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...

2018-07-18 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6360
  
@shuai-xu 
It makes sense.
The message that TM has successfully allocated slot might lost in transport.
When slot manager receives a slot status report which says one slot has 
allocation id irrelevant to this offer, then the slot is allocated to another 
slot request.
It looks this PR prevents runtime from some potential resource leak, 
doesn't it?


---


[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...

2018-07-18 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6345
  
OK. This PR is about performance improvement. I will try to give out a 
benchmark, but since it is inspired by our own batch table tasks, it might take 
time to give one. Though since this PR concurrently send partition info and 
deploy task in another thread, it theoretically does good.

Keep on on Flink 1.6! I will nudge you guys to review this one, 
though(laughed)


---


[GitHub] flink issue #6358: [FLINK-9882] [runtime] A function access can be private

2018-07-18 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6358
  
LGTM


---


[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...

2018-07-18 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6353
  
This pr cause `HITSITCase` unstable, I retrigger ci two times to get more 
info

going to fix it.


---


[GitHub] flink issue #6339: [FLINK-9859][Runtime] Distinguish TM akka config with JM ...

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6339
  
cc @StephanEwen 


---


[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6353
  
I think `ExecutionGraphConstructionTest` covers it.


---


[GitHub] flink issue #6354: [FLINK-9881] Fixed a typo in table.scala

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6354
  
LGTM! It's a net win.


---


[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6353
  
the existing tests verify correctness. I will take a try to give out a 
benchmark report since this PR is more relevant to performance.


---


[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6345
  
cc @sihuazhou 


---


[GitHub] flink pull request #6353: [FLINK-9875] Add concurrent creation of execution ...

2018-07-17 Thread tison1
GitHub user tison1 opened a pull request:

https://github.com/apache/flink/pull/6353

[FLINK-9875] Add concurrent creation of execution job vertex

## Add concurrent creation of execution job vertex

in some case like inputformat vertex, creation of execution job vertex is 
time
consuming, this pr add concurrent creation of execution job vertex to 
accelerate it.

## Brief change log

- `ExecutionGraph`
  - add a method `createExecutionJobVertex` to concurrent creation of 
execution job vertex
  - modify method `attachJobGraph` to acquire goodies from 
`createExecutionJobVertex`

## Verifying this change

Current tests confirm the correctness.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no, it's internal)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tison1/flink execution-vertex-init-improvement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6353.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6353


commit f219249ea99d25a41bb40aeb5a2d8c0d637f1cf7
Author: 陈梓立 
Date:   2018-07-17T13:25:16Z

[FLINK-9875] Add concurrent creation of execution job vertex




---


[GitHub] flink pull request #6347: [hotfix] consistency: vertexes -> vertices

2018-07-17 Thread tison1
Github user tison1 closed the pull request at:

https://github.com/apache/flink/pull/6347


---


[GitHub] flink pull request #6347: [hotfix] consistency: vertexes -> vertices

2018-07-17 Thread tison1
Github user tison1 closed the pull request at:

https://github.com/apache/flink/pull/6347


---


[GitHub] flink issue #6347: [hotfix] consistency: vertexes -> vertices

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6347
  
> vertices is the correct plural, but this is another one of those cases 
where fixing it might cause more harm than good since it could cause merge 
conflicts, yet provides no functional benefit.

... close as @zentol suggested


---


[GitHub] flink issue #6347: [hotfix] consistency: vertexes -> vertices

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6347
  
> vertices is the correct plural, but this is another one of those cases 
where fixing it might cause more harm than good since it could cause merge 
conflicts, yet provides no functional benefit.

... close as @zentol suggested


---


[GitHub] flink issue #6347: [hotfix] consistency: vertexes -> vertices

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6347
  
> Additionally this PR makes a lot of whitespace changes that should be 
reverted in any case.

did you mean the whitespace in comment `* ` is significant?


---


[GitHub] flink issue #6347: [hotfix] typo: vertexes -> vertices

2018-07-17 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6347
  
look up dictionary and it says "vertexes" is also vertex pluralities :P

but the most of our code use "vertices", so could this PR thought as 
keeping consistency?


---


[GitHub] flink pull request #6347: [hotfix] typo: vertexes -> vertices

2018-07-17 Thread tison1
GitHub user tison1 opened a pull request:

https://github.com/apache/flink/pull/6347

[hotfix] typo: vertexes -> vertices

cc @zentol @yanghua 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tison1/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6347


commit d3f8e633f7a7368c187612de837b88d6b09cf7c2
Author: 陈梓立 
Date:   2018-07-17T07:55:03Z

[hotfix] typo: vertexes -> vertices




---


[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...

2018-07-16 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6345
  
cc @tillrohrmann @fhueske 


---


[GitHub] flink pull request #6345: [FLINK-9869] Send PartitionInfo in batch to Improv...

2018-07-16 Thread tison1
GitHub user tison1 opened a pull request:

https://github.com/apache/flink/pull/6345

[FLINK-9869] Send PartitionInfo in batch to Improve perfornance

## What is the purpose of the change

Current we send partition info as soon as one arrive. we could 
`cachePartitionInfo` and then `sendPartitionInfoAsync`, which will improve 
performance.

... also improve task deployment

## Brief change log

- `Execution`
  - now deploy task in another thread
  - as describe above, now we first `cachePartitionInfo` and then 
`sendPartitionInfoAsync`
- add a config option 
`JobManagerOptions#UPDATE_PARTITION_INFO_SEND_INTERVAL`, which config the time 
window for cachePartitionInfo
- update `ExecutionGraphDeploymentTest` and 
`ExecutionVertexDeploymentTest`, which also tests changes above

## Verifying this change

This change is already covered by existing tests, such as 
`ExecutionGraphDeploymentTest` and `ExecutionVertexDeploymentTest`

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no, it's internal)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tison1/flink partition-improve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6345


commit ca9ffbb99e91a8415d7469cba4bf2075615edc0d
Author: 陈梓立 
Date:   2018-07-17T04:11:36Z

[FLINK-9869] Send PartitionInfo in batch to Improve perfornance




---


[GitHub] flink issue #6339: [FLINK-9859][Runtime] Distinguish TM akka config with JM ...

2018-07-16 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6339
  
cc @zentol @tzulitai 


---


[GitHub] flink pull request #6339: [FLINK-9859][Runtime] Distinguish TM akka config w...

2018-07-16 Thread tison1
GitHub user tison1 opened a pull request:

https://github.com/apache/flink/pull/6339

[FLINK-9859][Runtime] Distinguish TM akka config with JM config

## What is the purpose of the change

 Distinguish TM akka config with JM config.

 1. increase the number of akka threads on JM, to improve its performance
 2. decrease the number of akka threads on TM, to save resource.


## Brief change log
  - `AkkaUtils`
- add a method `getTaskExecutorAkkaConfig` to distinguish TM's config 
with JM's
- also the config details
  - `TaskManagerRunner` && `AkkaRpcServiceUtils` && `AkkaOptions`
- add a method `AkkaRpcServiceUtils#createRpcServiceForTaskExecutor`
- add a config option `AkkaOptions#AKKA_PORT_BIND_TIMEOUT`, which 
defines the timeout of binding retry. This is used in 
`AkkaRpcServiceUtils#createRpcServiceForTaskExecutor`
- `TaskManagerRunner#createRpcService` now uses 
`AkkaRpcServiceUtils#createRpcServiceForTaskExecutor` to create RpcService
  - add a test `AkkaRpcServiceUtilsTest` to see createRpcService works 
after changes above

## Verifying this change

This change added tests and can be verified as follows:

  - Unit test. Locally create rpc service using customized config, to see 
whether we could createRpcService properly.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no, it is internal)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tison1/flink akka-improve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6339


commit 019c018dfa994b6ef755823658adf613166659c8
Author: 陈梓立 
Date:   2018-07-16T09:47:29Z

[FLINK-9859][Runtime] Distinguish TM akka config with JM config

... increase the number of akka threads on JM, to improve its performance; 
decrease the number of akka threads on TM, to save resource.




---