from:"Xintong Song \(JIRA\)"

[jira] [Created] (FLINK-10640) Enable Slot Resource Profile for Resource Management

2018-10-22 Thread Xintong Song (JIRA)

Xintong Song created FLINK-10640:


 Summary: Enable Slot Resource Profile for Resource Management
 Key: FLINK-10640
 URL: https://issues.apache.org/jira/browse/FLINK-10640
 Project: Flink
  Issue Type: New Feature
  Components: ResourceManager
Reporter: Xintong Song


Motivation & Backgrounds
 * The existing concept of task slots roughly represents how many pipeline of 
tasks a TaskManager can hold. However, it does not consider the differences in 
resource needs and usage of individual tasks. Enabling resource profiles of 
slots may allow Flink to better allocate execution resources according to tasks 
fine-grained resource needs.
 * The community version Flink already contains APIs and some implementation 
for slot resource profile. However, such logic is not truly used. 
(ResourceProfile of slot requests is by default set to UNKNOWN with negative 
values, thus matches any given slot.)

Preliminary Design
 * Slot Management
 A slot represents a certain amount of resources for a single pipeline of tasks 
to run in on a TaskManager. Initially, a TaskManager does not have any slots 
but a total amount of resources. When allocating, the ResourceManager finds 
proper TMs to generate new slots for the tasks to run according to the slot 
requests. Once generated, the slot's size (resource profile) does not change 
until it's freed. ResourceManager can apply different, portable strategies to 
allocate slots from TaskManagers.
 * TM Management
 The size and number of TaskManagers and when to start them can also be 
flexible. TMs can be started and released dynamically, and may have different 
sizes. We may have many different, portable strategies. E.g., an elastic 
session that can run multiple jobs like the session mode while dynamically 
adjusting the size of session (number of TMs) according to the realtime working 
load.
 * About Slot Sharing
 Slot sharing is a good heuristic to easily calculate how many slots needed to 
get the job running and get better utilization when there is no resource 
profile in slots. However, with resource profiles enabling finer-grained 
resource management, each individual task has its specific resource need and it 
does not make much sense to have multiple tasks sharing the resource of the 
same slot. Instead, we may introduce locality preferences/constraints to 
support the semantics of putting tasks in same/different TMs in a more general 
way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13184) Support launching task executors with multi-thread on YARN.

2019-07-09 Thread Xintong Song (JIRA)

Xintong Song created FLINK-13184:


 Summary: Support launching task executors with multi-thread on 
YARN.
 Key: FLINK-13184
 URL: https://issues.apache.org/jira/browse/FLINK-13184
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.8.1, 1.9.0
Reporter: Xintong Song
Assignee: Xintong Song


Currently, YarnResourceManager starts all task executors in main thread. This 
could cause RM thread becomes unresponsive when launching a large number of TEs 
(e.g. > 1000), leading to TE registration/heartbeat timeouts.

 

In Blink, we have a thread pool that RM starts TEs through the YARN NMClient in 
separated threads. I think we should add this feature to the Flink master 
branch as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-13554) ResourceManager should have a timeout on starting new TaskExecutors.

2019-08-02 Thread Xintong Song (JIRA)

Xintong Song created FLINK-13554:


 Summary: ResourceManager should have a timeout on starting new 
TaskExecutors.
 Key: FLINK-13554
 URL: https://issues.apache.org/jira/browse/FLINK-13554
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song


Recently, we encountered a case that one TaskExecutor get stuck during 
launching on Yarn (without fail), causing that job cannot recover from 
continuous failovers.

The reason the TaskExecutor gets stuck is due to our environment problem. The 
TaskExecutor gets stuck somewhere after the ResourceManager starts the 
TaskExecutor and waiting for the TaskExecutor to be brought up and register. 
Later when the slot request timeouts, the job fails over and requests slots 
from ResourceManager again, the ResourceManager still see a TaskExecutor (the 
stuck one) is being started and will not request new container from Yarn. 
Therefore, the job can not recover from failure.

I think to avoid such unrecoverable status, the ResourceManager need to have a 
timeout on starting new TaskExecutor. If the starting of TaskExecutor takes too 
long, it should just fail the TaskExecutor and starts a new one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13555) Failures of slot requests requiring unfulfillable managed memory should not be ignored.

2019-08-02 Thread Xintong Song (JIRA)

Xintong Song created FLINK-13555:


 Summary: Failures of slot requests requiring unfulfillable managed 
memory should not be ignored.
 Key: FLINK-13555
 URL: https://issues.apache.org/jira/browse/FLINK-13555
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.9.0
 Attachments: flink-unk-standalonesession-0-u-home.log, 
flink-unk-taskexecutor-0-u-home.log

Currently, SlotPool ignores failures of requesting slots from ResourceManager 
for all batch slot requests. The idea behind this is to allow batch slot 
requests pending at SlotPool and waiting for other tasks to finish and release 
slots. A slot request will be failed only if it is not fulfilled in its timeout.

However, there could be two kinds of request slots from RM failures.
 # RM does not have available slots. All slots are in use at the moment. But 
they might become available later when the currently running tasks finish.
 # The slot request requires too many resources that can not be fulfilled by 
any slot (available or not) in the cluster. The request is also not likely to 
be fulfilled later.

For the 2nd kinds of failures, it doesn't make sense to wait for the timeout. 
We should fail the job immediately, with proper error messages describing the 
problem and suggesting the user to tune job or cluster configurations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13579) Failed launching standalone cluster due to improper configured irrelevant config options for active mode.

2019-08-05 Thread Xintong Song (JIRA)

Xintong Song created FLINK-13579:


 Summary: Failed launching standalone cluster due to improper 
configured irrelevant config options for active mode.
 Key: FLINK-13579
 URL: https://issues.apache.org/jira/browse/FLINK-13579
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.9.0






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (FLINK-13980) FLIP-49 Unified Memory Configuration for TaskExecutors

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13980:


 Summary: FLIP-49 Unified Memory Configuration for TaskExecutors
 Key: FLINK-13980
 URL: https://issues.apache.org/jira/browse/FLINK-13980
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration, Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.10.0


This is the umbrella issue of 'FLIP-49: Unified Memory Configuration for 
TaskExecutors'.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13981) Introduce a switch for enabling the new task executor memory configurations

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13981:


 Summary: Introduce a switch for enabling the new task executor 
memory configurations
 Key: FLINK-13981
 URL: https://issues.apache.org/jira/browse/FLINK-13981
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13985) Use native memory for managed memory.

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13985:


 Summary: Use native memory for managed memory.
 Key: FLINK-13985
 URL: https://issues.apache.org/jira/browse/FLINK-13985
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13982) Implement memory calculation logics

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13982:


 Summary: Implement memory calculation logics
 Key: FLINK-13982
 URL: https://issues.apache.org/jira/browse/FLINK-13982
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13984) Separate on-heap and off-heap managed memory pools

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13984:


 Summary: Separate on-heap and off-heap managed memory pools
 Key: FLINK-13984
 URL: https://issues.apache.org/jira/browse/FLINK-13984
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13986) Clean-up of legacy mode

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13986:


 Summary: Clean-up of legacy mode
 Key: FLINK-13986
 URL: https://issues.apache.org/jira/browse/FLINK-13986
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13983) Launch task executor with new memory calculation logics

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13983:


 Summary: Launch task executor with new memory calculation logics
 Key: FLINK-13983
 URL: https://issues.apache.org/jira/browse/FLINK-13983
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14058) FLIP-53 Fine Grained Operator Resource Management

2019-09-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-14058:


 Summary: FLIP-53 Fine Grained Operator Resource Management
 Key: FLINK-14058
 URL: https://issues.apache.org/jira/browse/FLINK-14058
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.10.0


This is the umbrella issue of 'FLIP-53: Fine Grained Operator Resource 
Management'.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14059) Introduce option allSourcesInSamePipelinedRegion in ExecutionConfig

2019-09-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-14059:


 Summary: Introduce option allSourcesInSamePipelinedRegion in 
ExecutionConfig
 Key: FLINK-14059
 URL: https://issues.apache.org/jira/browse/FLINK-14059
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* Introduce option {{allSourcesInSamePipelinedRegion}} in {{ExecutionConfig}}
 * Set it to {{true}} by default
 * Set it to {{false}} for SQL/Table API bounded batch jobs by the Blink planner

This step should not introduce any behavior changes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14061) Introduce managed memory fractions to StreamConfig

2019-09-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-14061:


 Summary: Introduce managed memory fractions to StreamConfig
 Key: FLINK-14061
 URL: https://issues.apache.org/jira/browse/FLINK-14061
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


Introduce {{fracManagedMemOnHeap}} and {{fracManagedMemOffHeap}} in 
{{StreamConfig}}, so they can be set by {{StreamingJobGraphGenerator}} and used 
by operators in runtime. 

This step should not introduce any behavior changes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14063) Operators use fractions to decide how many managed memory to allocate

2019-09-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-14063:


 Summary: Operators use fractions to decide how many managed memory 
to allocate
 Key: FLINK-14063
 URL: https://issues.apache.org/jira/browse/FLINK-14063
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* Operators allocate memory segments with the amount returned by 
{{MemoryManager#computeNumberOfPages}}.
 * Operators reserve memory with the amount returned by 
{{MemoryManager#computeMemorySize}}. 

This step activates the new fraction based managed memory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14060) Set slot sharing groups according to pipelined regions

2019-09-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-14060:


 Summary: Set slot sharing groups according to pipelined regions
 Key: FLINK-14060
 URL: https://issues.apache.org/jira/browse/FLINK-14060
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


{{StreamingJobGraphGenerator}} set slot sharing group for operators at 
compiling time.
 * Identify pipelined regions, with respect to 
{{allSourcesInSamePipelinedRegion}}
 * Set slot sharing groups according to pipelined regions 
 ** By default, each pipelined region should go into a separate slot sharing 
group
 ** If the user sets operators in multiple pipelined regions into same slot 
sharing group, it should be respected

This step should not introduce any behavior changes, given that later scheduled 
pipelined regions can reuse slots from previous scheduled pipelined regions. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14062) Set managed memory fractions according to slot sharing groups

2019-09-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-14062:


 Summary: Set managed memory fractions according to slot sharing 
groups
 Key: FLINK-14062
 URL: https://issues.apache.org/jira/browse/FLINK-14062
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* For operators with specified {{ResourceSpecs}}, calculate fractions according 
to operators {{ResourceSpecs}}
 * For operators with unknown {{ResourceSpecs}}, calculate fractions according 
to number of operators using managed memory

This step should not introduce any behavior changes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14106) Make SlotManager pluggable

2019-09-17 Thread Xintong Song (Jira)

Xintong Song created FLINK-14106:


 Summary: Make SlotManager pluggable
 Key: FLINK-14106
 URL: https://issues.apache.org/jira/browse/FLINK-14106
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.10.0


As we are enabling fine grained resource management in 1.10, we can have 
various resource scheduling strategies. Such strategies generally should make 
the following three decisions.
 * When to launch new / release existing TMs? (How many TMs)
 * What and how many resources should TMs be started with?
 * How to allocate between slot requests and TM resources?

We may want to make above decisions differently in different scenarios 
(active/reactive mode, perjob/session mode, etc.). Therefore, we propose to 
make the scheduling strategies pluggable.

We propose to make the following changes:
 * Make SlotManager an interface, and implements it differently for different 
strategies strategies.
 * Modify ResourceManager-SlotManager interfaces to cover all the three 
decisions mentioned above in SlotManager. In particular, SlotManager needs to 
allocate TM resources instead of slot resources from ResourceActions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14187) FLIP-56 Dynamic Slot Allocation

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14187:


 Summary: FLIP-56 Dynamic Slot Allocation
 Key: FLINK-14187
 URL: https://issues.apache.org/jira/browse/FLINK-14187
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.10.0


This is the umbrella issue for 'FLIP-56: Dynamic Slot Allocation'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14189) Extend TaskExecutor to support dynamic slot allocation

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14189:


 Summary: Extend TaskExecutor to support dynamic slot allocation
 Key: FLINK-14189
 URL: https://issues.apache.org/jira/browse/FLINK-14189
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* TaskSlotTable
 ** Bookkeep task manager available resources
 ** Add and implement interface for dynamic allocating slot (with resource 
profile instead of slot index)
 ** Create slot report with dynamic allocated slots and remaining available 
resources
 * TaskExecutor
 ** Support request slot with resource profile rather than slot id.

The slot report still contain status of legacy free slots. When ResourceManager 
requests slots with slot id, convert it to default slot resource profiles for 
bookkeeping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14188) TaskExecutor derive and register with default slot resource profile

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14188:


 Summary: TaskExecutor derive and register with default slot 
resource profile
 Key: FLINK-14188
 URL: https://issues.apache.org/jira/browse/FLINK-14188
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* Introduce config option for defaultSlotFraction
 * Derive default slot resource profile from the new config option, or the 
legacy config option "taskmanager.numberOfTaskSlots".
 * Register task executor with the default slot resource profile.

This step should not introduce any behavior changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14191) Extend SlotManager to support dynamic slot allocation on pending task executors

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14191:


 Summary: Extend SlotManager to support dynamic slot allocation on 
pending task executors
 Key: FLINK-14191
 URL: https://issues.apache.org/jira/browse/FLINK-14191
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* Introduce PendingTaskManagerResources
 * Create PendingTaskManagerSlot on allocation, from PendingTaskManagerResource
 * Map registered task executors to matching PendingTaskManagerResources, and 
allocate slots for corresponding PendingTaskManagerSlots

Convert registered task executor free slots into equivalent available resources 
according to default slot resource profiles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14190) Extend SlotManager to support dynamic slot allocation on registered TaskExecutors.

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14190:


 Summary: Extend SlotManager to support dynamic slot allocation on 
registered TaskExecutors.
 Key: FLINK-14190
 URL: https://issues.apache.org/jira/browse/FLINK-14190
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* Bookkeep task manager available resources
 * Match between slot requests and task executor resources
 ** Find task executors with matching available resources for slot requests
 ** Find matching pending slot requests for task executors with new available 
resources
 * Create TaskManagerSlot on allocation and remove on free.
 * Request slot from TaskExecutor with resource profiles.

Use RM calculated default resource profiles for all slot requests. Convert free 
slots in SlotReports into equivalent available resources according to default 
slot resource profiles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14192) Enable the dynamic slot allocation feature.

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14192:


 Summary: Enable the dynamic slot allocation feature.
 Key: FLINK-14192
 URL: https://issues.apache.org/jira/browse/FLINK-14192
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* ResourceManager uses TaskExecutor registered default slot resource profiles, 
instead of that calculated on RM side.
 * ResourceManager uses actual requested resource profiles for slot requests, 
instead assuming default profile for all requests.
 * TaskExecutor bookkeep with requested resource profiles instead, instead of 
assuming default profile for all requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14193) Update RestAPI / Web UI

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14193:


 Summary: Update RestAPI / Web UI
 Key: FLINK-14193
 URL: https://issues.apache.org/jira/browse/FLINK-14193
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


* Update RestAPI / WebUI to properly display information of available resources 
and allocated slots of task executors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14194) Clean-up of legacy mode.

2019-09-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-14194:


 Summary: Clean-up of legacy mode.
 Key: FLINK-14194
 URL: https://issues.apache.org/jira/browse/FLINK-14194
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14405) Align ResourceProfile/ResourceSpec fields with the new TaskExecutor memory setups.

2019-10-16 Thread Xintong Song (Jira)

Xintong Song created FLINK-14405:


 Summary: Align ResourceProfile/ResourceSpec fields with the new 
TaskExecutor memory setups.
 Key: FLINK-14405
 URL: https://issues.apache.org/jira/browse/FLINK-14405
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15023) Remove on-heap managed memory

2019-12-02 Thread Xintong Song (Jira)

Xintong Song created FLINK-15023:


 Summary: Remove on-heap managed memory
 Key: FLINK-15023
 URL: https://issues.apache.org/jira/browse/FLINK-15023
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration
Reporter: Xintong Song


As mentioned in [this discussion 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Make-Managed-Memory-always-off-heap-Adjustment-to-FLIP-49-td35365.html],
 we want to make managed memory always off-heap.

This task include changes:
* Remove `MEMORY_OFF_HEAP`, `MANAGED_MEMORY_OFFHEAP_FRACTION` and 
`MANAGED_MEMORY_OFFHEAP_SIZE` from `TaskManagerOptions`.
* Remove `onHeapManagedMemory` from `ResourceProfile`, `ResourceSpec` and 
`TaskExecutorResourceSpec`.
* Remove on heap managed memory from `MemoryManager`
* Remove on heap managed memory fraction from `StreamConfig`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15053) Configurations with values contains space may cause TM failures on Yarn

2019-12-04 Thread Xintong Song (Jira)

Xintong Song created FLINK-15053:


 Summary: Configurations with values contains space may cause TM 
failures on Yarn
 Key: FLINK-15053
 URL: https://issues.apache.org/jira/browse/FLINK-15053
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.10.0


Currently on Yarn setups, we are passing task executor specific configurations 
through dynamic properties in the starting command (see FLINK-13184).

If the value of configuration contains space, the dynamic properties may not be 
correctly parsed, which could cause task executor failures. On occurrence can 
be found in FLINK-15047.

It would be good to allow spaces when passing dynamic properties. E.g., 
surrounding the values with double quotation marks, or escaping special 
characters.

cc [~fly_in_gis]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15055) YarnDistributedCacheITCase does not generate logs.

2019-12-04 Thread Xintong Song (Jira)

Xintong Song created FLINK-15055:


 Summary: YarnDistributedCacheITCase does not generate logs.
 Key: FLINK-15055
 URL: https://issues.apache.org/jira/browse/FLINK-15055
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN, Tests
Affects Versions: 1.10.0
Reporter: Xintong Song


According to FLINK-14630, in order to properly generate logs, test cases that 
use starts cluster with YarnClusterDescriptor on MiniYARNCluster should call 
the util function FlinkYarnSessionCli#setLogConfigFileInConfig to set the path 
of log4j property file into the configuration.

The YarnDistributedCacheITCase failed to do this.

We probably should also revisit other Yarn IT cases see if there's any other 
case that also failed to do this. 

A discussable alternative is to make it a default behavior for 
YarnClusterDescriptor to always set the default log4j property file, if it can 
find one, when the property file is not explicitly configured.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15143) Create document for FLIP-49 TM memory model and configuration guide

2019-12-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-15143:


 Summary: Create document for FLIP-49 TM memory model and 
configuration guide
 Key: FLINK-15143
 URL: https://issues.apache.org/jira/browse/FLINK-15143
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.10.0


In release 1.10, with FLIP-49, we introduced significant changes to the 
TaskExecutor memory model and it's related configuration options / logics.

It is very important that we clearly state the changes and potential effects, 
and guide our users to tune their clusters with the new configuration for both 
new setups and migrations of previous setups. And we should do that for both 
documentation and the release note.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15145) Tune default values for FLIP-49 TM memory configurations with real production jobs.

2019-12-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-15145:


 Summary: Tune default values for FLIP-49 TM memory configurations 
with real production jobs.
 Key: FLINK-15145
 URL: https://issues.apache.org/jira/browse/FLINK-15145
 Project: Flink
  Issue Type: Task
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


In release 1.10, with FLIP-49, we introduced significant changes to the 
TaskExecutor memory model and it's related configuration options / logics.

Since the model and configuration logics are changed, it is reasonable that we 
also change the default configuration values. Currently, the default values are 
set with the gut feelings and experiences from e2e tests. It would be good that 
we try and tune the configurations with some real production jobs, of various 
scales if possible, before exposing the configurations in the release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15367) Handle backwards compatibility of "taskmanager.heap.size" differently for standalone / active setups

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15367:


 Summary: Handle backwards compatibility of "taskmanager.heap.size" 
differently for standalone / active setups
 Key: FLINK-15367
 URL: https://issues.apache.org/jira/browse/FLINK-15367
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


Previously, "taskmanager.heap.size" were used differently for calculating TM 
memory sizes on standalone / active setups. To fully align with the previous 
behaviors, we need to map this deprecated key to 
"taskmanager.memory.flink.size" for standalone setups and 
"taskmanager.memory.process.size" for active setups.

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15369) MiniCluster use fixed network / managed memory sizes by defualt

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15369:


 Summary: MiniCluster use fixed network / managed memory sizes by 
defualt
 Key: FLINK-15369
 URL: https://issues.apache.org/jira/browse/FLINK-15369
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


Currently, Mini Cluster may allocate off-heap memory (managed & network) 
according to the JVM free heap size and configured off-heap fractions. This 
could lead to unnecessary large off-heap memory usage and unpredictable / 
hard-to-understand behaviors.

We believe a fix value for managed / network memory would be enough for a such 
a setup that runs Flink as a library.

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15371) Change FLIP-49 memory configurations to use the new memory type config options

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15371:


 Summary: Change FLIP-49 memory configurations to use the new 
memory type config options
 Key: FLINK-15371
 URL: https://issues.apache.org/jira/browse/FLINK-15371
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


FLIP-49 memory configurations can leverage the new strong typed ConfigOption, 
to make validation automatic and save from breaking the options later.

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15372) Use shorter config keys for FLIP-49 total memory config options

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15372:


 Summary: Use shorter config keys for FLIP-49 total memory config 
options
 Key: FLINK-15372
 URL: https://issues.apache.org/jira/browse/FLINK-15372
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


We propose to use shorter keys for total flink / process memory config options, 
to make it less clumsy without loss of expressiveness.

To be specific, we propose to:
* Change the config option key "taskmanager.memory.total-flink.size" to 
"taskmanager.memory.flink.size"
* Change the config option key "taskmanager.memory.total-process.size" to 
"taskmanager.memory.process.size"

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15373) Update descriptions for framework / task off-heap memory config options

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15373:


 Summary: Update descriptions for framework / task off-heap memory 
config options
 Key: FLINK-15373
 URL: https://issues.apache.org/jira/browse/FLINK-15373
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


Update descriptions for "taskmanager.memory.framework.off-heap.size" and 
"taskmanager.memory.task.off-heap.size" to explicitly state that:
* Both direct and native memory are accounted
* Will be fully counted into MaxDirectMemorySize

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15374) Update descriptions for jvm overhead config options

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15374:


 Summary: Update descriptions for jvm overhead config options
 Key: FLINK-15374
 URL: https://issues.apache.org/jira/browse/FLINK-15374
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


Update descriptions for "taskmanager.memory.jvm-overhead.[min|max|fraction]" to 
remove "I/O direct memory" and explicitly state that it's not counted into 
MaxDirectMemorySize.

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15375) Improve MemorySize to print / parse with better readability.

2019-12-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-15375:


 Summary: Improve MemorySize to print / parse with better 
readability.
 Key: FLINK-15375
 URL: https://issues.apache.org/jira/browse/FLINK-15375
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0


* Print MemorySize with proper unit rather than tremendous number of bytes.
* Parse memory size in numbers instead of {{parse(xxx + "m")}}

Detailed discussion can be found in this [ML 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15382) Flink failed generating python config docs

2019-12-24 Thread Xintong Song (Jira)

Xintong Song created FLINK-15382:


 Summary: Flink failed generating python config docs 
 Key: FLINK-15382
 URL: https://issues.apache.org/jira/browse/FLINK-15382
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Runtime / Configuration
Reporter: Xintong Song


When generating config option docs with the command suggested by 
{{flink-docs/README.md}}, the generated 
{{docs/_includes/generated/python_configuration.html}} does not contain any 
config options despite that there are 4 options in {{PythonOptions}}. 

I encountered this problem at the commit 
{{545534e43ed37f518fe59b6ddd8ed56ae82a234b}} on master branch.

Command used to generate doc:
{code:bash}mvn package -Dgenerate-config-docs -pl flink-docs -am -nsu 
-DskipTests{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15403) 'State Migration end-to-end test from 1.6' is unstable on travis.

2019-12-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-15403:


 Summary: 'State Migration end-to-end test from 1.6' is unstable on 
travis.
 Key: FLINK-15403
 URL: https://issues.apache.org/jira/browse/FLINK-15403
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.10.0


https://api.travis-ci.org/v3/job/629576631/log.txt

The test case fails because the log contains the following error message.
{code}
2019-12-26 09:19:35,537 ERROR 
org.apache.flink.streaming.runtime.tasks.StreamTask   - Received 
CancelTaskException while we are not canceled. This is a bug and should be 
reported
org.apache.flink.runtime.execution.CancelTaskException: Consumed partition 
PipelinedSubpartitionView(index: 0) of ResultPartition 
3886657fb8cc980139fac67e32d6e380@8cfcbe851fe3bb3fa00e9afc370bd963 has been 
released.
at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:190)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:509)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:487)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.pollNext(SingleInputGate.java:475)
at 
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.pollNext(InputGateWithMetrics.java:75)
at 
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:125)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:311)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:488)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:702)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:527)
at java.lang.Thread.run(Thread.java:748)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15480) BlobsCleanupITCase is unstable on travis

2020-01-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-15480:


 Summary: BlobsCleanupITCase is unstable on travis
 Key: FLINK-15480
 URL: https://issues.apache.org/jira/browse/FLINK-15480
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.10.0


https://api.travis-ci.com/v3/job/272321636/log.txt
{code}
03:52:11.256 [ERROR] 
testBlobServerCleanupFinishedJob(org.apache.flink.runtime.jobmanager.BlobsCleanupITCase)
  Time elapsed: 298.556 s  <<< FAILURE!
java.lang.AssertionError: 

Expected: is 
 but: was 
at 
org.apache.flink.runtime.jobmanager.BlobsCleanupITCase.testBlobServerCleanup(BlobsCleanupITCase.java:220)
at 
org.apache.flink.runtime.jobmanager.BlobsCleanupITCase.testBlobServerCleanupFinishedJob(BlobsCleanupITCase.java:133)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15530) Replace process memory with flink memory for TMs in default flink-conf.yaml

2020-01-08 Thread Xintong Song (Jira)

Xintong Song created FLINK-15530:


 Summary: Replace process memory with flink memory for TMs in 
default flink-conf.yaml
 Key: FLINK-15530
 URL: https://issues.apache.org/jira/browse/FLINK-15530
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Xintong Song
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15541) FlinkKinesisConsumerTest.testSourceSynchronization is unstable on travis.

2020-01-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-15541:


 Summary: FlinkKinesisConsumerTest.testSourceSynchronization is 
unstable on travis.
 Key: FLINK-15541
 URL: https://issues.apache.org/jira/browse/FLINK-15541
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.10.0


[https://api.travis-ci.org/v3/job/634712405/log.txt]
{code:java}
13:16:19.144 [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 4.338 s <<< FAILURE! - in 
org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumerTest
13:16:19.144 [ERROR] 
testSourceSynchronization(org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumerTest)
  Time elapsed: 1.001 s  <<< FAILURE!
java.lang.AssertionError: expected null, but was:
at 
org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumerTest.testSourceSynchronization(FlinkKinesisConsumerTest.java:1018)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15564) YarnClusterDescriptorTest failed to validate the original intended behavior

2020-01-12 Thread Xintong Song (Jira)

Xintong Song created FLINK-15564:


 Summary: YarnClusterDescriptorTest failed to validate the original 
intended behavior
 Key: FLINK-15564
 URL: https://issues.apache.org/jira/browse/FLINK-15564
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Reporter: Xintong Song


As discovered in [PR#10834|https://github.com/apache/flink/pull/10834], the 
following test cases of {{YarnClusterDescriptorTest}} have failed to validate 
the original intended behavior and are temporally skipped by PR#10834.
- {{testFailIfTaskSlotsHigherThanMaxVcores}}
- {{testConfigOverwrite}}

The original purpose of these two test cases was to verify the validation logic 
against yarn max allocation vcores (in 
{{5836f7eddb4849b95d4860cf20045bc61d061918}}). 

These two cases should have failed when we change the validation logic to get 
yarn max allocation vcores from yarnClient instead of configuration (in 
{{e959e6d0cd42f0c5b21c0f03ce547f2025ac58d5}}), because there are no yarn 
cluster (neither {{MiniYARNCluster}}) started in these cases, thus 
{{yarnClient#getNodeReports}} will never return.

The cases have not failed because another {{IllegalConfigurationException}} was 
thrown in {{validateClusterSpecification}}, because of memory validation 
failure. The memory validation failure was by design, and in order to verify 
the original purpose these two test cases should have been updated with 
reasonable memory sizes, which is unfortunately overlooked. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15598) Memory accuracy loss in YarnClusterDescriptor may lead to deployment failure.

2020-01-15 Thread Xintong Song (Jira)

Xintong Song created FLINK-15598:


 Summary: Memory accuracy loss in YarnClusterDescriptor may lead to 
deployment failure.
 Key: FLINK-15598
 URL: https://issues.apache.org/jira/browse/FLINK-15598
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Reporter: Xintong Song
 Fix For: 1.10.0


Currently, YarnClusterDescriptor parses/derives TM process memory size from 
configuration, store it in ClusterSpecification and validate 
ClusterSpecification, then overwrite the memory size back to configuration.

This logic is unnecessary. The memory validation is already covered by creating 
TaskExecutorResourceSpec from configuration in TaskExecutorResourceUtils.

Moreover, the memory size is stored in MB in ClusterSpecification. The accuracy 
loss may lead to memory validation failure, which prevent the cluster from 
being deployed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16111) Kubernetes deployment does not respect "taskmanager.cpu.cores".

2020-02-16 Thread Xintong Song (Jira)

Xintong Song created FLINK-16111:


 Summary: Kubernetes deployment does not respect 
"taskmanager.cpu.cores".
 Key: FLINK-16111
 URL: https://issues.apache.org/jira/browse/FLINK-16111
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.10.1, 1.11.0


The Kubernetes deployment uses `kubernetes.taskmanager.cpu` for configuring TM 
cpu cores, and will fallback to number-of-slots if not specified.

FLINK-14188 introduced a common option `taskmanager.cpu.cores` (ATM not exposed 
to users and for internal usage only). A common logic is to decide the TM cpu 
cores following the fallback order of "common option -> K8s/Yarn/Mesos specific 
option -> numberOfSlot".

The above fallback rules are not respected by the Kubernetes deployment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16299) Release containers recovered from previous attempt in which TaskExecutor is not started.

2020-02-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-16299:


 Summary: Release containers recovered from previous attempt in 
which TaskExecutor is not started.
 Key: FLINK-16299
 URL: https://issues.apache.org/jira/browse/FLINK-16299
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Xintong Song


As discussed in FLINK-16215, on Yarn deployment, {{YarnResourceManager}} starts 
a new {{TaskExecutor}} in two steps:
 # Request a new container from Yarn
 # Starts a {{TaskExecutor}} process in the allocated container

If JM failover happens between the two steps, in the new attempt 
{{YarnResourceManager}} will not start {{TaskExecutor}} processes in recovered 
containers. That means such containers are neither used nor released.

A potential fix to this problem, is to query form the container status by 
calling {{NMClientAsync#getContainerStatusAsync}}, and release the containers 
whose state is {{NEW}}, keeps only those whose state is {{RUNNING}} and waiting 
for them to register.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16437) Make SlotManager allocate resource from ResourceManager at the worker granularity.

2020-03-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-16437:


 Summary: Make SlotManager allocate resource from ResourceManager 
at the worker granularity.
 Key: FLINK-16437
 URL: https://issues.apache.org/jira/browse/FLINK-16437
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Xintong Song
 Fix For: 1.11.0


This is the first step of FLINK-14106, including all the major changes inside 
SlotManager and changes to the RM/SM interfaces, except changes for metrics and 
status.

At the end of this step, SlotManager should allocate resource from 
ResourceManager with a WorkerResourceSpec, instead of slot ResourceProfile. At 
this step, the WorkerResourceSpec will not be used, and the active RMs will 
always use `ActiveResourceManager#taskExecutorProcessSpec` for requesting TMs. 
We will change that in subsequent steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16438) Make YarnResourceManager starts workers using WorkerResourceSpec requested by SlotManager

2020-03-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-16438:


 Summary: Make YarnResourceManager starts workers using 
WorkerResourceSpec requested by SlotManager
 Key: FLINK-16438
 URL: https://issues.apache.org/jira/browse/FLINK-16438
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


This means YarnResourceManager no longer:
 - be aware of the default task executor resources
 - assumes all workers are identical



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16439) Make KubernetesResourceManager starts workers using WorkerResourceSpec requested by SlotManager

2020-03-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-16439:


 Summary: Make KubernetesResourceManager starts workers using 
WorkerResourceSpec requested by SlotManager
 Key: FLINK-16439
 URL: https://issues.apache.org/jira/browse/FLINK-16439
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Xintong Song
 Fix For: 1.11.0


This means KubernetesResourceManager no longer:
 - be aware of the default task executor resources
 - assumes all workers are identical



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16440) Extend SlotManager metrics and status for dynamic slot allocation.

2020-03-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-16440:


 Summary: Extend SlotManager metrics and status for dynamic slot 
allocation.
 Key: FLINK-16440
 URL: https://issues.apache.org/jira/browse/FLINK-16440
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Metrics
Reporter: Xintong Song
 Fix For: 1.11.0


* Create a slotManagerMetricGroup in resourceManagerMetricGroup, pass it into 
SM and register slot related metrics there.
 * This allows registering different metrics for different SM implementation.
 * For backwards compatibility, the slotManagerMetricGroup should have the same 
path as the resourceManagerMetricGroup.


 * Extend ResourceOverview and TaskManagerInfo to contain TM total / free / 
allocated resources.
 * Need to add methods to SM for getting TM resource status.
 * For SlotManagerImpl,
 * The existing methods for getting number of registered / free slots need no 
changes.
 * TM resource status can be computed from TaskExecutorProcessSpec, slot 
profiles and number of free slots.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16442) Make MesosResourceManager starts workers using WorkerResourceSpec requested by SlotManager

2020-03-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-16442:


 Summary: Make MesosResourceManager starts workers using 
WorkerResourceSpec requested by SlotManager
 Key: FLINK-16442
 URL: https://issues.apache.org/jira/browse/FLINK-16442
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Mesos
Reporter: Xintong Song


This means MesosResourceManager no longer:
 - be aware of the default task executor resources
 - assumes all workers are identical

TBH, I'm not sure how many use cases do we have that needs to bring a different 
slot allocation strategy to the Mesos deployment. I think we can discuss 
whether we want to do this step or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16563) CommandLineParser should fail with explicit error message when parsing recognized arguments.

2020-03-12 Thread Xintong Song (Jira)

Xintong Song created FLINK-16563:


 Summary: CommandLineParser should fail with explicit error message 
when parsing recognized arguments.
 Key: FLINK-16563
 URL: https://issues.apache.org/jira/browse/FLINK-16563
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Affects Versions: 1.10.0
Reporter: Xintong Song
 Fix For: 1.11.0


Currently, {{CommandLineParser}} will stop parsing silently if it meets an 
unrecognized option, leaving the remaining tokens to "args" rather than 
"options".

This sometimes lead to problems due to absence of subsequence options, and the 
error messages do not point to the true root cause. 
[Example|[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-10-container-memory-configuration-with-Mesos-td33594.html]]
 reported in the user ML.

I've checked and it seems that the "args" generated by {{CommandLineParser }}is 
not really used anywhere. Therefore, I propose to make the parser fail fast 
with explicit error message at unrecognized tokens.

The proposed changes are basically as follows:
 * In {{CommandLineParser#parse}}, call {{DefaultParser#parse}} with the 
argument  {{stopAtNonOption}} set to {{false}}.
 * Remove args from {{ClusterConfiguration}} and its sub-classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-27514) Website links to up-to-date committer and PMC member lists

2022-05-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-27514:


 Summary: Website links to up-to-date committer and PMC member lists
 Key: FLINK-27514
 URL: https://issues.apache.org/jira/browse/FLINK-27514
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Xintong Song
Assignee: Xintong Song


According to the [ML 
discussion|https://lists.apache.org/thread/679ds6lfqs8f4q8lnt7tnlofl58str4y], 
we are going to add a link to the up-to-date [committer and PMC member 
list|https://projects.apache.org/committee.html?flink] from the community page 
of our project website, as well as a notice about the current list could be 
outdated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27719) Create Apache Flink slack workspace

2022-05-20 Thread Xintong Song (Jira)

Xintong Song created FLINK-27719:


 Summary: Create Apache Flink slack workspace
 Key: FLINK-27719
 URL: https://issues.apache.org/jira/browse/FLINK-27719
 Project: Flink
  Issue Type: Improvement
Reporter: Xintong Song
Assignee: Xintong Song


This is an umbrella for tasks related to setting up the Apache Flink slack 
channel.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27720) Slack: Update the project website regarding communication channels

2022-05-20 Thread Xintong Song (Jira)

Xintong Song created FLINK-27720:


 Summary: Slack: Update the project website regarding communication 
channels
 Key: FLINK-27720
 URL: https://issues.apache.org/jira/browse/FLINK-27720
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27721) Slack: set up archive

2022-05-20 Thread Xintong Song (Jira)

Xintong Song created FLINK-27721:


 Summary: Slack: set up archive
 Key: FLINK-27721
 URL: https://issues.apache.org/jira/browse/FLINK-27721
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27722) Slack: set up auto-updated invitation link

2022-05-20 Thread Xintong Song (Jira)

Xintong Song created FLINK-27722:


 Summary: Slack: set up auto-updated invitation link
 Key: FLINK-27722
 URL: https://issues.apache.org/jira/browse/FLINK-27722
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27723) Slack: revisit how this communication channel works

2022-05-20 Thread Xintong Song (Jira)

Xintong Song created FLINK-27723:


 Summary: Slack: revisit how this communication channel works
 Key: FLINK-27723
 URL: https://issues.apache.org/jira/browse/FLINK-27723
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27731) Cannot build documentation with Hugo docker image

2022-05-21 Thread Xintong Song (Jira)

Xintong Song created FLINK-27731:


 Summary: Cannot build documentation with Hugo docker image
 Key: FLINK-27731
 URL: https://issues.apache.org/jira/browse/FLINK-27731
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.16.0
Reporter: Xintong Song


Flink provides 2 ways for building the documentation: 1) using a Hugo docker 
image, and 2) using a local Hugo installation.

Currently, 1) is broken due to the `setup_docs.sh` script requires a local Hugo 
installation.

This was introduced in FLINK-27394.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27848) ZooKeeperLeaderElectionDriver keeps writing leader information, using up zxid

2022-05-30 Thread Xintong Song (Jira)

Xintong Song created FLINK-27848:


 Summary: ZooKeeperLeaderElectionDriver keeps writing leader 
information, using up zxid
 Key: FLINK-27848
 URL: https://issues.apache.org/jira/browse/FLINK-27848
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.14.4, 1.15.0
Reporter: Xintong Song
Assignee: Weijie Guo
 Fix For: 1.14.5, 1.15.1


After a leadership change, the new leader may keeps writing its information 
(which is identical) to ZK, causing the zxid on ZK quickly used up.

The problem is that, in 
{{ZooKeeperLeaderElectionDriver#retrieveLeaderInformationFromZooKeeper}}, 
{{leaderElectionEventHandler.onLeaderInformationChange(LeaderInformation.empty())}}
 is called no matter {{childData}} is {{null}} or not. In case of non-null, 
this will cause the driver keeps re-writing the leader information to ZK.

The problem is introduced in FLINK-24038, and only affects the legacy 
{{ZooKeeperHaServices}}. Thus, only 1.14 / 1.15 are affected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-25356) Add benchmarks for performance in OLAP scenarios

2021-12-16 Thread Xintong Song (Jira)

Xintong Song created FLINK-25356:


 Summary: Add benchmarks for performance in OLAP scenarios
 Key: FLINK-25356
 URL: https://issues.apache.org/jira/browse/FLINK-25356
 Project: Flink
  Issue Type: Sub-task
  Components: Benchmarks
Reporter: Xintong Song


As discussed in FLINK-25318, we would need a unified, public visible benchmark 
setups, for supporting OLAP performance improvements and investigations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25970) SerializedThrowable should record type of the original throwable.

2022-02-07 Thread Xintong Song (Jira)

Xintong Song created FLINK-25970:


 Summary: SerializedThrowable should record type of the original 
throwable.
 Key: FLINK-25970
 URL: https://issues.apache.org/jira/browse/FLINK-25970
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.14.3
Reporter: Xintong Song


Currently, only the message and stack of the original throwable is preserved in 
{{{}SerializedThrowable{}}}, while the type of the original throwable is 
discarded.

Sometimes, it would be helpful if message of {{SerializedThrowable}} can also 
include the full class name of the original throwable.

E.g., in the following stack.
{code:java}
Caused by: org.apache.flink.util.SerializedThrowable
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1471) ~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1469) ~[?:1.8.0_102]
...
{code}
It's not that easy to understand what is wrong from this stack. JDK does not 
provide a message for the original exception. We have to look into the JDK 
source codes to find out what's going on. Sometimes it's even more annoying 
having to look for the JDK source codes of the exactly same version in order to 
match the line numbers.

Turns out the original exception was a {{ConcurrentModificationException}}. I 
think it would be much more straightforward if we can have a stack like the 
following.
{code}
Caused by: org.apache.flink.util.SerializedThrowable: 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1471) ~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1469) ~[?:1.8.0_102]
...
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-18581) Cannot find GC cleaner with java version previous 8u202

2020-07-13 Thread Xintong Song (Jira)

Xintong Song created FLINK-18581:


 Summary: Cannot find GC cleaner with java version previous 8u202
 Key: FLINK-18581
 URL: https://issues.apache.org/jira/browse/FLINK-18581
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Xintong Song


{{JavaGcCleanerWrapper}} is looking for the package-private method 
{{Reference.tryHandlePending}} using reflection. However, the method is first 
introduced in the version 8u202. Therefore, if an older version JDK is used, 
the method cannot be found and Flink will fail.

See also this [ML 
thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-GC-Cleaner-Provider-Flink-1-11-0-td36565.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18620) Unify behaviors of active resource managers

2020-07-16 Thread Xintong Song (Jira)

Xintong Song created FLINK-18620:


 Summary: Unify behaviors of active resource managers
 Key: FLINK-18620
 URL: https://issues.apache.org/jira/browse/FLINK-18620
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Xintong Song
Assignee: Xintong Song


Flink supports various deployment modes: standalone, Kubernetes, Yarn & Mesos. 
For each deployment mode, a resource manager is implemented for managing the 
resources.

While StandaloneResourceManager is quite different from the others by not being 
able to dynamically request and release resources, the other three 
(KubernetesResourceManager, YarnResourceManager and MesosResourceManager) share 
many logics in common. These common logics are currently duplicately 
implemented by each of the active resource managers. Such duplication leads to 
extra maintaining overhead and amplifies stability risks.

This ticket proposes a refactor design for the resource managers, with better 
abstraction deduplicating common logics implementations and minimizing the 
deployment specific behaviors.

This proposal is a pure refactor effort. It does not intend to change any of 
the current resource management behaviors.

A detailed design doc and a simplified proof-of-concept implementation for the 
Kubernetes deployment are linked to this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18639) Error messages from BashJavaUtils is eaten

2020-07-19 Thread Xintong Song (Jira)

Xintong Song created FLINK-18639:


 Summary: Error messages from BashJavaUtils is eaten
 Key: FLINK-18639
 URL: https://issues.apache.org/jira/browse/FLINK-18639
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Scripts
Affects Versions: 1.11.0, 1.12.0
Reporter: Xintong Song
Assignee: Xintong Song
 Fix For: 1.12.0, 1.11.1


Shell scripts execute BashJavaUtils for generating memory related JVM 
parameters and dynamic configurations. When there's a problem in configuration 
that the memory sizes cannot be properly calculated, the script will not launch 
the Flink daemon and exit with non-zero code. In such cases, error messages 
from BashJavaUtils describing the reason of failure are missing, making it hard 
for users to understand what's wrong and how to fix the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18719) Define the interfaces and introduce ActiveResourceManager

2020-07-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-18719:


 Summary: Define the interfaces and introduce ActiveResourceManager
 Key: FLINK-18719
 URL: https://issues.apache.org/jira/browse/FLINK-18719
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Xintong Song


* Define the interface ResourceManagerDriver and ResourceEventHandler
* Rename the original ActiveResourceManager to LegacyActiveResourceManager.
* Introduce the new ActiveResourceManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18720) Migrate KubernetesResourceManager to the new KubernetesResourceManagerDriver

2020-07-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-18720:


 Summary: Migrate KubernetesResourceManager to the new 
KubernetesResourceManagerDriver
 Key: FLINK-18720
 URL: https://issues.apache.org/jira/browse/FLINK-18720
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes, Runtime / Coordination
Reporter: Xintong Song
 Fix For: 1.12.0


* Introduce KubernetesResourceManagerDriver
* Switch to ActiveResourceManager and KubernetesResourceManagerDriver for 
Kubernetes deployment
* Remove KubernetesResourceManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18722) Migrate MesosResourceManager to the new MesosResourceManagerDriver

2020-07-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-18722:


 Summary: Migrate MesosResourceManager to the new 
MesosResourceManagerDriver
 Key: FLINK-18722
 URL: https://issues.apache.org/jira/browse/FLINK-18722
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Mesos, Runtime / Coordination
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18721) Migrate YarnResourceManager to the new YarnResourceManagerDriver

2020-07-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-18721:


 Summary: Migrate YarnResourceManager to the new 
YarnResourceManagerDriver
 Key: FLINK-18721
 URL: https://issues.apache.org/jira/browse/FLINK-18721
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / YARN, Runtime / Coordination
Reporter: Xintong Song
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18738) Revisit resource management model for python processes.

2020-07-28 Thread Xintong Song (Jira)

Xintong Song created FLINK-18738:


 Summary: Revisit resource management model for python processes.
 Key: FLINK-18738
 URL: https://issues.apache.org/jira/browse/FLINK-18738
 Project: Flink
  Issue Type: Task
  Components: API / Python, Runtime / Coordination
Reporter: Xintong Song
Assignee: Xintong Song
 Fix For: 1.12.0


This ticket is for tracking the effort towards a proper long-term resource 
management model for python processes.

In FLINK-17923, we run into problems due to python processes are not well 
integrate with the task manager resource management mechanism. A temporal 
workaround has been merged for release-1.11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18760) Redundant task managers should be released when there's no job running in session cluster

2020-07-29 Thread Xintong Song (Jira)

Xintong Song created FLINK-18760:


 Summary: Redundant task managers should be released when there's 
no job running in session cluster
 Key: FLINK-18760
 URL: https://issues.apache.org/jira/browse/FLINK-18760
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.12.0
Reporter: Xintong Song
Assignee: Liu
 Fix For: 1.12.0


In FLINK-18625, we introduced redundant task managers, as backup resources for 
speeding up job recovery in cases of task manager lost.

For a session cluster, when there's no job running, it would be better to not 
keep such redundant resources. Currently, Flink session cluster will not 
request redundant task managers until a job is submitted, but it also will not 
release redundant task managers after all jobs terminated.

This ticket proposes to check and release task managers if there are redundant 
task managers only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19057) Avoid usage of GlobalConfiguration in ResourceManager

2020-08-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-19057:


 Summary: Avoid usage of GlobalConfiguration in ResourceManager 
 Key: FLINK-19057
 URL: https://issues.apache.org/jira/browse/FLINK-19057
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Reporter: Xintong Song


This is a follow up of this PR 
[discussion|https://github.com/apache/flink/pull/13186/#discussion_r476459874].

On Kubernetes/Yarn deployments, resource manager try to compare the effective 
configurations with the original configuration file shipped from client, and 
only set the differences to dynamic properties for task managers.

During which, {{GlobalConfiguration.loadConfiguration()}} is used for getting 
the original configuration file. The strongly relies on that Kubernetes/Yarn 
entry points do not support custom configuration directories, which is true at 
the moment but brittle in future.

It would be better to rethink the usage of GlobalConfiguration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19068) Filter verbose pod events for KubernetesResourceManagerDriver

2020-08-27 Thread Xintong Song (Jira)

Xintong Song created FLINK-19068:


 Summary: Filter verbose pod events for 
KubernetesResourceManagerDriver
 Key: FLINK-19068
 URL: https://issues.apache.org/jira/browse/FLINK-19068
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Reporter: Xintong Song


A status of a Kubernetes pod consists of many detailed fields. Currently, Flink 
receives pod {{MODIFIED}} events from the {{KubernetesPodsWatcher}} on every 
single change to these fields, many of which Flink does not care.

The verbose events will not affect the functionality of Flink, but will pollute 
the logs with repeated messages, because Flink only looks into the fields it 
interested in and those fields are identical.

E.g., when a task manager is stopped due to idle timeout, Flink receives 3 
events:
* MODIFIED: container terminated
* MODIFIED: {{deletionGracePeriodSeconds}} changes from 30 to 0, which is a 
Kubernetes internal status change after containers are gracefully terminated
* DELETED: Flink removes metadata of the terminated pod

Among the 3 messages, Flink is only interested in the 1st MODIFIED message, but 
will try to process all of them because the container status is terminated.

I propose to Filter the verbose events in 
{{KubernetesResourceManagerDriver.PodCallbackHandlerImpl}}, to only process the 
status changes interested by Flink. This probably requires recording the status 
of all living pods, to compare with the incoming events for detecting status 
changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19117) FLINK-18620

2020-09-01 Thread Xintong Song (Jira)

Xintong Song created FLINK-19117:


 Summary: FLINK-18620
 Key: FLINK-19117
 URL: https://issues.apache.org/jira/browse/FLINK-19117
 Project: Flink
  Issue Type: Improvement
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19151) Flink does not normalize container resource with correct configurations when Yarn FairScheduler is used

2020-09-07 Thread Xintong Song (Jira)

Xintong Song created FLINK-19151:


 Summary: Flink does not normalize container resource with correct 
configurations when Yarn FairScheduler is used 
 Key: FLINK-19151
 URL: https://issues.apache.org/jira/browse/FLINK-19151
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.11.2
Reporter: Xintong Song


h3. Problem

It's a Yarn protocol that the requested container resource will be normalized 
for allocation. That means, the allocated container may have different resource 
(larger than or equal to) compared to requested.

Currently, Flink matches the allocated containers to the original requests by 
reading the Yarn configurations and calculate how the requested resources 
should be normalized.

What has been overlooked is that, Yarn FairScheduler (and its subclass 
SLSFairScheduler) has overridden the normalization behavior. To be specific,
 * By default, Yarn normalize container resources to integer multiple of 
"yarn.scheduler.minimum-allocation-[mb|vcores]"
 * FairScheduler normalize container resources to integer multiple of 
"yarn.resource-types.[memory-mb|vcores].increment-allocation" (or the 
deprecated keys "yarn.scheduler.increment-allocation-[mb|vcores]"), while 
making sure the resource is no less than 
"yarn.scheduler.minimum-allocation-[mb|vcores]"

h3. Proposal for short term solution

To fix this problem, a quick and easy way is to also read Yarn configuration 
and learn which scheduler is used, and perform normalization calculations 
accordingly. This should be good enough to cover behaviors of all the 
schedulers that Yarn currently provides. The limitation is that, Flink will not 
be able to deal with custom Yarn schedulers which overrides the normalization 
behaviors.
h3. Proposal for long term solution

For long term, it would be good to use Yarn 
ContainerRequest#allocationRequestId to match the allocated containers with the 
original requests, so that Flink no longer needs to understand how Yarn 
normalize container resources. 

Yarn ContainerRequest#allocationRequestId is introduced in Hadoop 2.9, while 
ATM Flink claims to be compatible with Hadoop 2.4+. Therefore, this solution 
would not work at the moment.

Another idea is to support various Hadoop versions with different container 
matching logics. We can abstract the container matching logics into a 
dedicating component, and provide different implementations for it. This will 
allow Flink to take advantages of the new versions (e.g., work well with custom 
schedulers), while stay compatible with the old versions with without those 
advantages.

Given that we need the resource based matching anyway for the old Hadoop 
versions, and the cost for maintaining two sets of matching logics, I tend to 
think this approach as a back-up option to be worked on when we indeed see a 
need for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19177) FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-19177:


 Summary: FLIP-141: Intra-Slot Managed Memory Sharing
 Key: FLINK-19177
 URL: https://issues.apache.org/jira/browse/FLINK-19177
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Xintong Song
Assignee: Xintong Song
 Fix For: 1.12.0


This is the umbrella ticket of [FLIP-141: Intra-Slot Managed Memory 
Sharing|https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing].
 

[FLIP-53|https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management]
 introduced a fraction based approach for sharing managed memory within a slot. 
This approach needs to be extended as python operators, which also use managed 
memory, are introduced. This FLIP proposes a design for extending intra-slot 
managed memory sharing for python operators and other potential future managed 
memory use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19178) Introduce the memory weights configuration option

2020-09-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-19178:


 Summary: Introduce the memory weights configuration option
 Key: FLINK-19178
 URL: https://issues.apache.org/jira/browse/FLINK-19178
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19179) Implement the new fraction calculation logic

2020-09-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-19179:


 Summary: Implement the new fraction calculation logic
 Key: FLINK-19179
 URL: https://issues.apache.org/jira/browse/FLINK-19179
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


This also means migrating the batch operator use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19180) Make RocksDB respect the calculated fraction

2020-09-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-19180:


 Summary: Make RocksDB respect the calculated fraction
 Key: FLINK-19180
 URL: https://issues.apache.org/jira/browse/FLINK-19180
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19182) Update document for intra-slot managed memory sharing

2020-09-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-19182:


 Summary: Update document for intra-slot managed memory sharing
 Key: FLINK-19182
 URL: https://issues.apache.org/jira/browse/FLINK-19182
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19181) Make python processes respect the calculated fraction

2020-09-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-19181:


 Summary: Make python processes respect the calculated fraction
 Key: FLINK-19181
 URL: https://issues.apache.org/jira/browse/FLINK-19181
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19267) Calculate managed memory fractions for fine grained resource specs

2020-09-16 Thread Xintong Song (Jira)

Xintong Song created FLINK-19267:


 Summary: Calculate managed memory fractions for fine grained 
resource specs
 Key: FLINK-19267
 URL: https://issues.apache.org/jira/browse/FLINK-19267
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Reporter: Xintong Song


This is a follow-up issue of FLIP-141, to support managed memory calculation 
for fine grained resource specs w.r.t. various use cases, per the discussion 
[here|https://github.com/apache/flink/pull/13397#discussion_r489250595].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19324) Map requested/allocated containers with priority on YARN

2020-09-21 Thread Xintong Song (Jira)

Xintong Song created FLINK-19324:


 Summary: Map requested/allocated containers with priority on YARN
 Key: FLINK-19324
 URL: https://issues.apache.org/jira/browse/FLINK-19324
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Reporter: Xintong Song


In the design doc of FLINK-14106, there was a 
[discussion|https://docs.google.com/document/d/1f8imSus3QwKEUPAldzR8CSMjZ-2a9O17-rn4oeKGtqw/edit?disco=GPX_tmg]
 on how we map allocated containers with the requested ones on YARN. We 
rejected the design option that uses container priorities for mapping 
containers of different resources, because we do not want to priorities 
different container requests (which is the original purpose for this field). As 
a result, we have to interpret how the requested container request would be 
normalized by Yarn, and map the allocated/requested containers accordingly, 
which is complicated and fragile. See also FLINK-19151.

Recently in our POC for fine grained resource management, we surprisingly 
discovered that Yarn actually doesn't work with container requests same 
priority and different resources. I do not find this described as an official 
protocol in any Yarn's documents. The issue has been raised in early Yarn 
versions (YARN-314) and has not been fixed util Hadoop 2.9 when 
{{allocationRequestId}} is introduced. In Hadoop 2.8, Yarn scheduler is still 
internally using priority as the key of a container request (see 
[AppSchedulingInfo#updateResourceRequests 
|https://github.com/apache/hadoop/blob/eb818cdc64336ade273a960ba3b9b5a5d0c4d4ec/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java#L341]),
 thus requests same priority and different resources would overwrite each other.

The new discovery suggests that, if we want to support containers with 
different resources on Hadoop 2.8 and earlier versions, we have to give them 
different priorities anyway. Thus, I would suggest to get rid of the container 
normalization simulation and go back to the previously rejected priority based 
design option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19568) Offload creating TM launch contexts to the IO executor

2020-10-11 Thread Xintong Song (Jira)

Xintong Song created FLINK-19568:


 Summary: Offload creating TM launch contexts to the IO executor
 Key: FLINK-19568
 URL: https://issues.apache.org/jira/browse/FLINK-19568
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Xintong Song
 Fix For: 1.12.0


Currently, for launching each TM container on Yarn, Flink creates a container 
launch context in RM's PRC main thread. This includes accessing file status 
from remote file systems, which may blocks the RM's main thread, especially 
when remote file system is slow. See [this 
thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/TM-heartbeat-timeout-due-to-ResourceManager-being-busy-td38626.html].

The creating of TM context does not access nor change any RM's internal states. 
Therefore, I propose to offload the work to the IO executor. To be specific, I 
think the entire {{YarnResourceManagerDriver#createTaskExecutorLaunchContext}} 
can be invoked on the IO executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16733) Refactor YarnClusterDescriptor

2020-03-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-16733:


 Summary: Refactor YarnClusterDescriptor
 Key: FLINK-16733
 URL: https://issues.apache.org/jira/browse/FLINK-16733
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Xintong Song


Currently, YarnClusterDescriptor is not in a good shape. It has 1600+ lines of 
codes, of which the method {{startAppMaster}} alone has 400+ codes, leading to 
poor maintainability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16734) Remove ClusterSpecification

2020-03-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-16734:


 Summary: Remove ClusterSpecification
 Key: FLINK-16734
 URL: https://issues.apache.org/jira/browse/FLINK-16734
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client
Reporter: Xintong Song


Currently, {{ClusterSpecification}} has the following three fields.
* masterMemoryMB
* taskManagerMemoryMB
* slotsPerTaskManager

Among the three fields, {{taskManagerMemoryMB}} is only used in 
{{YarnClusterDescriptor#validateClusterResources}}. It can be replaced by 
"taskmanager.memory.process.size" in the configuration. Moreover, there are 
consistency risks for keeping the process memory in two places with different 
precision (MB vs. MemorySize).

{{masterMemoryMB}} should be the same as {{taskManagerMemoryMB}} after 
finishing [FLIP-116 Unified Memory Configuration for Job 
Managers|https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers].

That leaves only {{slotsPerTaskManager}}, which can easily get from the 
configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16798) Logs from BashJavaUtils are not properly preserved and passed into TM logs.

2020-03-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-16798:


 Summary: Logs from BashJavaUtils are not properly preserved and 
passed into TM logs.
 Key: FLINK-16798
 URL: https://issues.apache.org/jira/browse/FLINK-16798
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Scripts
Affects Versions: 1.11.0
Reporter: Xintong Song
 Fix For: 1.11.0


With FLINK-15519, in the TM start-up scripts, we have captured logs from 
{{BashJavaUtils}} and passed into the TM JVM process via environment variable. 
These logs will be merged with other TM logs, writing to same places respecting 
user's log configurations.

This effort was broken in FLINK-15727, where the outputs from {{BashJavaUtils}} 
 are thrown away, except for the result JVM parameters and dynamic 
configurations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16953) TableEnvHiveConnectorTest is unstable on travis.

2020-04-02 Thread Xintong Song (Jira)

Xintong Song created FLINK-16953:


 Summary: TableEnvHiveConnectorTest is unstable on travis.
 Key: FLINK-16953
 URL: https://issues.apache.org/jira/browse/FLINK-16953
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.11.0
Reporter: Xintong Song


[https://api.travis-ci.org/v3/job/670405441/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17061) Unset process/flink memory size from configuration once dynamic worker resource is activated.

2020-04-08 Thread Xintong Song (Jira)

Xintong Song created FLINK-17061:


 Summary: Unset process/flink memory size from configuration once 
dynamic worker resource is activated.
 Key: FLINK-17061
 URL: https://issues.apache.org/jira/browse/FLINK-17061
 Project: Flink
  Issue Type: Task
  Components: Runtime / Configuration, Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Xintong Song


With FLINK-14106, memory of a TaskExecutor is decided in two steps on active 
resource managers.
- {{SlotManager}} decides {{WorkerResourceSpec}}, including memory used by 
Flink tasks: task heap, task off-heap, network and managed memory.
- {{ResourceManager}} derives {{TaskExecutorProcessSpec}} from 
{{WorkerResourceSpec}} and the configuration, deciding sizes of memory used by 
Flink framework and JVM: framework heap, framework off-heap, jvm metaspace and 
jvm overhead.

This works fine for now, because both {{WorkerResourceSpec}} and 
{{TaskExecutorProcessSpec}} are derived from the same configurations. However, 
it might cause problem if later we have new {{SlotManager}} implementations 
that decides {{WorkerResourceSpec}} dynamically. In such cases, the 
process/flink sizes in configuration should be ignored, or it may easily lead 
to configuration conflicts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17257) AbstractYarnClusterTest does not compile with Hadoop 2.10

2020-04-20 Thread Xintong Song (Jira)

Xintong Song created FLINK-17257:


 Summary: AbstractYarnClusterTest does not compile with Hadoop 2.10
 Key: FLINK-17257
 URL: https://issues.apache.org/jira/browse/FLINK-17257
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Tests
Affects Versions: 1.9.3, 1.10.1, 1.11.0
Reporter: Xintong Song
 Fix For: 1.11.0, 1.10.2, 1.9.4


In {{AbstractYarnClusterTest}}, we create {{ApplicationReport}} with the static 
method {{ApplicationReport.newInstance}}, which is annotated as private and 
unstable. This method is no longer compatible in Hadoop 2.10.

As a workaround, we can create {{ApplicationReport}} with its default 
constructor and set only the fields that we need.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17390) Container resource cannot be mapped on Hadoop 2.10+

2020-04-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-17390:


 Summary: Container resource cannot be mapped on Hadoop 2.10+
 Key: FLINK-17390
 URL: https://issues.apache.org/jira/browse/FLINK-17390
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.11.0
Reporter: Xintong Song
 Fix For: 1.11.0


In FLINK-16438, we introduced {{WorkerSpecContainerResourceAdapter}} for 
mapping Yarn container {{Resource}} with Flink {{WorkerResourceSpec}}. Inside 
this class, we use {{Resource}} for hash map keys and set elements, assuming 
that {{Resource}} instances that describes the same set of resources have the 
same hash code.

This assumption is not always true. {{Resource}} is an abstract class and may 
have different implementations. In Hadoop 2.10+, {{LightWeightResource}}, a new 
implementation of {{Resource}}, is introduced for {{Resource}} generated by 
{{Resource.newInstance}} on the AM side, which overrides the {{hashCode}} 
method. That means, a {{Resource}} generated on AM may have a different hash 
code compared to an equal {{Resource}} returned from Yarn.

To solve this problem, we may introduce an {{InternalResource}} as an inner 
class of {{WorkerSpecContainerResourceAdapter}}, with {{hashCode}} method 
depends only on the fields needed by Flink (ATM memroy and vcores). 
{{WorkerSpecContainerResourceAdapter}} should only use {{InternalResource}} for 
internal state management, and do conversions for {{Resource}} passed into and 
returned from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17422) Create user document for the external resource framework and the GPU plugin..

2020-04-27 Thread Xintong Song (Jira)

Xintong Song created FLINK-17422:


 Summary: Create user document for the external resource framework 
and the GPU plugin..
 Key: FLINK-17422
 URL: https://issues.apache.org/jira/browse/FLINK-17422
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.11.0
Reporter: Xintong Song
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17551) Documentation of stable releases are actually built on top of snapshot code bases.

2020-05-06 Thread Xintong Song (Jira)

Xintong Song created FLINK-17551:


 Summary: Documentation of stable releases are actually built on 
top of snapshot code bases.
 Key: FLINK-17551
 URL: https://issues.apache.org/jira/browse/FLINK-17551
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Affects Versions: 1.10.0
Reporter: Xintong Song


When browsing Flink's documentation on the project website, we can choose from 
both the latest snapshot version and the stable release versions. However, it 
seems the documentation of stable release version is actually built on top of 
the snapshot version of the release branch.

E.g., currently the latest stable release is 1.10.0, but the documentation 
described as "Flink 1.10 (Latest stable release)" is actually built with 
1.10-SNAPSHOT. As a consequence, users might be confused when they use release 
1.10.0 and some latest documentation changes meant for 1.10.1.

[This 
comment|https://github.com/apache/flink/pull/11300#issuecomment-624776199] 
shows one of such confusions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17677) FLINK_LOG_PREFIX recommended in docs is not always available

2020-05-13 Thread Xintong Song (Jira)

Xintong Song created FLINK-17677:


 Summary: FLINK_LOG_PREFIX recommended in docs is not always 
available
 Key: FLINK-17677
 URL: https://issues.apache.org/jira/browse/FLINK-17677
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.10.1, 1.9.3, 1.11.0
Reporter: Xintong Song
 Fix For: 1.11.0, 1.10.2, 1.9.4


The [Application Profiling & 
Debugging|https://ci.apache.org/projects/flink/flink-docs-master/monitoring/application_profiling.html]
 documentation recommend to use the script variable {{FLINK_LOG_PREFIX}} for 
defining log file paths. However, this variable is only available in standalone 
mode. This is a bit misleading for users of other deployments (see this 
[thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Memory-analyze-on-AWS-EMR-td35036.html]).

I propose to replace {{FLINK_LOG_PREFIX}} with a general representation 
{{}}, and add a separate section to discuss how to set the log 
path (e.g., use {{FLINK_LOG_PREFIX}} with standalone deployments and 
{{}} with Yarn deployments).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18175) Add human readable summary for configured and derived memory sizes.

2020-06-08 Thread Xintong Song (Jira)

Xintong Song created FLINK-18175:


 Summary: Add human readable summary for configured and derived 
memory sizes.
 Key: FLINK-18175
 URL: https://issues.apache.org/jira/browse/FLINK-18175
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.10.1, 1.11.0
Reporter: Xintong Song


FLIP-49 & FLIP-116 introduces sophisticated memory configurations for 
TaskManager and Master processes. Before the JVM processes are started, Flink 
derives the accurate sizes for all necessary components, based on both explicit 
user configurations and implicit defaults.

To make the configuration results (especially those implicitly derived) clear 
to users, it would be helpful to print them in the beginning of the process 
logs. Currently, we only have printed JVM parameters (TM & Master) dynamic 
memory configurations (TM only). They are incomplete (jvm overhead for both 
processes and off-heap memory for the master process are not presented) and 
unfriendly (in bytes).

Therefore, I propose to add a human readable summary at the beginning of 
process logs.

See also this [PR 
discussion|https://github.com/apache/flink/pull/11445#discussion_r435721169].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18226) ResourceManager requests unnecessary new workers if previous workers are allocated but not registered.

2020-06-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-18226:


 Summary: ResourceManager requests unnecessary new workers if 
previous workers are allocated but not registered.
 Key: FLINK-18226
 URL: https://issues.apache.org/jira/browse/FLINK-18226
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
Coordination
Affects Versions: 1.11.0
Reporter: Xintong Song
Assignee: Xintong Song
 Fix For: 1.11.0


h2. Problem

Currently on Kubernetes & Yarn deployment, the ResourceManager compares 
*pending workers requested from Kubernetes/Yarn* against *pending workers 
required by SlotManager*, for deciding whether new workers should be requested 
in case of a worker failure.
 * {{KubernetesResourceManager#requestKubernetesPodIfRequired}}
 * {{YarnResourceManager#requestYarnContainerIfRequired}}

*Pending workers requested from Kubernetes/Yarn* is decreased when the worker 
is allocated, *before the worker is actually started and registered*.
 * Decreased in {{ActiveResourceManager#notifyNewWorkerAllocated}}, which is 
called in
 * {{KubernetesResourceManager#onAdded}}
 * {{YarnResourceManager#onContainersOfResourceAllocated}}

On the other hand, *pending workers required by SlotManager* is derived from 
the number of pending slots inside SlotManager, which is decreased *when the 
new workers/slots are registered*.
 * {{SlotManagerImpl#registerSlot}}

Therefore, if a worker {{w1}} is failed after another worker {{w2}} is 
allocated but before {{w2}} is registered, the ResourceManager will request an 
unnecessary new worker for {{w2}}.

h2. Impact

Normally, the extra worker should be released soon after allocated. But in 
cases where the Kubernetes/Yarn cluster does not have enough resources, it 
might create more and more pending pods/containers.

It's even more severe for Kubernetes, because 
{{KubernetesResourceManager#onAdded}} only suggest that the pod spec has been 
successfully added to the cluster, but the pod may not actually been allocated 
due to lack of resources. Imagine there are {{N}} pending pods, a failure of a 
running pod means requesting another {{N}} new pods.

In a session cluster, such pending pods could take long to be cleared even 
after all jobs in the session cluster have terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18229) Pending worker requests should be properly cleared

2020-06-09 Thread Xintong Song (Jira)

Xintong Song created FLINK-18229:


 Summary: Pending worker requests should be properly cleared
 Key: FLINK-18229
 URL: https://issues.apache.org/jira/browse/FLINK-18229
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
Coordination
Affects Versions: 1.10.1, 1.9.3, 1.11.0
Reporter: Xintong Song


Currently, if Kubernetes/Yarn does not have enough resources to fulfill Flink's 
resource requirement, there will be pending pod/container requests on 
Kubernetes/Yarn. These pending resource requirements are never cleared until 
either fulfilled or the Flink cluster is shutdown.

However, sometimes Flink no longer needs the pending resources. E.g., the slot 
request is then fulfilled by another slots that become available, or the job 
failed due to slot request timeout (in a session cluster). In such cases, Flink 
does not remove the resource request until the resource is allocated, then it 
discovers that it no longer needs the allocated resource and release them. This 
would affect the underlying Kubernetes/Yarn cluster, especially when the 
cluster is under heavy workload.

It would be good for Flink to cancel pod/container requests as earlier as 
possible if it can discover that some of the pending workers are no longer 
needed.

There are several approaches potentially achieve this.
 # We can always check whether there's a pending worker that can be canceled 
when a \{{PendingTaskManagerSlot}} is unassigned.
 # We can have a separate timeout for requesting new worker. If the resource 
cannot be allocated within the given time since requested, we should cancel 
that resource request and claim a resource allocation failure.
 # We can share the same timeout for starting new worker (proposed in 
FLINK-13554). This is similar to 2), but it requires the worker to be 
registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 >

1 - 100 of 317 matches

Mail list logo