[ 
https://issues.apache.org/jira/browse/TAJO-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912333#comment-13912333
 ] 

Hyunsik Choi commented on TAJO-602:
-----------------------------------

I've attached the patch for this issue. Above all, I'm very sorry for doing the 
job to separate the state part from TajoWorkerResourceManager.
I tried to separate the job into another jira issue, but the job was very 
overlapped in this issue (TAJO-602). This is because TAJO-602 is for separating 
heartbeat service from WorkerResourceManager and the state machine is mostly 
for heartbeat service. Instead, I made much effort to narrow the scope of 
changes and keep existing logic as possible; although I found some parts which 
should be improved. This patch mostly changes WorkerResourceManager and 
TajoWorkerResourceManager. So, I believe tha this patch mostly does not affect 
Min's work (TAJO-603)

During this work, I found some issues to be improved. Later, I'll create 
additional issues for them.

In detail, this patch does as follows:

 * Separate the heartbeat service from TajoWorkerResourceManager into 
TajoResourceTracker class
 * Separate the worker information from WorkerResource into Worker class
 * Separate the ping expired node checker into WorkerLivelinessMonitor which 
extends AbstractLivelinessMonitor
 * Separate the resource heartbeat from TajoHeartbeat into NodeHeartbeat
  ** Add one more heartbeat protocol and its service for that
 * Add TajoRMContext which contains active or inactive worker list managed by 
TajoWorkerResourceManager.
 * Extract the part to choose and start QueryMaster from WorkerResourceManager 
into QueryInProgress.
  ** I still keep allocateQueryMaster method in order to avoid radical API 
changes.
  ** But, allocateQueryMaster is changed to internally use 
allocateWorkerResources to allocate resources.
  ** In other words, unlike before, TajoWorkerResourceManager manages resources 
for both QueryMaster and worker in one resource pool management.
 * Separate the heartbeat report service from Worker into WorkerHeartbeatService

The unit tests are passed sucessfully. I've tested membership management of 
active/inactive workers, resource heartbeat, and some queries in a local 
cluster. 

In addition, I think that I don't have good naming sense. If you suggest better 
names for newly added classes, I'll appreciate it.

Thanks!

> WorkerResourceManager  should be broke down into 3 parts
> --------------------------------------------------------
>
>                 Key: TAJO-602
>                 URL: https://issues.apache.org/jira/browse/TAJO-602
>             Project: Tajo
>          Issue Type: Sub-task
>    Affects Versions: 1.0-incubating
>            Reporter: Min Zhou
>            Assignee: Hyunsik Choi
>             Fix For: 1.0-incubating
>
>         Attachments: TAJO-602.patch
>
>
> Before implementing a scheduler, I think we should do some refactoring at 
> first. There are 2 interfaces and 4 classes related to resource management, 
> they are WorkerResourceManager , YarnTajoResourceManager, 
> TajoWorkerResourceManager reside in TajoMaster, and ResourceAllocator, 
> YarnResourceAllocator, TajoResourceAllocator reside in QueryMasters.
> WorkerResourceManager actually plays 3 roles
> 1. Choose or start a QueryMaster for a query, and management it
> 2. allocate resource for query tasks / task runners (only for standalone mode)
> 3. Handle worker's heartbeat (only for standalone mode)
> If the scheduler is a decentralized one, like sparrow, we can allocate 
> resource for a QueryMaster as the same way for a TaskRunner.  So 1. and 2. 
> can use the same interface, but called by 2 different caller.    3. is 
> different from the others,  we can create another service, let's say 
> HeartbeatService to deal with worker's heartbeats. 
> Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to