[ 
https://issues.apache.org/jira/browse/MESOS-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated MESOS-4737:
---------------------------------
    Description: 
There are comments above the definition of TaskID in 
[mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
 which lead one to believe it is ok to reuse TaskID values so long as you 
guarantee there will only ever be 1 such TaskID running at the same time.

{code: title=existing comments for TaskID}
 * A framework generated ID to distinguish a task. The ID must remain
 * unique while the task is active. However, a framework can reuse an
 * ID _only_ if a previous task with the same ID has reached a
 * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
{code}

However, there are a few scenarios where problems can arise.

# The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
tasks that reuse an ID and get assigned to the same executor.
#* See [this 
email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
 for more info, as well as the attachment on this issue.
# Issues during network partitions and master failover, where a TaskID might 
appear to be unique in the system, whereas in actuality another Task is running 
with that ID and was just partitioned away for some time.

In light of these issues, we should simply update the document(s) to make it 
abundantly clear that reusing TaskIDs is never ok.  At the minimum this should 
involve updating the afore-mentioned comments in {{mesos.proto}}.  Also any 
framework development guides that talk about TaskID creation should be updated.

  was:
There are comments above the definition of TaskID in 
[mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
 which lead one to believe it is ok to reuse TaskID values so long as you 
guarantee there will only ever be 1 such TaskID running at the same time.

{code title=existing comments for TaskID}
 * A framework generated ID to distinguish a task. The ID must remain
 * unique while the task is active. However, a framework can reuse an
 * ID _only_ if a previous task with the same ID has reached a
 * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
{code}

However, there are a few scenarios where problems can arise.

# The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
tasks that reuse an ID and get assigned to the same executor.
#* See [this 
email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
 for more info, as well as the attachment on this issue.
# Issues during network partitions and master failover, where a TaskID might 
appear to be unique in the system, whereas in actuality another Task is running 
with that ID and was just partitioned away for some time.

In light of these issues, we should simply update the document(s) to make it 
abundantly clear that reusing TaskIDs is never ok.  At the minimum this should 
involve updating the afore-mentioned comments in {{mesos.proto}}.  Also any 
framework development guides that talk about TaskID creation should be updated.


> document TaskID uniqueness requirement
> --------------------------------------
>
>                 Key: MESOS-4737
>                 URL: https://issues.apache.org/jira/browse/MESOS-4737
>             Project: Mesos
>          Issue Type: Task
>          Components: documentation
>    Affects Versions: 0.27.0
>            Reporter: Erik Weathers
>            Assignee: Erik Weathers
>            Priority: Minor
>              Labels: documentation
>
> There are comments above the definition of TaskID in 
> [mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
>  which lead one to believe it is ok to reuse TaskID values so long as you 
> guarantee there will only ever be 1 such TaskID running at the same time.
> {code: title=existing comments for TaskID}
>  * A framework generated ID to distinguish a task. The ID must remain
>  * unique while the task is active. However, a framework can reuse an
>  * ID _only_ if a previous task with the same ID has reached a
>  * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
> {code}
> However, there are a few scenarios where problems can arise.
> # The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
> tasks that reuse an ID and get assigned to the same executor.
> #* See [this 
> email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
>  for more info, as well as the attachment on this issue.
> # Issues during network partitions and master failover, where a TaskID might 
> appear to be unique in the system, whereas in actuality another Task is 
> running with that ID and was just partitioned away for some time.
> In light of these issues, we should simply update the document(s) to make it 
> abundantly clear that reusing TaskIDs is never ok.  At the minimum this 
> should involve updating the afore-mentioned comments in {{mesos.proto}}.  
> Also any framework development guides that talk about TaskID creation should 
> be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to