Re: Mesos rare TASK_LOST scenario v 0.21.0

2018-01-09 Thread Vinod Kone
0.21 is really old and not supported. I highly recommend you upgrade to
1.3+.

Regarding what you are seeing, we definitely had issues in the past where
the command executor didn't stay up long enough to guarantee that
TASK_FINISHED was delivered to the agent; so races like above were possible.

On Tue, Jan 9, 2018 at 5:33 PM, Ajay V  wrote:

> Hello,
>
> I'm trying to debug a TASK_LOST thats generated on the agent that I see on
> rare occasions.
>
> Following is a log that I'm trying to understand. This is happening after
> the driver.sendStatusUpdate() has been called with a task state of
> TASK_FINISHED from a java executor. It looks to me like the container is
> already exited before the TASK_FINISHED  is processed. Is there a timing
> issue here in this version of mesos that is causing this? The effect of
> this problem is that, even though the work of the executor is complete and
> the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
> marked as LOST and the actual update of TASK_FINISHED is ignored.
>
> I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container
> 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited
>
> I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
> 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>
> W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>
> W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
> statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
> Failed to get usage: No process found at 28952
>
> I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
> 20171208-050805-140555025-5050-3470- exited with status 0
>
> I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470- from @0.0.0.0:0
>
> I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
> ff631ad1-cfab-493e-be18-961581abcf3d
>
> W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
> unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109
>
> I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status
> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470- to the slave
>
> I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update
> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470- to master@17.179.96.8:5050
>
> I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
> successfully handled status update TASK_LOST (UUID: 
> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5)
> for task ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status
> update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for
> task ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up
> status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of
> framework 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager
> successfully handled status update acknowledgement (UUID:
> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task 
> ff631ad1-cfab-493e-be18-961581abcf3d
> of framework 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task
> ff631ad1-cfab-493e-be18-961581abcf3d
>
> I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor
> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
> 20171208-050805-140555025-5050-3470-
>
> Regards,
> Ajay
>


Mesos rare TASK_LOST scenario v 0.21.0

2018-01-09 Thread Ajay V
Hello,

I'm trying to debug a TASK_LOST thats generated on the agent that I see on
rare occasions.

Following is a log that I'm trying to understand. This is happening after
the driver.sendStatusUpdate() has been called with a task state of
TASK_FINISHED from a java executor. It looks to me like the container is
already exited before the TASK_FINISHED  is processed. Is there a timing
issue here in this version of mesos that is causing this? The effect of
this problem is that, even though the work of the executor is complete and
the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
marked as LOST and the actual update of TASK_FINISHED is ignored.

I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container
'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited

I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
'bb0e5f2d-4bdb-479c-b829-4741993c4109'

W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'

W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
Failed to get usage: No process found at 28952

I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
20171208-050805-140555025-5050-3470- exited with status 0

I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470- from @0.0.0.0:0

I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
ff631ad1-cfab-493e-be18-961581abcf3d

W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109

I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-

I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470- to the slave

I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update TASK_LOST
(UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470- to master@17.179.96.8:5050

I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
successfully handled status update TASK_LOST (UUID:
f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-

I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for
task ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-

I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up
status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of
framework 20171208-050805-140555025-5050-3470-

I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager
successfully handled status update acknowledgement (UUID:
f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
ff631ad1-cfab-493e-be18-961581abcf3d of framework
20171208-050805-140555025-5050-3470-

I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task
ff631ad1-cfab-493e-be18-961581abcf3d

I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor
'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
20171208-050805-140555025-5050-3470-

Regards,
Ajay


Re: Doc-a-thon - January 11th, 2018

2018-01-09 Thread James Peach
Just a reminder that the Docathon is this Thursday :)

> On Nov 21, 2017, at 4:14 PM, Judith Malnick  wrote:
> 
> Hi all,
> 
> I'm excited to announce the next Apache Mesos doc-a-thon!
> 
> *Date:* January 11th, 2018
> 
> Location:
> 
> Mesosphere HQ
> 
> 88 Stevenson Street
> 
> San Francisco, CA
> 
> Schedule (Pacific time):
> 
> 3 - 3:30 PM: Discuss docs projects, split into groups
> 
> 3:30 - 6:30 PM: Work on docs
> 
> 6:30 - 7 PM: Present progress
> 
> 7 - 8 PM: Drinks and hangout!
> 
> 
> If you will be attending in person, please RSVP
>  so we
> know how much food to get.
> If you plan on attending remotely, you can with this Zoom link
> .
> Feel free to brainstorm project proposals on this planning doc
> .
> 
> 
> Let me know if you have any questions. I'm looking forward to seeing all of
> you and your amazing projects!
> 
> All the Best,
> Judith
> -- 
> Judith Malnick
> Community Manager
> 310-709-1517