Hi ben,

I ran into the same issue here.

This also happens in our hadoop framework. The slave log is like these. At
that time, I think the work load of the node is very high.

I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
for removal
I0708 23:36:44.256206 11001 slave.cpp:837] Launching task Task_Tracker_224
for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
with resources cpus=1; mem=1280' for framework
201307040929-252063498-5050-27411-0
000
I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task 'Task_Tracker_224'
for executor executor_Task_Tracker_224 of framework
'201307040929-252063498-5050-27411-0000
I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor at
2220
I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
allowed age: 2.295155852123924days
I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for executor
'executor_Task_Tracker_224' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
Task_Tracker_224 for executor 'executor_Task_Tracker_224' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 f
rom executor(1)@10.47.6.21:2310
I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received status
update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
Task_Tracker_224 of framework 201307040929-252063498-50
50-27411-0000 with checkpoint=false
I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
StatusUpdate stream for task Task_Tracker_224 of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
task Task_Tracker_224 of framework 201307040929-252063498-
5050-27411-0000 to master@10.47.6.15:5050
I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement for
status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
task Task_Tracker_224 of framework 201307040929-25206349
8-5050-27411-0000 to executor(1)@10.47.6.21:2310
I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received status
update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
Task_Tracker_224 of framework 201307040929-252063498-5050-27
411-0000
I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:50.259472 11005 slave.cpp:837] Launching task Task_Tracker_230
for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task 'Task_Tracker_230'
for executor executor_Task_Tracker_230 of framework
'201307040929-252063498-5050-27411-0000
I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
with resources cpus=1; mem=1280' for framework
201307040929-252063498-5050-27411-0
000
I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor at
2851
I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for executor
'executor_Task_Tracker_230' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
Task_Tracker_230 for executor 'executor_Task_Tracker_230' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 f
rom executor(1)@10.47.6.21:27786
I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received status
update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
Task_Tracker_230 of framework 201307040929-252063498-50
50-27411-0000 with checkpoint=false
I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
StatusUpdate stream for task Task_Tracker_230 of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
task Task_Tracker_230 of framework 201307040929-252063498-
5050-27411-0000 to master@10.47.6.15:5050
I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement for
status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
task Task_Tracker_230 of framework 201307040929-25206349
8-5050-27411-0000 to executor(1)@10.47.6.21:27786
I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received status
update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
Task_Tracker_230 of framework 201307040929-252063498-5050-27
411-0000
I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293704423241597days
I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293703916528542days
I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293639867998055days
I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292921551567535days
I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
allowed age: 2.291521098018820days
I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293668041244063days
I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292935638190544days
I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292916079066516days
I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
allowed age: 2.291485324076945days
I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293641894850289days
I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293629429709074days
I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.293525350847025days
I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292909289111539days
I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
allowed age: 2.291438098419977days
I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293635104895313days
I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292983775931019days
I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
allowed age: 2.286910009194236days
I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
allowed age: 2.286909502481169days
I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
allowed age: 2.284414244700093days
I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
allowed age: 2.280636901540567days
I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
allowed age: 2.279481899796968days
I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
allowed age: 2.276566475548496days
I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
allowed age: 2.275690368671817days
I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
allowed age: 2.275057180034989days
I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
allowed age: 2.273999467198449days
I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
allowed age: 2.273472384275891days
I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
allowed age: 2.282894612240220days
I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281966516603831days
I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281962260214144days
I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281791801941551days
I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281715288269849days
I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281699782850289days
I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
allowed age: 2.280776044946192days
I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
allowed age: 2.280772193926956days
I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
allowed age: 2.279204525069213days
I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
allowed age: 2.277132676719109days
I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
allowed age: 2.280012428368322days
I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
allowed age: 2.276733690857512days
I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
allowed age: 2.272715152282546days
I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
allowed age: 2.270354274804352days
I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
allowed age: 2.266927678423322days
I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
allowed age: 2.264218182361482days
I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
allowed age: 2.261509598383137days
I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
allowed age: 2.259379478031400days
I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
allowed age: 2.255819920144039days
I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
allowed age: 2.253314528101817days
I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
allowed age: 2.250524870034248days
I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
allowed age: 2.246784618270532days
I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
allowed age: 2.242399422127049days
I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
allowed age: 2.240250654734792days
I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
allowed age: 2.240516983117894days
I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
allowed age: 2.235834143724352days
I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
allowed age: 2.233297436815162days
W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
allowed age: 2.232469873049410days


Guodong


On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler
<benjamin.mah...@gmail.com>wrote:

> Are these the un-edited logs? I'm expecting to see some logs from the
> process_isolator or cgroups_isolator in there.
>
>
> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> brenden.matth...@airbedandbreakfast.com> wrote:
>
> > Hey guys,
> >
> > I'm currently having a problem where tasks will get stuck in the staging
> > state, though according to the logs they should have been terminated.
>  They
> > hang indefinitely, or until I restart the slave.  Below is a screenshot +
> > logs.  Also interesting is the 'Failed to collect resource usage ...'
> > messages.
> >
> > [image: Inline image 2]
> >
> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> >> n for framework chronos
> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> >> or framework chronos
> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> >>
> >>
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> >> f2-4d1ce60d618f'
> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> >> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> >> or executor
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> framework 'c
> >> hronos
> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
> >>
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
> >> allowed age: 22.955009563956388hrs
> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> framework chronos because it did not register within 1mins
> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
> >> allowed age: 22.955009342481944hrs
> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >
> >
> >
>

Reply via email to