date:20160407

Re: [VOTE] Release Apache Mesos 0.26.1 (rc4)

2016-04-07 Thread Kapil Arya

+1 (binding)

CI runs with: amd64/centos/6 amd64/centos/7 amd64/debian/jessie
amd64/ubuntu/precise amd64/ubuntu/trusty amd64/ubuntu/vivid
amd64/ubuntu/wily

On Wed, Apr 6, 2016 at 11:53 PM, Vinod Kone  wrote:

> +1(binding)
>
> Tested on ASF CI.
>
> The centos failures are due to a known bug in the docker build script that
> cannot guess JAVA_HOME (has been fixed since).
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl
> [image: Failed]
> 
> [image: Not run]
> --verbose
> [image: Failed]
> 
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl
> [image: Success]
> 
> [image: Success]
> 
> --verbose
> [image: Success]
> 
> [image: Success]
> 
>
> On Wed, Apr 6, 2016 at 8:50 PM, Vinod Kone  wrote:
>
>> oops wrong email thread. this should be for 28.1 voting thread.
>>
>> canceling this particular vote. i'll vote for 26.1 shortly.
>>
>> On Wed, Apr 6, 2016 at 8:46 PM, Vinod Kone  wrote:
>>
>>> +1 (binding)
>>>
>>> Tested on ASF CI. There was one flaky test that's new:
>>> https://issues.apache.org/jira/browse/MESOS-5139
>>>
>>> Configuration Matrix gcc clang
>>> centos:7 --verbose --enable-libevent --enable-ssl
>>> [image: Success]
>>> 
>>> [image: Not run]
>>> --verbose
>>> [image: Success]
>>> 
>>> [image: Not run]
>>> ubuntu:14.04 --verbose --enable-libevent --enable-ssl
>>> [image: Success]
>>> 
>>> [image: Failed]
>>> 
>>> --verbose
>>> [image: Success]
>>> 
>>> [image: Success]
>>> 
>>>
>>> On Wed, Apr 6, 2016 at 6:17 PM, Benjamin Mahler 
>>> wrote:
>>>
 +1 (binding)

 The following passes on OS X:
 $ ./configure CC=clang CXX=clang++ --disable-python --disable-java
 $ make check

 On Tue, Apr 5, 2016 at 11:41 PM, Michael Park  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos
> 0.26.1.
>
>
> 0.26.1 includes the following:
>
> 
> No changes from rc3:
>
> * Improvements
>   - `/state` endpoint performance
>   - `systemd` integration
>   - GLOG performance
>   - Configurable task/framework history
>   - Offer filter timeout fix for backlogged allocator
>   - Deletion of spec

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread Greg Mann

Hi June,
Are these Spark tasks being run in cluster mode or client mode? If it's
client mode, then perhaps your local Spark scheduler is tearing itself down
before the executors exit, thus leaving them orphaned.

I'd love to see master/agent logs during the time that the tasks are
becoming orphaned if you have them available.

Cheers,
Greg


On Thu, Apr 7, 2016 at 1:08 PM, June Taylor  wrote:

> Just a quick update... I was only able to get the orphans cleared by
> stopping mesos-slave, deleting the contents of the scratch directory, and
> then restarting mesos-slave.
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 12:01 PM, Vinod Kone  wrote:
>
>> A task/executor is called "orphaned" if the corresponding scheduler
>> doesn't register with Mesos. Is your framework scheduler running or gone
>> for good? The resources should be cleaned up if the agent (and consequently
>> the master) have realized that the executor exited.
>>
>> Can you paste the master and agent logs for one of orphaned
>> tasks/executors (grep the log with the task/executor id)?
>>
>> On Thu, Apr 7, 2016 at 9:00 AM, haosdent  wrote:
>>
>>> Hmm, sorry for didn't express my idea clear. I mean kill those orphan
>>> tasks here.
>>>
>>> On Thu, Apr 7, 2016 at 11:57 PM, June Taylor  wrote:
>>>
 Forgive my ignorance, are you literally saying I should just sigkill
 these instances? How will that clean up the mesos orphans?


 Thanks,
 June Taylor
 System Administrator, Minnesota Population Center
 University of Minnesota

 On Thu, Apr 7, 2016 at 10:44 AM, haosdent  wrote:

> Support you --work_dir=/tmp/mesos. So you could
>
> $ find /tmp/mesos -name $YOUR_EXECUTOR_ID
>
> Then you could get a folder list and then could use lsof on them.
>
> As a example, my executor id is "test" here.
>
> $ find /tmp/mesos/ -name 'test'
>
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test
>
> When I execute
> lsof 
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
> (Keep in mind I append runs/latest) here.
>
> Then you could see the pid list:
>
> COMMAND PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
> mesos-exe 21811 haosdent  cwdDIR8,36 3221463220
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
> sleep 21847 haosdent  cwdDIR8,36 3221463220
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
>
> Kill all of them.
>
> On Thu, Apr 7, 2016 at 11:23 PM, June Taylor  wrote:
>
>> I do have the executor ID. Can you advise how to kill it?
>>
>> I have one master and three slaves. Each slave has one of these
>> orphans.
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:
>>
>>> >Going to this slave I can find an executor within the mesos
>>> working directory which matches this framework ID
>>> The quickest way here is use kill in slave if you could find the
>>> mesos-executor id. You make use lsof/fuser or dig log to find out the
>>> executor pid.
>>>
>>> However, it still wired according your feedbacks. Do you have
>>> multiple masters and fail over happens in your master? So that the slave
>>> could not collect to the new master and tasks become orphan.
>>>
>>> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:
>>>
 Here is one of three orphaned tasks (first two octets of IP
 removed):

 "orphan_tasks": [
 {
 "executor_id": "",
 "name": "Task 1",
 "framework_id":
 "14cddded-e692-4838-9893-6e04a81481d8-0006",
 "state": "TASK_RUNNING",
 "statuses": [
 {
 "timestamp": 1459887295.05554,
 "state": "TASK_RUNNING",
 "container_status": {
 "network_infos": [
 {
 "ip_addresses": [
 {
 "ip_address":
 "xxx.xxx.163.205"
 }

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread June Taylor

Just a quick update... I was only able to get the orphans cleared by
stopping mesos-slave, deleting the contents of the scratch directory, and
then restarting mesos-slave.


Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

On Thu, Apr 7, 2016 at 12:01 PM, Vinod Kone  wrote:

> A task/executor is called "orphaned" if the corresponding scheduler
> doesn't register with Mesos. Is your framework scheduler running or gone
> for good? The resources should be cleaned up if the agent (and consequently
> the master) have realized that the executor exited.
>
> Can you paste the master and agent logs for one of orphaned
> tasks/executors (grep the log with the task/executor id)?
>
> On Thu, Apr 7, 2016 at 9:00 AM, haosdent  wrote:
>
>> Hmm, sorry for didn't express my idea clear. I mean kill those orphan
>> tasks here.
>>
>> On Thu, Apr 7, 2016 at 11:57 PM, June Taylor  wrote:
>>
>>> Forgive my ignorance, are you literally saying I should just sigkill
>>> these instances? How will that clean up the mesos orphans?
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Thu, Apr 7, 2016 at 10:44 AM, haosdent  wrote:
>>>
 Support you --work_dir=/tmp/mesos. So you could

 $ find /tmp/mesos -name $YOUR_EXECUTOR_ID

 Then you could get a folder list and then could use lsof on them.

 As a example, my executor id is "test" here.

 $ find /tmp/mesos/ -name 'test'

 /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test

 When I execute
 lsof 
 /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
 (Keep in mind I append runs/latest) here.

 Then you could see the pid list:

 COMMAND PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
 mesos-exe 21811 haosdent  cwdDIR8,36 3221463220
 /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
 sleep 21847 haosdent  cwdDIR8,36 3221463220
 /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11

 Kill all of them.

 On Thu, Apr 7, 2016 at 11:23 PM, June Taylor  wrote:

> I do have the executor ID. Can you advise how to kill it?
>
> I have one master and three slaves. Each slave has one of these
> orphans.
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:
>
>> >Going to this slave I can find an executor within the mesos working
>> directory which matches this framework ID
>> The quickest way here is use kill in slave if you could find the
>> mesos-executor id. You make use lsof/fuser or dig log to find out the
>> executor pid.
>>
>> However, it still wired according your feedbacks. Do you have
>> multiple masters and fail over happens in your master? So that the slave
>> could not collect to the new master and tasks become orphan.
>>
>> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:
>>
>>> Here is one of three orphaned tasks (first two octets of IP removed):
>>>
>>> "orphan_tasks": [
>>> {
>>> "executor_id": "",
>>> "name": "Task 1",
>>> "framework_id":
>>> "14cddded-e692-4838-9893-6e04a81481d8-0006",
>>> "state": "TASK_RUNNING",
>>> "statuses": [
>>> {
>>> "timestamp": 1459887295.05554,
>>> "state": "TASK_RUNNING",
>>> "container_status": {
>>> "network_infos": [
>>> {
>>> "ip_addresses": [
>>> {
>>> "ip_address":
>>> "xxx.xxx.163.205"
>>> }
>>> ],
>>> "ip_address": "xxx.xxx.163.205"
>>> }
>>> ]
>>> }
>>> }
>>> ],
>>> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
>>> "id": "1",
>>> "resources": {
>>> "mem": 112640.0,
>>> "disk": 0.0,
>>> "cpus": 30.0
>>> }
>>> }

Re: [VOTE] Release Apache Mesos 0.25.1 (rc4)

2016-04-07 Thread Kapil Arya

+1 (binding)

CI runs for amd64/centos/6 amd64/centos/7 amd64/debian/jessie
amd64/ubuntu/precise amd64/ubuntu/trusty amd64/ubuntu/vivid amd64/


On Wed, Apr 6, 2016 at 9:43 PM, Vinod Kone  wrote:

> +1 (binding)
>
> `./configure && make check` on ubuntu 14.04
>
> On Wed, Apr 6, 2016 at 6:18 PM, Benjamin Mahler 
> wrote:
>
>> +1 (binding)
>>
>> The following passes on OS X:
>> $ ./configure CC=clang CXX=clang++ --disable-python --disable-java
>> $ make check
>>
>> On Tue, Apr 5, 2016 at 11:41 PM, Michael Park  wrote:
>>
>> > s/No changes from rc4/No changes from rc3/
>> > s/New fixes in rc5/New fixes in rc4/
>> >
>> > On 5 April 2016 at 23:18, Michael Park  wrote:
>> >
>> >> Hi all,
>> >>
>> >> Please vote on releasing the following candidate as Apache Mesos
>> 0.25.1.
>> >>
>> >>
>> >> 0.25.1 includes the following:
>> >>
>> >>
>> 
>> >> No changes from rc4:
>> >>
>> >> * Improvements
>> >>   - `/state` endpoint performance
>> >>   - `systemd` integration
>> >>   - GLOG performance
>> >>   - Configurable task/framework history
>> >>   - Offer filter timeout fix for backlogged allocator
>> >>   - JSON-based credential files. (MESOS-3560)
>> >>   - Mesos health check within docker container. (MESOS-3738)
>> >>   - Deletion of special files. (MESOS-4979)
>> >>   - Memory leak in subprocess. (MESOS-5021)
>> >>
>> >> * Bugs
>> >>   - SSL
>> >>   - Libevent
>> >>   - Fixed point resources math
>> >>   - HDFS
>> >>   - Agent upgrade compatibility
>> >>   - Health checks
>> >>
>> >> New fixes in rc5:
>> >>   - Build failure on OS 10.11 using Xcode 7. (MESOS-3030)
>> >>   - ExamplesTest.PersistentVolumeFramework does not work in OS X El
>> >> Capitan. (MESOS-3604)
>> >>
>> >> Thank you to Evan Krall from Yelp for requesting MESOS-3560 and
>> >> MESOS-3738 to be included,
>> >> and Ben Mahler for requesting MESOS-3030, MESOS-3604, MESOS-4979 and
>> >> MESOS-5021.
>> >>
>> >> The CHANGELOG for the release is available at:
>> >>
>> >>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.25.1-rc4
>> >>
>> >>
>> 
>> >>
>> >> The candidate for Mesos 0.25.1 release is available at:
>> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc4/mesos-0.25.1.tar.gz
>> >>
>> >> The tag to be voted on is 0.25.1-rc4:
>> >>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.25.1-rc4
>> >>
>> >> The MD5 checksum of the tarball can be found at:
>> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc4/mesos-0.25.1.tar.gz.md5
>> >>
>> >> The signature of the tarball can be found at:
>> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc4/mesos-0.25.1.tar.gz.asc
>> >>
>> >> The PGP key used to sign the release is here:
>> >> https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >>
>> >> The JAR is up in Maven in a staging repository here:
>> >> https://repository.apache.org/content/repositories/orgapachemesos-1136
>> >>
>> >> Please vote on releasing this package as Apache Mesos 0.25.1!
>> >>
>> >> The vote is open until Fri Apr 8 23:59:59 PDT 2016 and passes if a
>> >> majority of at least 3 +1 PMC votes are cast.
>> >>
>> >> [ ] +1 Release this package as Apache Mesos 0.25.1
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> Thanks,
>> >>
>> >> MPark
>> >>
>> >
>> >
>>
>
>

Re: [VOTE] Release Apache Mesos 0.24.2 (rc5)

2016-04-07 Thread Kapil Arya

+1 (binding)

CI runs for amd64/centos/6 amd64/centos/7 amd64/debian/jessie
amd64/ubuntu/precise amd64/ubuntu/trusty amd64/ubuntu/vivid amd64/




On Thu, Apr 7, 2016 at 12:44 AM, Vinod Kone  wrote:

> +1 (binding)
>
> make check on ubuntu 14.04
>
> On Wed, Apr 6, 2016 at 6:17 PM, Benjamin Mahler 
> wrote:
>
>> +1 (binding)
>>
>> The following passes on OS X:
>> $ ./configure CC=clang CXX=clang++ --disable-python --disable-java
>> $ make check
>>
>> On Tue, Apr 5, 2016 at 10:51 PM, Michael Park  wrote:
>>
>>> Hi all,
>>>
>>> Please vote on releasing the following candidate as Apache Mesos 0.24.2.
>>>
>>>
>>> 0.24.2 includes the following:
>>>
>>> 
>>> No changes from rc4:
>>>
>>> * Improvements
>>> - Allocator filter performance
>>> - Port Ranges performance
>>> - UUID performance
>>> - `/state` endpoint performance
>>>   - GLOG performance
>>>   - Configurable task/framework history
>>>   - Offer filter timeout fix for backlogged allocator
>>>   - JSON-based credential files. (MESOS-3560)
>>>   - Mesos health check within docker container. (MESOS-3738)
>>>   - Deletion of special files. (MESOS-4979)
>>>   - Memory leak in subprocess. (MESOS-5021)
>>>
>>> * Bugs
>>>   - SSL
>>>   - Libevent
>>>   - Fixed point resources math
>>>   - HDFS
>>>   - Agent upgrade compatibility
>>>   - Health checks
>>>
>>> New fixes in rc5:
>>>   - Build failure on OS 10.11 using Xcode 7. (MESOS-3030)
>>>   - ExamplesTest.PersistentVolumeFramework does not work in OS X El
>>> Capitan. (MESOS-3604)
>>>
>>> Thank you to Evan Krall from Yelp for requesting MESOS-3560 and
>>> MESOS-3738
>>> to be included,
>>> and Ben Mahler for requesting MESOS-3030, MESOS-3604, MESOS-4979 and
>>> MESOS-5021.
>>>
>>> The CHANGELOG for the release is available at:
>>>
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.2-rc5
>>>
>>> 
>>>
>>> The candidate for Mesos 0.24.2 release is available at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc5/mesos-0.24.2.tar.gz
>>>
>>> The tag to be voted on is 0.24.2-rc5:
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.24.2-rc5
>>>
>>> The MD5 checksum of the tarball can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc5/mesos-0.24.2.tar.gz.md5
>>>
>>> The signature of the tarball can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc5/mesos-0.24.2.tar.gz.asc
>>>
>>> The PGP key used to sign the release is here:
>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>>
>>> The JAR is up in Maven in a staging repository here:
>>> https://repository.apache.org/content/repositories/orgapachemesos-1134
>>>
>>> Please vote on releasing this package as Apache Mesos 0.24.2!
>>>
>>> The vote is open until Fri Apr 8 23:59:59 PDT 2016 and passes if a
>>> majority
>>> of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Mesos 0.24.2
>>> [ ] -1 Do not release this package because ...
>>>
>>> Thanks,
>>>
>>> MPark
>>>
>>
>>
>

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread Vinod Kone

A task/executor is called "orphaned" if the corresponding scheduler doesn't
register with Mesos. Is your framework scheduler running or gone for good?
The resources should be cleaned up if the agent (and consequently the
master) have realized that the executor exited.

Can you paste the master and agent logs for one of orphaned tasks/executors
(grep the log with the task/executor id)?

On Thu, Apr 7, 2016 at 9:00 AM, haosdent  wrote:

> Hmm, sorry for didn't express my idea clear. I mean kill those orphan
> tasks here.
>
> On Thu, Apr 7, 2016 at 11:57 PM, June Taylor  wrote:
>
>> Forgive my ignorance, are you literally saying I should just sigkill
>> these instances? How will that clean up the mesos orphans?
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Thu, Apr 7, 2016 at 10:44 AM, haosdent  wrote:
>>
>>> Support you --work_dir=/tmp/mesos. So you could
>>>
>>> $ find /tmp/mesos -name $YOUR_EXECUTOR_ID
>>>
>>> Then you could get a folder list and then could use lsof on them.
>>>
>>> As a example, my executor id is "test" here.
>>>
>>> $ find /tmp/mesos/ -name 'test'
>>>
>>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test
>>>
>>> When I execute
>>> lsof 
>>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
>>> (Keep in mind I append runs/latest) here.
>>>
>>> Then you could see the pid list:
>>>
>>> COMMAND PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
>>> mesos-exe 21811 haosdent  cwdDIR8,36 3221463220
>>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
>>> sleep 21847 haosdent  cwdDIR8,36 3221463220
>>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
>>>
>>> Kill all of them.
>>>
>>> On Thu, Apr 7, 2016 at 11:23 PM, June Taylor  wrote:
>>>
 I do have the executor ID. Can you advise how to kill it?

 I have one master and three slaves. Each slave has one of these orphans.


 Thanks,
 June Taylor
 System Administrator, Minnesota Population Center
 University of Minnesota

 On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:

> >Going to this slave I can find an executor within the mesos working
> directory which matches this framework ID
> The quickest way here is use kill in slave if you could find the
> mesos-executor id. You make use lsof/fuser or dig log to find out the
> executor pid.
>
> However, it still wired according your feedbacks. Do you have multiple
> masters and fail over happens in your master? So that the slave could not
> collect to the new master and tasks become orphan.
>
> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:
>
>> Here is one of three orphaned tasks (first two octets of IP removed):
>>
>> "orphan_tasks": [
>> {
>> "executor_id": "",
>> "name": "Task 1",
>> "framework_id":
>> "14cddded-e692-4838-9893-6e04a81481d8-0006",
>> "state": "TASK_RUNNING",
>> "statuses": [
>> {
>> "timestamp": 1459887295.05554,
>> "state": "TASK_RUNNING",
>> "container_status": {
>> "network_infos": [
>> {
>> "ip_addresses": [
>> {
>> "ip_address":
>> "xxx.xxx.163.205"
>> }
>> ],
>> "ip_address": "xxx.xxx.163.205"
>> }
>> ]
>> }
>> }
>> ],
>> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
>> "id": "1",
>> "resources": {
>> "mem": 112640.0,
>> "disk": 0.0,
>> "cpus": 30.0
>> }
>> }
>>
>> Going to this slave I can find an executor within the mesos working
>> directory which matches this framework ID. Reviewing the stdout messaging
>> within indicates the program has finished its work. But, it is still
>> holding these resources open.
>>
>> This framework ID is not shown as Active in the main Mesos Web UI,
>> but does show up if you display the Slave's web UI.
>>
>> The resources consumed c

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread haosdent

Hmm, sorry for didn't express my idea clear. I mean kill those orphan tasks
here.

On Thu, Apr 7, 2016 at 11:57 PM, June Taylor  wrote:

> Forgive my ignorance, are you literally saying I should just sigkill these
> instances? How will that clean up the mesos orphans?
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 10:44 AM, haosdent  wrote:
>
>> Support you --work_dir=/tmp/mesos. So you could
>>
>> $ find /tmp/mesos -name $YOUR_EXECUTOR_ID
>>
>> Then you could get a folder list and then could use lsof on them.
>>
>> As a example, my executor id is "test" here.
>>
>> $ find /tmp/mesos/ -name 'test'
>>
>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test
>>
>> When I execute
>> lsof 
>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
>> (Keep in mind I append runs/latest) here.
>>
>> Then you could see the pid list:
>>
>> COMMAND PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
>> mesos-exe 21811 haosdent  cwdDIR8,36 3221463220
>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
>> sleep 21847 haosdent  cwdDIR8,36 3221463220
>> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
>>
>> Kill all of them.
>>
>> On Thu, Apr 7, 2016 at 11:23 PM, June Taylor  wrote:
>>
>>> I do have the executor ID. Can you advise how to kill it?
>>>
>>> I have one master and three slaves. Each slave has one of these orphans.
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:
>>>
 >Going to this slave I can find an executor within the mesos working
 directory which matches this framework ID
 The quickest way here is use kill in slave if you could find the
 mesos-executor id. You make use lsof/fuser or dig log to find out the
 executor pid.

 However, it still wired according your feedbacks. Do you have multiple
 masters and fail over happens in your master? So that the slave could not
 collect to the new master and tasks become orphan.

 On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:

> Here is one of three orphaned tasks (first two octets of IP removed):
>
> "orphan_tasks": [
> {
> "executor_id": "",
> "name": "Task 1",
> "framework_id":
> "14cddded-e692-4838-9893-6e04a81481d8-0006",
> "state": "TASK_RUNNING",
> "statuses": [
> {
> "timestamp": 1459887295.05554,
> "state": "TASK_RUNNING",
> "container_status": {
> "network_infos": [
> {
> "ip_addresses": [
> {
> "ip_address": "xxx.xxx.163.205"
> }
> ],
> "ip_address": "xxx.xxx.163.205"
> }
> ]
> }
> }
> ],
> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
> "id": "1",
> "resources": {
> "mem": 112640.0,
> "disk": 0.0,
> "cpus": 30.0
> }
> }
>
> Going to this slave I can find an executor within the mesos working
> directory which matches this framework ID. Reviewing the stdout messaging
> within indicates the program has finished its work. But, it is still
> holding these resources open.
>
> This framework ID is not shown as Active in the main Mesos Web UI, but
> does show up if you display the Slave's web UI.
>
> The resources consumed count towards the Idle pool, and have resulted
> in zero available resources for other Offers.
>
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 9:46 AM, haosdent  wrote:
>
>> > pyspark executors hanging around and consuming resources marked as
>> Idle in mesos Web UI
>>
>> Do you have some logs about this?
>>
>> >is there an API call I can make to kill these orphans?
>>
>> As I know, mesos agent wou

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread June Taylor

Forgive my ignorance, are you literally saying I should just sigkill these
instances? How will that clean up the mesos orphans?


Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

On Thu, Apr 7, 2016 at 10:44 AM, haosdent  wrote:

> Support you --work_dir=/tmp/mesos. So you could
>
> $ find /tmp/mesos -name $YOUR_EXECUTOR_ID
>
> Then you could get a folder list and then could use lsof on them.
>
> As a example, my executor id is "test" here.
>
> $ find /tmp/mesos/ -name 'test'
>
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test
>
> When I execute
> lsof 
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
> (Keep in mind I append runs/latest) here.
>
> Then you could see the pid list:
>
> COMMAND PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
> mesos-exe 21811 haosdent  cwdDIR8,36 3221463220
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
> sleep 21847 haosdent  cwdDIR8,36 3221463220
> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
>
> Kill all of them.
>
> On Thu, Apr 7, 2016 at 11:23 PM, June Taylor  wrote:
>
>> I do have the executor ID. Can you advise how to kill it?
>>
>> I have one master and three slaves. Each slave has one of these orphans.
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:
>>
>>> >Going to this slave I can find an executor within the mesos working
>>> directory which matches this framework ID
>>> The quickest way here is use kill in slave if you could find the
>>> mesos-executor id. You make use lsof/fuser or dig log to find out the
>>> executor pid.
>>>
>>> However, it still wired according your feedbacks. Do you have multiple
>>> masters and fail over happens in your master? So that the slave could not
>>> collect to the new master and tasks become orphan.
>>>
>>> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:
>>>
 Here is one of three orphaned tasks (first two octets of IP removed):

 "orphan_tasks": [
 {
 "executor_id": "",
 "name": "Task 1",
 "framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006",
 "state": "TASK_RUNNING",
 "statuses": [
 {
 "timestamp": 1459887295.05554,
 "state": "TASK_RUNNING",
 "container_status": {
 "network_infos": [
 {
 "ip_addresses": [
 {
 "ip_address": "xxx.xxx.163.205"
 }
 ],
 "ip_address": "xxx.xxx.163.205"
 }
 ]
 }
 }
 ],
 "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
 "id": "1",
 "resources": {
 "mem": 112640.0,
 "disk": 0.0,
 "cpus": 30.0
 }
 }

 Going to this slave I can find an executor within the mesos working
 directory which matches this framework ID. Reviewing the stdout messaging
 within indicates the program has finished its work. But, it is still
 holding these resources open.

 This framework ID is not shown as Active in the main Mesos Web UI, but
 does show up if you display the Slave's web UI.

 The resources consumed count towards the Idle pool, and have resulted
 in zero available resources for other Offers.



 Thanks,
 June Taylor
 System Administrator, Minnesota Population Center
 University of Minnesota

 On Thu, Apr 7, 2016 at 9:46 AM, haosdent  wrote:

> > pyspark executors hanging around and consuming resources marked as
> Idle in mesos Web UI
>
> Do you have some logs about this?
>
> >is there an API call I can make to kill these orphans?
>
> As I know, mesos agent would try to clean orphan containers when
> restart. But I not sure the orphan I mean here is same with yours.
>
> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor  wrote:
>
>> Greetings mesos users!
>>
>> I am debugging an issue with pyspark executors han

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread haosdent

Support you --work_dir=/tmp/mesos. So you could

$ find /tmp/mesos -name $YOUR_EXECUTOR_ID

Then you could get a folder list and then could use lsof on them.

As a example, my executor id is "test" here.

$ find /tmp/mesos/ -name 'test'
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test

When I execute
lsof 
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
(Keep in mind I append runs/latest) here.

Then you could see the pid list:

COMMAND PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
mesos-exe 21811 haosdent  cwdDIR8,36 3221463220
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
sleep 21847 haosdent  cwdDIR8,36 3221463220
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11

Kill all of them.

On Thu, Apr 7, 2016 at 11:23 PM, June Taylor  wrote:

> I do have the executor ID. Can you advise how to kill it?
>
> I have one master and three slaves. Each slave has one of these orphans.
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:
>
>> >Going to this slave I can find an executor within the mesos working
>> directory which matches this framework ID
>> The quickest way here is use kill in slave if you could find the
>> mesos-executor id. You make use lsof/fuser or dig log to find out the
>> executor pid.
>>
>> However, it still wired according your feedbacks. Do you have multiple
>> masters and fail over happens in your master? So that the slave could not
>> collect to the new master and tasks become orphan.
>>
>> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:
>>
>>> Here is one of three orphaned tasks (first two octets of IP removed):
>>>
>>> "orphan_tasks": [
>>> {
>>> "executor_id": "",
>>> "name": "Task 1",
>>> "framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006",
>>> "state": "TASK_RUNNING",
>>> "statuses": [
>>> {
>>> "timestamp": 1459887295.05554,
>>> "state": "TASK_RUNNING",
>>> "container_status": {
>>> "network_infos": [
>>> {
>>> "ip_addresses": [
>>> {
>>> "ip_address": "xxx.xxx.163.205"
>>> }
>>> ],
>>> "ip_address": "xxx.xxx.163.205"
>>> }
>>> ]
>>> }
>>> }
>>> ],
>>> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
>>> "id": "1",
>>> "resources": {
>>> "mem": 112640.0,
>>> "disk": 0.0,
>>> "cpus": 30.0
>>> }
>>> }
>>>
>>> Going to this slave I can find an executor within the mesos working
>>> directory which matches this framework ID. Reviewing the stdout messaging
>>> within indicates the program has finished its work. But, it is still
>>> holding these resources open.
>>>
>>> This framework ID is not shown as Active in the main Mesos Web UI, but
>>> does show up if you display the Slave's web UI.
>>>
>>> The resources consumed count towards the Idle pool, and have resulted in
>>> zero available resources for other Offers.
>>>
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Thu, Apr 7, 2016 at 9:46 AM, haosdent  wrote:
>>>
 > pyspark executors hanging around and consuming resources marked as
 Idle in mesos Web UI

 Do you have some logs about this?

 >is there an API call I can make to kill these orphans?

 As I know, mesos agent would try to clean orphan containers when
 restart. But I not sure the orphan I mean here is same with yours.

 On Thu, Apr 7, 2016 at 10:21 PM, June Taylor  wrote:

> Greetings mesos users!
>
> I am debugging an issue with pyspark executors hanging around and
> consuming resources marked as Idle in mesos Web UI. These tasks also show
> up in the orphaned_tasks key in `mesos state`.
>
> I'm first wondering how to clear them out - is there an API call I can
> make to kill these orphans? Secondly, how it happened at all.
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minn

Das

2016-04-07 Thread Stephan Hadan



Stephan Hadan
Senior IT Operations Engineer

NOLTE&LAUTH

Seidenstraße 19
70174  Stuttgart
+49 160 94 81 00 08
stephan.ha...@nolte-lauth.com
www.nolteundlauth.de

NOLTE&LAUTH GmbH - Seidenstraße 19 - 70174 
Stuttgart - DE
Geschäftsführer: Karsten Lauth, Kai Müller, Markus Stauffenberg - Amtsgericht 
Stuttgart HRB 720342 - Ust.-ID.-Nr. DE 246431089

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread June Taylor

I do have the executor ID. Can you advise how to kill it?

I have one master and three slaves. Each slave has one of these orphans.


Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

On Thu, Apr 7, 2016 at 10:14 AM, haosdent  wrote:

> >Going to this slave I can find an executor within the mesos working
> directory which matches this framework ID
> The quickest way here is use kill in slave if you could find the
> mesos-executor id. You make use lsof/fuser or dig log to find out the
> executor pid.
>
> However, it still wired according your feedbacks. Do you have multiple
> masters and fail over happens in your master? So that the slave could not
> collect to the new master and tasks become orphan.
>
> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:
>
>> Here is one of three orphaned tasks (first two octets of IP removed):
>>
>> "orphan_tasks": [
>> {
>> "executor_id": "",
>> "name": "Task 1",
>> "framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006",
>> "state": "TASK_RUNNING",
>> "statuses": [
>> {
>> "timestamp": 1459887295.05554,
>> "state": "TASK_RUNNING",
>> "container_status": {
>> "network_infos": [
>> {
>> "ip_addresses": [
>> {
>> "ip_address": "xxx.xxx.163.205"
>> }
>> ],
>> "ip_address": "xxx.xxx.163.205"
>> }
>> ]
>> }
>> }
>> ],
>> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
>> "id": "1",
>> "resources": {
>> "mem": 112640.0,
>> "disk": 0.0,
>> "cpus": 30.0
>> }
>> }
>>
>> Going to this slave I can find an executor within the mesos working
>> directory which matches this framework ID. Reviewing the stdout messaging
>> within indicates the program has finished its work. But, it is still
>> holding these resources open.
>>
>> This framework ID is not shown as Active in the main Mesos Web UI, but
>> does show up if you display the Slave's web UI.
>>
>> The resources consumed count towards the Idle pool, and have resulted in
>> zero available resources for other Offers.
>>
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Thu, Apr 7, 2016 at 9:46 AM, haosdent  wrote:
>>
>>> > pyspark executors hanging around and consuming resources marked as
>>> Idle in mesos Web UI
>>>
>>> Do you have some logs about this?
>>>
>>> >is there an API call I can make to kill these orphans?
>>>
>>> As I know, mesos agent would try to clean orphan containers when
>>> restart. But I not sure the orphan I mean here is same with yours.
>>>
>>> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor  wrote:
>>>
 Greetings mesos users!

 I am debugging an issue with pyspark executors hanging around and
 consuming resources marked as Idle in mesos Web UI. These tasks also show
 up in the orphaned_tasks key in `mesos state`.

 I'm first wondering how to clear them out - is there an API call I can
 make to kill these orphans? Secondly, how it happened at all.

 Thanks,
 June Taylor
 System Administrator, Minnesota Population Center
 University of Minnesota

>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread haosdent

>Going to this slave I can find an executor within the mesos working
directory which matches this framework ID
The quickest way here is use kill in slave if you could find the
mesos-executor id. You make use lsof/fuser or dig log to find out the
executor pid.

However, it still wired according your feedbacks. Do you have multiple
masters and fail over happens in your master? So that the slave could not
collect to the new master and tasks become orphan.

On Thu, Apr 7, 2016 at 11:06 PM, June Taylor  wrote:

> Here is one of three orphaned tasks (first two octets of IP removed):
>
> "orphan_tasks": [
> {
> "executor_id": "",
> "name": "Task 1",
> "framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006",
> "state": "TASK_RUNNING",
> "statuses": [
> {
> "timestamp": 1459887295.05554,
> "state": "TASK_RUNNING",
> "container_status": {
> "network_infos": [
> {
> "ip_addresses": [
> {
> "ip_address": "xxx.xxx.163.205"
> }
> ],
> "ip_address": "xxx.xxx.163.205"
> }
> ]
> }
> }
> ],
> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
> "id": "1",
> "resources": {
> "mem": 112640.0,
> "disk": 0.0,
> "cpus": 30.0
> }
> }
>
> Going to this slave I can find an executor within the mesos working
> directory which matches this framework ID. Reviewing the stdout messaging
> within indicates the program has finished its work. But, it is still
> holding these resources open.
>
> This framework ID is not shown as Active in the main Mesos Web UI, but
> does show up if you display the Slave's web UI.
>
> The resources consumed count towards the Idle pool, and have resulted in
> zero available resources for other Offers.
>
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 9:46 AM, haosdent  wrote:
>
>> > pyspark executors hanging around and consuming resources marked as
>> Idle in mesos Web UI
>>
>> Do you have some logs about this?
>>
>> >is there an API call I can make to kill these orphans?
>>
>> As I know, mesos agent would try to clean orphan containers when restart.
>> But I not sure the orphan I mean here is same with yours.
>>
>> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor  wrote:
>>
>>> Greetings mesos users!
>>>
>>> I am debugging an issue with pyspark executors hanging around and
>>> consuming resources marked as Idle in mesos Web UI. These tasks also show
>>> up in the orphaned_tasks key in `mesos state`.
>>>
>>> I'm first wondering how to clear them out - is there an API call I can
>>> make to kill these orphans? Secondly, how it happened at all.
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread June Taylor

Here is one of three orphaned tasks (first two octets of IP removed):

"orphan_tasks": [
{
"executor_id": "",
"name": "Task 1",
"framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006",
"state": "TASK_RUNNING",
"statuses": [
{
"timestamp": 1459887295.05554,
"state": "TASK_RUNNING",
"container_status": {
"network_infos": [
{
"ip_addresses": [
{
"ip_address": "xxx.xxx.163.205"
}
],
"ip_address": "xxx.xxx.163.205"
}
]
}
}
],
"slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
"id": "1",
"resources": {
"mem": 112640.0,
"disk": 0.0,
"cpus": 30.0
}
}

Going to this slave I can find an executor within the mesos working
directory which matches this framework ID. Reviewing the stdout messaging
within indicates the program has finished its work. But, it is still
holding these resources open.

This framework ID is not shown as Active in the main Mesos Web UI, but does
show up if you display the Slave's web UI.

The resources consumed count towards the Idle pool, and have resulted in
zero available resources for other Offers.



Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

On Thu, Apr 7, 2016 at 9:46 AM, haosdent  wrote:

> > pyspark executors hanging around and consuming resources marked as Idle
> in mesos Web UI
>
> Do you have some logs about this?
>
> >is there an API call I can make to kill these orphans?
>
> As I know, mesos agent would try to clean orphan containers when restart.
> But I not sure the orphan I mean here is same with yours.
>
> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor  wrote:
>
>> Greetings mesos users!
>>
>> I am debugging an issue with pyspark executors hanging around and
>> consuming resources marked as Idle in mesos Web UI. These tasks also show
>> up in the orphaned_tasks key in `mesos state`.
>>
>> I'm first wondering how to clear them out - is there an API call I can
>> make to kill these orphans? Secondly, how it happened at all.
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: orphaned_tasks cleanup and prevention method

2016-04-07 Thread haosdent

> pyspark executors hanging around and consuming resources marked as Idle
in mesos Web UI

Do you have some logs about this?

>is there an API call I can make to kill these orphans?

As I know, mesos agent would try to clean orphan containers when restart.
But I not sure the orphan I mean here is same with yours.

On Thu, Apr 7, 2016 at 10:21 PM, June Taylor  wrote:

> Greetings mesos users!
>
> I am debugging an issue with pyspark executors hanging around and
> consuming resources marked as Idle in mesos Web UI. These tasks also show
> up in the orphaned_tasks key in `mesos state`.
>
> I'm first wondering how to clear them out - is there an API call I can
> make to kill these orphans? Secondly, how it happened at all.
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>



-- 
Best Regards,
Haosdent Huang

Re: Mesos 0.28 SSL in official packages

2016-04-07 Thread haosdent

Hi, ssl didn't enable default. You need compile it by following this doc
http://mesos.apache.org/documentation/latest/ssl/

On Thu, Apr 7, 2016 at 10:04 PM, Kamil Wokitajtis 
wrote:

> This is my first post, so Hi everyone!
>
> Is SSL enabled in official packages (CentOS in my case)?
> I can see libssl in ldd output, but I cannot see libevent.
> I had to compile mesos from sources to run it over ssl.
> I would prefer to install it from packages.
>
> Regards,
> Kamil
>

-- 
Best Regards,
Haosdent Huang

orphaned_tasks cleanup and prevention method

2016-04-07 Thread June Taylor

Greetings mesos users!

I am debugging an issue with pyspark executors hanging around and consuming
resources marked as Idle in mesos Web UI. These tasks also show up in the
orphaned_tasks key in `mesos state`.

I'm first wondering how to clear them out - is there an API call I can make
to kill these orphans? Secondly, how it happened at all.

Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

Mesos 0.28 SSL in official packages

2016-04-07 Thread Kamil Wokitajtis

This is my first post, so Hi everyone!

Is SSL enabled in official packages (CentOS in my case)?
I can see libssl in ldd output, but I cannot see libevent.
I had to compile mesos from sources to run it over ssl.
I would prefer to install it from packages.

Regards,
Kamil

Re: [VOTE] Release Apache Mesos 0.26.1 (rc4)

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Re: [VOTE] Release Apache Mesos 0.25.1 (rc4)

Re: [VOTE] Release Apache Mesos 0.24.2 (rc5)

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Das

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Re: orphaned_tasks cleanup and prevention method

Re: Mesos 0.28 SSL in official packages

orphaned_tasks cleanup and prevention method

Mesos 0.28 SSL in official packages

17 matches

Site Navigation

Mail list logo

Footer information