Re: Mesos 0.28 SSL in official packages

2016-04-11 Thread Zameer Manji
I have suggested this before and I will suggest it again here.

I think the Apache Mesos project should build and distribute packages
instead of relying on the generosity of a commercial vendor. The Apache
Aurora project does this already with good success. As a user of Apache
Mesos I don't care about Mesosphere Inc and I feel uncomfortable that the
project is so dependent on its employees.

Doing this would allow users to contribute packaging fixes directly to the
project, such as enabling SSL.

On Mon, Apr 11, 2016 at 3:02 AM, Adam Bordelon  wrote:

> Hi Kamil,
>
> Technically, there are no "official" Apache-built packages for Apache
> Mesos.
>
> At least once company (Mesosphere) chooses to build and distribute
> Mesos packages, but does not currently offer SSL builds. It wouldn't
> be hard to add an SSL build to our regular builds, but it hasn't been
> requested enough to prioritize it.
>
> cc: Joris, Kapil
>
> On Thu, Apr 7, 2016 at 7:42 AM, haosdent  wrote:
> > Hi, ssl didn't enable default. You need compile it by following this doc
> > http://mesos.apache.org/documentation/latest/ssl/
> >
> > On Thu, Apr 7, 2016 at 10:04 PM, Kamil Wokitajtis 
> > wrote:
> >>
> >> This is my first post, so Hi everyone!
> >>
> >> Is SSL enabled in official packages (CentOS in my case)?
> >> I can see libssl in ldd output, but I cannot see libevent.
> >> I had to compile mesos from sources to run it over ssl.
> >> I would prefer to install it from packages.
> >>
> >> Regards,
> >> Kamil
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>
> --
> Zameer Manji
>
>


Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Justin Ryan
I’m now using /var/mesos as my work_dir, and don’t have any logs from when 
they’ve gone missing because getting them to start hasn’t happened for so long. 
:/

From: Greg Mann >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:46 PM
To: user >
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

Hi Justin,
Do you have master/agent logs from a time when these tasks would have gone 
missing from the Mesos UI?

What location are you using for the work_dir on the agents?

Cheers,
Greg


On Mon, Apr 11, 2016 at 1:41 PM, Justin Ryan 
> wrote:
Update : I noticed one of the clusters had a framework registered before I 
cleared ZK, but is now seeing the same failure at scheduler start.

When the brokers do launch, in recent times, they disappear from mesos within a 
day, although they keep running.  I have another thread on this list talking 
about that which it’s unclear if is directly related – I had the same happen to 
flume launched by marathon.

From: Justin Ryan >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:35 PM

To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

I have tried it without the &&, and i can ‘broker stop’, then ‘broker start’ 
with no change, though I’ll make sure on my next zk clear to be sure to try 
without the &&

I am, indeed, not seeing the framework at all, and when this happens, the last 
line of output running the scheduler is:

  I0411 13:34:37.174973 14368 sched.cpp:336] No credentials provided. 
Attempting to register without authentication

When it works, the next step is basically, ‘registered framework 
--XXX-XXX'

From: Kevin Lu >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:33 PM
To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

Can you try it step-by-step without the "&&"?

Also, IIRC, mesos creates separate tasks for the kafka framework and the 
broker. Are you not even seeing the framework in the mesos UI?

On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan 
> wrote:
Hi, folks!

In pursuit of a mesos-based solution for distributed logging and processing, 
I’ve been experimenting with the mesos/kafka project 
(github.com/mesos/kafka).  I had great success 
for some weeks during initial testing, and am now having trouble getting 
brokers to launch at all.

This code has been adopted by the mesos project, but also as far as I can tell 
the meat of it relies on functionality from org.apache.mesos by implementing 
the Scheduler interface.

Let’s say, for instance, I run:

  ./kafka-mesos.sh broker add 0..2 --options 
log.retention.hours=1,log.retention.bytes.per.topic=1073741824 && 
./kafka-mesos.sh broker start 0..2

The broker start simply times out and status never changes.  I’ve cleared ZK a 
number of times, which is the way I’ve been advised to get out of wierd mesos 
states in the past.  The mesos UI never shows an job in STARTING or other 
state, or a failed / ended job.

Any idea what I might be running into? This was working consistently for weeks 
on end and recently stopped working altogether about 95% of the time.  When it 
works, it only sporadically works.

TIA,

JR


P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising in any way.




Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Greg Mann
Hi Justin,
Do you have master/agent logs from a time when these tasks would have gone
missing from the Mesos UI?

What location are you using for the work_dir on the agents?

Cheers,
Greg


On Mon, Apr 11, 2016 at 1:41 PM, Justin Ryan  wrote:

> Update : I noticed one of the clusters had a framework registered before I
> cleared ZK, but is now seeing the same failure at scheduler start.
>
> When the brokers do launch, in recent times, they disappear from mesos
> within a day, although they keep running.  I have another thread on this
> list talking about that which it’s unclear if is directly related – I had
> the same happen to flume launched by marathon.
>
> From: Justin Ryan 
> Reply-To: "user@mesos.apache.org" 
> Date: Monday, April 11, 2016 at 1:35 PM
>
> To: "user@mesos.apache.org" 
> Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)
>
> I have tried it without the &&, and i can ‘broker stop’, then ‘broker
> start’ with no change, though I’ll make sure on my next zk clear to be sure
> to try without the &&
>
> I am, indeed, not seeing the framework at all, and when this happens, the
> last line of output running the scheduler is:
>
>   I0411 13:34:37.174973 14368 sched.cpp:336] No credentials provided.
> Attempting to register without authentication
>
> When it works, the next step is basically, ‘registered framework
> --XXX-XXX'
>
> From: Kevin Lu 
> Reply-To: "user@mesos.apache.org" 
> Date: Monday, April 11, 2016 at 1:33 PM
> To: "user@mesos.apache.org" 
> Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)
>
> Can you try it step-by-step without the "&&"?
>
> Also, IIRC, mesos creates separate tasks for the kafka framework and the
> broker. Are you not even seeing the framework in the mesos UI?
>
> On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan  wrote:
>
>> Hi, folks!
>>
>> In pursuit of a mesos-based solution for distributed logging and
>> processing, I’ve been experimenting with the mesos/kafka project (
>> github.com/mesos/kafka).  I had great success for some weeks during
>> initial testing, and am now having trouble getting brokers to launch at all.
>>
>> This code has been adopted by the mesos project, but also as far as I can
>> tell the meat of it relies on functionality from org.apache.mesos by
>> implementing the Scheduler interface.
>>
>> Let’s say, for instance, I run:
>>
>>   ./kafka-mesos.sh broker add 0..2 --options
>> log.retention.hours=1,log.retention.bytes.per.topic=1073741824 &&
>> ./kafka-mesos.sh broker start 0..2
>>
>> The broker start simply times out and status never changes.  I’ve cleared
>> ZK a number of times, which is the way I’ve been advised to get out of
>> wierd mesos states in the past.  The mesos UI never shows an job in
>> STARTING or other state, or a failed / ended job.
>>
>> Any idea what I might be running into? This was working consistently for
>> weeks on end and recently stopped working altogether about 95% of the
>> time.  When it works, it only sporadically works.
>>
>> TIA,
>>
>> JR
>> --
>>
>> P Please consider the environment before printing this e-mail
>> The information in this electronic mail message is the sender's
>> confidential business and may be legally privileged. It is intended solely
>> for the addressee(s). Access to this internet electronic mail message by
>> anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it is prohibited and may be unlawful. The sender
>> believes that this E-mail and any attachments were free of any virus, worm,
>> Trojan horse, and/or malicious code when sent. This message and its
>> attachments could have been infected during transmission. By reading the
>> message and opening any attachments, the recipient accepts full
>> responsibility for taking protective and remedial action about viruses and
>> other defects. The sender's employer is not liable for any loss or damage
>> arising in any way.
>>
>
>


Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Justin Ryan
I have 3 hosts running zookeeper, mesos masters, and marathon, an HDFS 
namenode, and 10 worker nodes running mesos-slave and HDFS datanodes.

I don’t remember having set LIBPROCESS_IP in the past, maybe it’s part of some 
slightly newer code, so I went ahead and did this and verified 
MESOS_NATIVE_JAVA_LIBRARY as well, no change.

Shouldn’t be any firewall rules, and like I said, at least one of these 
clusters I built with chef a couple of months back.  I had some concern that 
when I launched the production cluster, I may have inadvertently copied some 
config related to the testing environment (e.g. wrong zk hosts), but I’ve 
re-verified this all a number of times, and see no indication that hosts from 
one are talking to the other.

From: Kevin Lu >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:40 PM
To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

What's your master/slave setup? What are your ports/firewall rules? In the 
past, when weird situations like these happen to me, it's usually because of 
some firewall rule, and at that point I'll ssh into the machine where my 
framework is running and see what ports it's trying to talk to via netstat.

I'm sure you've done this as well, but the github docs do say to set the 
LIBPROCESS_IP environment variable, if you haven't done so.

On Mon, Apr 11, 2016 at 1:35 PM, Justin Ryan 
> wrote:
I have tried it without the &&, and i can ‘broker stop’, then ‘broker start’ 
with no change, though I’ll make sure on my next zk clear to be sure to try 
without the &&

I am, indeed, not seeing the framework at all, and when this happens, the last 
line of output running the scheduler is:

  I0411 13:34:37.174973 14368 sched.cpp:336] No credentials provided. 
Attempting to register without authentication

When it works, the next step is basically, ‘registered framework 
--XXX-XXX'

From: Kevin Lu >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:33 PM
To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

Can you try it step-by-step without the "&&"?

Also, IIRC, mesos creates separate tasks for the kafka framework and the 
broker. Are you not even seeing the framework in the mesos UI?

On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan 
> wrote:
Hi, folks!

In pursuit of a mesos-based solution for distributed logging and processing, 
I’ve been experimenting with the mesos/kafka project 
(github.com/mesos/kafka).  I had great success 
for some weeks during initial testing, and am now having trouble getting 
brokers to launch at all.

This code has been adopted by the mesos project, but also as far as I can tell 
the meat of it relies on functionality from org.apache.mesos by implementing 
the Scheduler interface.

Let’s say, for instance, I run:

  ./kafka-mesos.sh broker add 0..2 --options 
log.retention.hours=1,log.retention.bytes.per.topic=1073741824 && 
./kafka-mesos.sh broker start 0..2

The broker start simply times out and status never changes.  I’ve cleared ZK a 
number of times, which is the way I’ve been advised to get out of wierd mesos 
states in the past.  The mesos UI never shows an job in STARTING or other 
state, or a failed / ended job.

Any idea what I might be running into? This was working consistently for weeks 
on end and recently stopped working altogether about 95% of the time.  When it 
works, it only sporadically works.

TIA,

JR


P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising 

Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Justin Ryan
Update : I noticed one of the clusters had a framework registered before I 
cleared ZK, but is now seeing the same failure at scheduler start.

When the brokers do launch, in recent times, they disappear from mesos within a 
day, although they keep running.  I have another thread on this list talking 
about that which it’s unclear if is directly related – I had the same happen to 
flume launched by marathon.

From: Justin Ryan >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:35 PM
To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

I have tried it without the &&, and i can ‘broker stop’, then ‘broker start’ 
with no change, though I’ll make sure on my next zk clear to be sure to try 
without the &&

I am, indeed, not seeing the framework at all, and when this happens, the last 
line of output running the scheduler is:

  I0411 13:34:37.174973 14368 sched.cpp:336] No credentials provided. 
Attempting to register without authentication

When it works, the next step is basically, ‘registered framework 
--XXX-XXX'

From: Kevin Lu >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:33 PM
To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

Can you try it step-by-step without the "&&"?

Also, IIRC, mesos creates separate tasks for the kafka framework and the 
broker. Are you not even seeing the framework in the mesos UI?

On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan 
> wrote:
Hi, folks!

In pursuit of a mesos-based solution for distributed logging and processing, 
I’ve been experimenting with the mesos/kafka project 
(github.com/mesos/kafka).  I had great success 
for some weeks during initial testing, and am now having trouble getting 
brokers to launch at all.

This code has been adopted by the mesos project, but also as far as I can tell 
the meat of it relies on functionality from org.apache.mesos by implementing 
the Scheduler interface.

Let’s say, for instance, I run:

  ./kafka-mesos.sh broker add 0..2 --options 
log.retention.hours=1,log.retention.bytes.per.topic=1073741824 && 
./kafka-mesos.sh broker start 0..2

The broker start simply times out and status never changes.  I’ve cleared ZK a 
number of times, which is the way I’ve been advised to get out of wierd mesos 
states in the past.  The mesos UI never shows an job in STARTING or other 
state, or a failed / ended job.

Any idea what I might be running into? This was working consistently for weeks 
on end and recently stopped working altogether about 95% of the time.  When it 
works, it only sporadically works.

TIA,

JR


P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising in any way.



Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Kevin Lu
What's your master/slave setup? What are your ports/firewall rules? In the
past, when weird situations like these happen to me, it's usually because
of some firewall rule, and at that point I'll ssh into the machine where my
framework is running and see what ports it's trying to talk to via netstat.

I'm sure you've done this as well, but the github docs do say to set the
LIBPROCESS_IP environment variable, if you haven't done so.

On Mon, Apr 11, 2016 at 1:35 PM, Justin Ryan  wrote:

> I have tried it without the &&, and i can ‘broker stop’, then ‘broker
> start’ with no change, though I’ll make sure on my next zk clear to be sure
> to try without the &&
>
> I am, indeed, not seeing the framework at all, and when this happens, the
> last line of output running the scheduler is:
>
>   I0411 13:34:37.174973 14368 sched.cpp:336] No credentials provided.
> Attempting to register without authentication
>
> When it works, the next step is basically, ‘registered framework
> --XXX-XXX'
>
> From: Kevin Lu 
> Reply-To: "user@mesos.apache.org" 
> Date: Monday, April 11, 2016 at 1:33 PM
> To: "user@mesos.apache.org" 
> Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)
>
> Can you try it step-by-step without the "&&"?
>
> Also, IIRC, mesos creates separate tasks for the kafka framework and the
> broker. Are you not even seeing the framework in the mesos UI?
>
> On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan  wrote:
>
>> Hi, folks!
>>
>> In pursuit of a mesos-based solution for distributed logging and
>> processing, I’ve been experimenting with the mesos/kafka project (
>> github.com/mesos/kafka).  I had great success for some weeks during
>> initial testing, and am now having trouble getting brokers to launch at all.
>>
>> This code has been adopted by the mesos project, but also as far as I can
>> tell the meat of it relies on functionality from org.apache.mesos by
>> implementing the Scheduler interface.
>>
>> Let’s say, for instance, I run:
>>
>>   ./kafka-mesos.sh broker add 0..2 --options
>> log.retention.hours=1,log.retention.bytes.per.topic=1073741824 &&
>> ./kafka-mesos.sh broker start 0..2
>>
>> The broker start simply times out and status never changes.  I’ve cleared
>> ZK a number of times, which is the way I’ve been advised to get out of
>> wierd mesos states in the past.  The mesos UI never shows an job in
>> STARTING or other state, or a failed / ended job.
>>
>> Any idea what I might be running into? This was working consistently for
>> weeks on end and recently stopped working altogether about 95% of the
>> time.  When it works, it only sporadically works.
>>
>> TIA,
>>
>> JR
>> --
>>
>> P Please consider the environment before printing this e-mail
>> The information in this electronic mail message is the sender's
>> confidential business and may be legally privileged. It is intended solely
>> for the addressee(s). Access to this internet electronic mail message by
>> anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it is prohibited and may be unlawful. The sender
>> believes that this E-mail and any attachments were free of any virus, worm,
>> Trojan horse, and/or malicious code when sent. This message and its
>> attachments could have been infected during transmission. By reading the
>> message and opening any attachments, the recipient accepts full
>> responsibility for taking protective and remedial action about viruses and
>> other defects. The sender's employer is not liable for any loss or damage
>> arising in any way.
>>
>
>


Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Justin Ryan
I have tried it without the &&, and i can ‘broker stop’, then ‘broker start’ 
with no change, though I’ll make sure on my next zk clear to be sure to try 
without the &&

I am, indeed, not seeing the framework at all, and when this happens, the last 
line of output running the scheduler is:

  I0411 13:34:37.174973 14368 sched.cpp:336] No credentials provided. 
Attempting to register without authentication

When it works, the next step is basically, ‘registered framework 
--XXX-XXX'

From: Kevin Lu >
Reply-To: "user@mesos.apache.org" 
>
Date: Monday, April 11, 2016 at 1:33 PM
To: "user@mesos.apache.org" 
>
Subject: Re: mesos/kafka issues (org.apache.mesos.Scheduler)

Can you try it step-by-step without the "&&"?

Also, IIRC, mesos creates separate tasks for the kafka framework and the 
broker. Are you not even seeing the framework in the mesos UI?

On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan 
> wrote:
Hi, folks!

In pursuit of a mesos-based solution for distributed logging and processing, 
I’ve been experimenting with the mesos/kafka project 
(github.com/mesos/kafka).  I had great success 
for some weeks during initial testing, and am now having trouble getting 
brokers to launch at all.

This code has been adopted by the mesos project, but also as far as I can tell 
the meat of it relies on functionality from org.apache.mesos by implementing 
the Scheduler interface.

Let’s say, for instance, I run:

  ./kafka-mesos.sh broker add 0..2 --options 
log.retention.hours=1,log.retention.bytes.per.topic=1073741824 && 
./kafka-mesos.sh broker start 0..2

The broker start simply times out and status never changes.  I’ve cleared ZK a 
number of times, which is the way I’ve been advised to get out of wierd mesos 
states in the past.  The mesos UI never shows an job in STARTING or other 
state, or a failed / ended job.

Any idea what I might be running into? This was working consistently for weeks 
on end and recently stopped working altogether about 95% of the time.  When it 
works, it only sporadically works.

TIA,

JR


P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising in any way.



Re: mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Kevin Lu
Can you try it step-by-step without the "&&"?

Also, IIRC, mesos creates separate tasks for the kafka framework and the
broker. Are you not even seeing the framework in the mesos UI?

On Mon, Apr 11, 2016 at 1:29 PM, Justin Ryan  wrote:

> Hi, folks!
>
> In pursuit of a mesos-based solution for distributed logging and
> processing, I’ve been experimenting with the mesos/kafka project (
> github.com/mesos/kafka).  I had great success for some weeks during
> initial testing, and am now having trouble getting brokers to launch at all.
>
> This code has been adopted by the mesos project, but also as far as I can
> tell the meat of it relies on functionality from org.apache.mesos by
> implementing the Scheduler interface.
>
> Let’s say, for instance, I run:
>
>   ./kafka-mesos.sh broker add 0..2 --options
> log.retention.hours=1,log.retention.bytes.per.topic=1073741824 &&
> ./kafka-mesos.sh broker start 0..2
>
> The broker start simply times out and status never changes.  I’ve cleared
> ZK a number of times, which is the way I’ve been advised to get out of
> wierd mesos states in the past.  The mesos UI never shows an job in
> STARTING or other state, or a failed / ended job.
>
> Any idea what I might be running into? This was working consistently for
> weeks on end and recently stopped working altogether about 95% of the
> time.  When it works, it only sporadically works.
>
> TIA,
>
> JR
> --
>
> P Please consider the environment before printing this e-mail
> The information in this electronic mail message is the sender's
> confidential business and may be legally privileged. It is intended solely
> for the addressee(s). Access to this internet electronic mail message by
> anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it is prohibited and may be unlawful. The sender
> believes that this E-mail and any attachments were free of any virus, worm,
> Trojan horse, and/or malicious code when sent. This message and its
> attachments could have been infected during transmission. By reading the
> message and opening any attachments, the recipient accepts full
> responsibility for taking protective and remedial action about viruses and
> other defects. The sender's employer is not liable for any loss or damage
> arising in any way.
>


mesos/kafka issues (org.apache.mesos.Scheduler)

2016-04-11 Thread Justin Ryan
Hi, folks!

In pursuit of a mesos-based solution for distributed logging and processing, 
I’ve been experimenting with the mesos/kafka project (github.com/mesos/kafka).  
I had great success for some weeks during initial testing, and am now having 
trouble getting brokers to launch at all.

This code has been adopted by the mesos project, but also as far as I can tell 
the meat of it relies on functionality from org.apache.mesos by implementing 
the Scheduler interface.

Let’s say, for instance, I run:

  ./kafka-mesos.sh broker add 0..2 --options 
log.retention.hours=1,log.retention.bytes.per.topic=1073741824 && 
./kafka-mesos.sh broker start 0..2

The broker start simply times out and status never changes.  I’ve cleared ZK a 
number of times, which is the way I’ve been advised to get out of wierd mesos 
states in the past.  The mesos UI never shows an job in STARTING or other 
state, or a failed / ended job.

Any idea what I might be running into? This was working consistently for weeks 
on end and recently stopped working altogether about 95% of the time.  When it 
works, it only sporadically works.

TIA,

JR


P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising in any way.


Re: [VOTE] Release Apache Mesos 0.28.1 (rc2)

2016-04-11 Thread Kapil Arya
+1 (binding)

CI runs with: amd64/centos/6 amd64/centos/7 amd64/debian/jessie
amd64/ubuntu/precise amd64/ubuntu/trusty amd64/ubuntu/vivid
amd64/ubuntu/wily

On Wed, Apr 6, 2016 at 11:51 PM, Vinod Kone  wrote:

> +1 (binding)
>
> Tested on ASF CI. There was one flaky test that's new:
> https://issues.apache.org/jira/browse/MESOS-5139
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/13/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> [image:
> Not run]
> --verbose [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/13/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> [image:
> Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/13/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> [image:
> Failed]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/13/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> --verbose [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/13/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> [image:
> Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/13/COMPILER=clang,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
>
> On Wed, Apr 6, 2016 at 7:34 PM, Michael Park  wrote:
>
> > +1 (binding)
> >
> > Internal CI results with the corresponding JIRA tickets for the failed
> > tests:
> >
> > CentOS 6 (non-SSL):
> > CentOS 6 (SSL):
> >   - MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> > (MESOS-4047 ,
> > MESOS-4053 )
> >
> > CentOS 7 (non-SSL):
> >   - HealthCheckTest.ROOT_DOCKER_DockerHealthyTask
> > (MESOS-4604 )
> >
> > CentOS 7 (SSL):
> >   -
> >
> >
> LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes
> > (Segfault during test teardown, likely addressed in 0.29.0 by
> > MESOS-4633
> >  and MESOS-4634
> > )
> >
> > Debian 8 (non-SSL):
> > Debian 8 (SSL):
> >   - HealthCheckTest.ROOT_DOCKER_DockerHealthyTask
> > (MESOS-4604 )
> >
> > Ubuntu 12 (non-SSL): Success!
> > Ubuntu 12 (SSL):Success!
> > Ubuntu 14 (non-SSL): Success!
> > Ubuntu 14 (SSL):Success!
> > Ubuntu 15 (non-SSL): Success!
> > Ubuntu 15 (SSL):Success!
> >
> > On 5 April 2016 at 22:30, Greg Mann  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Ran `sudo make check` on CentOS 7 with libevent and SSL enabled; all
> > tests
> > > pass.
> > >
> > > I was also able to successfully simulate a simple upgrade scenario
> using
> > > 'test-upgrade.py'. Note that this initially failed due to some changes
> > made
> > > to the test framework in this release, but after applying this patch
> > >  the upgrade script succeeds.
> While
> > > ideally this patch for the upgrade script would be included in the
> > release,
> > > I don't consider it to be a blocker. If we end up cutting another RC,
> it
> > > would be great to include.
> > >
> > > Cheers,
> > > Greg
> > >
> > >
> > > On Tue, Apr 5, 2016 at 6:30 PM, Jie Yu  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Please vote on releasing the following candidate as Apache Mesos
> > 0.28.1.
> > > >
> > > >
> > > > 0.28.1 includes the following bug fixes:
> > > >
> > > >
> > >
> >
> 
> > > >
> > > > [MESOS-4662] - PortMapping network isolator should not assume
> > > > BIND_MOUNT_ROOT is a realpath.
> > > > [MESOS-4874] - overlayfs does not work with kernel 4.2.3
> > > > [MESOS-4877] - Mesos containerizer can't handle top level docker
> image
> > > > like "alpine" (must use "library/alpine")
> > > > [MESOS-4878] - Task stuck in TASK_STAGING when docker fetcher failed
> to
> > > > fetch the image
> > > > [MESOS-4964] - curl based docker fetcher 

Re: SharedFilesystemIsolator (filesystem/shared)

2016-04-11 Thread Jie Yu
Hi Stephan,

Last time I asked, looks like you're the only one that are using
filesystem/shared isolator. Have you switched to filesystem/linux isolator?
Please let us know if you run into any issue when switching.

We plan to retire filesystem/shared isolator in the next Mesos release.
I'll send out an announcement shortly.

- Jie

On Mon, Apr 11, 2016 at 5:34 AM, Erb, Stephan 
wrote:

> Given that the "filesystem/linux" isolator has landed, is it now
> considered to be a drop-in replacement for the "filesystem/shared"
> isolator?
> --
> *From:* Erb, Stephan 
> *Sent:* Wednesday, July 29, 2015 21:08
> *To:* user@mesos.apache.org
> *Subject:* Re: SharedFilesystemIsolator (filesystem/shared)
>
>
> We are using the isolator in order to re-route writes to /tmp to a path
> inside the container sandbox:
>
>
> --default_container_info='{
> "type": "MESOS",
> "volumes": [
> {"host_path": "system/tmp", "container_path": "/tmp",
>"mode": "RW"},
> {"host_path": "system/vartmp",  "container_path":
> "/var/tmp","mode": "RW"}
> ]
> }
>
> ​
>
> We are running on linux, so I guess the new one will work for us just as
> well?
>
>
> Thanks,
>
> Stephan
> --
> *From:* Jie Yu 
> *Sent:* Wednesday, July 29, 2015 2:40 AM
> *To:* user@mesos.apache.org
> *Subject:* SharedFilesystemIsolator (filesystem/shared)
>
> Hi Mesos users,
>
> I am wondering if anyone is using this isolator (i.e.,
> --isolation=filesystem/shared)? If not, we plan to remove it from the
> source code in favor of using the upcoming more general linux filesystem
> isolator (https://reviews.apache.org/r/36429/).
>
> - Jie
>


Re: orphaned_tasks cleanup and prevention method

2016-04-11 Thread June Taylor
While I was waiting for more info the app finally did start up. I am trying
to figure out why it took so long.


Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

On Mon, Apr 11, 2016 at 9:50 AM, haosdent  wrote:

> Could you find marathon in 
> http://${YOUR_MASTER_IP}:${YOUR_MASTER_PORT}/#/frameworks
> page? And
>
> >While deploying I am looking at mesos-master.WARNING, mesos-master.INFO
> and mesos-master.ERROR log files, but I never see anything show up that
> would indicate a problem, or even an attempt.
>
> When you create a new task in marathon, could you see any related logs in
> mesos master?
>
>
> On Mon, Apr 11, 2016 at 10:11 PM, June Taylor  wrote:
>
>> Hello again. I am not sure this has been resolved yet, because I am still
>> unable to get Marathon deployments to start.
>>
>> I have deleted the /marathon/ node from Zookeeper, and I now have the
>> Marathon WebUI accessible again. I try to add a new task to deploy, and
>> there seem to be available resources, but it is still stuck in a 'Waiting'
>> status.
>>
>> While deploying I am looking at mesos-master.WARNING, mesos-master.INFO
>> and mesos-master.ERROR log files, but I never see anything show up that
>> would indicate a problem, or even an attempt.
>>
>> Where am I going wrong?
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Sat, Apr 9, 2016 at 6:07 AM, Pradeep Chhetri <
>> pradeep.chhetr...@gmail.com> wrote:
>>
>>> Hi Greg & June,
>>>
>>> By looking at the above command, I can say that you are running spark in
>>> client mode because you are invoking the pyspark-shell.
>>>
>>> One simple way to distinguish is that in cluster mode, it's mandatory to
>>> start MesosClusterDispatcher in your mesos cluster which is the spark
>>> framework scheduler.
>>>
>>> As everyone told above, I guess the reason you are observing orphaned
>>> tasks is because the scheduler is getting killed before the tasks getting
>>> finished.
>>>
>>> I would suggest June to run Spark in clustered mode (
>>> http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode)
>>>
>>> Also, as Radek suggested above, run spark in coarse grained (default run
>>> mode) which will save you much of the JVM startup time.
>>>
>>> Keep us informed how it goes.
>>>
>>>
>>> On Sat, Apr 9, 2016 at 12:28 AM, Rad Gruchalski 
>>> wrote:
>>>
 Greg,

 All you need to do is tell Spark that the master is mesos://…, as in
 the example from June.
 It’s all nicely documented here:

 http://spark.apache.org/docs/latest/running-on-mesos.html

 I’d suggest running in coarse mode as fine grained is a bit choppy.

 Best regards,
 Radek Gruchalski
 ra...@gruchalski.com 
 de.linkedin.com/in/radgruchalski/


 *Confidentiality:*This communication is intended for the above-named
 person and may be confidential and/or legally privileged.
 If it has come to you in error you must take no action based on it, nor
 must you copy or show it to anyone; please delete/destroy and inform the
 sender immediately.

 On Saturday, 9 April 2016 at 00:48, Greg Mann wrote:

 Unfortunately I'm not able to glean much from that command, but perhaps
 someone out there with more Spark experience can? I do know that there are
 a couple ways to launch Spark jobs on a cluster: you can run them in client
 mode, where the Spark driver runs locally on your machine and exits when
 it's finished, or they can be run in cluster mode where the Spark driver
 runs persistently on the cluster as a Mesos framework. How exactly are you
 launching these tasks on the Mesos cluster?

 On Fri, Apr 8, 2016 at 5:41 AM, June Taylor  wrote:

 Greg,

 I'm on the ops side and fairly new to spark/mesos, so I'm not quite
 sure I understand your question, here's how the task shows up in a process
 listing:

 /usr/lib/jvm/java-8-oracle/bin/java -cp /path/to/spark/spark-
 installations/spark-1.6.0-bin-hadoop2.6/conf/:/path/to/spark/spark-
 installations/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-
 1.6.0-hadoop2.6.0.jar:/path/to/spark/spark-
 installations/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-
 core-3.2.10.jar:/path/to/spark/spark-installations/spark-1.6.0-bin-
 hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/path/to/spark/spark-
 installations/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar
 -Xms10G -Xmx10G org.apache.spark.deploy.SparkSubmit --master mesos://
 master.ourdomain.com:5050 --conf spark.driver.memory=10G
 --executor-memory 100G --total-executor-cores 90 pyspark-shell


 Thanks,
 June Taylor
 System Administrator, Minnesota Population Center
 University of Minnesota

 On Thu, Apr 7, 

Re: orphaned_tasks cleanup and prevention method

2016-04-11 Thread haosdent
Could you find marathon in
http://${YOUR_MASTER_IP}:${YOUR_MASTER_PORT}/#/frameworks
page? And

>While deploying I am looking at mesos-master.WARNING, mesos-master.INFO
and mesos-master.ERROR log files, but I never see anything show up that
would indicate a problem, or even an attempt.

When you create a new task in marathon, could you see any related logs in
mesos master?


On Mon, Apr 11, 2016 at 10:11 PM, June Taylor  wrote:

> Hello again. I am not sure this has been resolved yet, because I am still
> unable to get Marathon deployments to start.
>
> I have deleted the /marathon/ node from Zookeeper, and I now have the
> Marathon WebUI accessible again. I try to add a new task to deploy, and
> there seem to be available resources, but it is still stuck in a 'Waiting'
> status.
>
> While deploying I am looking at mesos-master.WARNING, mesos-master.INFO
> and mesos-master.ERROR log files, but I never see anything show up that
> would indicate a problem, or even an attempt.
>
> Where am I going wrong?
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Sat, Apr 9, 2016 at 6:07 AM, Pradeep Chhetri <
> pradeep.chhetr...@gmail.com> wrote:
>
>> Hi Greg & June,
>>
>> By looking at the above command, I can say that you are running spark in
>> client mode because you are invoking the pyspark-shell.
>>
>> One simple way to distinguish is that in cluster mode, it's mandatory to
>> start MesosClusterDispatcher in your mesos cluster which is the spark
>> framework scheduler.
>>
>> As everyone told above, I guess the reason you are observing orphaned
>> tasks is because the scheduler is getting killed before the tasks getting
>> finished.
>>
>> I would suggest June to run Spark in clustered mode (
>> http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode)
>>
>> Also, as Radek suggested above, run spark in coarse grained (default run
>> mode) which will save you much of the JVM startup time.
>>
>> Keep us informed how it goes.
>>
>>
>> On Sat, Apr 9, 2016 at 12:28 AM, Rad Gruchalski 
>> wrote:
>>
>>> Greg,
>>>
>>> All you need to do is tell Spark that the master is mesos://…, as in the
>>> example from June.
>>> It’s all nicely documented here:
>>>
>>> http://spark.apache.org/docs/latest/running-on-mesos.html
>>>
>>> I’d suggest running in coarse mode as fine grained is a bit choppy.
>>>
>>> Best regards,
>>> Radek Gruchalski
>>> ra...@gruchalski.com 
>>> de.linkedin.com/in/radgruchalski/
>>>
>>>
>>> *Confidentiality:*This communication is intended for the above-named
>>> person and may be confidential and/or legally privileged.
>>> If it has come to you in error you must take no action based on it, nor
>>> must you copy or show it to anyone; please delete/destroy and inform the
>>> sender immediately.
>>>
>>> On Saturday, 9 April 2016 at 00:48, Greg Mann wrote:
>>>
>>> Unfortunately I'm not able to glean much from that command, but perhaps
>>> someone out there with more Spark experience can? I do know that there are
>>> a couple ways to launch Spark jobs on a cluster: you can run them in client
>>> mode, where the Spark driver runs locally on your machine and exits when
>>> it's finished, or they can be run in cluster mode where the Spark driver
>>> runs persistently on the cluster as a Mesos framework. How exactly are you
>>> launching these tasks on the Mesos cluster?
>>>
>>> On Fri, Apr 8, 2016 at 5:41 AM, June Taylor  wrote:
>>>
>>> Greg,
>>>
>>> I'm on the ops side and fairly new to spark/mesos, so I'm not quite sure
>>> I understand your question, here's how the task shows up in a process
>>> listing:
>>>
>>> /usr/lib/jvm/java-8-oracle/bin/java -cp /path/to/spark/spark-
>>> installations/spark-1.6.0-bin-hadoop2.6/conf/:/path/to/spark/spark-
>>> installations/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-
>>> 1.6.0-hadoop2.6.0.jar:/path/to/spark/spark-
>>> installations/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-
>>> core-3.2.10.jar:/path/to/spark/spark-installations/spark-1.6.0-bin-
>>> hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/path/to/spark/spark-
>>> installations/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar
>>> -Xms10G -Xmx10G org.apache.spark.deploy.SparkSubmit --master mesos://
>>> master.ourdomain.com:5050 --conf spark.driver.memory=10G
>>> --executor-memory 100G --total-executor-cores 90 pyspark-shell
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Thu, Apr 7, 2016 at 3:37 PM, Greg Mann  wrote:
>>>
>>> Hi June,
>>> Are these Spark tasks being run in cluster mode or client mode? If it's
>>> client mode, then perhaps your local Spark scheduler is tearing itself down
>>> before the executors exit, thus leaving them orphaned.
>>>
>>> I'd love to see master/agent logs during the time that the tasks are
>>> becoming orphaned if you have them 

Re: orphaned_tasks cleanup and prevention method

2016-04-11 Thread June Taylor
Hello again. I am not sure this has been resolved yet, because I am still
unable to get Marathon deployments to start.

I have deleted the /marathon/ node from Zookeeper, and I now have the
Marathon WebUI accessible again. I try to add a new task to deploy, and
there seem to be available resources, but it is still stuck in a 'Waiting'
status.

While deploying I am looking at mesos-master.WARNING, mesos-master.INFO and
mesos-master.ERROR log files, but I never see anything show up that would
indicate a problem, or even an attempt.

Where am I going wrong?


Thanks,
June Taylor
System Administrator, Minnesota Population Center
University of Minnesota

On Sat, Apr 9, 2016 at 6:07 AM, Pradeep Chhetri  wrote:

> Hi Greg & June,
>
> By looking at the above command, I can say that you are running spark in
> client mode because you are invoking the pyspark-shell.
>
> One simple way to distinguish is that in cluster mode, it's mandatory to
> start MesosClusterDispatcher in your mesos cluster which is the spark
> framework scheduler.
>
> As everyone told above, I guess the reason you are observing orphaned
> tasks is because the scheduler is getting killed before the tasks getting
> finished.
>
> I would suggest June to run Spark in clustered mode (
> http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode)
>
> Also, as Radek suggested above, run spark in coarse grained (default run
> mode) which will save you much of the JVM startup time.
>
> Keep us informed how it goes.
>
>
> On Sat, Apr 9, 2016 at 12:28 AM, Rad Gruchalski 
> wrote:
>
>> Greg,
>>
>> All you need to do is tell Spark that the master is mesos://…, as in the
>> example from June.
>> It’s all nicely documented here:
>>
>> http://spark.apache.org/docs/latest/running-on-mesos.html
>>
>> I’d suggest running in coarse mode as fine grained is a bit choppy.
>>
>> Best regards,
>> Radek Gruchalski
>> ra...@gruchalski.com 
>> de.linkedin.com/in/radgruchalski/
>>
>>
>> *Confidentiality:*This communication is intended for the above-named
>> person and may be confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>> On Saturday, 9 April 2016 at 00:48, Greg Mann wrote:
>>
>> Unfortunately I'm not able to glean much from that command, but perhaps
>> someone out there with more Spark experience can? I do know that there are
>> a couple ways to launch Spark jobs on a cluster: you can run them in client
>> mode, where the Spark driver runs locally on your machine and exits when
>> it's finished, or they can be run in cluster mode where the Spark driver
>> runs persistently on the cluster as a Mesos framework. How exactly are you
>> launching these tasks on the Mesos cluster?
>>
>> On Fri, Apr 8, 2016 at 5:41 AM, June Taylor  wrote:
>>
>> Greg,
>>
>> I'm on the ops side and fairly new to spark/mesos, so I'm not quite sure
>> I understand your question, here's how the task shows up in a process
>> listing:
>>
>> /usr/lib/jvm/java-8-oracle/bin/java -cp /path/to/spark/spark-
>> installations/spark-1.6.0-bin-hadoop2.6/conf/:/path/to/spark/spark-
>> installations/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-
>> 1.6.0-hadoop2.6.0.jar:/path/to/spark/spark-installations/spark-1.6.0-bin-
>> hadoop2.6/lib/datanucleus-core-3.2.10.jar:/path/to/spark/spark-
>> installations/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-
>> rdbms-3.2.9.jar:/path/to/spark/spark-installations/spark-1.6.0-bin-
>> hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar -Xms10G -Xmx10G
>> org.apache.spark.deploy.SparkSubmit --master mesos://master.ourdomain.com
>> :5050 --conf spark.driver.memory=10G --executor-memory 100G
>> --total-executor-cores 90 pyspark-shell
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Thu, Apr 7, 2016 at 3:37 PM, Greg Mann  wrote:
>>
>> Hi June,
>> Are these Spark tasks being run in cluster mode or client mode? If it's
>> client mode, then perhaps your local Spark scheduler is tearing itself down
>> before the executors exit, thus leaving them orphaned.
>>
>> I'd love to see master/agent logs during the time that the tasks are
>> becoming orphaned if you have them available.
>>
>> Cheers,
>> Greg
>>
>>
>> On Thu, Apr 7, 2016 at 1:08 PM, June Taylor  wrote:
>>
>> Just a quick update... I was only able to get the orphans cleared by
>> stopping mesos-slave, deleting the contents of the scratch directory, and
>> then restarting mesos-slave.
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Thu, Apr 7, 2016 at 12:01 PM, Vinod Kone  wrote:
>>
>> A task/executor is called "orphaned" if the corresponding scheduler
>> doesn't register with 

Re: Backup a Mesos Cluster

2016-04-11 Thread haosdent
Hi, @Paul. Mesos support recovery well when your server crash. For Mesos
Master, I suggest to set up multiple masters with zookeeper, so that the
Mesos cluster would not be affected after some of Mesos Master down. For
Mesos Agent, it would recover tasks information after restart.

About backup, I am not clear about your idea here. Do you mean backup to
snapshots in every interval and recover to any timepoint? If you mean this,
Mesos doesn't support this yet.

On Mon, Apr 11, 2016 at 8:31 PM, Paul Bell  wrote:

> Piotr,
>
> Thank you for this link. I am looking at it now where I right away notice
> that Exhibitor is designed to monitor (and backup) Zookeeper (but not
> anything related to Mesos itself). Don't the Mesos master & agent nodes
> keep at least some state outside of the ZK znodes, e.g., under the default
> workdir?
>
> Shua,
>
> Thank you for this observation. Happily (I think), we do not have a custom
> framework. Presently, Marathon is the only framework that we use.
>
> -Paul
>
> On Mon, Apr 11, 2016 at 8:12 AM, Shuai Lin  wrote:
>
>> If your product containers a custom framework, at least you should
>> implement kind of high availability for your scheduler (like
>> marathon/chronos does), or let it be launched by marathon so it can be
>> restarted when it fails.
>>
>> On Mon, Apr 11, 2016 at 7:27 PM, Paul Bell  wrote:
>>
>>> Hi All,
>>>
>>> As we get closer to shipping a Mesos-based version of our product, we've
>>> turned our attention to "protecting" (supporting backup & recovery) of not
>>> only our application databases, but the cluster as well.
>>>
>>> I'm not quite sure how to begin thinking about this, but I suppose the
>>> usual dimensions of B/R would come into play, e.g., hot/cold, application
>>> consistent/crash consistent, etc.
>>>
>>> Has anyone grappled with this issue and, if so, would you be so kind as
>>> to share your experience and solutions?
>>>
>>> Thank you.
>>>
>>> -Paul
>>>
>>>
>>
>


-- 
Best Regards,
Haosdent Huang


Re: SharedFilesystemIsolator (filesystem/shared)

2016-04-11 Thread Erb, Stephan
Given that the "filesystem/linux" isolator has landed, is it now considered to 
be a drop-in replacement for the "filesystem/shared" isolator?


From: Erb, Stephan 
Sent: Wednesday, July 29, 2015 21:08
To: user@mesos.apache.org
Subject: Re: SharedFilesystemIsolator (filesystem/shared)


We are using the isolator in order to re-route writes to /tmp to a path  inside 
the container sandbox:


--default_container_info='{
"type": "MESOS",
"volumes": [
{"host_path": "system/tmp", "container_path": "/tmp",   
 "mode": "RW"},
{"host_path": "system/vartmp",  "container_path": "/var/tmp",   
 "mode": "RW"}
]
}

?

We are running on linux, so I guess the new one will work for us just as well?


Thanks,

Stephan


From: Jie Yu 
Sent: Wednesday, July 29, 2015 2:40 AM
To: user@mesos.apache.org
Subject: SharedFilesystemIsolator (filesystem/shared)

Hi Mesos users,

I am wondering if anyone is using this isolator (i.e., 
--isolation=filesystem/shared)? If not, we plan to remove it from the source 
code in favor of using the upcoming more general linux filesystem isolator 
(https://reviews.apache.org/r/36429/).

- Jie


Re: Backup a Mesos Cluster

2016-04-11 Thread Paul Bell
Piotr,

Thank you for this link. I am looking at it now where I right away notice
that Exhibitor is designed to monitor (and backup) Zookeeper (but not
anything related to Mesos itself). Don't the Mesos master & agent nodes
keep at least some state outside of the ZK znodes, e.g., under the default
workdir?

Shua,

Thank you for this observation. Happily (I think), we do not have a custom
framework. Presently, Marathon is the only framework that we use.

-Paul

On Mon, Apr 11, 2016 at 8:12 AM, Shuai Lin  wrote:

> If your product containers a custom framework, at least you should
> implement kind of high availability for your scheduler (like
> marathon/chronos does), or let it be launched by marathon so it can be
> restarted when it fails.
>
> On Mon, Apr 11, 2016 at 7:27 PM, Paul Bell  wrote:
>
>> Hi All,
>>
>> As we get closer to shipping a Mesos-based version of our product, we've
>> turned our attention to "protecting" (supporting backup & recovery) of not
>> only our application databases, but the cluster as well.
>>
>> I'm not quite sure how to begin thinking about this, but I suppose the
>> usual dimensions of B/R would come into play, e.g., hot/cold, application
>> consistent/crash consistent, etc.
>>
>> Has anyone grappled with this issue and, if so, would you be so kind as
>> to share your experience and solutions?
>>
>> Thank you.
>>
>> -Paul
>>
>>
>


Re: Backup a Mesos Cluster

2016-04-11 Thread Piotr Szwed
Do you know Exhibitor?
https://github.com/Netflix/exhibitor

This could be a good starting point as it has implemented sort of backup
mechanism of Zookeeper cluster status to S3.

Cheers,

2016-04-11 13:27 GMT+02:00 Paul Bell :

> Hi All,
>
> As we get closer to shipping a Mesos-based version of our product, we've
> turned our attention to "protecting" (supporting backup & recovery) of not
> only our application databases, but the cluster as well.
>
> I'm not quite sure how to begin thinking about this, but I suppose the
> usual dimensions of B/R would come into play, e.g., hot/cold, application
> consistent/crash consistent, etc.
>
> Has anyone grappled with this issue and, if so, would you be so kind as to
> share your experience and solutions?
>
> Thank you.
>
> -Paul
>
>


-- 
--
Mesos Labs


Backup a Mesos Cluster

2016-04-11 Thread Paul Bell
Hi All,

As we get closer to shipping a Mesos-based version of our product, we've
turned our attention to "protecting" (supporting backup & recovery) of not
only our application databases, but the cluster as well.

I'm not quite sure how to begin thinking about this, but I suppose the
usual dimensions of B/R would come into play, e.g., hot/cold, application
consistent/crash consistent, etc.

Has anyone grappled with this issue and, if so, would you be so kind as to
share your experience and solutions?

Thank you.

-Paul


Re: Mesos 0.28 SSL in official packages

2016-04-11 Thread Adam Bordelon
Hi Kamil,

Technically, there are no "official" Apache-built packages for Apache Mesos.

At least once company (Mesosphere) chooses to build and distribute
Mesos packages, but does not currently offer SSL builds. It wouldn't
be hard to add an SSL build to our regular builds, but it hasn't been
requested enough to prioritize it.

cc: Joris, Kapil

On Thu, Apr 7, 2016 at 7:42 AM, haosdent  wrote:
> Hi, ssl didn't enable default. You need compile it by following this doc
> http://mesos.apache.org/documentation/latest/ssl/
>
> On Thu, Apr 7, 2016 at 10:04 PM, Kamil Wokitajtis 
> wrote:
>>
>> This is my first post, so Hi everyone!
>>
>> Is SSL enabled in official packages (CentOS in my case)?
>> I can see libssl in ldd output, but I cannot see libevent.
>> I had to compile mesos from sources to run it over ssl.
>> I would prefer to install it from packages.
>>
>> Regards,
>> Kamil
>
>
>
>
> --
> Best Regards,
> Haosdent Huang