High quality copies of Mesos logo for download

2014-06-19 Thread Dave Lester
Folks periodically ask for a high-quality version of the Mesos logo for
presentations and other uses; below is a link to the files, shared via
Dropbox.

https://www.dropbox.com/sh/7vmh5w3ukvwitb3/AACpGaEjRv1HXTKP9PAHALEfa?lst

The Dropbox link includes illustrator and PNG images for both the dark blue
and white versions of the logo.

Dave


Re: Difficulties building libmesos.so

2014-06-19 Thread Tim St Clair
inline 

- Original Message -

> From: "Alexander Gallego" 
> To: user@mesos.apache.org
> Sent: Thursday, June 19, 2014 1:49:44 PM
> Subject: Re: Difficulties building libmesos.so

> Hi Tim,

> Thanks for the reply and apologies for the late response - gmail filter was
> pretty aggressive.

> At the moment I have a less than Ideal setup - a hack -, but the work around
> is to essentially remove libproto, libzookeeper from my libs when linking
> against libmesos

> Given that we happen to run the same version, is just a plain coincidence. It
> would be great if libmesos can build as a normal dynamic lib as opposed to a
> statically linked library. This gives me greater flexibility to provide for
> example - patched versions of a protobuf or zookeepr libs etc.

You might want to track: https://issues.apache.org/jira/browse/MESOS-1071 
I'll try to button up the protobuf patch tomorrow. 
https://issues.apache.org/jira/browse/MESOS-1174 

> Thanks!

> Sincerely,
> Alexander Gallego

> ---*---
> --*
> * * *

> On Mon, Jun 16, 2014 at 4:49 PM, Tim St Clair < tstcl...@redhat.com > wrote:

> > Greetings Alexandar -
> 

> > My apologies for my delayed response, I've been inundated as of late.
> 

> > ./configure --disable-bundled
> 

> > is the easiest option if you have a system installed version of the
> > libraries, but it doesn't yet handle protobuf.
> 

> > I'm not entirely certain where ubuntu is @ with regard to the full
> > dep-graph,
> > but it is available in fedora channels.
> 

> > You could also try https://github.com/timothysc/mesos/tree/0.18-integ if
> > you're willing to live behind the times for a bit.
> 

> > I'm going to make a hard push to get --disable-bundled fully completed
> > prior
> > to a 1.0 release/MesosCon.
> 

> > Cheers,
> 
> > Tim
> 

> > > From: "Alexander Gallego" < gallego.al...@gmail.com >
> > 
> 
> > > To: user@mesos.apache.org
> > 
> 
> > > Sent: Sunday, June 8, 2014 12:55:56 AM
> > 
> 
> > > Subject: Difficulties building libmesos.so
> > 
> 

> > > I'm having a hard time attempting to use libmesos.so and hoping for
> > > guidance.
> > 
> 

> > > Issue:
> > 
> 

> > > Libmesos.so as installed by the mesosphere .deb pkg or built from source
> > 
> 
> > > statically links all sources including:
> > 
> 

> > > 1. Protobuf (2.5)
> > 
> 
> > > 2. Zookeeper (3.4.5)
> > 
> 

> > > This is a problem because when you start any int main(args, char**) it is
> > 
> 
> > > suggested by the protobuf to initialize it for proper behavior. (check
> > > versions)
> > 
> 

> > > Here is the snippet from source:
> > 
> 

> > > build/include/google/protobuf/stubs/common.h
> > 
> 
> > > 149:#define GOOGLE_PROTOBUF_VERIFY_VERSION
> > 
> 
> > > // Place this macro in your main() function (or somewhere before you
> > > attempt
> > 
> 
> > > // to use the protobuf library) to verify that the version you link
> > > against
> > 
> 
> > > // matches the headers you compiled against. If a version mismatch is
> > 
> 
> > > // detected, the process will abort.
> > 
> 
> > > #define GOOGLE_PROTOBUF_VERIFY_VERSION \
> > 
> 
> > > ::google::protobuf::internal::VerifyVersion( \
> > 
> 
> > > GOOGLE_PROTOBUF_VERSION, GOOGLE_PROTOBUF_MIN_LIBRARY_VERSION, \
> > 
> 
> > > __FILE__)
> > 
> 

> > > To deinitialize the library you are suggested you call 'shutdown'
> > 
> 

> > > build/include/google/protobuf/stubs/common.cc
> > 
> 
> > > void ShutdownProtobufLibrary() {
> > 
> 
> > > internal::InitShutdownFunctionsOnce();
> > 
> 
> > > // ... stuff
> > 
> 
> > > }
> > 
> 

> > > Well the issue is that when linking w/ libmesos (the static fat lib 298MB
> > > as
> > > of rc3)
> > 
> 
> > > my protobufs now double free :(
> > 
> 

> > > I haven't yet been able to play w/ zookeeper and its internal state as it
> > > interacts w/
> > 
> 
> > > libmesos.so. The issue is fundamentally static state (ugh). But I have to
> > > use
> > 
> 
> > > these libs (zookeeper, protobuf) for a project.
> > 
> 

> > > The tentative solution suggested (mesos/docs/ configuration.md ) is
> > 
> 
> > > to use the compile time flags --with-zookeeper=/path/to/root/not/src/c
> > 
> 
> > > *this is where i'd like guidance*
> > 
> 

> > > I have not been able to compile libmesos as a lib without statically
> > > linking
> > > all the deps
> > 
> 

> > > Note: I have read the mainling list post describing why the build system
> > > was
> > > originally
> > 
> 
> > > set up this way (mainly that there are patches -- look at the .patch
> > > files)
> > 
> 

> > > Things I've done to try and build libmesos.so without statically linking
> > > protobufs, boost, zookeeper:
> > 
> 

> > > Here are the command line args passing to configure:
> > 
> 

> > > // assume TLD=/absolute/path/to/libs
> > 
> 

> > > ./configure --enable-shared=yes \
> > 
> 
> > > --enable-bundled=no \
> > 
> 
> > > --disable-python \
> > 
> 
> > > --with-zookeeper="${TLD}/zookeeper" \
> > 
> 
> > > --with-leveldb="${TLD}/leveldb"
> > 
> 

> > > The first problem is that pa

Re: cgroups memory isolation

2014-06-19 Thread Tim St Clair
https://issues.apache.org/jira/browse/MESOS-1516 

- Original Message -

> From: "Vinod Kone" 
> To: user@mesos.apache.org
> Cc: "Ian Downes" , "Eric Abbott" 
> Sent: Thursday, June 19, 2014 2:35:20 PM
> Subject: Re: cgroups memory isolation

> On Thu, Jun 19, 2014 at 11:33 AM, Sharma Podila < spod...@netflix.com >
> wrote:

> > Yeah, having soft-limit for memory seems like the right thing to do
> > immediately. The only problem left to solve being that it would be nicer to
> > throttle I/O instead of OOM for high rate I/O jobs. Hopefully the soft
> > limits on memory push this problem to only the extreme edge cases.
> 

> The reason that Mesos uses hard limits for memory and cpu is to provide
> predictability for the users/tasks. For example, some users/tasks don't want
> to be in a place where the task has been improperly sized but was humming
> along fine because it was using idle resources on the machine (soft limits)
> but during crunch time (e.g., peak workload) cannot work as well because the
> machine had multiple tasks all utilizing their full allocations. In other
> words, this provides the users the ability to better predict their SLAs.

> That said, in some cases the tight SLAs probably don't make sense (e.g.,
> batch jobs). That is the reason we let operators configure soft and hard
> limits for cpu. Unless I misunderstand how memory soft limits work (
> https://www.kernel.org/doc/Documentation/cgroups/memory.txt ) I don't see
> why we can't provide a similar soft limit option for memory.

> IOW, feel free to file a ticket :)

-- 
Cheers, 
Tim 
Freedom, Features, Friends, First -> Fedora 
https://fedoraproject.org/wiki/SIGs/bigdata 


Re: cgroups memory isolation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 11:33 AM, Sharma Podila  wrote:

> Yeah, having soft-limit for memory seems like the right thing to do
> immediately. The only problem left to solve being that it would be nicer to
> throttle I/O instead of OOM for high rate I/O jobs. Hopefully the soft
> limits on memory push this problem to only the extreme edge cases.
>

The reason that Mesos uses hard limits for memory and cpu is to provide
predictability for the users/tasks. For example, some users/tasks don't
want to be in a place where the task has been improperly sized but was
humming along fine because it was using idle resources on the machine (soft
limits) but during crunch time (e.g., peak workload) cannot work as well
because the machine had multiple tasks all utilizing their full
allocations. In other words, this provides the users the ability to better
predict their SLAs.

That said, in some cases the tight SLAs probably don't make sense (e.g.,
batch jobs). That is the reason we let operators configure soft and hard
limits for cpu. Unless I misunderstand how memory soft limits work (
https://www.kernel.org/doc/Documentation/cgroups/memory.txt) I don't see
why we can't provide a similar soft limit option for memory.

IOW, feel free to file a ticket :)


Re: "Failed to perform recovery: Incompatible slave info detected"

2014-06-19 Thread Dick Davies
Fair enough, appreciate the explanation (and that you've clearly
thought hard about this in the design).

The cluster I hit this on was in the process of being built and had no
tasks deployed, it just violated
my Principle of Least Astonishment that dropping some more cores into
the slaves seemed to kill them
off.

I can see there must be cases where this design choice is the right
thing to do, now we know we can
work around it easily enough - so thanks for the lesson :)

On 19 June 2014 18:43, Vinod Kone  wrote:
> Yes. The idea behind storing the whole slave info is to provide safety.
>
> Imagine, the slave resources were reduced on a restart. What does this mean
> for already running tasks that are using more resources than the newly
> configured resources? Should the slave kill them? If yes, which ones?
> Similarly what happens when the slave attributes are changed (e.g., "secure"
> to "unsecure")? Is it safe to keep running the existing tasks?
>
> As you can see, reconciliation of slave info is a complex problem. While
> there are some smarts we can add to the slave (e.g., increase of resources
> is OK while decrease is not) we haven't really seen a need for it yet.
>
>
> On Thu, Jun 19, 2014 at 3:03 AM, Dick Davies  wrote:
>>
>> Fab, thanks Vinod. Turns out that feature (different FQDN to serve the ui
>> up on)
>> might well be really useful for us, so every cloud has a silver lining :)
>>
>> back to the metadata feature though - do you know why just the 'id' of
>> the slaves isn't used?
>> As it stands adding disk storage, cores or RAM to a slave will cause
>> it to drop out of cluster -
>> does checking the whole metadata provide any benefit vs. checking the id?
>>
>> On 18 June 2014 19:46, Vinod Kone  wrote:
>> > Filed https://issues.apache.org/jira/browse/MESOS-1506 for fixing
>> > flags/documentation.
>> >
>> >
>> > On Wed, Jun 18, 2014 at 11:33 AM, Dick Davies 
>> > wrote:
>> >>
>> >> Thanks, it might be worth correcting the docs in that case then.
>> >> This URL says it'll use the system hostname, not the reverse DNS of
>> >> the ip argument:
>> >>
>> >> http://mesos.apache.org/documentation/latest/configuration/
>> >>
>> >> re: the CFS thing - this was while running Docker on the slaves - that
>> >> also uses cgroups
>> >> so maybe resources were getting split with mesos or something? (I'm
>> >> still reading up on
>> >> cgroups) - definitely wasn't the case until cfs was enabled.
>> >>
>> >>
>> >> On 18 June 2014 18:34, Vinod Kone  wrote:
>> >> > Hey Dick,
>> >> >
>> >> > Regarding slave recovery, any changes in the SlaveInfo (see
>> >> > mesos.proto)
>> >> > are
>> >> > considered as a new slave and hence recovery doesn't proceed forward.
>> >> > This
>> >> > is because Master caches SlaveInfo and it is quite complex to
>> >> > reconcile
>> >> > the
>> >> > differences in SlaveInfo. So we decided to fail on any SlaveInfo
>> >> > changes
>> >> > for
>> >> > now.
>> >> >
>> >> > In your particular case,
>> >> > https://issues.apache.org/jira/browse/MESOS-672
>> >> > was
>> >> > committed in 0.18.0 which fixed redirection
>> >> >  of WebUI. Included in this fix is
>> >> > https://reviews.apache.org/r/17573/
>> >> > which
>> >> > changed how SlaveInfo.hostname is calculated. Since you are not
>> >> > providing a
>> >> > hostname via "--hostname" flag, slave now deduces the hostname from
>> >> > "--ip"
>> >> > flag. Looks like in your cluster the hostname corresponding to that
>> >> > ip
>> >> > is
>> >> > different than what 'os::hostname()' gives.
>> >> >
>> >> > Couple of options to move forward. If you want slave recovery,
>> >> > provide
>> >> > "--hostname" that matches the previous hostname. If you don't care
>> >> > above
>> >> > recovery, just remove the meta directory ("rm -rf /var/mesos/meta")
>> >> > so
>> >> > that
>> >> > the slave starts as a fresh one (since you are not using cgroups, you
>> >> > will
>> >> > have to manually kill any old executors/tasks that are still alive on
>> >> > the
>> >> > slave).
>> >> >
>> >> > Not sure about your comment on CFS. Enabling CFS shouldn't change how
>> >> > much
>> >> > memory the slave sees as available. More details/logs would help
>> >> > diagnose
>> >> > the issue.
>> >> >
>> >> > HTH,
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Jun 18, 2014 at 4:26 AM, Dick Davies 
>> >> > wrote:
>> >> >>
>> >> >> Should have said, the CLI for this is :
>> >> >>
>> >> >> /usr/local/sbin/mesos-slave --master=zk://10.10.10.105:2181/mesos
>> >> >> --log_dir=/var/log/mesos --ip=10.10.10.101 --work_dir=/var/mesos
>> >> >>
>> >> >> (note IP is specified, hostname is not - docs indicated hostname arg
>> >> >> will default to the fqdn of host, but it appears to be using the
>> >> >> value
>> >> >> passed as 'ip' instead.)
>> >> >>
>> >> >> On 18 June 2014 12:00, Dick Davies  wrote:
>> >> >> > Hi, we recently bumped 0.17.0 -> 0.18.2 and the slaves
>> >> >> > now show their IPs rather than their FQDNs on the mesos UI.
>> >> >> >
>> >> >> > This broke slave recovery 

Re: Difficulties building libmesos.so

2014-06-19 Thread Alexander Gallego
Hi Tim,

Thanks for the reply and apologies for the late response - gmail filter was
pretty aggressive.

At the moment I have a less than Ideal setup - a hack -, but the work
around is to essentially remove libproto, libzookeeper from my libs when
linking against libmesos

Given that we happen to run the same version, is just a plain coincidence.
It would be great if libmesos can build as a normal dynamic lib as opposed
to a statically linked library. This gives me greater flexibility to
provide for example - patched versions of a protobuf or zookeepr libs etc.

Thanks!





Sincerely,
Alexander Gallego

---*---
--*
*  *  *




On Mon, Jun 16, 2014 at 4:49 PM, Tim St Clair  wrote:

> Greetings Alexandar -
>
> My apologies for my delayed response, I've been inundated as of late.
>
> ./configure --disable-bundled
>
> is the easiest option if you have a system installed version of the
> libraries, but it doesn't yet handle protobuf.
>
> I'm not entirely certain where ubuntu is @ with regard to the full
> dep-graph, but it is available in fedora channels.
>
> You could also try https://github.com/timothysc/mesos/tree/0.18-integ if
> you're willing to live behind the times for a bit.
>
> I'm going to make a hard push to get --disable-bundled fully completed
> prior to a 1.0 release/MesosCon.
>
> Cheers,
> Tim
>
> --
>
> *From: *"Alexander Gallego" 
> *To: *user@mesos.apache.org
> *Sent: *Sunday, June 8, 2014 12:55:56 AM
> *Subject: *Difficulties building libmesos.so
>
>
> I'm having a hard time attempting to use libmesos.so and hoping for
> guidance.
>
> Issue:
>
> Libmesos.so as installed by the mesosphere .deb pkg or built from source
> statically links all sources including:
>
> 1. Protobuf (2.5)
> 2. Zookeeper (3.4.5)
>
> This is a problem because when you start any int main(args, char**) it is
> suggested by the protobuf to initialize it for proper behavior. (check
> versions)
>
> Here is the snippet from source:
>
> build/include/google/protobuf/stubs/common.h
> 149:#define GOOGLE_PROTOBUF_VERIFY_VERSION
> // Place this macro in your main() function (or somewhere before you
> attempt
> // to use the protobuf library) to verify that the version you link against
> // matches the headers you compiled against.  If a version mismatch is
> // detected, the process will abort.
> #define GOOGLE_PROTOBUF_VERIFY_VERSION\
>   ::google::protobuf::internal::VerifyVersion(\
> GOOGLE_PROTOBUF_VERSION, GOOGLE_PROTOBUF_MIN_LIBRARY_VERSION, \
> __FILE__)
>
> To deinitialize the library  you are suggested you call 'shutdown'
>
> build/include/google/protobuf/stubs/common.cc
> void ShutdownProtobufLibrary() {
>   internal::InitShutdownFunctionsOnce();
>   // ... stuff
> }
>
> Well the issue is that when linking w/ libmesos (the static fat lib 298MB
> as of rc3)
> my protobufs now double free :(
>
> I haven't yet been able to play w/ zookeeper and its internal state as it
> interacts w/
> libmesos.so. The issue is fundamentally static state (ugh). But I have to
> use
> these libs (zookeeper, protobuf) for a project.
>
>
> The tentative solution suggested (mesos/docs/configuration.md) is
> to use the compile time flags --with-zookeeper=/path/to/root/not/src/c
> *this is where i'd like guidance*
>
> I have not been able to compile libmesos as a lib without statically
> linking all the deps
>
> Note: I have read the mainling list post describing why the build system
> was originally
> set up this way (mainly that there are patches -- look at the .patch files)
>
>
> Things I've done to try and build libmesos.so without statically linking
> protobufs, boost, zookeeper:
>
>
> Here are the command line args passing to configure:
>
> // assume TLD=/absolute/path/to/libs
>
> ./configure --enable-shared=yes \
> --enable-bundled=no \
> --disable-python\
> --with-zookeeper="${TLD}/zookeeper" \
> --with-leveldb="${TLD}/leveldb"
>
> The first problem is that passing just -with-leveldb= to the dir w/
> leveldb for example
> doesn't find the -lleveldb.
>
> I tried looking at the make file and the paths seem correct mainly:
>
> Lines 129 & 130 (I guess this might change from system to system)
> am__append_2 = $(LEVELDB)/libleveldb.a
> am__append_3 = $(ZOOKEEPER)/src/c/libzookeeper_mt.la
>
>
> Notes about my desktop system:
>
> Using gcc 4.8
> $ lsb_release -a
> Distributor ID: Ubuntu
> Description:Ubuntu 14.04 LTS
> Release:14.04
> Codename:   trusty
>
>
> The questions i'm looking to get some pointers are:
>
> * Has anyone actually built a shared libmesos.so without included
> dependencies, if so, how.
>
> * Is there a pre-build binaries (lib mesos w/out deps) for linux x86_64
> systems avail for download
>   perhaps the mesosphere guys have some solution here.
>
> * How have other people building native apps (c++) linked and

Re: cgroups memory isolation

2014-06-19 Thread Sharma Podila
Yeah, having soft-limit for memory seems like the right thing to do
immediately. The only problem left to solve being that it would be nicer to
throttle I/O instead of OOM for high rate I/O jobs. Hopefully the soft
limits on memory push this problem to only the extreme edge cases.

Agreed on still enforcing limits in general. This tends be on an ongoing
issue from the operations perspective, I've had my share of dealing with
it, and I am sure I will continue to do so. Sometimes users can't estimate,
sometimes jobs' memory footprint changes drastically with minor changes,
etc. Memory usage prediction based on historic usage and reactive resizing
based on actual usage are two tools of the trade.

BTW, by resize, did you mean cgrops memory limits can be resized for
running jobs? That's nice to know (am relatively new to cgroups).



On Thu, Jun 19, 2014 at 10:55 AM, Tim St Clair  wrote:

> Awesome response!
>
> inline below -
>
> --
>
> *From: *"Sharma Podila" 
> *To: *user@mesos.apache.org
> *Cc: *"Ian Downes" , "Eric Abbott" <
> eabb...@hubspot.com>
> *Sent: *Thursday, June 19, 2014 11:54:34 AM
>
> *Subject: *Re: cgroups memory isolation
>
> Purely from a user expectation point of view, I am wondering if such an
> "abuse" (overuse?) of I/O bandwidth/rate should translate into I/O
> bandwidth getting throttled for the job instead of it manifesting into an
> OOM that results in a job kill. Such I/O overuse translating into memory
> overuse seems like an implementation detail (for lack of a better phrase)
> of the OS that uses cache'ing. It's not like the job asked for its memory
> to be used up for I/O cache'ing :-)
>
> In cgroups, you could optionally specify the memory limit as soft, vs.
> hard (OOM).
>
>
>
> I do see that this isn't Mesos specific, but, rather a containerization
> artifact that is inevitable in a shared resource environment.
>
> That said, specifying memory size for jobs is not trivial in a shared
> resource environment. Conservative safe margins do help prevent OOMs, but,
> they also come with the side effect of fragmenting resources and reducing
> utilization. In some cases, they can cause job starvation to some extent,
> if most available memory is allocated to the conservative buffering for
> every job.
>
> Yup, unless you develop tuning models / hunting algorithms.  You need some
> level of global visibility & history.
>
> Another approach that could help, if feasible, is to have containers with
> elastic boundaries (different from over-subscription) that manage things
> such that sum of actual usage of all containers is <= system resources.
> This helps when not all jobs have peak use of resources simultaneously.
>
>
> You "could" use soft limits & resize, I like to call it the "push-over"
> policy.  If the limits are not enforced, what prevents abusive users in
> absence of global visibility?
>
> IMHO - having soft c-group memory limits being an option seems to be the
> right play given the environment.
>
> Thoughts?
>
>
>
> On Wed, Jun 18, 2014 at 1:42 PM, Tim St Clair  wrote:
>
>> FWIW -  There is classic grid mantra that applies here.  Test your
>> workflow on an upper bound, then over provision to be safe.
>>
>> Mesos is no different then SGE, PBS, LSF, Condor, etc.
>>
>> Also, there is no hunting algo for "jobs", that would have to live
>> outside of mesos itself, on some batch system built atop.
>>
>> Cheers,
>> Tim
>>
>> --
>>
>> *From: *"Thomas Petr" 
>> *To: *"Ian Downes" 
>> *Cc: *user@mesos.apache.org, "Eric Abbott" 
>> *Sent: *Wednesday, June 18, 2014 9:36:51 AM
>> *Subject: *Re: cgroups memory isolation
>>
>>
>> Thanks for all the info, Ian. We're running CentOS 6 with the 2.6.32
>> kernel.
>>
>> I ran `dd if=/dev/zero of=lotsazeros bs=1M` as a task in Mesos and got
>> some weird results. I initially gave the task 256 MB, and it never exceeded
>> the memory allocation (I killed the task manually after 5 minutes when the
>> file hit 50 GB). Then I noticed your example was 128 MB, so I resized and
>> tried again. It exceeded memory
>>  almost
>> immediately. The next (replacement) task our framework started ran
>> successfully and never exceeded memory. I watched nr_dirty and it
>> fluctuated between 1 to 14000 when the task is running. The slave host
>> is a c3.xlarge in EC2, if it makes a difference.
>>
>> As Mesos users, we'd like an isolation strategy that isn't affected by
>> cache this much -- it makes it harder for us to appropriately size things.
>> Is it possible through Mesos or cgroups itself to make the page cache not
>> count towards the total memory consumption? If the answer is no, do you
>> think it'd be worth looking at using Docker for isolation instead?
>>
>> -Tom
>>
>>
>> On Tue, Jun 17, 2014 at 6:18 PM, Ian Downes  wrote:
>>
>>> Hello Thomas,
>>>
>>> Your impression is mostly correct: the kernel will *try* to reclaim
>>> memory by writing ou

Re: Framework Starvation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 10:46 AM, Vinod Kone  wrote:

> Waiting to see your blog post :)
>
> That said, what baffles me is that in the very beginning when only two
> frameworks are present and no tasks have been launched, one framework is
> getting more allocations than other (see the log lines I posted in the
> earlier email), which is unexpected.
>
>
> @vinodkone
>
>
> On Tue, Jun 17, 2014 at 9:41 PM, Claudiu Barbura <
> claudiu.barb...@atigeo.com> wrote:
>
>>  Hi Vinod,
>>
>>  Yo are looking at logs I had posted before we implemented our fix
>> (files attached in my last email).
>> I will write a detailed blog post on the issue … after the Spark Summit
>> at the end of this month.
>>
>>  What wold happen before is that frameworks with the same share (0)
>> would also have the smallest allocation in the beginning, and after sorting
>> the list they would be at the top, always offered all the resources before
>> other frameworks that had been already offered, running tasks with a share
>> and allocation > 0.
>>
>>  Thanks,
>> Claudiu
>>
>>   From: Vinod Kone 
>> Reply-To: "user@mesos.apache.org" 
>> Date: Wednesday, June 18, 2014 at 4:54 AM
>>
>> To: "user@mesos.apache.org" 
>> Subject: Re: Framework Starvation
>>
>>   Hey Claudiu,
>>
>>  I spent some time trying to understand the logs you posted. Whats
>> strange to me is that in the very beginning when framework's 1 and 2 are
>> registered, only one framework gets offers for a period of 9s. It's not
>> clear why this happens. I even wrote a test (
>> https://reviews.apache.org/r/22714/) to repro but wasn't able to.
>>
>>  It would probably be helpful to add more logging to the drf sorting
>> comparator function to understand why frameworks are sorted in such a way
>> when their share is same (0). My expectation is that after each allocation,
>> the 'allocations' for a framework should increase causing the sort function
>> to behave correctly. But that doesn't seem to be happening in your case.
>>
>>
>>  I0604 22:12:43.715530 22270 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-
>>
>> I0604 22:12:44.276062 22273 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:44.756918 22292 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-
>>
>> I0604 22:12:45.794178 22276 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:46.841629 22291 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:47.884266 22262 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:48.926856 22268 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:49.966560 22280 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:51.007143 22267 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:52.047987 22280 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:53.089340 22291 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-0001
>>
>> I0604 22:12:54.130242 22263 master.cpp:2282] Sending 4 offers to
>> framework 20140604-221214-302055434-5050-22260-
>>
>>
>>  @vinodkone
>>
>>
>> On Fri, Jun 13, 2014 at 3:40 PM, Claudiu Barbura <
>> claudiu.barb...@atigeo.com> wrote:
>>
>>>  Hi Vinod,
>>>
>>>  Attached are the patch files.Hadoop has to be treated differently as
>>> it requires resources in order to shut down task trackers after a job is
>>> complete. Therefore we set the role name so that Mesos allocates resources
>>> for it first, ahead of the rest of the frameworks under the default role
>>> (*).
>>> This is not ideal, we are going to loo into the Hadoop Mesos framework
>>> code and fix if possible. Luckily, Hadoop is the only framework we use on
>>> top of Mesos that allows a configurable role name to be passed in when
>>> registering a framework (unlike, Spark, Aurora, Storm etc)
>>> For the non-Hadoop frameworks, we are making sure that once a framework
>>> is running its jobs, Mesos no longer offers resources to it. In the same
>>> time, once a framework completes its job, we make sure its “client
>>> allocations” value is updated so that when it completes the execution of
>>> its jobs, it is placed back in the sorting list with a real chance of being
>>> offered again immediately (not starved!).
>>> What is also key is that mem type resources are ignored during share
>>> computation as only cpus are a good indicator of which frameworks are
>>> actually running jobs in the cluster.
>>>
>>>  Thanks,
>>> Claudiu
>>>
>>>   From: Claudiu Barbura 
>>> Reply-To: "user@mesos.apache.org" 
>>>  Date: Thursday, June 12, 2014 at

Re: cgroups memory isolation

2014-06-19 Thread Tim St Clair
Awesome response! 

inline below - 

- Original Message -

> From: "Sharma Podila" 
> To: user@mesos.apache.org
> Cc: "Ian Downes" , "Eric Abbott" 
> Sent: Thursday, June 19, 2014 11:54:34 AM
> Subject: Re: cgroups memory isolation

> Purely from a user expectation point of view, I am wondering if such an
> "abuse" (overuse?) of I/O bandwidth/rate should translate into I/O bandwidth
> getting throttled for the job instead of it manifesting into an OOM that
> results in a job kill. Such I/O overuse translating into memory overuse
> seems like an implementation detail (for lack of a better phrase) of the OS
> that uses cache'ing. It's not like the job asked for its memory to be used
> up for I/O cache'ing :-)

In cgroups, you could optionally specify the memory limit as soft, vs. hard 
(OOM). 

> I do see that this isn't Mesos specific, but, rather a containerization
> artifact that is inevitable in a shared resource environment.

> That said, specifying memory size for jobs is not trivial in a shared
> resource environment. Conservative safe margins do help prevent OOMs, but,
> they also come with the side effect of fragmenting resources and reducing
> utilization. In some cases, they can cause job starvation to some extent, if
> most available memory is allocated to the conservative buffering for every
> job.

Yup, unless you develop tuning models / hunting algorithms. You need some level 
of global visibility & history. 

> Another approach that could help, if feasible, is to have containers with
> elastic boundaries (different from over-subscription) that manage things
> such that sum of actual usage of all containers is <= system resources. This
> helps when not all jobs have peak use of resources simultaneously.

You "could" use soft limits & resize, I like to call it the "push-over" policy. 
If the limits are not enforced, what prevents abusive users in absence of 
global visibility? 

IMHO - having soft c-group memory limits being an option seems to be the right 
play given the environment. 

Thoughts? 

> On Wed, Jun 18, 2014 at 1:42 PM, Tim St Clair < tstcl...@redhat.com > wrote:

> > FWIW - There is classic grid mantra that applies here. Test your workflow
> > on
> > an upper bound, then over provision to be safe.
> 

> > Mesos is no different then SGE, PBS, LSF, Condor, etc.
> 
> > Also, there is no hunting algo for "jobs", that would have to live outside
> > of
> > mesos itself, on some batch system built atop.
> 

> > Cheers,
> 
> > Tim
> 

> > > From: "Thomas Petr" < tp...@hubspot.com >
> > 
> 
> > > To: "Ian Downes" < ian.dow...@gmail.com >
> > 
> 
> > > Cc: user@mesos.apache.org , "Eric Abbott" < eabb...@hubspot.com >
> > 
> 
> > > Sent: Wednesday, June 18, 2014 9:36:51 AM
> > 
> 
> > > Subject: Re: cgroups memory isolation
> > 
> 

> > > Thanks for all the info, Ian. We're running CentOS 6 with the 2.6.32
> > > kernel.
> > 
> 

> > > I ran ` dd if=/dev/zero of=lotsazeros bs=1M` as a task in Mesos and got
> > > some
> > > weird results. I initially gave the task 256 MB, and it never exceeded
> > > the
> > > memory allocation (I killed the task manually after 5 minutes when the
> > > file
> > > hit 50 GB). Then I noticed your example was 128 MB, so I resized and
> > > tried
> > > again. It exceeded memory almost immediately. The next (replacement) task
> > > our framework started ran successfully and never exceeded memory. I
> > > watched
> > > nr_dirty and it fluctuated between 1 to 14000 when the task is
> > > running.
> > > The slave host is a c3.xlarge in EC2, if it makes a difference.
> > 
> 

> > > As Mesos users, we'd like an isolation strategy that isn't affected by
> > > cache
> > > this much -- it makes it harder for us to appropriately size things. Is
> > > it
> > > possible through Mesos or cgroups itself to make the page cache not count
> > > towards the total memory consumption? If the answer is no, do you think
> > > it'd
> > > be worth looking at using Docker for isolation instead?
> > 
> 

> > > - Tom
> > 
> 

> > > On Tue, Jun 17, 2014 at 6:18 PM, Ian Downes < ian.dow...@gmail.com >
> > > wrote:
> > 
> 

> > > > Hello Thomas,
> > > 
> > 
> 

> > > > Your impression is mostly correct: the kernel will *try* to reclaim
> > > 
> > 
> 
> > > > memory by writing out dirty pages before killing processes in a cgroup
> > > 
> > 
> 
> > > > but if it's unable to reclaim sufficient pages within some interval (I
> > > 
> > 
> 
> > > > don't recall this off-hand) then it will start killing things.
> > > 
> > 
> 

> > > > We observed this on a 3.4 kernel where we could overwhelm the disk
> > > 
> > 
> 
> > > > subsystem and trigger an oom. Just how quickly this happens depends on
> > > 
> > 
> 
> > > > how fast you're writing compared to how fast your disk subsystem can
> > > 
> > 
> 
> > > > write it out. A simple "dd if=/dev/zero of=lotsazeros bs=1M" when
> > > 
> > 
> 
> > > > contained in a memory cgroup will fill the cache quickly, reach its
> > > 
> > 
> 
> > > > limit a

Re: Framework Starvation

2014-06-19 Thread Vinod Kone
Waiting to see your blog post :)

That said, what baffles me is that in the very beginning when only two
frameworks are present and no tasks have been launched, one framework is
getting more allocations than other (see the log lines I posted in the
earlier email), which is unexpected.


@vinodkone


On Tue, Jun 17, 2014 at 9:41 PM, Claudiu Barbura  wrote:

>  Hi Vinod,
>
>  Yo are looking at logs I had posted before we implemented our fix (files
> attached in my last email).
> I will write a detailed blog post on the issue … after the Spark Summit at
> the end of this month.
>
>  What wold happen before is that frameworks with the same share (0) would
> also have the smallest allocation in the beginning, and after sorting the
> list they would be at the top, always offered all the resources before
> other frameworks that had been already offered, running tasks with a share
> and allocation > 0.
>
>  Thanks,
> Claudiu
>
>   From: Vinod Kone 
> Reply-To: "user@mesos.apache.org" 
> Date: Wednesday, June 18, 2014 at 4:54 AM
>
> To: "user@mesos.apache.org" 
> Subject: Re: Framework Starvation
>
>   Hey Claudiu,
>
>  I spent some time trying to understand the logs you posted. Whats
> strange to me is that in the very beginning when framework's 1 and 2 are
> registered, only one framework gets offers for a period of 9s. It's not
> clear why this happens. I even wrote a test (
> https://reviews.apache.org/r/22714/) to repro but wasn't able to.
>
>  It would probably be helpful to add more logging to the drf sorting
> comparator function to understand why frameworks are sorted in such a way
> when their share is same (0). My expectation is that after each allocation,
> the 'allocations' for a framework should increase causing the sort function
> to behave correctly. But that doesn't seem to be happening in your case.
>
>
>  I0604 22:12:43.715530 22270 master.cpp:2282] Sending 4 offers to
> framework 20140604-221214-302055434-5050-22260-
>
> I0604 22:12:44.276062 22273 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:44.756918 22292 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-
>
> I0604 22:12:45.794178 22276 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:46.841629 22291 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:47.884266 22262 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:48.926856 22268 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:49.966560 22280 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:51.007143 22267 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:52.047987 22280 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:53.089340 22291 master.cpp:2282] Sending 4 offers to framework
> 20140604-221214-302055434-5050-22260-0001
>
> I0604 22:12:54.130242 22263 master.cpp:2282] Sending 4 offers to
> framework 20140604-221214-302055434-5050-22260-
>
>
>  @vinodkone
>
>
> On Fri, Jun 13, 2014 at 3:40 PM, Claudiu Barbura <
> claudiu.barb...@atigeo.com> wrote:
>
>>  Hi Vinod,
>>
>>  Attached are the patch files.Hadoop has to be treated differently as it
>> requires resources in order to shut down task trackers after a job is
>> complete. Therefore we set the role name so that Mesos allocates resources
>> for it first, ahead of the rest of the frameworks under the default role
>> (*).
>> This is not ideal, we are going to loo into the Hadoop Mesos framework
>> code and fix if possible. Luckily, Hadoop is the only framework we use on
>> top of Mesos that allows a configurable role name to be passed in when
>> registering a framework (unlike, Spark, Aurora, Storm etc)
>> For the non-Hadoop frameworks, we are making sure that once a framework
>> is running its jobs, Mesos no longer offers resources to it. In the same
>> time, once a framework completes its job, we make sure its “client
>> allocations” value is updated so that when it completes the execution of
>> its jobs, it is placed back in the sorting list with a real chance of being
>> offered again immediately (not starved!).
>> What is also key is that mem type resources are ignored during share
>> computation as only cpus are a good indicator of which frameworks are
>> actually running jobs in the cluster.
>>
>>  Thanks,
>> Claudiu
>>
>>   From: Claudiu Barbura 
>> Reply-To: "user@mesos.apache.org" 
>>  Date: Thursday, June 12, 2014 at 6:20 PM
>>
>> To: "user@mesos.apache.org" 
>> Subject: Re: Framework Starvation
>>
>>   Hi Vinod,
>>
>>  We have a fix (more like a hack) that works for us, but it requires us
>> to run each Hadoop framework with

Re: "Failed to perform recovery: Incompatible slave info detected"

2014-06-19 Thread Vinod Kone
Yes. The idea behind storing the whole slave info is to provide safety.

Imagine, the slave resources were reduced on a restart. What does this mean
for already running tasks that are using more resources than the newly
configured resources? Should the slave kill them? If yes, which ones?
Similarly what happens when the slave attributes are changed (e.g.,
"secure" to "unsecure")? Is it safe to keep running the existing tasks?

As you can see, reconciliation of slave info is a complex problem. While
there are some smarts we can add to the slave (e.g., increase of resources
is OK while decrease is not) we haven't really seen a need for it yet.


On Thu, Jun 19, 2014 at 3:03 AM, Dick Davies  wrote:

> Fab, thanks Vinod. Turns out that feature (different FQDN to serve the ui
> up on)
> might well be really useful for us, so every cloud has a silver lining :)
>
> back to the metadata feature though - do you know why just the 'id' of
> the slaves isn't used?
> As it stands adding disk storage, cores or RAM to a slave will cause
> it to drop out of cluster -
> does checking the whole metadata provide any benefit vs. checking the id?
>
> On 18 June 2014 19:46, Vinod Kone  wrote:
> > Filed https://issues.apache.org/jira/browse/MESOS-1506 for fixing
> > flags/documentation.
> >
> >
> > On Wed, Jun 18, 2014 at 11:33 AM, Dick Davies 
> > wrote:
> >>
> >> Thanks, it might be worth correcting the docs in that case then.
> >> This URL says it'll use the system hostname, not the reverse DNS of
> >> the ip argument:
> >>
> >> http://mesos.apache.org/documentation/latest/configuration/
> >>
> >> re: the CFS thing - this was while running Docker on the slaves - that
> >> also uses cgroups
> >> so maybe resources were getting split with mesos or something? (I'm
> >> still reading up on
> >> cgroups) - definitely wasn't the case until cfs was enabled.
> >>
> >>
> >> On 18 June 2014 18:34, Vinod Kone  wrote:
> >> > Hey Dick,
> >> >
> >> > Regarding slave recovery, any changes in the SlaveInfo (see
> mesos.proto)
> >> > are
> >> > considered as a new slave and hence recovery doesn't proceed forward.
> >> > This
> >> > is because Master caches SlaveInfo and it is quite complex to
> reconcile
> >> > the
> >> > differences in SlaveInfo. So we decided to fail on any SlaveInfo
> changes
> >> > for
> >> > now.
> >> >
> >> > In your particular case,
> https://issues.apache.org/jira/browse/MESOS-672
> >> > was
> >> > committed in 0.18.0 which fixed redirection
> >> >  of WebUI. Included in this fix is
> https://reviews.apache.org/r/17573/
> >> > which
> >> > changed how SlaveInfo.hostname is calculated. Since you are not
> >> > providing a
> >> > hostname via "--hostname" flag, slave now deduces the hostname from
> >> > "--ip"
> >> > flag. Looks like in your cluster the hostname corresponding to that ip
> >> > is
> >> > different than what 'os::hostname()' gives.
> >> >
> >> > Couple of options to move forward. If you want slave recovery, provide
> >> > "--hostname" that matches the previous hostname. If you don't care
> above
> >> > recovery, just remove the meta directory ("rm -rf /var/mesos/meta") so
> >> > that
> >> > the slave starts as a fresh one (since you are not using cgroups, you
> >> > will
> >> > have to manually kill any old executors/tasks that are still alive on
> >> > the
> >> > slave).
> >> >
> >> > Not sure about your comment on CFS. Enabling CFS shouldn't change how
> >> > much
> >> > memory the slave sees as available. More details/logs would help
> >> > diagnose
> >> > the issue.
> >> >
> >> > HTH,
> >> >
> >> >
> >> >
> >> > On Wed, Jun 18, 2014 at 4:26 AM, Dick Davies 
> >> > wrote:
> >> >>
> >> >> Should have said, the CLI for this is :
> >> >>
> >> >> /usr/local/sbin/mesos-slave --master=zk://10.10.10.105:2181/mesos
> >> >> --log_dir=/var/log/mesos --ip=10.10.10.101 --work_dir=/var/mesos
> >> >>
> >> >> (note IP is specified, hostname is not - docs indicated hostname arg
> >> >> will default to the fqdn of host, but it appears to be using the
> value
> >> >> passed as 'ip' instead.)
> >> >>
> >> >> On 18 June 2014 12:00, Dick Davies  wrote:
> >> >> > Hi, we recently bumped 0.17.0 -> 0.18.2 and the slaves
> >> >> > now show their IPs rather than their FQDNs on the mesos UI.
> >> >> >
> >> >> > This broke slave recovery with the error:
> >> >> >
> >> >> > "Failed to perform recovery: Incompatible slave info detected"
> >> >> >
> >> >> >
> >> >> > cpu, mem, disk, ports are all the same. so is the 'id' field.
> >> >> >
> >> >> > the only thing that's changed is are the 'hostname' and
> >> >> > webui_hostname
> >> >> > arguments
> >> >> > (the CLI we're passing in is exactly the same as it was on 0.17.0,
> so
> >> >> > presumably this is down to a change in mesos conventions).
> >> >> >
> >> >> > I've had similar issues enabling CFS in test environments (slaves
> >> >> > show
> >> >> > less free memory and refuse to recover).
> >> >> >
> >> >> > is the 'id' field not enough to uniquely identify a slave?
> >> >
> >> 

Re: cgroups memory isolation

2014-06-19 Thread Sharma Podila
Purely from a user expectation point of view, I am wondering if such an
"abuse" (overuse?) of I/O bandwidth/rate should translate into I/O
bandwidth getting throttled for the job instead of it manifesting into an
OOM that results in a job kill. Such I/O overuse translating into memory
overuse seems like an implementation detail (for lack of a better phrase)
of the OS that uses cache'ing. It's not like the job asked for its memory
to be used up for I/O cache'ing :-)

I do see that this isn't Mesos specific, but, rather a containerization
artifact that is inevitable in a shared resource environment.

That said, specifying memory size for jobs is not trivial in a shared
resource environment. Conservative safe margins do help prevent OOMs, but,
they also come with the side effect of fragmenting resources and reducing
utilization. In some cases, they can cause job starvation to some extent,
if most available memory is allocated to the conservative buffering for
every job.
Another approach that could help, if feasible, is to have containers with
elastic boundaries (different from over-subscription) that manage things
such that sum of actual usage of all containers is <= system resources.
This helps when not all jobs have peak use of resources simultaneously.


On Wed, Jun 18, 2014 at 1:42 PM, Tim St Clair  wrote:

> FWIW -  There is classic grid mantra that applies here.  Test your
> workflow on an upper bound, then over provision to be safe.
>
> Mesos is no different then SGE, PBS, LSF, Condor, etc.
>
> Also, there is no hunting algo for "jobs", that would have to live outside
> of mesos itself, on some batch system built atop.
>
> Cheers,
> Tim
>
> --
>
> *From: *"Thomas Petr" 
> *To: *"Ian Downes" 
> *Cc: *user@mesos.apache.org, "Eric Abbott" 
> *Sent: *Wednesday, June 18, 2014 9:36:51 AM
> *Subject: *Re: cgroups memory isolation
>
>
> Thanks for all the info, Ian. We're running CentOS 6 with the 2.6.32
> kernel.
>
> I ran `dd if=/dev/zero of=lotsazeros bs=1M` as a task in Mesos and got
> some weird results. I initially gave the task 256 MB, and it never exceeded
> the memory allocation (I killed the task manually after 5 minutes when the
> file hit 50 GB). Then I noticed your example was 128 MB, so I resized and
> tried again. It exceeded memory
>  almost
> immediately. The next (replacement) task our framework started ran
> successfully and never exceeded memory. I watched nr_dirty and it
> fluctuated between 1 to 14000 when the task is running. The slave host
> is a c3.xlarge in EC2, if it makes a difference.
>
> As Mesos users, we'd like an isolation strategy that isn't affected by
> cache this much -- it makes it harder for us to appropriately size things.
> Is it possible through Mesos or cgroups itself to make the page cache not
> count towards the total memory consumption? If the answer is no, do you
> think it'd be worth looking at using Docker for isolation instead?
>
> -Tom
>
>
> On Tue, Jun 17, 2014 at 6:18 PM, Ian Downes  wrote:
>
>> Hello Thomas,
>>
>> Your impression is mostly correct: the kernel will *try* to reclaim
>> memory by writing out dirty pages before killing processes in a cgroup
>> but if it's unable to reclaim sufficient pages within some interval (I
>> don't recall this off-hand) then it will start killing things.
>>
>> We observed this on a 3.4 kernel where we could overwhelm the disk
>> subsystem and trigger an oom. Just how quickly this happens depends on
>> how fast you're writing compared to how fast your disk subsystem can
>> write it out. A simple "dd if=/dev/zero of=lotsazeros bs=1M" when
>> contained in a memory cgroup will fill the cache quickly, reach its
>> limit and get oom'ed. We were not able to reproduce this under 3.10
>> and 3.11 kernels. Which kernel are you using?
>>
>> Example: under 3.4:
>>
>> [idownes@hostname tmp]$ cat /proc/self/cgroup
>> 6:perf_event:/
>> 4:memory:/test
>> 3:freezer:/
>> 2:cpuacct:/
>> 1:cpu:/
>> [idownes@hostname tmp]$ cat
>> /sys/fs/cgroup/memory/test/memory.limit_in_bytes  # 128 MB
>> 134217728
>> [idownes@hostname tmp]$ dd if=/dev/zero of=lotsazeros bs=1M
>> Killed
>> [idownes@hostname tmp]$ ls -lah lotsazeros
>> -rw-r--r-- 1 idownes idownes 131M Jun 17 21:55 lotsazeros
>>
>>
>> You can also look in /proc/vmstat at nr_dirty to see how many dirty
>> pages there are (system wide). If you wrote at a rate sustainable by
>> your disk subsystem then you would see a sawtooth pattern _/|_/| ...
>> (use something like watch) as the cgroup approached its limit and the
>> kernel flushed dirty pages to bring it down.
>>
>> This might be an interesting read:
>>
>> http://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
>>
>> Hope this helps! Please do let us know if you're seeing this on a
>> kernel >= 3.10, otherwise it's likely this is a kernel issue rather
>> than something with Mesos.
>>
>> Thanks,
>> Ian
>>
>>
>> On Tue, Jun 17

mesos native lib issue

2014-06-19 Thread Eyal Levy
Hello everyone

I am trying to run spark on mesos using Tomcat application
(Trying to creating Spark context within the web application)
when running spark driver from the command line it works perfectly and I
can see the executor of spark through the mesos UI , e.g.
 /bin/spark-shell --master mesos://:5050

but when I try to create the Spark Context through the java code (running
inside tomcat on different machine) I got error regarding the
java.library.path that it missing the mesos lib (*nested exception is
java.lang.UnsatisfiedLinkError: no mesos in java.library.path*)
so, I define the lib as "export MESOS_NATIVE_LIBRARY=/usr/lib/
libmesos-0.19.0.so" and it found the mesos lib
but now I am encounter new issue within the dependency of this lib

*java.lang.UnsatisfiedLinkError: /usr/lib/libmesos-0.19.0.so
: /usr/lib64/libstdc++.so.6: version
`GLIBCXX_3.4.18' not found (required by /usr/lib/libmesos-0.19.0.so
)*

I am running on CentOS 6.5

*any help on this issue will be appreciated, *

another Question:
*How can I run tomcat on window machine using mesos?*

*Best regards*

*Eyal Levy*, elev...@gmail.com



Re: "Failed to perform recovery: Incompatible slave info detected"

2014-06-19 Thread Dick Davies
Fab, thanks Vinod. Turns out that feature (different FQDN to serve the ui up on)
might well be really useful for us, so every cloud has a silver lining :)

back to the metadata feature though - do you know why just the 'id' of
the slaves isn't used?
As it stands adding disk storage, cores or RAM to a slave will cause
it to drop out of cluster -
does checking the whole metadata provide any benefit vs. checking the id?

On 18 June 2014 19:46, Vinod Kone  wrote:
> Filed https://issues.apache.org/jira/browse/MESOS-1506 for fixing
> flags/documentation.
>
>
> On Wed, Jun 18, 2014 at 11:33 AM, Dick Davies 
> wrote:
>>
>> Thanks, it might be worth correcting the docs in that case then.
>> This URL says it'll use the system hostname, not the reverse DNS of
>> the ip argument:
>>
>> http://mesos.apache.org/documentation/latest/configuration/
>>
>> re: the CFS thing - this was while running Docker on the slaves - that
>> also uses cgroups
>> so maybe resources were getting split with mesos or something? (I'm
>> still reading up on
>> cgroups) - definitely wasn't the case until cfs was enabled.
>>
>>
>> On 18 June 2014 18:34, Vinod Kone  wrote:
>> > Hey Dick,
>> >
>> > Regarding slave recovery, any changes in the SlaveInfo (see mesos.proto)
>> > are
>> > considered as a new slave and hence recovery doesn't proceed forward.
>> > This
>> > is because Master caches SlaveInfo and it is quite complex to reconcile
>> > the
>> > differences in SlaveInfo. So we decided to fail on any SlaveInfo changes
>> > for
>> > now.
>> >
>> > In your particular case, https://issues.apache.org/jira/browse/MESOS-672
>> > was
>> > committed in 0.18.0 which fixed redirection
>> >  of WebUI. Included in this fix is https://reviews.apache.org/r/17573/
>> > which
>> > changed how SlaveInfo.hostname is calculated. Since you are not
>> > providing a
>> > hostname via "--hostname" flag, slave now deduces the hostname from
>> > "--ip"
>> > flag. Looks like in your cluster the hostname corresponding to that ip
>> > is
>> > different than what 'os::hostname()' gives.
>> >
>> > Couple of options to move forward. If you want slave recovery, provide
>> > "--hostname" that matches the previous hostname. If you don't care above
>> > recovery, just remove the meta directory ("rm -rf /var/mesos/meta") so
>> > that
>> > the slave starts as a fresh one (since you are not using cgroups, you
>> > will
>> > have to manually kill any old executors/tasks that are still alive on
>> > the
>> > slave).
>> >
>> > Not sure about your comment on CFS. Enabling CFS shouldn't change how
>> > much
>> > memory the slave sees as available. More details/logs would help
>> > diagnose
>> > the issue.
>> >
>> > HTH,
>> >
>> >
>> >
>> > On Wed, Jun 18, 2014 at 4:26 AM, Dick Davies 
>> > wrote:
>> >>
>> >> Should have said, the CLI for this is :
>> >>
>> >> /usr/local/sbin/mesos-slave --master=zk://10.10.10.105:2181/mesos
>> >> --log_dir=/var/log/mesos --ip=10.10.10.101 --work_dir=/var/mesos
>> >>
>> >> (note IP is specified, hostname is not - docs indicated hostname arg
>> >> will default to the fqdn of host, but it appears to be using the value
>> >> passed as 'ip' instead.)
>> >>
>> >> On 18 June 2014 12:00, Dick Davies  wrote:
>> >> > Hi, we recently bumped 0.17.0 -> 0.18.2 and the slaves
>> >> > now show their IPs rather than their FQDNs on the mesos UI.
>> >> >
>> >> > This broke slave recovery with the error:
>> >> >
>> >> > "Failed to perform recovery: Incompatible slave info detected"
>> >> >
>> >> >
>> >> > cpu, mem, disk, ports are all the same. so is the 'id' field.
>> >> >
>> >> > the only thing that's changed is are the 'hostname' and
>> >> > webui_hostname
>> >> > arguments
>> >> > (the CLI we're passing in is exactly the same as it was on 0.17.0, so
>> >> > presumably this is down to a change in mesos conventions).
>> >> >
>> >> > I've had similar issues enabling CFS in test environments (slaves
>> >> > show
>> >> > less free memory and refuse to recover).
>> >> >
>> >> > is the 'id' field not enough to uniquely identify a slave?
>> >
>> >
>
>