Re: Proposal: moving Mesos website to project codebase

2015-10-09 Thread Kevin Sweeney
+1!

On Fri, Oct 9, 2015 at 3:35 PM Marco Massenzio  wrote:

> +1
>
> Dave - great stuff!
>
> *Marco Massenzio*
>
> *Distributed Systems Engineerhttp://codetrips.com *
>
> On Fri, Oct 9, 2015 at 3:05 PM, Dave Lester  wrote:
>
> > As part of the #MesosCon Europe hackathon, my team has been making
> > improvements to the website. Among these changes, we'd like to propose
> > changing where the website source files live by moving them to the main
> > Mesos codebase. Our current progress / working branch of this is
> > available on GitHub: https://github.com/fayusohenson/mesos/tree/site
> >
> > * What does this mean? *
> > We've added a /site directory to the Mesos codebase, which includes the
> > website source files. Today, these live in subversion. The rake file and
> > other parts of building the website all work in this new environment,
> > plus a number of related fixes (image linking, etc).
> >
> > For committers that are familiar with the current model for pushing the
> > site live, this immediate change still requires us `svn commit` the
> > /publish directory for the website (static files that are generated).
> >
> > * Why this change? *
> > 1. Today we do not have an easy process for the community to contribute
> > to the project website. By merging this with the Mesos codebase, it will
> > be significantly easier to send a review or pull request.
> > 2. It'll be easier for committers to manage the website, and check that
> > documentation changes render on the website properly before committing.
> > Because it's difficult to do today, this is often not checked. :(
> > 3. It's a solid step toward an automated deployment of the website in
> > the future: https://issues.apache.org/jira/browse/MESOS-1309
> >
> > * Who approves of this change? *
> > As the Mesos website maintainer, I feel good about this change and its
> > direction for the project. Before committing this change, I'd like
> > community support that including this in the main Mesos codebase makes
> > sense.
> >
> > Comments? Questions?
> >
> > Dave
> >
>


Re: Prepping for next release

2015-09-02 Thread Kevin Sweeney
I'd be in favor of setting that flag to Java 7 as well - just because
classes are compiled in Java 6 format doesn't mean the standard library
classes they reference will be available on Java 6 - your compiler
classpath contains Java 7's rt.jar, which contains classes that don't exist
in Java 6's rt.jar.

On Tue, Sep 1, 2015 at 5:08 PM, Vinod Kone <vinodk...@apache.org> wrote:

> Actually looking at the RC1 jar more closely, it looks like the classes
> are built for 1.6 (our pom file
> <https://github.com/apache/mesos/blob/master/src/java/mesos.pom.in#L117>actually
> sets this via maven compiler plugin).
>
> $ file ~/Downloads/Executor.class
>
> /Users/vinod/Downloads/Executor.class: compiled Java class data, version
> 50.0 (Java 1.6)
>
> The confusing part (for me) is that jar's manifest says "Build-Jdk:
> 1.7.0_60" but AFAICT that just means JDK7 was used to build the JAR. It
> has nothing to do with the version of the generated byte code.
>
> So, I think we are OK here.
>
>
> On Tue, Sep 1, 2015 at 5:03 PM, Kevin Sweeney <kevi...@apache.org> wrote:
>
>> I'm generally in favor of dropping support for JDK6 as it's been
>> end-of-life for years.
>>
>> On Tue, Sep 1, 2015 at 4:46 PM, Vinod Kone <vinodk...@apache.org> wrote:
>>
>>> +user
>>>
>>> So looks like this issue is related to JDK6 and not my maven password
>>> settings.
>>>
>>> Related ASF ticket: https://issues.apache.org/jira/browse/BUILDS-85
>>>
>>> The reason it worked for me, when I tagged RC1, was because I also
>>> pointed my maven to use JDK7.
>>>
>>> So we have couple options here:
>>>
>>> #1) (Easy) Do same thing with RC2 as we did for RC1. This does mean the
>>> artifacts we upload to nexus will be compiled with JDK7. IIUC, if any JVM
>>> based frameworks are still on JDK6 they can't link in the new artifacts?
>>>
>>> #2) (Harder) As mentioned in the ticket, have maven compile Mesos jar
>>> with JDK6 but use JDK7 when uploading. Not sure how easy it is to adapt our
>>> Mesos build tool chain for this. Anyone has expertise in this area?
>>>
>>> Thoughts?
>>>
>>>
>>> On Tue, Aug 18, 2015 at 3:14 PM, Vinod Kone <vinodk...@apache.org>
>>> wrote:
>>>
>>>> I re-encrypted the maven passwords and that seemed to have done the
>>>> trick. Thanks Adam!
>>>>
>>>> On Tue, Aug 18, 2015 at 1:59 PM, Adam Bordelon <a...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Update your ~/.m2/settings.xml?
>>>>> Also check that the output of `gpg --list-keys` and `--list-sigs`
>>>>> matches
>>>>> the keypair you expect
>>>>>
>>>>> On Tue, Aug 18, 2015 at 1:48 PM, Vinod Kone <vinodk...@apache.org>
>>>>> wrote:
>>>>>
>>>>> > I definitely had to create a new gpg key because my previous one
>>>>> expired! I
>>>>> > uploaded them id.apache and our SVN repo containing KEYS.
>>>>> >
>>>>> > Do I need to do anything specific for maven?
>>>>> >
>>>>> > On Tue, Aug 18, 2015 at 1:25 PM, Adam Bordelon <a...@mesosphere.io>
>>>>> wrote:
>>>>> >
>>>>> > > Haven't seen that one. Are you sure you've got your gpg key
>>>>> properly set
>>>>> > up
>>>>> > > with Maven?
>>>>> > >
>>>>> > > On Tue, Aug 18, 2015 at 1:13 PM, Vinod Kone <vinodk...@apache.org>
>>>>> > wrote:
>>>>> > >
>>>>> > > > I'm getting the following error when running ./support/tag.sh.
>>>>> Has any
>>>>> > of
>>>>> > > > the recent release managers seen this one before?
>>>>> > > >
>>>>> > > > [ERROR] Failed to execute goal
>>>>> > > > org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy
>>>>> > (default-deploy)
>>>>> > > on
>>>>> > > > project mesos: Failed to deploy artifacts: Could not transfer
>>>>> artifact
>>>>> > > > org.apache.mesos:mesos:jar:0.24.0-rc1 from/to
>>>>> apache.releases.https (
>>>>> > > >
>>>>> https://repository.apache.org/service/local/staging/deploy/maven2):
>>>>> > >

Re: Prepping for next release

2015-09-01 Thread Kevin Sweeney
I'm generally in favor of dropping support for JDK6 as it's been
end-of-life for years.

On Tue, Sep 1, 2015 at 4:46 PM, Vinod Kone  wrote:

> +user
>
> So looks like this issue is related to JDK6 and not my maven password
> settings.
>
> Related ASF ticket: https://issues.apache.org/jira/browse/BUILDS-85
>
> The reason it worked for me, when I tagged RC1, was because I also pointed
> my maven to use JDK7.
>
> So we have couple options here:
>
> #1) (Easy) Do same thing with RC2 as we did for RC1. This does mean the
> artifacts we upload to nexus will be compiled with JDK7. IIUC, if any JVM
> based frameworks are still on JDK6 they can't link in the new artifacts?
>
> #2) (Harder) As mentioned in the ticket, have maven compile Mesos jar with
> JDK6 but use JDK7 when uploading. Not sure how easy it is to adapt our
> Mesos build tool chain for this. Anyone has expertise in this area?
>
> Thoughts?
>
>
> On Tue, Aug 18, 2015 at 3:14 PM, Vinod Kone  wrote:
>
>> I re-encrypted the maven passwords and that seemed to have done the
>> trick. Thanks Adam!
>>
>> On Tue, Aug 18, 2015 at 1:59 PM, Adam Bordelon 
>> wrote:
>>
>>> Update your ~/.m2/settings.xml?
>>> Also check that the output of `gpg --list-keys` and `--list-sigs` matches
>>> the keypair you expect
>>>
>>> On Tue, Aug 18, 2015 at 1:48 PM, Vinod Kone 
>>> wrote:
>>>
>>> > I definitely had to create a new gpg key because my previous one
>>> expired! I
>>> > uploaded them id.apache and our SVN repo containing KEYS.
>>> >
>>> > Do I need to do anything specific for maven?
>>> >
>>> > On Tue, Aug 18, 2015 at 1:25 PM, Adam Bordelon 
>>> wrote:
>>> >
>>> > > Haven't seen that one. Are you sure you've got your gpg key properly
>>> set
>>> > up
>>> > > with Maven?
>>> > >
>>> > > On Tue, Aug 18, 2015 at 1:13 PM, Vinod Kone 
>>> > wrote:
>>> > >
>>> > > > I'm getting the following error when running ./support/tag.sh. Has
>>> any
>>> > of
>>> > > > the recent release managers seen this one before?
>>> > > >
>>> > > > [ERROR] Failed to execute goal
>>> > > > org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy
>>> > (default-deploy)
>>> > > on
>>> > > > project mesos: Failed to deploy artifacts: Could not transfer
>>> artifact
>>> > > > org.apache.mesos:mesos:jar:0.24.0-rc1 from/to
>>> apache.releases.https (
>>> > > > https://repository.apache.org/service/local/staging/deploy/maven2
>>> ):
>>> > > > java.lang.RuntimeException: Could not generate DH keypair: Prime
>>> size
>>> > > must
>>> > > > be multiple of 64, and can only range from 512 to 1024 (inclusive)
>>> ->
>>> > > [Help
>>> > > > 1]
>>> > > >
>>> > > > On Mon, Aug 17, 2015 at 11:23 AM, Vinod Kone >> >
>>> > > wrote:
>>> > > >
>>> > > > > Update:
>>> > > > >
>>> > > > > There are 3 outstanding tickets (all related to flaky tests),
>>> that we
>>> > > are
>>> > > > > trying to resolve. Any help fixing those (esp. MESOS-3050
>>> > > > > ) would be
>>> > > > appreciated!
>>> > > > >
>>> > > > > Planning to cut an RC as soon as they are fixed (assuming no new
>>> ones
>>> > > > crop
>>> > > > > up).
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > On Fri, Aug 14, 2015 at 7:50 AM, James DeFelice <
>>> > > > james.defel...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Awesome - thanks so much!
>>> > > > >>
>>> > > > >> On Fri, Aug 14, 2015 at 9:37 AM, Bernd Mathiske <
>>> > be...@mesosphere.io>
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > I just committed it. Thanks, James!
>>> > > > >> >
>>> > > > >> > > On Aug 13, 2015, at 9:53 PM, James DeFelice <
>>> > > > james.defel...@gmail.com
>>> > > > >> >
>>> > > > >> > wrote:
>>> > > > >> > >
>>> > > > >> > > Hi Vinod,
>>> > > > >> > >
>>> > > > >> > > Would *really* like to see
>>> > > > >> > https://issues.apache.org/jira/browse/MESOS-2841
>>> > > > >> > > in 0.24.0. Currently in review.
>>> > > > >> > >
>>> > > > >> > > Any chance that can make it in?
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > > On Wed, Aug 12, 2015 at 1:16 PM, Vinod Kone <
>>> > vinodk...@apache.org
>>> > > >
>>> > > > >> > wrote:
>>> > > > >> > >
>>> > > > >> > >> Removed the target versions for all unresolved tickets
>>> (except
>>> > > for
>>> > > > >> HTTP
>>> > > > >> > >> scheduler API ones) targeted for 0.24.0
>>> > > > >> > >> 
>>> > > > >> > >>
>>> > > > >> > >> Hoping to cut an RC tomorrow.
>>> > > > >> > >>
>>> > > > >> > >> On Wed, Aug 5, 2015 at 11:31 AM, Vinod Kone <
>>> > vinodk...@gmail.com
>>> > > >
>>> > > > >> > wrote:
>>> > > > >> > >>
>>> > > > >> > >>> Hi,
>>> > > > >> > >>>
>>> > > > >> > >>> The tracking ticket for the 0.24.0 release is
>>> > > > >> > >>> https://issues.apache.org/jira/browse/MESOS-2562
>>> > > > >> > >>>
>>> > > > >> > >>> The main feature of this release is going to be v1 (beta)

Re: Error While MESOS Setup

2015-05-29 Thread Kevin Sweeney
Would it make sense for configure to automatically disable Java support if
it can't find a JDK (with an appropriate warning)? That seems more in line
with how other projects' configure scripts work.

On Fri, May 29, 2015 at 12:31 AM, Adam Bordelon a...@mesosphere.io wrote:

 Roshan, do you even have Java installed on your machine? If not, you can
 disable Java support in Mesos by specifying `./configure --disable-java`

 On Thu, May 28, 2015 at 4:42 AM, Alex Rukletsov a...@mesosphere.com
 wrote:

  Roshan,
 
  Let's assume the configure script is right: could you please check the
  JAVA_HOME env var is set correctly in the session where you build Mesos?
 
  On Thu, May 28, 2015 at 5:55 AM, Roshan Bagdiya 
 roshanforyou...@gmail.com
  
  wrote:
 
   Can you please help me to solve this issue
  
   configure: error: failed to determine linker flags for using Java (bad
   JAVA_HOME or missing support for your architecture?)
  
   what steps should i follow to resolve this
  
  
   ​
  
   --
   Thank You
   Roshan Bagdiya
   MITCOE, Pune
  
  
 



Re: Upcoming change to the Scheduler API

2015-02-13 Thread Kevin Sweeney
Regarding the backwards-compatibility concern, would it make sense to add a
TaskStatusID field to the existing TaskStatus message instead of changing
the Scheduler signature?

On Friday, February 13, 2015, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 Hi all,

 As part of https://issues.apache.org/jira/browse/MESOS-2347, there is a
 scalability concern with the reconciliation API. Performing an implicit
 reconciliation results in a status update being sent for each task in the
 cluster. For large clusters in the tens of thousands of slaves, this can be
 begin to approach hundreds of thousands of status updates.

 With the current design of the driver, status updates must be persisted
 before the scheduler returns from the 'statusUpdate' callback, as the
 driver sends an acknowledgement implicitly once the call completes. This
 design forces the scheduler to synchronously process individual status
 updates.

 To remedy the issue, we're looking to introduce the ability to optionally
 specify whether the implicit acknowledgements are provided (during
 construction of the scheduler driver). If disabled, then the scheduler must
 send acknowledgments through a new 'acknowledgeStatusUpdate' call on the
 driver. Having explicit acknowledgements allows schedulers to process them
 asynchronously outside of the driver thread, and allows them to process
 updates in batch (e.g. 1:N storage operation:status updates).

 As part of the change, the underlying UUID of the status update needs to be
 exposed to the scheduler, which requires an update to the signature of
 'statusUpdate'. What this means is that when schedulers include the new
 headers/JAR/egg, they need to adjust their code to accept the new uuid
 argument, regardless of whether implicit acknowledgements are desired (to
 my knowledge, there is no way to expose the uuid without requiring
 schedulers to update their code, because of Java's interface semantics).

 I'd like to get this change landed for 0.22.0 to make reconciliation usable
 for large clusters. The patches are up on MESOS-2347. I've outlined the
 compatibility details and upgrade steps in
 https://reviews.apache.org/r/30978/

 Please share any high level feedback or concerns!

 Ben



-- 
Sent from Gmail Mobile


Re: Review Request 26275: MESOS-444: Remove --checkpoint flag

2014-10-10 Thread Kevin Sweeney


 On Oct. 8, 2014, 5:26 p.m., Vinod Kone wrote:
  Sorry for the delay in committing this. Since we didn't do a proper 
  deprecation, I'm waiting for some of Twitter's clusters to get updated 
  (i.e., removing --checkpoint from slave config files) before landing this 
  on trunk.
  
  Let me know if there is an urgency in landing this patch and we'll figure 
  out how to fast track it or do a proper deprecation.

IIUC, given that this flag has defaulted to the correct value for a while there 
shouldn't need to be a delay to commit this patch. If a particular 
organization's deployment breaks because it's still explicitly specifying this 
flag it'll find out very quickly at start time right? We can just add a note to 
UPDATING that users should remove the flag before upgrading to 0.21.0 from 
0.20.0.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26275/#review55918
---


On Oct. 6, 2014, 11:50 a.m., Cody Maloney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26275/
 ---
 
 (Updated Oct. 6, 2014, 11:50 a.m.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-444
 https://issues.apache.org/jira/browse/MESOS-444
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Checkpointing has been enabled by default in the slave since 0.14, remove the 
 flag now because all slaves should checkpoint.
 
 Removing checkpoint from slaves throughout the codebase will occur in a 
 series of following commits.
 
 
 Diffs
 -
 
   src/slave/flags.hpp 16f0cc2 
 
 Diff: https://reviews.apache.org/r/26275/diff/
 
 
 Testing
 ---
 
 make check on ubuntu 14.04 with gcc.
 
 
 Thanks,
 
 Cody Maloney
 




Re: mesos authorization

2014-09-11 Thread Kevin Sweeney
On Tue, Sep 9, 2014 at 5:04 PM, Jay Buffington m...@jaybuff.com wrote:

 Hi All,

 I've had a few conversations offline with mesos contributors regarding
 authorization and authentication.  I'd like to solicit the larger
 community for
 comments.

 I want to create groups of people and allow those groups to only launch
 tasks
 as certain unix users.  Commonly, this unix user is a service user which
 has a
 1:1 relationship to a group.

 Mesos users are frameworks.  Using the framework authorization features
 that
 were introduced in 0.20.0 frameworks can be authorized to run tasks as
 certain
 unix users.  Mesos delegates the question of what people can launch a task
 as
 what service users to the framework.

I think with the present (0.20.0) implementation of this feature that's not
possible without running a framework per (*nix) user, which doesn't work
well (in the case of Aurora at least) since the scheduler needs to be able
to prioritize tasks across different users.

But it's still a valuable feature in that it lets you isolate frameworks
from each other (no risk my Jenkins framework will launch tasks as my ads
user).


 I don't want to have to trust that two frameworks will enforce a consistent
 view of authorization.  From a security standpoint this transitive trust
 significantly raises the auditing burden.  What happens when one framework
 thinks jaybuff is in the ads group, but the other framework says he is not?

As the system exists today that's either a bug in one of the frameworks or
a general distributed computing problem (maybe one framework saw an LDAP
update adding or removing jaybuff to or from the ads group and the other
hasn't yet).


[jira] [Commented] (MESOS-1458) Restore cpus metric in addition to cpu_*_secs counters in container stats

2014-06-05 Thread Kevin Sweeney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019079#comment-14019079
 ] 

Kevin Sweeney commented on MESOS-1458:
--

First, a car analogy. The accelerometer is usually more useful than the 
odometer, even though you can theoretically compute the reading on the 
accelerometer by sampling the odometer on a tight enough interval. I want the 
snapshot of the dashboard (what I get from hitting the metrics endpoint) to 
contain both an odometer (counter) and an accelerometer (precomputed rate) 
reading.

Back to computers, I do want the ability to use the endpoint as a rudimentary 
{{top}}. Computing the aggregate closer to the source of the data has the 
advantage of being simpler to reason about and lets a snapshot have independent 
meaning. Also, while it is true that a polling interval needs to be exposed, I 
think a hardcoded interval of something like 1sec [1] (or something more 
empirically calculated) would be fine for this use case.

[1] The framework we use in Aurora makes it configurable - 
https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/application/modules/StatsModule.java#L66-L68

 Restore cpus metric in addition to cpu_*_secs counters in container stats
 ---

 Key: MESOS-1458
 URL: https://issues.apache.org/jira/browse/MESOS-1458
 Project: Mesos
  Issue Type: Story
  Components: isolation, slave
Affects Versions: 0.18.0
Reporter: Kevin Sweeney

 While a cpu-seconds per second metric can be computed with 2 samples of 
 cpu_*_secs at different times, it's often convenient to have a single 
 self-contained value, for example if sample collection interval is long, or 
 so that an operator can glance at the cpu usage of a task with a single curl 
 command.
 Add user_cpus, system_cpus, and total_cpus metrics to the available container 
 stats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MESOS-1458) Restore cpus metric in addition to cpu_*_secs counters in container stats

2014-06-04 Thread Kevin Sweeney (JIRA)
Kevin Sweeney created MESOS-1458:


 Summary: Restore cpus metric in addition to cpu_*_secs counters 
in container stats
 Key: MESOS-1458
 URL: https://issues.apache.org/jira/browse/MESOS-1458
 Project: Mesos
  Issue Type: Story
  Components: isolation, slave
Affects Versions: 0.18.0
Reporter: Kevin Sweeney


While a cpu-seconds per second metric can be computed with 2 samples of 
cpu_*_secs at different times, it's often convenient to have a single 
self-contained value, for example if sample collection interval is long, or so 
that an operator can glance at the cpu usage of a task with a single curl 
command.

Add user_cpus, system_cpus, and total_cpus metrics to the available container 
stats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1374) Verify static libprocess scheduler port works with Mesos Master

2014-05-15 Thread Kevin Sweeney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998309#comment-13998309
 ] 

Kevin Sweeney commented on MESOS-1374:
--

It seems to work fine in our vagrant environment:

https://github.com/apache/incubator-aurora/blob/master/examples/vagrant/upstart/aurora-scheduler.conf#L22

 Verify static libprocess scheduler port works with Mesos Master
 ---

 Key: MESOS-1374
 URL: https://issues.apache.org/jira/browse/MESOS-1374
 Project: Mesos
  Issue Type: Task
  Components: framework, master
Reporter: Chris Lambert
Assignee: Dominic Hamon
  Labels: 5
 Fix For: 0.19.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-184) Log has a space leak

2014-05-15 Thread Kevin Sweeney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997998#comment-13997998
 ] 

Kevin Sweeney commented on MESOS-184:
-

Any update on this?

 Log has a space leak
 

 Key: MESOS-184
 URL: https://issues.apache.org/jira/browse/MESOS-184
 Project: Mesos
  Issue Type: Bug
  Components: c++ api, replicated log
Affects Versions: 0.9.0
Reporter: John Sirois
Assignee: Jie Yu
Priority: Minor
  Labels: twitter

 In short, the access pattern of the Log of the underlying LevelDB storage is 
 such that background compactions are ineffective and a long running Log will 
 have a space leak on disk even in the presence of otherwise apparently 
 sufficient Log::Writer::truncate calls.
 It seems the right thing to do is to issue a DB::CompactRange(NULL, 
 Slice(truncateToKey)) after a replica learns a Action::TRUNCATE Record.  The 
 cost here is a synchronous compaction stall on every truncate so maybe this 
 should be a configuration option or even an explicit api.
 ===
 Snip of email explanation:
 I spent some time understanding what was going on here and our use pattern of 
 leveldb does in fact defeat the backround compaction algorithm.
 The docs are here: http://leveldb.googlecode.com/svn/trunk/doc/impl.html in 
 the 'Compactions' section, but in short the gist is compaction operates on an 
 uncompacted file from a level (1 file) + all files overlapping its key range 
 in the next level.  Since we write sequential keys with no randomness at all, 
 by definition the only overlap we ever can get is in level 0 which is the 
 only level that leveldb allows for overlap in sstables in the 1st place.
 That leaves the question of why no compaction on open.  Looking there: 
 http://code.google.com/p/leveldb/source/browse/db/db_impl.cc#1376
 I see a call to MaybeScheduleCompaction, but following that trail, that just 
 leads to 
 http://code.google.com/p/leveldb/source/browse/db/version_set.cc?spec=svnbc1ee4d25e09b04e074db330a41f54ef4af0e31br=36a5f8ed7f9fb3373236d5eace4f5fea369856ee#1156
  which implements the compaction strategy I tried to summarize above, and 
 thus background compactions for out case are limited to level0 - level 1 
 compactions and lefel1 and higher never compact automatically.
 This seems born out by the LOG files.  For example, from smf1-prod - restarts 
 after your manual compaction fix in bold:
 [jsirois@smf1-ajb-35-sr1 ~]$ grep Compacting 
 /var/lib/mesos/scheduler_db/mesos_log/LOG.old 
 2012/04/13-00:24:20.356673 44c1e940 Compacting 3@0 + 4@1 files
 2012/04/13-00:24:20.490113 44c1e940 Compacting 5@1 + 281@2 files
 2012/04/13-00:24:25.824995 44c1e940 Compacting 1@1 + 0@2 files
 2012/04/13-00:24:26.008857 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.196877 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.312465 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.429817 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.533483 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.631044 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.733702 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.832787 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:26.949864 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:27.052502 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:27.164623 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:27.275621 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:27.376748 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:27.477728 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:27.611332 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:28.050275 44c1e940 Compacting 50@2 + 242@3 files
 2012/04/13-00:24:32.455665 44c1e940 Compacting 1@2 + 0@3 files
 2012/04/13-00:24:32.538566 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:32.819205 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:33.052064 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:33.198850 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:33.350893 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:33.521784 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:33.693531 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:33.847151 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:34.034277 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:34.225582 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:34.390228 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:34.554127 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:34.715242 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:34.852110 44c1e940 Compacting 1@3 + 0@4 files
 2012/04/13-00:24:35.046899 44c1e940 Compacting 68@3 + 331@4 files
 2012/04/13-00:25:02.582758 44c1e940 Compacting 433@3

[jira] [Created] (MESOS-949) slave should wipe meta directory on startup if bootid changes

2014-01-27 Thread Kevin Sweeney (JIRA)
Kevin Sweeney created MESOS-949:
---

 Summary: slave should wipe meta directory on startup if bootid 
changes
 Key: MESOS-949
 URL: https://issues.apache.org/jira/browse/MESOS-949
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Kevin Sweeney


Right now, if slave metadata is persisted across a reboot the slave is left 
with useless metadata. Slave should detect this case and purge the metadata, 
perhaps by using bootid.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MESOS-946) setup.py should be able to create publishable sdist

2014-01-24 Thread Kevin Sweeney (JIRA)
Kevin Sweeney created MESOS-946:
---

 Summary: setup.py should be able to create publishable sdist
 Key: MESOS-946
 URL: https://issues.apache.org/jira/browse/MESOS-946
 Project: Mesos
  Issue Type: Bug
  Components: python api
Reporter: Kevin Sweeney


Since setup.py uses the build root for compiler flags the sdist produced by 
python src/python/setup.py sdist can't be used outside the machine that built 
mesos, uploaded to PyPI, or used with a local PyPI mirror. Aurora's Python 
executor should be installable by `pip` and depend only on the presence of 
mesos headers and shared libraries, but since the produced sdist is unusable 
this currently isn't possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MESOS-684) Automate publishing of artifacts with release candidates and releases

2013-12-27 Thread Kevin Sweeney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857662#comment-13857662
 ] 

Kevin Sweeney commented on MESOS-684:
-

It looks like the sdist created by python setup.py sdist currently hardcodes 
the buildroot of the mesos source tree. Using an sdist I created with

{noformat} 
% ./bootstrap
% ./configure
% make
% make install
% cd src
% python python/setup.py sdist
% cp python/dist/mesos-0.16.0.tar.gz ~
{noformat}

The build fails with an error message
{noformat}
% tar zxvf mesos-0.16.0.tar.gz
% cd mesos-0.16.0
% python setup.py install
Traceback (most recent call last):
  File setup.py, line 58, in module
for file in os.listdir(os.path.join(abs_top_srcdir, src_python_native))
OSError: [Errno 2] No such file or directory: 
'/Users/ksweeney/workspace/mesos/src/python/native'
{noformat}

In theory as long as {{make install}} (or the OS package manager) installed 
appropriate header files and a library that works with {{-lmesos}} it shouldn't 
be necessary to know about the path that the {{make dist}} tarball was 
originally extracted to in order to compile the Python bindings. I suspect the 
issue right now is that the Python egg is built before the libraries and 
headers are installed at the expense of a {{setup.py}} that can't be used to 
create a pip-compatible sdist.

 Automate publishing of artifacts with release candidates and releases
 -

 Key: MESOS-684
 URL: https://issues.apache.org/jira/browse/MESOS-684
 Project: Mesos
  Issue Type: Sub-task
Reporter: Vinod Kone
 Fix For: 0.16.0


 Currently, when we tag a release candidate, we never upload the relevant 
 artifacts (e.g., mesos jar) for the community to be able to test it. It would 
 be great if we can have a script to automate this.
 Since we also support Python, we could (should?) also publish a mesos egg to 
 somewhere in Apache?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MESOS-715) Allow framework followers to read log

2013-10-01 Thread Kevin Sweeney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783376#comment-13783376
 ] 

Kevin Sweeney commented on MESOS-715:
-

We've observed very long failover times O(5-10min) in large production Aurora 
clusters that could be significantly mitigated with this feature. While we've 
been able to make data-structure optimizations to reduce this and can still do 
more work in this area, there's still a near-linear processing delay during 
scheduler startup and this work could be performed. As long as followers can 
see log entries in order we can tolerate being arbitrarily far behind; this 
feature request is only a performance optimization.

 Allow framework followers to read log
 -

 Key: MESOS-715
 URL: https://issues.apache.org/jira/browse/MESOS-715
 Project: Mesos
  Issue Type: Improvement
Reporter: Joe Smith

 If a framework has a leader election, the newly elected leader needs to 
 re-play the log of task transitions to build up the state of the world. It 
 would help frameworks failover much faster if followers were able to read the 
 log to keep themselves primed to take over.



--
This message was sent by Atlassian JIRA
(v6.1#6144)