Re: Mentors: question about Apache GitHub mirrors

2013-09-16 Thread Roman Shaposhnik
Hi!

Update -- turns out all of my connection to GitHub left :-(
(I guess its the valley -- people move around quickly).

The best advice I got is to pop up on #github and
ask away hoping that some of the GH folks would
be there.

Sorry -- that's all I've got.

Thanks,
Roman.

On Wed, Sep 11, 2013 at 9:57 PM, Mattmann, Chris A (398J)
 wrote:
> Thanks Roman..
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Roman Shaposhnik 
> Reply-To: "dev@spark.incubator.apache.org" 
> Date: Wednesday, September 11, 2013 8:44 PM
> To: "dev@spark.incubator.apache.org" 
> Subject: Re: Mentors: question about Apache GitHub mirrors
>
>>On Wed, Sep 11, 2013 at 6:54 PM, Matei Zaharia 
>>wrote:
>>> I'm not sure where to ask this, but here goes: since we'll receive pull
>>>requests on https://github.com/apache/incubator-spark, is there any way
>>>to subscribe the dev@ list (or some other list) to the GitHub emails
>>>from those discussions? I think that would be useful, and similar to
>>>subscribing it to JIRA. My main question is who manages the mirroring
>>>(i.e. the github.com/apache account), since they'd have to configure
>>>this.
>>
>>This is being managed by GitHub folks. I can ping
>>my connection to them to see if what you're asking
>>for is feasible. Stay tuned.
>>
>>Thanks,
>>Roman.
>


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Andy Konwinski
Patrick, I took a quick look over your release_auditor.py script and it's
really great!

Then I ran it (had to add "--keyserver pgp.mit.edu" to the gpg command) and
everything passed on OS X!

Great job and +1 from me whenever you resolve the kafka jar issue you
mentioned.

Andy


On Mon, Sep 16, 2013 at 8:37 PM, Matei Zaharia wrote:

> FWIW, I tested it otherwise and it seems good modulo this issue.
>
> Matei
>
> On Sep 16, 2013, at 6:39 PM, Patrick Wendell  wrote:
>
> > Hey folks, just FYI we found one minor issue with this RC (the kafka
> > jar in the stream pom needs to be published as "provided" since it's
> > not available in maven). Please still continue to test this and
> > provide feedback here until the following RC is posted later.
> >
> > - Patrick
> >
> > On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin 
> wrote:
> >> +1
> >>
> >>
> >> --
> >> Reynold Xin, AMPLab, UC Berkeley
> >> http://rxin.org
> >>
> >>
> >>
> >> On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell  >wrote:
> >>
> >>> I also wrote an audit script [1] to verify various aspects of the
> >>> release binaries and ran it on this RC. People are welcome to run this
> >>> themselves, but I haven't tested it on other machines yet, and some of
> >>> the Spark tests are very sensitive to the test environment :) Output
> >>> is pasted below:
> >>>
> >>> [1]
> https://github.com/pwendell/spark-utils/blob/master/release_auditor.py
> >>>
> >>> -
> >>>  Verifying download integrity for artifact:
> >>> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
> >>> [PASSED] Artifact signature verified.
> >>> [PASSED] Artifact MD5 verified.
> >>> [PASSED] Artifact SHA verified.
> >>> [PASSED] Tarball contains CHANGES.txt file
> >>> [PASSED] Tarball contains NOTICE file
> >>> [PASSED] Tarball contains LICENSE file
> >>> [PASSED] README file contains disclaimer
> >>>  Verifying download integrity for artifact:
> >>> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
> >>> [PASSED] Artifact signature verified.
> >>> [PASSED] Artifact MD5 verified.
> >>> [PASSED] Artifact SHA verified.
> >>> [PASSED] Tarball contains CHANGES.txt file
> >>> [PASSED] Tarball contains NOTICE file
> >>> [PASSED] Tarball contains LICENSE file
> >>> [PASSED] README file contains disclaimer
> >>>  Verifying download integrity for artifact:
> >>> spark-0.8.0-incubating-rc5.tgz 
> >>> [PASSED] Artifact signature verified.
> >>> [PASSED] Artifact MD5 verified.
> >>> [PASSED] Artifact SHA verified.
> >>> [PASSED] Tarball contains CHANGES.txt file
> >>> [PASSED] Tarball contains NOTICE file
> >>> [PASSED] Tarball contains LICENSE file
> >>> [PASSED] README file contains disclaimer
> >>>  Verifying build and tests for artifact:
> >>> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
> >>> ==> Running build
> >>> [PASSED] sbt build successful
> >>> [PASSED] Maven build successful
> >>> ==> Performing unit tests
> >>> [PASSED] Tests successful
> >>>  Verifying build and tests for artifact:
> >>> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
> >>> ==> Running build
> >>> [PASSED] sbt build successful
> >>> [PASSED] Maven build successful
> >>> ==> Performing unit tests
> >>> [PASSED] Tests successful
> >>>  Verifying build and tests for artifact:
> >>> spark-0.8.0-incubating-rc5.tgz 
> >>> ==> Running build
> >>> [PASSED] sbt build successful
> >>> [PASSED] Maven build successful
> >>> ==> Performing unit tests
> >>> [PASSED] Tests successful
> >>>
> >>> - Patrick
> >>>
> >>> On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell 
> >>> wrote:
>  Please vote on releasing the following candidate as Apache Spark
>  (incubating) version 0.8.0. This will be the first incubator release
> for
>  Spark in Apache.
> 
>  The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
> 
> https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
> 
>  The release files, including signatures, digests, etc can be found at:
>  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/
> 
>  Release artifacts are signed with the following key:
>  https://people.apache.org/keys/committer/pwendell.asc
> 
>  The staging repository for this release can be found at:
> 
> >>>
> https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/
> 
>  The documentation corresponding to this release can be found at:
>  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/
> 
>  Please vote on releasing this package as Apache Spark
> 0.8.0-incubating!
>  The vote is open until Thursday, September 19th at 05:00 UTC and
> passes
> >>> if
>  a majority of at least 3 +1 IPMC votes are cast.
> 
>  [ ] +1 Release this package as Apache Spark 0.8.0-incubating
>  [ ] -1 Do not release this package because ...
> 
>  To learn more about Apache Spark, please see
>  http://spark.incubator.apach

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Matei Zaharia
FWIW, I tested it otherwise and it seems good modulo this issue.

Matei

On Sep 16, 2013, at 6:39 PM, Patrick Wendell  wrote:

> Hey folks, just FYI we found one minor issue with this RC (the kafka
> jar in the stream pom needs to be published as "provided" since it's
> not available in maven). Please still continue to test this and
> provide feedback here until the following RC is posted later.
> 
> - Patrick
> 
> On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin  wrote:
>> +1
>> 
>> 
>> --
>> Reynold Xin, AMPLab, UC Berkeley
>> http://rxin.org
>> 
>> 
>> 
>> On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell wrote:
>> 
>>> I also wrote an audit script [1] to verify various aspects of the
>>> release binaries and ran it on this RC. People are welcome to run this
>>> themselves, but I haven't tested it on other machines yet, and some of
>>> the Spark tests are very sensitive to the test environment :) Output
>>> is pasted below:
>>> 
>>> [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py
>>> 
>>> -
>>>  Verifying download integrity for artifact:
>>> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
>>> [PASSED] Artifact signature verified.
>>> [PASSED] Artifact MD5 verified.
>>> [PASSED] Artifact SHA verified.
>>> [PASSED] Tarball contains CHANGES.txt file
>>> [PASSED] Tarball contains NOTICE file
>>> [PASSED] Tarball contains LICENSE file
>>> [PASSED] README file contains disclaimer
>>>  Verifying download integrity for artifact:
>>> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
>>> [PASSED] Artifact signature verified.
>>> [PASSED] Artifact MD5 verified.
>>> [PASSED] Artifact SHA verified.
>>> [PASSED] Tarball contains CHANGES.txt file
>>> [PASSED] Tarball contains NOTICE file
>>> [PASSED] Tarball contains LICENSE file
>>> [PASSED] README file contains disclaimer
>>>  Verifying download integrity for artifact:
>>> spark-0.8.0-incubating-rc5.tgz 
>>> [PASSED] Artifact signature verified.
>>> [PASSED] Artifact MD5 verified.
>>> [PASSED] Artifact SHA verified.
>>> [PASSED] Tarball contains CHANGES.txt file
>>> [PASSED] Tarball contains NOTICE file
>>> [PASSED] Tarball contains LICENSE file
>>> [PASSED] README file contains disclaimer
>>>  Verifying build and tests for artifact:
>>> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
>>> ==> Running build
>>> [PASSED] sbt build successful
>>> [PASSED] Maven build successful
>>> ==> Performing unit tests
>>> [PASSED] Tests successful
>>>  Verifying build and tests for artifact:
>>> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
>>> ==> Running build
>>> [PASSED] sbt build successful
>>> [PASSED] Maven build successful
>>> ==> Performing unit tests
>>> [PASSED] Tests successful
>>>  Verifying build and tests for artifact:
>>> spark-0.8.0-incubating-rc5.tgz 
>>> ==> Running build
>>> [PASSED] sbt build successful
>>> [PASSED] Maven build successful
>>> ==> Performing unit tests
>>> [PASSED] Tests successful
>>> 
>>> - Patrick
>>> 
>>> On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell 
>>> wrote:
 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.8.0. This will be the first incubator release for
 Spark in Apache.
 
 The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
 https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
 
 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/
 
 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc
 
 The staging repository for this release can be found at:
 
>>> https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/
 
 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/
 
 Please vote on releasing this package as Apache Spark 0.8.0-incubating!
 The vote is open until Thursday, September 19th at 05:00 UTC and passes
>>> if
 a majority of at least 3 +1 IPMC votes are cast.
 
 [ ] +1 Release this package as Apache Spark 0.8.0-incubating
 [ ] -1 Do not release this package because ...
 
 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/
>>> 



Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Patrick Wendell
Hey folks, just FYI we found one minor issue with this RC (the kafka
jar in the stream pom needs to be published as "provided" since it's
not available in maven). Please still continue to test this and
provide feedback here until the following RC is posted later.

- Patrick

On Mon, Sep 16, 2013 at 1:28 PM, Reynold Xin  wrote:
> +1
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell wrote:
>
>> I also wrote an audit script [1] to verify various aspects of the
>> release binaries and ran it on this RC. People are welcome to run this
>> themselves, but I haven't tested it on other machines yet, and some of
>> the Spark tests are very sensitive to the test environment :) Output
>> is pasted below:
>>
>> [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py
>>
>> -
>>  Verifying download integrity for artifact:
>> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
>> [PASSED] Artifact signature verified.
>> [PASSED] Artifact MD5 verified.
>> [PASSED] Artifact SHA verified.
>> [PASSED] Tarball contains CHANGES.txt file
>> [PASSED] Tarball contains NOTICE file
>> [PASSED] Tarball contains LICENSE file
>> [PASSED] README file contains disclaimer
>>  Verifying download integrity for artifact:
>> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
>> [PASSED] Artifact signature verified.
>> [PASSED] Artifact MD5 verified.
>> [PASSED] Artifact SHA verified.
>> [PASSED] Tarball contains CHANGES.txt file
>> [PASSED] Tarball contains NOTICE file
>> [PASSED] Tarball contains LICENSE file
>> [PASSED] README file contains disclaimer
>>  Verifying download integrity for artifact:
>> spark-0.8.0-incubating-rc5.tgz 
>> [PASSED] Artifact signature verified.
>> [PASSED] Artifact MD5 verified.
>> [PASSED] Artifact SHA verified.
>> [PASSED] Tarball contains CHANGES.txt file
>> [PASSED] Tarball contains NOTICE file
>> [PASSED] Tarball contains LICENSE file
>> [PASSED] README file contains disclaimer
>>  Verifying build and tests for artifact:
>> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
>> ==> Running build
>> [PASSED] sbt build successful
>> [PASSED] Maven build successful
>> ==> Performing unit tests
>> [PASSED] Tests successful
>>  Verifying build and tests for artifact:
>> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
>> ==> Running build
>> [PASSED] sbt build successful
>> [PASSED] Maven build successful
>> ==> Performing unit tests
>> [PASSED] Tests successful
>>  Verifying build and tests for artifact:
>> spark-0.8.0-incubating-rc5.tgz 
>> ==> Running build
>> [PASSED] sbt build successful
>> [PASSED] Maven build successful
>> ==> Performing unit tests
>> [PASSED] Tests successful
>>
>> - Patrick
>>
>> On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell 
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark
>> > (incubating) version 0.8.0. This will be the first incubator release for
>> > Spark in Apache.
>> >
>> > The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
>> > https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
>> >
>> > The release files, including signatures, digests, etc can be found at:
>> > http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> >
>> https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/
>> >
>> > Please vote on releasing this package as Apache Spark 0.8.0-incubating!
>> > The vote is open until Thursday, September 19th at 05:00 UTC and passes
>> if
>> > a majority of at least 3 +1 IPMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 0.8.0-incubating
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see
>> > http://spark.incubator.apache.org/
>>


Re: Propose to Re-organize the scripts and configurations

2013-09-16 Thread shannie.huang
I like the idea of using Typesafe Config. 

Nick, we'd be glad to work with you after we gathered enough opinions and come 
to a consensus of the approach. 

On 2013-9-16, at 16:52, Nick Pentreath  wrote:

> There was another discussion on the old dev list about this:
> https://groups.google.com/forum/#!msg/spark-developers/GL2_DwAeh5s/9rwQ3iDa2t4J
> 
> I tend to agree with having configuration sitting in JSON (or properties
> files) and using the Typesafe Config library which can parse both.
> 
> Something I've used in my apps is along these lines:
> https://gist.github.com/MLnick/6578146
> 
> It's then easy to have default config overridden with CLI for example:
> val conf = cliConf.withFallback(defaultConf)
> 
> I'd be happy to be involved in working on this if there is a consensus
> about best approach
> 
> N
> 
> 
> 
> 
> 
> On Mon, Sep 16, 2013 at 9:29 AM, Mike  wrote:
> 
>> Shane Huang wrote:
>>> we found the current organization of the scripts and configuration a
>>> bit confusing and inconvenient
>> 
>> ditto
>> 
>>> - Scripts
>> 
>> I wonder why the work of these scripts wasn't mostly done in Scala.
>> Seems roundabout to use Bash (or Python, in spark-perf) to calculate
>> shell environment variables that are then read back into Scala code.
>> 
>>> 1. Define a Configuration class which contains all the options
>>> available for Spark application. A Configuration instance can be
>>> de-/serialized from/to a json formatted file.
>>> 2. Each application (SparkContext) has one Configuration instance and
>>> it is initialized by the application which creates it (either read
>>> from file or passed from command line options or env SPARK_JAVA_OPTS).
>> 
>> Reminiscent of what Hibernate's been doing for the past decade.  Would
>> be nice if the Configuration was also exposed through an MBean or such
>> so that one can check it's values with certainty.
>> 


Re: Propose to Re-organize the scripts and configurations

2013-09-16 Thread shannie.huang
Yeah, I tend to agree that either executable or not, these common utility 
scripts may not need to be exposed to end user in sbin and bin folders. But it 
seems we must still make some of these scripts executable as they are not only 
called in other scripts, but also called in scala source code.


On 2013-9-17, at 8:35, Mike  wrote:

> Shane Huang wrote:
>> - low-level or internally used utility scripts, i.e. 
>> compute-classpath, spark-config, spark-class, spark-executor
> 
> I'd like to see script broken out into shell functions in a common file 
> that gets "."-included in every script, where that makes sense.  
> Specifically, I gather that compute-classpath.sh isn't run except as a 
> subroutine, so no need to promote it as an executable.


Re: Propose to Re-organize the scripts and configurations

2013-09-16 Thread Mike
Shane Huang wrote:
> - low-level or internally used utility scripts, i.e. 
> compute-classpath, spark-config, spark-class, spark-executor

I'd like to see script broken out into shell functions in a common file 
that gets "."-included in every script, where that makes sense.  
Specifically, I gather that compute-classpath.sh isn't run except as a 
subroutine, so no need to promote it as an executable.


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Reynold Xin
+1


--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell wrote:

> I also wrote an audit script [1] to verify various aspects of the
> release binaries and ran it on this RC. People are welcome to run this
> themselves, but I haven't tested it on other machines yet, and some of
> the Spark tests are very sensitive to the test environment :) Output
> is pasted below:
>
> [1] https://github.com/pwendell/spark-utils/blob/master/release_auditor.py
>
> -
>  Verifying download integrity for artifact:
> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
> [PASSED] Artifact signature verified.
> [PASSED] Artifact MD5 verified.
> [PASSED] Artifact SHA verified.
> [PASSED] Tarball contains CHANGES.txt file
> [PASSED] Tarball contains NOTICE file
> [PASSED] Tarball contains LICENSE file
> [PASSED] README file contains disclaimer
>  Verifying download integrity for artifact:
> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
> [PASSED] Artifact signature verified.
> [PASSED] Artifact MD5 verified.
> [PASSED] Artifact SHA verified.
> [PASSED] Tarball contains CHANGES.txt file
> [PASSED] Tarball contains NOTICE file
> [PASSED] Tarball contains LICENSE file
> [PASSED] README file contains disclaimer
>  Verifying download integrity for artifact:
> spark-0.8.0-incubating-rc5.tgz 
> [PASSED] Artifact signature verified.
> [PASSED] Artifact MD5 verified.
> [PASSED] Artifact SHA verified.
> [PASSED] Tarball contains CHANGES.txt file
> [PASSED] Tarball contains NOTICE file
> [PASSED] Tarball contains LICENSE file
> [PASSED] README file contains disclaimer
>  Verifying build and tests for artifact:
> spark-0.8.0-incubating-bin-cdh4-rc5.tgz 
> ==> Running build
> [PASSED] sbt build successful
> [PASSED] Maven build successful
> ==> Performing unit tests
> [PASSED] Tests successful
>  Verifying build and tests for artifact:
> spark-0.8.0-incubating-bin-hadoop1-rc5.tgz 
> ==> Running build
> [PASSED] sbt build successful
> [PASSED] Maven build successful
> ==> Performing unit tests
> [PASSED] Tests successful
>  Verifying build and tests for artifact:
> spark-0.8.0-incubating-rc5.tgz 
> ==> Running build
> [PASSED] sbt build successful
> [PASSED] Maven build successful
> ==> Performing unit tests
> [PASSED] Tests successful
>
> - Patrick
>
> On Sun, Sep 15, 2013 at 9:48 PM, Patrick Wendell 
> wrote:
> > Please vote on releasing the following candidate as Apache Spark
> > (incubating) version 0.8.0. This will be the first incubator release for
> > Spark in Apache.
> >
> > The tag to be voted on is v0.8.0-incubating (commit d9e80d5):
> > https://github.com/apache/incubator-spark/releases/tag/v0.8.0-incubating
> >
> > The release files, including signatures, digests, etc can be found at:
> > http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/files/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachespark-051/org/apache/spark/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc5/docs/
> >
> > Please vote on releasing this package as Apache Spark 0.8.0-incubating!
> > The vote is open until Thursday, September 19th at 05:00 UTC and passes
> if
> > a majority of at least 3 +1 IPMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 0.8.0-incubating
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> > http://spark.incubator.apache.org/
>


Re: git commit: Hard code scala version in pom files.

2013-09-16 Thread Henry Saputra
Agree with this. POM file reflects the state of the project at a
particular moment so it is metadata of the project but should be
treated as part of source code.


- Henry

On Sun, Sep 15, 2013 at 9:30 PM, Konstantin Boudnik  wrote:
> Guys,
>
> actually, POM files are artifacts metadata. Hence, IMO it should be treated as
> a part of the source code. Simply because POM file should be a reflection on
> what is going into the source code at the moment.
>
> Makefiles are fun ;), but it would be an interesting experiment to say the
> least.
>
> Cos
>
> On Sun, Sep 15, 2013 at 07:19PM, Jey Kottalam wrote:
>> I'm overall in favor of treating POM files as some kind of disgusting
>> object code that our build process just has to generate in some way,
>> instead of treating POM files as source code itself. An analogue would
>> be that we don't write machine code by hand, but instead use a
>> high-level language that is comparatively sane combined with a
>> compiler that accounts for all the bizarre quirks and details specific
>> to getting a working x86/ELF executable.
>>
>> Maybe we should write our POM files in a macro language, and have a
>> Makefile that configures and generates the actual POM files seen by
>> Maven? This would allow us to both be in full compliance with Maven's
>> demands yet wrest back control of the build system.
>>
>> However: for the 0.8.0 release, I support just hardcoding the scala
>> version in the POM files that we ship to the Maven repositories, and
>> revisiting this later. I think Patrick is right that Maven is warning
>> about these issues for some legitimate reason that may only be
>> encountered by downstream users, so we should proactively address the
>> warnings unless we're certain they can be ignored. We could revert
>> commit a1e7e519 on master as soon as the final 0.8.0 build is shipped.
>>
>> -Jey
>>
>> On Sun, Sep 15, 2013 at 6:46 PM, Mark Hamstra  
>> wrote:
>> > Ah sorry, I've gotten so used to using ClearStory's poms (where we make
>> > quite a lot of use of such parameterization) that I lost track of exactly
>> > when Spark's maven build was changed to work in a similar way.
>> >
>> > This all revolves around a basic difference of opinion as to whether the
>> > thing that specifies how a project is built should be a fixed, static
>> > document or is more of a program itself or a parameterized function that
>> > drives the build and results in an artifact.  SBT is of the latter opinion,
>> > while Maven (at least with Maven 3) is going the other way.  That means
>> > that building idiomatic Scala artifacts (which expect things like
>> > cross-versioning support and artifactIds that include the Scala binary
>> > version that was used to create them) is somewhat at odds with the Maven
>> > philosophy.  Hard-coding artifactIds, versions, and whatever else Maven now
>> > requires to guarantee that a pom file be a fixed, repeatable build
>> > description works okay for a single build of an artifact; and a user of
>> > just that built artifact won't have to change behavior if the pom is no
>> > longer parameterized.  However, users who are not just interested in using
>> > pre-built artifacts but also in modifying, adding to or reusing the code do
>> > have to change their behavior if parameterized Maven builds disappear (yes,
>> > you have pointed out the state of affairs with the 0.6 and 0.7 releases;
>> > I'll point out that some of those making further use of the code have been
>> > using the current, not-yet-released poms for a good while.)
>> >
>> > Without some form of parameterized Maven builds, developers who now rely
>> > upon such parameterized builds will have to choose to fork the Apache poms
>> > and maintain their own parameterized build, or to repeatedly and manually
>> > edit static Apache pom files in order to change artifactIds and dependency
>> > versions (which is a frequent need when integrating Spark into a much
>> > larger and more complicated technology stack), or to switch over to using
>> > SBT in order to get parameterized builds (which, of course, would
>> > necessitate a lot of other changes, not all of them welcome.)  Archetypes
>> > or something similar seems like a way to satisfy Maven's new requirement
>> > for static build configurations while at the same time providing a
>> > parameterized way to generate that configuration or a modified version of
>> > it -- solving the problem by adding a layer of abstraction.
>> >
>> >
>> > On Sun, Sep 15, 2013 at 6:12 PM, Patrick Wendell  
>> > wrote:
>> >
>> >> Hey Mark,
>> >>
>> >> Could you describe a user whose behavior is changed by this, and how
>> >> it is changed? This commit actually brings 0.8 in line with the 0.7
>> >> and 0.6 branches, where the scala version is hard coded in the
>> >> released artifacts:
>> >>
>> >>
>> >> http://repo1.maven.org/maven2/org/spark-project/spark-streaming_2.9.3/0.7.3/spark-streaming_2.9.3-0.7.3.pom
>> >>
>> >> That seems to me to minimize the changes in user b

Re: Propose to Re-organize the scripts and configurations

2013-09-16 Thread Mike
> I wonder why the work of these scripts wasn't mostly done in Scala.  

After some sleep, I guess the answer's obvious: to set the "java" 
command line.


Re: git commit: Hard code scala version in pom files.

2013-09-16 Thread Gary Struthers
Maven provides 2 ways for users to customize properties in pom files. There is 
a Resource Filtering option which is disabled by default, when it's turned on 
Maven searches src/main/resources for overriding properties in .properties and 
xml files. There is also a Production Profile option that also overrides 
default properties in the pom. Typically, the pom has the developer default 
properties and the profile reads production deployment properties from 
.properties and xml files in src/main/resources.

http://books.sonatype.com/mvnref-book/reference/resource-filtering-sect-description.html

If you post the warnings, I'll look into them.

Gary Struthers


On Sep 15, 2013, at 7:19 PM, Jey Kottalam  wrote:

> I'm overall in favor of treating POM files as some kind of disgusting
> object code that our build process just has to generate in some way,
> instead of treating POM files as source code itself. An analogue would
> be that we don't write machine code by hand, but instead use a
> high-level language that is comparatively sane combined with a
> compiler that accounts for all the bizarre quirks and details specific
> to getting a working x86/ELF executable.
> 
> Maybe we should write our POM files in a macro language, and have a
> Makefile that configures and generates the actual POM files seen by
> Maven? This would allow us to both be in full compliance with Maven's
> demands yet wrest back control of the build system.
> 
> However: for the 0.8.0 release, I support just hardcoding the scala
> version in the POM files that we ship to the Maven repositories, and
> revisiting this later. I think Patrick is right that Maven is warning
> about these issues for some legitimate reason that may only be
> encountered by downstream users, so we should proactively address the
> warnings unless we're certain they can be ignored. We could revert
> commit a1e7e519 on master as soon as the final 0.8.0 build is shipped.
> 
> -Jey
> 
> On Sun, Sep 15, 2013 at 6:46 PM, Mark Hamstra  wrote:
>> Ah sorry, I've gotten so used to using ClearStory's poms (where we make
>> quite a lot of use of such parameterization) that I lost track of exactly
>> when Spark's maven build was changed to work in a similar way.
>> 
>> This all revolves around a basic difference of opinion as to whether the
>> thing that specifies how a project is built should be a fixed, static
>> document or is more of a program itself or a parameterized function that
>> drives the build and results in an artifact.  SBT is of the latter opinion,
>> while Maven (at least with Maven 3) is going the other way.  That means
>> that building idiomatic Scala artifacts (which expect things like
>> cross-versioning support and artifactIds that include the Scala binary
>> version that was used to create them) is somewhat at odds with the Maven
>> philosophy.  Hard-coding artifactIds, versions, and whatever else Maven now
>> requires to guarantee that a pom file be a fixed, repeatable build
>> description works okay for a single build of an artifact; and a user of
>> just that built artifact won't have to change behavior if the pom is no
>> longer parameterized.  However, users who are not just interested in using
>> pre-built artifacts but also in modifying, adding to or reusing the code do
>> have to change their behavior if parameterized Maven builds disappear (yes,
>> you have pointed out the state of affairs with the 0.6 and 0.7 releases;
>> I'll point out that some of those making further use of the code have been
>> using the current, not-yet-released poms for a good while.)
>> 
>> Without some form of parameterized Maven builds, developers who now rely
>> upon such parameterized builds will have to choose to fork the Apache poms
>> and maintain their own parameterized build, or to repeatedly and manually
>> edit static Apache pom files in order to change artifactIds and dependency
>> versions (which is a frequent need when integrating Spark into a much
>> larger and more complicated technology stack), or to switch over to using
>> SBT in order to get parameterized builds (which, of course, would
>> necessitate a lot of other changes, not all of them welcome.)  Archetypes
>> or something similar seems like a way to satisfy Maven's new requirement
>> for static build configurations while at the same time providing a
>> parameterized way to generate that configuration or a modified version of
>> it -- solving the problem by adding a layer of abstraction.
>> 
>> 
>> On Sun, Sep 15, 2013 at 6:12 PM, Patrick Wendell  wrote:
>> 
>>> Hey Mark,
>>> 
>>> Could you describe a user whose behavior is changed by this, and how
>>> it is changed? This commit actually brings 0.8 in line with the 0.7
>>> and 0.6 branches, where the scala version is hard coded in the
>>> released artifacts:
>>> 
>>> 
>>> http://repo1.maven.org/maven2/org/spark-project/spark-streaming_2.9.3/0.7.3/spark-streaming_2.9.3-0.7.3.pom
>>> 
>>> That seems to me to minimize the changes in user behavior as 

Re: Propose to Re-organize the scripts and configurations

2013-09-16 Thread Nick Pentreath
There was another discussion on the old dev list about this:
https://groups.google.com/forum/#!msg/spark-developers/GL2_DwAeh5s/9rwQ3iDa2t4J

I tend to agree with having configuration sitting in JSON (or properties
files) and using the Typesafe Config library which can parse both.

Something I've used in my apps is along these lines:
https://gist.github.com/MLnick/6578146

It's then easy to have default config overridden with CLI for example:
val conf = cliConf.withFallback(defaultConf)

I'd be happy to be involved in working on this if there is a consensus
about best approach

N





On Mon, Sep 16, 2013 at 9:29 AM, Mike  wrote:

> Shane Huang wrote:
> > we found the current organization of the scripts and configuration a
> > bit confusing and inconvenient
>
> ditto
>
> > - Scripts
>
> I wonder why the work of these scripts wasn't mostly done in Scala.
> Seems roundabout to use Bash (or Python, in spark-perf) to calculate
> shell environment variables that are then read back into Scala code.
>
> > 1. Define a Configuration class which contains all the options
> > available for Spark application. A Configuration instance can be
> > de-/serialized from/to a json formatted file.
> > 2. Each application (SparkContext) has one Configuration instance and
> > it is initialized by the application which creates it (either read
> > from file or passed from command line options or env SPARK_JAVA_OPTS).
>
> Reminiscent of what Hibernate's been doing for the past decade.  Would
> be nice if the Configuration was also exposed through an MBean or such
> so that one can check it's values with certainty.
>


Re: Propose to Re-organize the scripts and configurations

2013-09-16 Thread Mike
Shane Huang wrote:
> we found the current organization of the scripts and configuration a 
> bit confusing and inconvenient

ditto

> - Scripts

I wonder why the work of these scripts wasn't mostly done in Scala.  
Seems roundabout to use Bash (or Python, in spark-perf) to calculate 
shell environment variables that are then read back into Scala code.

> 1. Define a Configuration class which contains all the options 
> available for Spark application. A Configuration instance can be 
> de-/serialized from/to a json formatted file.
> 2. Each application (SparkContext) has one Configuration instance and 
> it is initialized by the application which creates it (either read 
> from file or passed from command line options or env SPARK_JAVA_OPTS).

Reminiscent of what Hibernate's been doing for the past decade.  Would 
be nice if the Configuration was also exposed through an MBean or such 
so that one can check it's values with certainty.