Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)

2014-01-19 Thread Reynold Xin
+1


On Sat, Jan 18, 2014 at 11:11 PM, Patrick Wendell pwend...@gmail.comwrote:

 I'll kick of the voting with a +1.

 On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.9.0.
 
  A draft of the release notes along with the changes file is attached
  to this e-mail.
 
  The tag to be voted on is v0.9.0-incubating (commit 00c847a):
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=00c847af1d4be2fe5fad887a57857eead1e517dc
 
  The release files, including signatures, digests, etc can be found at:
  http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1003/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 0.9.0-incubating!
 
  The vote is open until Wednesday, January 22, at 07:05 UTC
  and passes if a majority of at least 3 +1 PPMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.9.0-incubating
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.incubator.apache.org/



Re: Config properties broken in master

2014-01-19 Thread Mridul Muralidharan
Chanced upon spill related config which exhibit same pattern ...

- Mridul

On Sun, Jan 19, 2014 at 1:10 AM, Reynold Xin r...@databricks.com wrote:
 I also just went over the config options to see how pervasive this is. In
 addition to speculation, there is one more conflict of this kind:

 spark.locality.wait
 spark.locality.wait.node
 spark.locality.wait.process
 spark.locality.wait.rack


 spark.speculation
 spark.speculation.interval
 spark.speculation.multiplier
 spark.speculation.quantile


 On Sat, Jan 18, 2014 at 11:36 AM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:

 This is definitely an important issue to fix. Instead of renaming
 properties, one solution would be to replace Typesafe Config with just
 reading Java system properties, and disable config files for this release.
 I kind of like that over renaming.

 Matei

 On Jan 18, 2014, at 11:30 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Hi,
 
   Speculation was an example, there are others in spark which are
  affected by this ...
  Some of them have been around for a while, so will break existing
 code/scripts.
 
  Regards,
  Mridul
 
  On Sun, Jan 19, 2014 at 12:51 AM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
  change spark.speculation to spark.speculation.switch?
 
  maybe we can restrict that all properties in Spark should be three
 levels
 
 
  On Sat, Jan 18, 2014 at 2:10 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
  Hi,
 
   Unless I am mistaken, the change to using typesafe ConfigFactory has
  broken some of the system properties we use in spark.
 
  For example: if we have both
  -Dspark.speculation=true -Dspark.speculation.multiplier=0.95
  set, then the spark.speculation property is dropped.
 
  The rules of parseProperty actually document this clearly [1]
 
 
  I am not sure what the right fix here would be (other than replacing
  use of config that is).
 
  Any thoughts ?
  I would vote -1 for 0.9 to be released before this is fixed.
 
 
  Regards,
  Mridul
 
 
  [1]
 
 http://typesafehub.github.io/config/latest/api/com/typesafe/config/ConfigFactory.html#parseProperties%28java.util.Properties,%20com.typesafe.config.ConfigParseOptions%29
 




Re: Config properties broken in master

2014-01-19 Thread Patrick Wendell
Hey Mridul this was patched and we cut a new release candidate. There
were several different config options which had a.b and a.b.c... they
should all work in the new RC.

On Sun, Jan 19, 2014 at 4:56 AM, Mridul Muralidharan mri...@gmail.com wrote:
 Chanced upon spill related config which exhibit same pattern ...

 - Mridul

 On Sun, Jan 19, 2014 at 1:10 AM, Reynold Xin r...@databricks.com wrote:
 I also just went over the config options to see how pervasive this is. In
 addition to speculation, there is one more conflict of this kind:

 spark.locality.wait
 spark.locality.wait.node
 spark.locality.wait.process
 spark.locality.wait.rack


 spark.speculation
 spark.speculation.interval
 spark.speculation.multiplier
 spark.speculation.quantile


 On Sat, Jan 18, 2014 at 11:36 AM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:

 This is definitely an important issue to fix. Instead of renaming
 properties, one solution would be to replace Typesafe Config with just
 reading Java system properties, and disable config files for this release.
 I kind of like that over renaming.

 Matei

 On Jan 18, 2014, at 11:30 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Hi,
 
   Speculation was an example, there are others in spark which are
  affected by this ...
  Some of them have been around for a while, so will break existing
 code/scripts.
 
  Regards,
  Mridul
 
  On Sun, Jan 19, 2014 at 12:51 AM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
  change spark.speculation to spark.speculation.switch?
 
  maybe we can restrict that all properties in Spark should be three
 levels
 
 
  On Sat, Jan 18, 2014 at 2:10 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
  Hi,
 
   Unless I am mistaken, the change to using typesafe ConfigFactory has
  broken some of the system properties we use in spark.
 
  For example: if we have both
  -Dspark.speculation=true -Dspark.speculation.multiplier=0.95
  set, then the spark.speculation property is dropped.
 
  The rules of parseProperty actually document this clearly [1]
 
 
  I am not sure what the right fix here would be (other than replacing
  use of config that is).
 
  Any thoughts ?
  I would vote -1 for 0.9 to be released before this is fixed.
 
 
  Regards,
  Mridul
 
 
  [1]
 
 http://typesafehub.github.io/config/latest/api/com/typesafe/config/ConfigFactory.html#parseProperties%28java.util.Properties,%20com.typesafe.config.ConfigParseOptions%29
 




Re: Config properties broken in master

2014-01-19 Thread Mridul Muralidharan
Oh great, just saw the PR from Matei ... for some odd reason, the dev
mails are coming to be horribly delayed.


Thanks,
Mridul

On Sun, Jan 19, 2014 at 10:35 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Mridul this was patched and we cut a new release candidate. There
 were several different config options which had a.b and a.b.c... they
 should all work in the new RC.

 On Sun, Jan 19, 2014 at 4:56 AM, Mridul Muralidharan mri...@gmail.com wrote:
 Chanced upon spill related config which exhibit same pattern ...

 - Mridul

 On Sun, Jan 19, 2014 at 1:10 AM, Reynold Xin r...@databricks.com wrote:
 I also just went over the config options to see how pervasive this is. In
 addition to speculation, there is one more conflict of this kind:

 spark.locality.wait
 spark.locality.wait.node
 spark.locality.wait.process
 spark.locality.wait.rack


 spark.speculation
 spark.speculation.interval
 spark.speculation.multiplier
 spark.speculation.quantile


 On Sat, Jan 18, 2014 at 11:36 AM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:

 This is definitely an important issue to fix. Instead of renaming
 properties, one solution would be to replace Typesafe Config with just
 reading Java system properties, and disable config files for this release.
 I kind of like that over renaming.

 Matei

 On Jan 18, 2014, at 11:30 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Hi,
 
   Speculation was an example, there are others in spark which are
  affected by this ...
  Some of them have been around for a while, so will break existing
 code/scripts.
 
  Regards,
  Mridul
 
  On Sun, Jan 19, 2014 at 12:51 AM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
  change spark.speculation to spark.speculation.switch?
 
  maybe we can restrict that all properties in Spark should be three
 levels
 
 
  On Sat, Jan 18, 2014 at 2:10 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
  Hi,
 
   Unless I am mistaken, the change to using typesafe ConfigFactory has
  broken some of the system properties we use in spark.
 
  For example: if we have both
  -Dspark.speculation=true -Dspark.speculation.multiplier=0.95
  set, then the spark.speculation property is dropped.
 
  The rules of parseProperty actually document this clearly [1]
 
 
  I am not sure what the right fix here would be (other than replacing
  use of config that is).
 
  Any thoughts ?
  I would vote -1 for 0.9 to be released before this is fixed.
 
 
  Regards,
  Mridul
 
 
  [1]
 
 http://typesafehub.github.io/config/latest/api/com/typesafe/config/ConfigFactory.html#parseProperties%28java.util.Properties,%20com.typesafe.config.ConfigParseOptions%29
 




Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)

2014-01-19 Thread Patrick Wendell
This vote is cancelled in favor of rc3 - which fixes the YARN issue
Sandy ran into.

@taka - thanks for reporting that bug. It's not enough to block this
release however. Once a fix exists we can merge it into the 0.9 branch
and it will be in 0.9.1

On Sun, Jan 19, 2014 at 12:37 PM, Taka Shinagawa taka.epsi...@gmail.com wrote:
 I've found a problem with the cartesian method on Pyspark and filed
 as SPARK-1034
 https://spark-project.atlassian.net/browse/SPARK-1034

 0.8.1 doesn't have this problem. On Scala, cartesian method works fine.

 It's also nice if SPARK-978 can be fixed, too.
 https://spark-project.atlassian.net/browse/SPARK-978

 Thanks,
 Taka


 On Sun, Jan 19, 2014 at 1:24 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Has anybody tested against YARN 2.2?  I tried it out against a
 pseudo-distributed cluster and ran into an issue I just filed as
 SPARK-1031https://spark-project.atlassian.net/browse/SPARK-1031
 .

 thanks,
 Sandy


 On Sun, Jan 19, 2014 at 12:55 AM, Reynold Xin r...@databricks.com wrote:

  +1
 
 
  On Sat, Jan 18, 2014 at 11:11 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   I'll kick of the voting with a +1.
  
   On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell pwend...@gmail.com
   wrote:
Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.9.0.
   
A draft of the release notes along with the changes file is attached
to this e-mail.
   
The tag to be voted on is v0.9.0-incubating (commit 00c847a):
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=00c847af1d4be2fe5fad887a57857eead1e517dc
   
The release files, including signatures, digests, etc can be found
 at:
http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
   
  https://repository.apache.org/content/repositories/orgapachespark-1003/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc2-docs/
   
Please vote on releasing this package as Apache Spark
 0.9.0-incubating!
   
The vote is open until Wednesday, January 22, at 07:05 UTC
and passes if a majority of at least 3 +1 PPMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 0.9.0-incubating
[ ] -1 Do not release this package because ...
   
To learn more about Apache Spark, please see
http://spark.incubator.apache.org/
  
 



Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-19 Thread Patrick Wendell
Attempting to attach the release notes again (I think it may have been
blocked previously due to not having an extension).

On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 I'll add my +1 as well

 On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 +1

 Re-tested on Mac.

 Matei

 On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Starting off.
 +1


 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit a7760eff):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1004/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 22:15 UTC and passes
 if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/


Spark 0.9.0 is a major release that adds significant new features. It updates 
Spark to Scala 2.10, simplifies high availability, and updates numerous 
components of the project. This release includes a first version of GraphX, a 
powerful new framework for graph processing that comes with a library of 
standard algorithms. In addition, Spark Streaming is now out of alpha, and 
includes significant optimizations and simplified high availability deployment.

### Scala 2.10 Support

Spark now runs on Scala 2.10, letting users benefit from the language and 
library improvements in this version.

### Configuration System

The new [SparkConf] class is now the preferred way to configure advanced 
settings on your SparkContext, though the previous Java system property still 
works. SparkConf is especially useful in tests to make sure properties don’t 
stay set across tests.

### Spark Streaming Improvements

Spark Streaming is no longer alpha, and comes with simplified high availability 
and several optimizations.

* When running on a Spark standalone cluster with the [standalone cluster high 
availability mode], you can submit a Spark Streaming driver application to the 
cluster and have it automatically recovered if either the driver or the cluster 
master crashes.
* Windowed operators have been sped up by 30-50%.
* Spark Streaming’s input source plugins (e.g. for Twitter, Kafka and Flume) 
are now separate projects, making it easier to pull in only the dependencies 
you need.
* A new StreamingListener interface has been added for monitoring statistics 
about the streaming computation.
* A few aspects of the API have been improved:
* `DStream` and `PairDStream` classes have been moved from 
`org.apache.spark.streaming` to `org.apache.spark.streaming.dstream` to keep it 
consistent with `org.apache.spark.rdd.RDD`.
* `DStream.foreach` - `DStream.foreachRDD` to make it explicit that it works 
for every RDD, not every element
* `StreamingContext.awaitTermination()` allows you wait for context shutdown 
and catch any exception that occurs in the streaming computation.
*`StreamingContext.stop()` now allows stopping of StreamingContext without 
stopping the underlying SparkContext.

### GraphX Alpha

GraphX is a new API for graph processing that uses recent advances in 
graph-parallel computation. It lets you build a graph within a Spark program 
using the standard Spark operators, then process it with new graph operators 
that are optimized for distributed computation. It includes basic 
transformations, a Pregel API for iterative computation, and a standard library 
of graph loaders and analytics algorithms. By offering these features within 
the Spark engine, GraphX can significantly speed up processing tasks compared 
to workflows that use different engines.

GraphX features in this release include:

* Building graphs from arbitrary Spark RDDs
* Basic operations to transform graphs or extract subgraphs
* An optimized Pregel API that takes advantage of graph partitioning and 
indexing
* Standard algorithms including PageRank, connected components, strongly 
connected components, SVD++, and triangle counting
* Interactive use from the Spark shell

GraphX 

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-19 Thread Henry Saputra
Hi Patrick, quick question, where are you planning to add the release notes?
I dont think it is part of the source, is it?

- Henry

On Sun, Jan 19, 2014 at 8:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Attempting to attach the release notes again (I think it may have been
 blocked previously due to not having an extension).

 On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 I'll add my +1 as well

 On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 +1

 Re-tested on Mac.

 Matei

 On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Starting off.
 +1


 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit a7760eff):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1004/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 22:15 UTC and passes
 if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/




Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-19 Thread Patrick Wendell
Eventually the notes get posted on the apache website. I attached them
to this e-mail so that people can get a sense of what is in the
release before they vote on it.

On Sun, Jan 19, 2014 at 9:57 PM, Henry Saputra henry.sapu...@gmail.com wrote:
 Hi Patrick, quick question, where are you planning to add the release notes?
 I dont think it is part of the source, is it?

 - Henry

 On Sun, Jan 19, 2014 at 8:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Attempting to attach the release notes again (I think it may have been
 blocked previously due to not having an extension).

 On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 I'll add my +1 as well

 On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 +1

 Re-tested on Mac.

 Matei

 On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Starting off.
 +1


 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit a7760eff):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1004/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 22:15 UTC and passes
 if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/




Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-19 Thread Henry Saputra
Ah yes, makes sense, thanks!

- Henry

On Sun, Jan 19, 2014 at 10:01 PM, Patrick Wendell pwend...@gmail.com wrote:
 Eventually the notes get posted on the apache website. I attached them
 to this e-mail so that people can get a sense of what is in the
 release before they vote on it.

 On Sun, Jan 19, 2014 at 9:57 PM, Henry Saputra henry.sapu...@gmail.com 
 wrote:
 Hi Patrick, quick question, where are you planning to add the release notes?
 I dont think it is part of the source, is it?

 - Henry

 On Sun, Jan 19, 2014 at 8:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Attempting to attach the release notes again (I think it may have been
 blocked previously due to not having an extension).

 On Sun, Jan 19, 2014 at 8:05 PM, Patrick Wendell pwend...@gmail.com wrote:
 I'll add my +1 as well

 On Sun, Jan 19, 2014 at 7:33 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 +1

 Re-tested on Mac.

 Matei

 On Jan 19, 2014, at 7:09 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:

 Starting off.
 +1


 On Sun, Jan 19, 2014 at 2:15 PM, Patrick Wendell pwend...@gmail.com 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 (incubating) version 0.9.0.

 A draft of the release notes along with the changes file is attached
 to this e-mail.

 The tag to be voted on is v0.9.0-incubating (commit a7760eff):

 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=a7760eff4ea6a474cab68896a88550f63bae8b0d

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1004/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.0-incubating!

 The vote is open until Wednesday, January 22, at 22:15 UTC and passes
 if a majority of at least 3 +1 PPMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.0-incubating
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.incubator.apache.org/