Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Henry Saputra
HI Sandy,

Just curious if the Vote is for rc5 or rc6? Gmail shows me that you
replied to the rc5 thread.

Thanks,

- Henry

On Wed, May 14, 2014 at 1:28 PM, Sandy Ryza  wrote:
> +1 (non-binding)
>
> * Built the release from source.
> * Compiled Java and Scala apps that interact with HDFS against it.
> * Ran them in local mode.
> * Ran them against a pseudo-distributed YARN cluster in both yarn-client
> mode and yarn-cluster mode.
>
>
> On Tue, May 13, 2014 at 9:09 PM, witgo  wrote:
>
>> You need to set:
>> spark.akka.frameSize 5
>> spark.default.parallelism1
>>
>>
>>
>>
>>
>> -- Original --
>> From:  "Madhu";;
>> Date:  Wed, May 14, 2014 09:15 AM
>> To:  "dev";
>>
>> Subject:  Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
>>
>>
>>
>> I just built rc5 on Windows 7 and tried to reproduce the problem described
>> in
>>
>> https://issues.apache.org/jira/browse/SPARK-1712
>>
>> It works on my machine:
>>
>> 14/05/13 21:06:47 INFO DAGScheduler: Stage 1 (sum at :17) finished
>> in 4.548 s
>> 14/05/13 21:06:47 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
>> have all completed, from pool
>> 14/05/13 21:06:47 INFO SparkContext: Job finished: sum at :17,
>> took
>> 4.814991993 s
>> res1: Double = 5.05E11
>>
>> I used all defaults, no config files were changed.
>> Not sure if that makes a difference...
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-0-0-rc5-tp6542p6560.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> .


can RDD be shared across mutil spark applications?

2014-05-16 Thread qingyang li



Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
I'll start the voting with a +1.

On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.0.0!
>
> This patch has minor documentation changes and fixes on top of rc6.
>
> The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc7/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1015
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/
>
> Please vote on releasing this package as Apache Spark 1.0.0!
>
> The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == API Changes ==
> We welcome users to compile Spark applications against 1.0. There are
> a few API changes in this release. Here are links to the associated
> upgrade guides - user facing changes have been kept as small as
> possible.
>
> changes to ML vector specification:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>
> changes to the Java API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>
> changes to the streaming API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>
> changes to the GraphX API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>
> coGroup and related functions now return Iterable[T] instead of Seq[T]
> ==> Call toSeq on the result to restore the old behavior
>
> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> ==> Call toSeq on the result to restore old behavior


(test)

2014-05-16 Thread Andrew Or
Apache has been having some problems lately. Do you guys see this message?


Re: mllib vector templates

2014-05-16 Thread Xiangrui Meng
I submitted a PR for standardizing the text format for vectors and
labeled data: https://github.com/apache/spark/pull/685

Once it gets merged, saveAsTextFile and loading should be consistent.
I didn't choose LibSVM as the default format because two reasons:

1) It doesn't contain feature dimension info in the record. We need to
scan the dataset to get that info.
2) It saves index:value tuples. Putting indices together can help data
compression. Same for value if there are many binary features.

Best,
Xiangrui

On Wed, May 7, 2014 at 10:25 PM, Debasish Das  wrote:
> Hi,
>
> I see ALS is still using Array[Int] but for other mllib algorithm we moved
> to Vector[Double] so that it can support either dense and sparse formats...
>
> ALS can stay in Array[Int] due to the Netflix format for input datasets
> which is well defined but it helps if we move ALS to Vector[Double] as
> well...that way all algorithms will be consistent...
>
> The second issue is that toString on SparseVector does not write libsvm
> format but something not very generic...can we change the
> SparseVector.toString to write as libsvm output ? I am dumping a sample of
> dataset to see how mllib glm compares with the glmnet-R package for QoR...
>
> Thanks.
> Deb
>
> On Mon, May 5, 2014 at 4:05 PM, David Hall  wrote:
>>
>>> On Mon, May 5, 2014 at 3:40 PM, DB Tsai  wrote:
>>>
>>> > David,
>>> >
>>> > Could we use Int, Long, Float as the data feature spaces, and Double for
>>> > optimizer?
>>> >
>>>
>>> Yes. Breeze doesn't allow operations on mixed types, so you'd need to
>>> convert the double vectors to Floats if you wanted, e.g. dot product with
>>> the weights vector.
>>>
>>> You might also be interested in FeatureVector, which is just a wrapper
>>> around Array[Int] that emulates an indicator vector. It supports dot
>>> products, axpy, etc.
>>>
>>> -- David
>>>
>>>
>>> >
>>> >
>>> > Sincerely,
>>> >
>>> > DB Tsai
>>> > ---
>>> > My Blog: https://www.dbtsai.com
>>> > LinkedIn: https://www.linkedin.com/in/dbtsai
>>> >
>>> >
>>> > On Mon, May 5, 2014 at 3:06 PM, David Hall 
>>> wrote:
>>> >
>>> > > Lbfgs and other optimizers would not work immediately, as they require
>>> > > vector spaces over double. Otherwise it should work.
>>> > > On May 5, 2014 3:03 PM, "DB Tsai"  wrote:
>>> > >
>>> > > > Breeze could take any type (Int, Long, Double, and Float) in the
>>> matrix
>>> > > > template.
>>> > > >
>>> > > >
>>> > > > Sincerely,
>>> > > >
>>> > > > DB Tsai
>>> > > > ---
>>> > > > My Blog: https://www.dbtsai.com
>>> > > > LinkedIn: https://www.linkedin.com/in/dbtsai
>>> > > >
>>> > > >
>>> > > > On Mon, May 5, 2014 at 2:56 PM, Debasish Das <
>>> debasish.da...@gmail.com
>>> > > > >wrote:
>>> > > >
>>> > > > > Is this a breeze issue or breeze can take templates on float /
>>> > double ?
>>> > > > >
>>> > > > > If breeze can take templates then it is a minor fix for
>>> Vectors.scala
>>> > > > right
>>> > > > > ?
>>> > > > >
>>> > > > > Thanks.
>>> > > > > Deb
>>> > > > >
>>> > > > >
>>> > > > > On Mon, May 5, 2014 at 2:45 PM, DB Tsai 
>>> wrote:
>>> > > > >
>>> > > > > > +1  Would be nice that we can use different type in Vector.
>>> > > > > >
>>> > > > > >
>>> > > > > > Sincerely,
>>> > > > > >
>>> > > > > > DB Tsai
>>> > > > > > ---
>>> > > > > > My Blog: https://www.dbtsai.com
>>> > > > > > LinkedIn: https://www.linkedin.com/in/dbtsai
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, May 5, 2014 at 2:41 PM, Debasish Das <
>>> > > debasish.da...@gmail.com
>>> > > > > > >wrote:
>>> > > > > >
>>> > > > > > > Hi,
>>> > > > > > >
>>> > > > > > > Why mllib vector is using double as default ?
>>> > > > > > >
>>> > > > > > > /**
>>> > > > > > >
>>> > > > > > >  * Represents a numeric vector, whose index type is Int and
>>> value
>>> > > > type
>>> > > > > is
>>> > > > > > > Double.
>>> > > > > > >
>>> > > > > > >  */
>>> > > > > > >
>>> > > > > > > trait Vector extends Serializable {
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >   /**
>>> > > > > > >
>>> > > > > > >* Size of the vector.
>>> > > > > > >
>>> > > > > > >*/
>>> > > > > > >
>>> > > > > > >   def size: Int
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >   /**
>>> > > > > > >
>>> > > > > > >* Converts the instance to a double array.
>>> > > > > > >
>>> > > > > > >*/
>>> > > > > > >
>>> > > > > > >   def toArray: Array[Double]
>>> > > > > > >
>>> > > > > > > Don't we need a template on float/double ? This will give us
>>> > memory
>>> > > > > > > savings...
>>> > > > > > >
>>> > > > > > > Thanks.
>>> > > > > > >
>>> > > > > > > Deb
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>


[RESULT][VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Patrick Wendell
This vote is cancelled in favor of rc7.

On Wed, May 14, 2014 at 1:02 PM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.0.0!
>
> This patch has a few minor fixes on top of rc5. I've also built the
> binary artifacts with Hive support enabled so people can test this
> configuration. When we release 1.0 we might just release both vanilla
> and Hive-enabled binaries.
>
> The tag to be voted on is v1.0.0-rc6 (commit 54133a):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=54133abdce0246f6643a1112a5204afb2c4caa82
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc6/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachestratos-1011
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc6-docs/
>
> Please vote on releasing this package as Apache Spark 1.0.0!
>
> The vote is open until Saturday, May 17, at 20:58 UTC and passes if
> amajority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == API Changes ==
> We welcome users to compile Spark applications against 1.0. There are
> a few API changes in this release. Here are links to the associated
> upgrade guides - user facing changes have been kept as small as
> possible.
>
> changes to ML vector specification:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>
> changes to the Java API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>
> changes to the streaming API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>
> changes to the GraphX API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>
> coGroup and related functions now return Iterable[T] instead of Seq[T]
> ==> Call toSeq on the result to restore the old behavior
>
> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> ==> Call toSeq on the result to restore old behavior


Re: (test)

2014-05-16 Thread Ted Yu
Yes.


On Thu, May 15, 2014 at 10:34 AM, Andrew Or  wrote:

> Apache has been having some problems lately. Do you guys see this message?
>


Re: (test)

2014-05-16 Thread DB Tsai
Yes.
On May 16, 2014 8:39 AM, "Andrew Or"  wrote:

> Apache has been having some problems lately. Do you guys see this message?
>


Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Henry Saputra
Hi Patrick,

Just want to make sure that VOTE for rc6 also cancelled?


Thanks,

Henry

On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell  wrote:
> I'll start the voting with a +1.
>
> On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell  wrote:
>> Please vote on releasing the following candidate as Apache Spark version 
>> 1.0.0!
>>
>> This patch has minor documentation changes and fixes on top of rc6.
>>
>> The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc7/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1015
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.0.0!
>>
>> The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.0.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> == API Changes ==
>> We welcome users to compile Spark applications against 1.0. There are
>> a few API changes in this release. Here are links to the associated
>> upgrade guides - user facing changes have been kept as small as
>> possible.
>>
>> changes to ML vector specification:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>>
>> changes to the Java API:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>>
>> changes to the streaming API:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>>
>> changes to the GraphX API:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>>
>> coGroup and related functions now return Iterable[T] instead of Seq[T]
>> ==> Call toSeq on the result to restore the old behavior
>>
>> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
>> ==> Call toSeq on the result to restore old behavior


Re: (test)

2014-05-16 Thread Nishkam Ravi
Yes.


On Fri, May 16, 2014 at 8:40 AM, DB Tsai  wrote:

> Yes.
> On May 16, 2014 8:39 AM, "Andrew Or"  wrote:
>
> > Apache has been having some problems lately. Do you guys see this
> message?
> >
>


Scala examples for Spark do not work as written in documentation

2014-05-16 Thread GlennStrycker
On the webpage http://spark.apache.org/examples.html, there is an example
written as

val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
).reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

This does not execute in Spark, which gives me an error:
:2: error: illegal start of simple expression
 val x = Math.random()
 ^

If I rewrite the query slightly, adding in {}, it works:

val count = spark.parallelize(1 to 1).map(i =>
   {
   val x = Math.random()
   val y = Math.random()
   if (x*x + y*y < 1) 1 else 0
   }
).reduce(_ + _)
println("Pi is roughly " + 4.0 * count / 1.0)





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Aaron Davidson
It was, but due to the apache infra issues, some may not have received the
email yet...

On Fri, May 16, 2014 at 10:48 AM, Henry Saputra wrote:

> Hi Patrick,
>
> Just want to make sure that VOTE for rc6 also cancelled?
>
>
> Thanks,
>
> Henry
>
> On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell 
> wrote:
> > I'll start the voting with a +1.
> >
> > On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell 
> wrote:
> >> Please vote on releasing the following candidate as Apache Spark
> version 1.0.0!
> >>
> >> This patch has minor documentation changes and fixes on top of rc6.
> >>
> >> The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
> >>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7/
> >>
> >> Release artifacts are signed with the following key:
> >> https://people.apache.org/keys/committer/pwendell.asc
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1015
> >>
> >> The documentation corresponding to this release can be found at:
> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/
> >>
> >> Please vote on releasing this package as Apache Spark 1.0.0!
> >>
> >> The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
> >> majority of at least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Spark 1.0.0
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see
> >> http://spark.apache.org/
> >>
> >> == API Changes ==
> >> We welcome users to compile Spark applications against 1.0. There are
> >> a few API changes in this release. Here are links to the associated
> >> upgrade guides - user facing changes have been kept as small as
> >> possible.
> >>
> >> changes to ML vector specification:
> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
> >>
> >> changes to the Java API:
> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
> >>
> >> changes to the streaming API:
> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
> >>
> >> changes to the GraphX API:
> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
> >>
> >> coGroup and related functions now return Iterable[T] instead of Seq[T]
> >> ==> Call toSeq on the result to restore the old behavior
> >>
> >> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> >> ==> Call toSeq on the result to restore old behavior
>


Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Patrick Wendell
Hey Everyone,

Just a heads up - I've sent other release candidates to the list, but
they appear to be getting swallowed (i.e. they are not on nabble). I
think there is an issue with Apache mail servers.

I'm going to keep trying... if you get duplicate e-mails I apologize in advance.

On Thu, May 15, 2014 at 10:23 AM, Patrick Wendell  wrote:
> Thanks for your feedback. Since it's not a regression, it won't block
> the release.
>
> On Wed, May 14, 2014 at 12:17 AM, witgo  wrote:
>> SPARK-1817 will cause users to get incorrect results  and RDD.zip is common 
>> usage .
>> This should be the highest priority. I think we should fix the bug,and 
>> should also test the previous release
>> -- Original --
>> From:  "Patrick Wendell";;
>> Date:  Wed, May 14, 2014 03:02 PM
>> To:  "dev@spark.apache.org";
>>
>> Subject:  Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
>>
>>
>>
>> Hey @witgo - those bugs are not severe enough to block the release,
>> but it would be nice to get them fixed.
>>
>> At this point we are focused on severe bugs with an immediate fix, or
>> regressions from previous versions of Spark. Anything that misses this
>> release will get merged into the branch-1.0 branch and make it into
>> the 1.0.1 release, so people will have access to it.
>>
>> On Tue, May 13, 2014 at 5:32 PM, witgo  wrote:
>>> -1
>>> The following bug should be fixed:
>>> https://issues.apache.org/jira/browse/SPARK-1817
>>> https://issues.apache.org/jira/browse/SPARK-1712
>>>
>>>
>>> -- Original --
>>> From:  "Patrick Wendell";;
>>> Date:  Wed, May 14, 2014 04:07 AM
>>> To:  "dev@spark.apache.org";
>>>
>>> Subject:  Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
>>>
>>>
>>>
>>> Hey all - there were some earlier RC's that were not presented to the
>>> dev list because issues were found with them. Also, there seems to be
>>> some issues with the reliability of the dev list e-mail. Just a heads
>>> up.
>>>
>>> I'll lead with a +1 for this.
>>>
>>> On Tue, May 13, 2014 at 8:07 AM, Nan Zhu  wrote:
 just curious, where is rc4 VOTE?

 I searched my gmail but didn't find that?




 On Tue, May 13, 2014 at 9:49 AM, Sean Owen  wrote:

> On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell 
> wrote:
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-1.0.0-rc5/
>
> Good news is that the sigs, MD5 and SHA are all correct.
>
> Tiny note: the Maven artifacts use SHA1, while the binary artifacts
> use SHA512, which took me a bit of head-scratching to figure out.
>
> If another RC comes out, I might suggest making it SHA1 everywhere?
> But there is nothing wrong with these signatures and checksums.
>
> Now to look at the contents...
>
>>> .
>> .


Re: (test)

2014-05-16 Thread Reynold Xin
I didn't see the original message, but only a reply.


On Fri, May 16, 2014 at 10:38 AM, Nishkam Ravi  wrote:

> Yes.
>
>
> On Fri, May 16, 2014 at 8:40 AM, DB Tsai  wrote:
>
> > Yes.
> > On May 16, 2014 8:39 AM, "Andrew Or"  wrote:
> >
> > > Apache has been having some problems lately. Do you guys see this
> > message?
> > >
> >
>


Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread Reynold Xin
Thanks for pointing it out. We should update the website to fix the code.

val count = spark.parallelize(1 to NUM_SAMPLES).map { i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)



On Fri, May 16, 2014 at 9:41 AM, GlennStrycker wrote:

> On the webpage http://spark.apache.org/examples.html, there is an example
> written as
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>
> This does not execute in Spark, which gives me an error:
> :2: error: illegal start of simple expression
>  val x = Math.random()
>  ^
>
> If I rewrite the query slightly, adding in {}, it works:
>
> val count = spark.parallelize(1 to 1).map(i =>
>{
>val x = Math.random()
>val y = Math.random()
>if (x*x + y*y < 1) 1 else 0
>}
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / 1.0)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>


Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread Mark Hamstra
Actually, the better way to write the multi-line closure would be:

val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)


On Fri, May 16, 2014 at 9:41 AM, GlennStrycker wrote:

> On the webpage http://spark.apache.org/examples.html, there is an example
> written as
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>
> This does not execute in Spark, which gives me an error:
> :2: error: illegal start of simple expression
>  val x = Math.random()
>  ^
>
> If I rewrite the query slightly, adding in {}, it works:
>
> val count = spark.parallelize(1 to 1).map(i =>
>{
>val x = Math.random()
>val y = Math.random()
>if (x*x + y*y < 1) 1 else 0
>}
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / 1.0)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>


reduce only removes duplicates, cannot be arbitrary function

2014-05-16 Thread GlennStrycker
I am attempting to write a mapreduce job on a graph object to take an edge
list and return a new edge list.  Unfortunately I find that the current
function is

def reduce(f: (T, T) => T): T

not

def reduce(f: (T1, T2) => T3): T


I see this because the following 2 commands give different results for the
final number, which should be the same (tempMappedRDD is a MappedRDD of the
form (Edge,1), and I found the the A and B here are (1,4) and (7,3) )

tempMappedRDD.reduce( (A,B) => (Edge(A._1.srcId, A._1.dstId,
A._1.dstId.toInt), 1) )  // (Edge(1,4,4),1)
tempMappedRDD.reduce( (A,B) => (Edge(A._1.srcId, B._1.dstId,
A._1.dstId.toInt), 1) )  // (Edge(1,3,3),1)

why is the 3rd digit above a '3' in the second line, and not a '4'?  Does it
have something to do with toInt?

the really weird thing is that it is only for A, since the following
commands work correctly:

tempMappedRDD.reduce( (A,B) => (Edge(B._1.srcId, B._1.dstId,
B._1.dstId.toInt), 1) )  // (Edge(7,3,3),1)
tempMappedRDD.reduce( (A,B) => (Edge(B._1.srcId, A._1.dstId,
B._1.dstId.toInt), 1) )  // (Edge(7,4,3),1)




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-tp6606.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread Mark Hamstra
Sorry, looks like an extra line got inserted in there.  One more try:

val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)



On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra wrote:

> Actually, the better way to write the multi-line closure would be:
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> }.reduce(_ + _)
>
>
> On Fri, May 16, 2014 at 9:41 AM, GlennStrycker 
> wrote:
>
>> On the webpage http://spark.apache.org/examples.html, there is an example
>> written as
>>
>> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>>   val x = Math.random()
>>   val y = Math.random()
>>   if (x*x + y*y < 1) 1 else 0
>> ).reduce(_ + _)
>> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>>
>> This does not execute in Spark, which gives me an error:
>> :2: error: illegal start of simple expression
>>  val x = Math.random()
>>  ^
>>
>> If I rewrite the query slightly, adding in {}, it works:
>>
>> val count = spark.parallelize(1 to 1).map(i =>
>>{
>>val x = Math.random()
>>val y = Math.random()
>>if (x*x + y*y < 1) 1 else 0
>>}
>> ).reduce(_ + _)
>> println("Pi is roughly " + 4.0 * count / 1.0)
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>
>


Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread GlennStrycker
Why does the reduce function only work on sums of keys of the same type and
does not support other functional forms?

I am having trouble in another example where instead of 1s and 0s, the
output of the map function is something like A=(1,2) and B=(3,4).  I need a
reduce function that can return something complicated based on reduce( (A,B)
=> (arbitrary fcn1 of A and B, arbitrary fcn2 of A and B) ), but I am only
getting reduce( (A,B) => (arbitrary fcn1 of A, arbitrary fcn2 of A) ).

See
http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-td6606.html




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593p6607.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


[VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Patrick Wendell
[Due to ASF e-mail outage, I'm not if anyone will actually receive this.]

Please vote on releasing the following candidate as Apache Spark version 1.0.0!
This has only minor changes on top of rc7.

The tag to be voted on is v1.0.0-rc8 (commit 80eea0f):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=80eea0f111c06260ffaa780d2f3f7facd09c17bc

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc8/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1016/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/

Please vote on releasing this package as Apache Spark 1.0.0!

The vote is open until Monday, May 19, at 10:15 UTC and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== API Changes ==
We welcome users to compile Spark applications against 1.0. There are
a few API changes in this release. Here are links to the associated
upgrade guides - user facing changes have been kept as small as
possible.

changes to ML vector specification:
http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/mllib-guide.html#from-09-to-10

changes to the Java API:
http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

changes to the streaming API:
http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x

changes to the GraphX API:
http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091

coGroup and related functions now return Iterable[T] instead of Seq[T]
==> Call toSeq on the result to restore the old behavior

SparkContext.jarOfClass returns Option[String] instead of Seq[String]
==> Call toSeq on the result to restore old behavior


Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Mark Hamstra
Sorry for the duplication, but I think this is the current VOTE candidate
-- we're not voting on rc8 yet?

+1, but just barely.  We've got quite a number of outstanding bugs
identified, and many of them have fixes in progress.  I'd hate to see those
efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 --
in other words, I'd like to see 1.0.1 retain a high priority relative to
1.1.0.

Looking through the unresolved JIRAs, it doesn't look like any of the
identified bugs are show-stoppers or strictly regressions (although I will
note that one that I have in progress, SPARK-1749, is a bug that we
introduced with recent work -- it's not strictly a regression because we
had equally bad but different behavior when the DAGScheduler exceptions
weren't previously being handled at all vs. being slightly mis-handled
now), so I'm not currently seeing a reason not to release.


On Fri, May 16, 2014 at 11:42 AM, Henry Saputra wrote:

> Ah ok, thanks Aaron
>
> Just to make sure we VOTE the right RC.
>
> Thanks,
>
> Henry
>
> On Fri, May 16, 2014 at 11:37 AM, Aaron Davidson 
> wrote:
> > It was, but due to the apache infra issues, some may not have received
> the
> > email yet...
> >
> > On Fri, May 16, 2014 at 10:48 AM, Henry Saputra  >
> > wrote:
> >>
> >> Hi Patrick,
> >>
> >> Just want to make sure that VOTE for rc6 also cancelled?
> >>
> >>
> >> Thanks,
> >>
> >> Henry
> >>
> >> On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell 
> >> wrote:
> >> > I'll start the voting with a +1.
> >> >
> >> > On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell 
> >> > wrote:
> >> >> Please vote on releasing the following candidate as Apache Spark
> >> >> version 1.0.0!
> >> >>
> >> >> This patch has minor documentation changes and fixes on top of rc6.
> >> >>
> >> >> The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
> >> >>
> >> >>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464
> >> >>
> >> >> The release files, including signatures, digests, etc. can be found
> at:
> >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7/
> >> >>
> >> >> Release artifacts are signed with the following key:
> >> >> https://people.apache.org/keys/committer/pwendell.asc
> >> >>
> >> >> The staging repository for this release can be found at:
> >> >>
> https://repository.apache.org/content/repositories/orgapachespark-1015
> >> >>
> >> >> The documentation corresponding to this release can be found at:
> >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/
> >> >>
> >> >> Please vote on releasing this package as Apache Spark 1.0.0!
> >> >>
> >> >> The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
> >> >> majority of at least 3 +1 PMC votes are cast.
> >> >>
> >> >> [ ] +1 Release this package as Apache Spark 1.0.0
> >> >> [ ] -1 Do not release this package because ...
> >> >>
> >> >> To learn more about Apache Spark, please see
> >> >> http://spark.apache.org/
> >> >>
> >> >> == API Changes ==
> >> >> We welcome users to compile Spark applications against 1.0. There are
> >> >> a few API changes in this release. Here are links to the associated
> >> >> upgrade guides - user facing changes have been kept as small as
> >> >> possible.
> >> >>
> >> >> changes to ML vector specification:
> >> >>
> >> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
> >> >>
> >> >> changes to the Java API:
> >> >>
> >> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
> >> >>
> >> >> changes to the streaming API:
> >> >>
> >> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
> >> >>
> >> >> changes to the GraphX API:
> >> >>
> >> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
> >> >>
> >> >> coGroup and related functions now return Iterable[T] instead of
> Seq[T]
> >> >> ==> Call toSeq on the result to restore the old behavior
> >> >>
> >> >> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> >> >> ==> Call toSeq on the result to restore old behavior
> >
> >
>


Calling external classes added by sc.addJar needs to be through reflection

2014-05-16 Thread DB Tsai
Finally find a way out of the ClassLoader maze! It took me some times to
understand how it works; I think it worths to document it in a separated
thread.

We're trying to add external utility.jar which contains CSVRecordParser,
and we added the jar to executors through sc.addJar APIs.

If the instance of CSVRecordParser is created without reflection, it
raises *ClassNotFound
Exception*.

data.mapPartitions(lines => {
val csvParser = new CSVRecordParser((delimiter.charAt(0))
lines.foreach(line => {
  val lineElems = csvParser.parseLine(line)
})
...
...
 )


If the instance of CSVRecordParser is created through reflection, it works.

data.mapPartitions(lines => {
val loader = Thread.currentThread.getContextClassLoader
val CSVRecordParser =
loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")

val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
.newInstance(delimiter.charAt(0).asInstanceOf[Character])

val parseLine = CSVRecordParser
.getDeclaredMethod("parseLine", classOf[String])

lines.foreach(line => {
   val lineElems = parseLine.invoke(csvParser,
line).asInstanceOf[Array[String]]
})
...
...
 )


This is identical to this question,
http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection

It's not intuitive for users to load external classes through reflection,
but couple available solutions including 1) messing around
systemClassLoader by calling systemClassLoader.addURI through reflection or
2) forking another JVM to add jars into classpath before bootstrap loader
are very tricky.

Any thought on fixing it properly?

@Xiangrui,
netlib-java jniloader is loaded from netlib-java through reflection, so
this problem will not be seen.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


[VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0!

This patch has minor documentation changes and fixes on top of rc6.

The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc7/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1015

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/

Please vote on releasing this package as Apache Spark 1.0.0!

The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== API Changes ==
We welcome users to compile Spark applications against 1.0. There are
a few API changes in this release. Here are links to the associated
upgrade guides - user facing changes have been kept as small as
possible.

changes to ML vector specification:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10

changes to the Java API:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

changes to the streaming API:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x

changes to the GraphX API:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091

coGroup and related functions now return Iterable[T] instead of Seq[T]
==> Call toSeq on the result to restore the old behavior

SparkContext.jarOfClass returns Option[String] instead of Seq[String]
==> Call toSeq on the result to restore old behavior


Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Mark Hamstra
+1, but just barely.  We've got quite a number of outstanding bugs
identified, and many of them have fixes in progress.  I'd hate to see those
efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 --
in other words, I'd like to see 1.0.1 retain a high priority relative to
1.1.0.

Looking through the unresolved JIRAs, it doesn't look like any of the
identified bugs are show-stoppers or strictly regressions (although I will
note that one that I have in progress, SPARK-1749, is a bug that we
introduced with recent work -- it's not strictly a regression because we
had equally bad but different behavior when the DAGScheduler exceptions
weren't previously being handled at all vs. being slightly mis-handled
now), so I'm not currently seeing a reason not to release.


On Tue, May 13, 2014 at 1:36 AM, Patrick Wendell  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.0.0!
>
> The tag to be voted on is v1.0.0-rc5 (commit 18f0623):
>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=18f062303303824139998e8fc8f4158217b0dbc3
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1012/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/
>
> Please vote on releasing this package as Apache Spark 1.0.0!
>
> The vote is open until Friday, May 16, at 09:30 UTC and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == API Changes ==
> We welcome users to compile Spark applications against 1.0. There are
> a few API changes in this release. Here are links to the associated
> upgrade guides - user facing changes have been kept as small as
> possible.
>
> changes to ML vector specification:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>
> changes to the Java API:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>
> changes to the streaming API:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>
> changes to the GraphX API:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>
> coGroup and related functions now return Iterable[T] instead of Seq[T]
> ==> Call toSeq on the result to restore the old behavior
>
> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> ==> Call toSeq on the result to restore old behavior
>


Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Patrick Wendell
Thanks for your feedback. Since it's not a regression, it won't block
the release.

On Wed, May 14, 2014 at 12:17 AM, witgo  wrote:
> SPARK-1817 will cause users to get incorrect results  and RDD.zip is common 
> usage .
> This should be the highest priority. I think we should fix the bug,and should 
> also test the previous release
> -- Original --
> From:  "Patrick Wendell";;
> Date:  Wed, May 14, 2014 03:02 PM
> To:  "dev@spark.apache.org";
>
> Subject:  Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
>
>
>
> Hey @witgo - those bugs are not severe enough to block the release,
> but it would be nice to get them fixed.
>
> At this point we are focused on severe bugs with an immediate fix, or
> regressions from previous versions of Spark. Anything that misses this
> release will get merged into the branch-1.0 branch and make it into
> the 1.0.1 release, so people will have access to it.
>
> On Tue, May 13, 2014 at 5:32 PM, witgo  wrote:
>> -1
>> The following bug should be fixed:
>> https://issues.apache.org/jira/browse/SPARK-1817
>> https://issues.apache.org/jira/browse/SPARK-1712
>>
>>
>> -- Original --
>> From:  "Patrick Wendell";;
>> Date:  Wed, May 14, 2014 04:07 AM
>> To:  "dev@spark.apache.org";
>>
>> Subject:  Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
>>
>>
>>
>> Hey all - there were some earlier RC's that were not presented to the
>> dev list because issues were found with them. Also, there seems to be
>> some issues with the reliability of the dev list e-mail. Just a heads
>> up.
>>
>> I'll lead with a +1 for this.
>>
>> On Tue, May 13, 2014 at 8:07 AM, Nan Zhu  wrote:
>>> just curious, where is rc4 VOTE?
>>>
>>> I searched my gmail but didn't find that?
>>>
>>>
>>>
>>>
>>> On Tue, May 13, 2014 at 9:49 AM, Sean Owen  wrote:
>>>
 On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell 
 wrote:
 > The release files, including signatures, digests, etc. can be found at:
 > http://people.apache.org/~pwendell/spark-1.0.0-rc5/

 Good news is that the sigs, MD5 and SHA are all correct.

 Tiny note: the Maven artifacts use SHA1, while the binary artifacts
 use SHA512, which took me a bit of head-scratching to figure out.

 If another RC comes out, I might suggest making it SHA1 everywhere?
 But there is nothing wrong with these signatures and checksums.

 Now to look at the contents...

>> .
> .


Re: [VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Tom Graves
Yes, rc5 and rc6 were cancelled. There is now an rc7.   Unfortunately the 
Apache mailing list issue has caused lots of emails not to come through.

Here is the details (hopefully it goes through):

Please vote on releasing the following candidate as Apache Spark version 1.0.0!

This patch has minor documentation changes and fixes on top of rc6.

The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc7/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1015

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/

Please vote on releasing this package as Apache Spark 1.0.0!

The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== API Changes ==
We welcome users to compile Spark applications against 1.0. There are
a few API changes in this release. Here are links to the associated
upgrade guides - user facing changes have been kept as small as
possible.

changes to ML vector specification:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10

changes to the Java API:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

changes to the streaming API:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x

changes to the GraphX API:
http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091

coGroup and related functions now return Iterable[T] instead of Seq[T]
==> Call toSeq on the result to restore the old behavior

SparkContext.jarOfClass returns Option[String] instead of Seq[String]
==> Call toSeq on the result to restore old behavior


Tom


On Friday, May 16, 2014 10:22 AM, Mridul Muralidharan  wrote:
 


So was rc5 cancelled ? Did not see a note indicating that or why ... [1]

- Mridul


[1] could have easily missed it in the email storm though !


On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version
 1.0.0!
>
> This patch has a few minor fixes on top of rc5. I've also built the
> binary artifacts with Hive support enabled so people can test this
> configuration. When we release 1.0 we might just release both vanilla
> and Hive-enabled binaries.
>
> The tag to be voted on is v1.0.0-rc6 (commit 54133a):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=54133abdce0246f6643a1112a5204afb2c4caa82
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc6/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachestratos-1011
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc6-docs/
>
> Please vote on releasing this package as Apache Spark 1.0.0!
>
> The vote is open until Saturday, May 17, at 20:58 UTC and passes if
> amajority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == API Changes ==
> We welcome users to compile Spark applications against 1.0. There are
> a few API changes in this release. Here are links to the associated
> upgrade guides - user facing changes have been kept as small as
>
 possible.
>
> changes to ML vector specification:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>
> changes to the Java API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>
> changes to the streaming API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>
> changes to the GraphX API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>
> coGroup and related functions now return Iterable[T] instead of Seq[T]
> ==> Call toSeq on the result to restore the

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
Hey all,

My vote threads seem to be running about 24 hours behind and/or
getting swallowed by infra e-mail.

I sent RC8 yesterday and we might send one tonight as well. I'll make
sure to close all existing ones

There have been only small "polish" changes in the recent RC's since
RC5. So testing any off these should be pretty equivalent. I'll make
sure I close all the other threads by tonight.

- Patrick

On Fri, May 16, 2014 at 1:10 PM, Mark Hamstra  wrote:
> Sorry for the duplication, but I think this is the current VOTE candidate
> -- we're not voting on rc8 yet?
>
> +1, but just barely.  We've got quite a number of outstanding bugs
> identified, and many of them have fixes in progress.  I'd hate to see those
> efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 --
> in other words, I'd like to see 1.0.1 retain a high priority relative to
> 1.1.0.
>
> Looking through the unresolved JIRAs, it doesn't look like any of the
> identified bugs are show-stoppers or strictly regressions (although I will
> note that one that I have in progress, SPARK-1749, is a bug that we
> introduced with recent work -- it's not strictly a regression because we
> had equally bad but different behavior when the DAGScheduler exceptions
> weren't previously being handled at all vs. being slightly mis-handled
> now), so I'm not currently seeing a reason not to release.
>
>
> On Fri, May 16, 2014 at 11:42 AM, Henry Saputra 
> wrote:
>
>> Ah ok, thanks Aaron
>>
>> Just to make sure we VOTE the right RC.
>>
>> Thanks,
>>
>> Henry
>>
>> On Fri, May 16, 2014 at 11:37 AM, Aaron Davidson 
>> wrote:
>> > It was, but due to the apache infra issues, some may not have received
>> the
>> > email yet...
>> >
>> > On Fri, May 16, 2014 at 10:48 AM, Henry Saputra > >
>> > wrote:
>> >>
>> >> Hi Patrick,
>> >>
>> >> Just want to make sure that VOTE for rc6 also cancelled?
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Henry
>> >>
>> >> On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell 
>> >> wrote:
>> >> > I'll start the voting with a +1.
>> >> >
>> >> > On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell 
>> >> > wrote:
>> >> >> Please vote on releasing the following candidate as Apache Spark
>> >> >> version 1.0.0!
>> >> >>
>> >> >> This patch has minor documentation changes and fixes on top of rc6.
>> >> >>
>> >> >> The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
>> >> >>
>> >> >>
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464
>> >> >>
>> >> >> The release files, including signatures, digests, etc. can be found
>> at:
>> >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7/
>> >> >>
>> >> >> Release artifacts are signed with the following key:
>> >> >> https://people.apache.org/keys/committer/pwendell.asc
>> >> >>
>> >> >> The staging repository for this release can be found at:
>> >> >>
>> https://repository.apache.org/content/repositories/orgapachespark-1015
>> >> >>
>> >> >> The documentation corresponding to this release can be found at:
>> >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/
>> >> >>
>> >> >> Please vote on releasing this package as Apache Spark 1.0.0!
>> >> >>
>> >> >> The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
>> >> >> majority of at least 3 +1 PMC votes are cast.
>> >> >>
>> >> >> [ ] +1 Release this package as Apache Spark 1.0.0
>> >> >> [ ] -1 Do not release this package because ...
>> >> >>
>> >> >> To learn more about Apache Spark, please see
>> >> >> http://spark.apache.org/
>> >> >>
>> >> >> == API Changes ==
>> >> >> We welcome users to compile Spark applications against 1.0. There are
>> >> >> a few API changes in this release. Here are links to the associated
>> >> >> upgrade guides - user facing changes have been kept as small as
>> >> >> possible.
>> >> >>
>> >> >> changes to ML vector specification:
>> >> >>
>> >> >>
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>> >> >>
>> >> >> changes to the Java API:
>> >> >>
>> >> >>
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>> >> >>
>> >> >> changes to the streaming API:
>> >> >>
>> >> >>
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>> >> >>
>> >> >> changes to the GraphX API:
>> >> >>
>> >> >>
>> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>> >> >>
>> >> >> coGroup and related functions now return Iterable[T] instead of
>> Seq[T]
>> >> >> ==> Call toSeq on the result to restore the old behavior
>> >> >>
>> >> >> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
>> >> >> ==> Call toSeq on the result to restore old behavior
>> >
>> >
>>


Re: mllib vector templates

2014-05-16 Thread Xiangrui Meng
3) It is not designed for dense feature vectors.

On Thu, May 15, 2014 at 8:33 PM, Xiangrui Meng  wrote:
> I submitted a PR for standardizing the text format for vectors and
> labeled data: https://github.com/apache/spark/pull/685
>
> Once it gets merged, saveAsTextFile and loading should be consistent.
> I didn't choose LibSVM as the default format because two reasons:
>
> 1) It doesn't contain feature dimension info in the record. We need to
> scan the dataset to get that info.
> 2) It saves index:value tuples. Putting indices together can help data
> compression. Same for value if there are many binary features.
>
> Best,
> Xiangrui
>
> On Wed, May 7, 2014 at 10:25 PM, Debasish Das  
> wrote:
>> Hi,
>>
>> I see ALS is still using Array[Int] but for other mllib algorithm we moved
>> to Vector[Double] so that it can support either dense and sparse formats...
>>
>> ALS can stay in Array[Int] due to the Netflix format for input datasets
>> which is well defined but it helps if we move ALS to Vector[Double] as
>> well...that way all algorithms will be consistent...
>>
>> The second issue is that toString on SparseVector does not write libsvm
>> format but something not very generic...can we change the
>> SparseVector.toString to write as libsvm output ? I am dumping a sample of
>> dataset to see how mllib glm compares with the glmnet-R package for QoR...
>>
>> Thanks.
>> Deb
>>
>> On Mon, May 5, 2014 at 4:05 PM, David Hall  wrote:
>>>
 On Mon, May 5, 2014 at 3:40 PM, DB Tsai  wrote:

 > David,
 >
 > Could we use Int, Long, Float as the data feature spaces, and Double for
 > optimizer?
 >

 Yes. Breeze doesn't allow operations on mixed types, so you'd need to
 convert the double vectors to Floats if you wanted, e.g. dot product with
 the weights vector.

 You might also be interested in FeatureVector, which is just a wrapper
 around Array[Int] that emulates an indicator vector. It supports dot
 products, axpy, etc.

 -- David


 >
 >
 > Sincerely,
 >
 > DB Tsai
 > ---
 > My Blog: https://www.dbtsai.com
 > LinkedIn: https://www.linkedin.com/in/dbtsai
 >
 >
 > On Mon, May 5, 2014 at 3:06 PM, David Hall 
 wrote:
 >
 > > Lbfgs and other optimizers would not work immediately, as they require
 > > vector spaces over double. Otherwise it should work.
 > > On May 5, 2014 3:03 PM, "DB Tsai"  wrote:
 > >
 > > > Breeze could take any type (Int, Long, Double, and Float) in the
 matrix
 > > > template.
 > > >
 > > >
 > > > Sincerely,
 > > >
 > > > DB Tsai
 > > > ---
 > > > My Blog: https://www.dbtsai.com
 > > > LinkedIn: https://www.linkedin.com/in/dbtsai
 > > >
 > > >
 > > > On Mon, May 5, 2014 at 2:56 PM, Debasish Das <
 debasish.da...@gmail.com
 > > > >wrote:
 > > >
 > > > > Is this a breeze issue or breeze can take templates on float /
 > double ?
 > > > >
 > > > > If breeze can take templates then it is a minor fix for
 Vectors.scala
 > > > right
 > > > > ?
 > > > >
 > > > > Thanks.
 > > > > Deb
 > > > >
 > > > >
 > > > > On Mon, May 5, 2014 at 2:45 PM, DB Tsai 
 wrote:
 > > > >
 > > > > > +1  Would be nice that we can use different type in Vector.
 > > > > >
 > > > > >
 > > > > > Sincerely,
 > > > > >
 > > > > > DB Tsai
 > > > > > ---
 > > > > > My Blog: https://www.dbtsai.com
 > > > > > LinkedIn: https://www.linkedin.com/in/dbtsai
 > > > > >
 > > > > >
 > > > > > On Mon, May 5, 2014 at 2:41 PM, Debasish Das <
 > > debasish.da...@gmail.com
 > > > > > >wrote:
 > > > > >
 > > > > > > Hi,
 > > > > > >
 > > > > > > Why mllib vector is using double as default ?
 > > > > > >
 > > > > > > /**
 > > > > > >
 > > > > > >  * Represents a numeric vector, whose index type is Int and
 value
 > > > type
 > > > > is
 > > > > > > Double.
 > > > > > >
 > > > > > >  */
 > > > > > >
 > > > > > > trait Vector extends Serializable {
 > > > > > >
 > > > > > >
 > > > > > >   /**
 > > > > > >
 > > > > > >* Size of the vector.
 > > > > > >
 > > > > > >*/
 > > > > > >
 > > > > > >   def size: Int
 > > > > > >
 > > > > > >
 > > > > > >   /**
 > > > > > >
 > > > > > >* Converts the instance to a double array.
 > > > > > >
 > > > > > >*/
 > > > > > >
 > > > > > >   def toArray: Array[Double]
 > > > > > >
 > > > > > > Don't we need a template on float/double ? This will give us
 > memory
 > > > > > > savings...
 > > > > > >
 > > > > > > Thanks.
 > > > > > >
 > > > > > > Deb
 > > > > > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Mark Hamstra
+1


On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell  wrote:

> [Due to ASF e-mail outage, I'm not if anyone will actually receive this.]
>
> Please vote on releasing the following candidate as Apache Spark version
> 1.0.0!
> This has only minor changes on top of rc7.
>
> The tag to be voted on is v1.0.0-rc8 (commit 80eea0f):
>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=80eea0f111c06260ffaa780d2f3f7facd09c17bc
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc8/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1016/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/
>
> Please vote on releasing this package as Apache Spark 1.0.0!
>
> The vote is open until Monday, May 19, at 10:15 UTC and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == API Changes ==
> We welcome users to compile Spark applications against 1.0. There are
> a few API changes in this release. Here are links to the associated
> upgrade guides - user facing changes have been kept as small as
> possible.
>
> changes to ML vector specification:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/mllib-guide.html#from-09-to-10
>
> changes to the Java API:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>
> changes to the streaming API:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>
> changes to the GraphX API:
>
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>
> coGroup and related functions now return Iterable[T] instead of Seq[T]
> ==> Call toSeq on the result to restore the old behavior
>
> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> ==> Call toSeq on the result to restore old behavior
>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Mridul Muralidharan
Effectively this is persist without fault tolerance.
Failure of any node means complete lack of fault tolerance.
I would be very skeptical of truncating lineage if it is not reliable.
 On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:

> Xiangrui Meng created SPARK-1855:
> 
>
>  Summary: Provide memory-and-local-disk RDD checkpointing
>  Key: SPARK-1855
>  URL: https://issues.apache.org/jira/browse/SPARK-1855
>  Project: Spark
>   Issue Type: New Feature
>   Components: MLlib, Spark Core
> Affects Versions: 1.0.0
> Reporter: Xiangrui Meng
>
>
> Checkpointing is used to cut long lineage while maintaining fault
> tolerance. The current implementation is HDFS-based. Using the BlockRDD we
> can create in-memory-and-local-disk (with replication) checkpoints that are
> not as reliable as HDFS-based solution but faster.
>
> It can help applications that require many iterations.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>


Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Henry Saputra
Ah ok, thanks Aaron

Just to make sure we VOTE the right RC.

Thanks,

Henry

On Fri, May 16, 2014 at 11:37 AM, Aaron Davidson  wrote:
> It was, but due to the apache infra issues, some may not have received the
> email yet...
>
> On Fri, May 16, 2014 at 10:48 AM, Henry Saputra 
> wrote:
>>
>> Hi Patrick,
>>
>> Just want to make sure that VOTE for rc6 also cancelled?
>>
>>
>> Thanks,
>>
>> Henry
>>
>> On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell 
>> wrote:
>> > I'll start the voting with a +1.
>> >
>> > On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell 
>> > wrote:
>> >> Please vote on releasing the following candidate as Apache Spark
>> >> version 1.0.0!
>> >>
>> >> This patch has minor documentation changes and fixes on top of rc6.
>> >>
>> >> The tag to be voted on is v1.0.0-rc7 (commit 9212b3e):
>> >>
>> >> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108e6fb1c464
>> >>
>> >> The release files, including signatures, digests, etc. can be found at:
>> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7/
>> >>
>> >> Release artifacts are signed with the following key:
>> >> https://people.apache.org/keys/committer/pwendell.asc
>> >>
>> >> The staging repository for this release can be found at:
>> >> https://repository.apache.org/content/repositories/orgapachespark-1015
>> >>
>> >> The documentation corresponding to this release can be found at:
>> >> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/
>> >>
>> >> Please vote on releasing this package as Apache Spark 1.0.0!
>> >>
>> >> The vote is open until Sunday, May 18, at 09:12 UTC and passes if a
>> >> majority of at least 3 +1 PMC votes are cast.
>> >>
>> >> [ ] +1 Release this package as Apache Spark 1.0.0
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> To learn more about Apache Spark, please see
>> >> http://spark.apache.org/
>> >>
>> >> == API Changes ==
>> >> We welcome users to compile Spark applications against 1.0. There are
>> >> a few API changes in this release. Here are links to the associated
>> >> upgrade guides - user facing changes have been kept as small as
>> >> possible.
>> >>
>> >> changes to ML vector specification:
>> >>
>> >> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>> >>
>> >> changes to the Java API:
>> >>
>> >> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>> >>
>> >> changes to the streaming API:
>> >>
>> >> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>> >>
>> >> changes to the GraphX API:
>> >>
>> >> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>> >>
>> >> coGroup and related functions now return Iterable[T] instead of Seq[T]
>> >> ==> Call toSeq on the result to restore the old behavior
>> >>
>> >> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
>> >> ==> Call toSeq on the result to restore old behavior
>
>


Re: [VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Mridul Muralidharan
So was rc5 cancelled ? Did not see a note indicating that or why ... [1]

- Mridul


[1] could have easily missed it in the email storm though !

On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.0.0!
>
> This patch has a few minor fixes on top of rc5. I've also built the
> binary artifacts with Hive support enabled so people can test this
> configuration. When we release 1.0 we might just release both vanilla
> and Hive-enabled binaries.
>
> The tag to be voted on is v1.0.0-rc6 (commit 54133a):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=54133abdce0246f6643a1112a5204afb2c4caa82
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc6/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachestratos-1011
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.0.0-rc6-docs/
>
> Please vote on releasing this package as Apache Spark 1.0.0!
>
> The vote is open until Saturday, May 17, at 20:58 UTC and passes if
> amajority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == API Changes ==
> We welcome users to compile Spark applications against 1.0. There are
> a few API changes in this release. Here are links to the associated
> upgrade guides - user facing changes have been kept as small as
> possible.
>
> changes to ML vector specification:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/mllib-guide.html#from-09-to-10
>
> changes to the Java API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
>
> changes to the streaming API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
>
> changes to the GraphX API:
> http://people.apache.org/~pwendell/spark-1.0.0-rc5-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
>
> coGroup and related functions now return Iterable[T] instead of Seq[T]
> ==> Call toSeq on the result to restore the old behavior
>
> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> ==> Call toSeq on the result to restore old behavior


Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Michael Armbrust
-1

We found a regression in the way configuration is passed to executors.

https://issues.apache.org/jira/browse/SPARK-1864
https://github.com/apache/spark/pull/808

Michael


On Fri, May 16, 2014 at 3:57 PM, Mark Hamstra wrote:

> +1
>
>
> On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell 
> wrote:
>
> > [Due to ASF e-mail outage, I'm not if anyone will actually receive this.]
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.0.0!
> > This has only minor changes on top of rc7.
> >
> > The tag to be voted on is v1.0.0-rc8 (commit 80eea0f):
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=80eea0f111c06260ffaa780d2f3f7facd09c17bc
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-1.0.0-rc8/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1016/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/
> >
> > Please vote on releasing this package as Apache Spark 1.0.0!
> >
> > The vote is open until Monday, May 19, at 10:15 UTC and passes if a
> > majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.0.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> > http://spark.apache.org/
> >
> > == API Changes ==
> > We welcome users to compile Spark applications against 1.0. There are
> > a few API changes in this release. Here are links to the associated
> > upgrade guides - user facing changes have been kept as small as
> > possible.
> >
> > changes to ML vector specification:
> >
> >
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/mllib-guide.html#from-09-to-10
> >
> > changes to the Java API:
> >
> >
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
> >
> > changes to the streaming API:
> >
> >
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x
> >
> > changes to the GraphX API:
> >
> >
> http://people.apache.org/~pwendell/spark-1.0.0-rc8-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091
> >
> > coGroup and related functions now return Iterable[T] instead of Seq[T]
> > ==> Call toSeq on the result to restore the old behavior
> >
> > SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> > ==> Call toSeq on the result to restore old behavior
> >
>


Re: (test)

2014-05-16 Thread Aaron Davidson
No. Only 3 of the responses.


On Fri, May 16, 2014 at 10:38 AM, Nishkam Ravi  wrote:

> Yes.
>
>
> On Fri, May 16, 2014 at 8:40 AM, DB Tsai  wrote:
>
> > Yes.
> > On May 16, 2014 8:39 AM, "Andrew Or"  wrote:
> >
> > > Apache has been having some problems lately. Do you guys see this
> > message?
> > >
> >
>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Xiangrui Meng
With 3x replication, we should be able to achieve fault tolerance.
This checkPointed RDD can be cleared if we have another in-memory
checkPointed RDD down the line. It can avoid hitting disk if we have
enough memory to use. We need to investigate more to find a good
solution. -Xiangrui

On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan  wrote:
> Effectively this is persist without fault tolerance.
> Failure of any node means complete lack of fault tolerance.
> I would be very skeptical of truncating lineage if it is not reliable.
>  On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:
>
>> Xiangrui Meng created SPARK-1855:
>> 
>>
>>  Summary: Provide memory-and-local-disk RDD checkpointing
>>  Key: SPARK-1855
>>  URL: https://issues.apache.org/jira/browse/SPARK-1855
>>  Project: Spark
>>   Issue Type: New Feature
>>   Components: MLlib, Spark Core
>> Affects Versions: 1.0.0
>> Reporter: Xiangrui Meng
>>
>>
>> Checkpointing is used to cut long lineage while maintaining fault
>> tolerance. The current implementation is HDFS-based. Using the BlockRDD we
>> can create in-memory-and-local-disk (with replication) checkpoints that are
>> not as reliable as HDFS-based solution but faster.
>>
>> It can help applications that require many iterations.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>