subject:"Spark 1.0.0 rc3"

Re: Spark 1.0.0 rc3

2014-05-03 Thread Nan Zhu

SPARK_HADOOP_VERSION=2.3.0 sbt/sbt assembly 

and copy the generated jar to lib/ directory of my application, 

it seems that sbt cannot find the dependencies in the jar?

but everything works with the pre-built jar files downloaded from the link 
provided by Patrick

Best, 

-- 
Nan Zhu


On Thursday, May 1, 2014 at 11:16 PM, Madhu wrote:

 I'm guessing EC2 support is not there yet?
 
 I was able to build using the binary download on both Windows 7 and RHEL 6
 without issues.
 I tried to create an EC2 cluster, but saw this:
 
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 
 The spark dir on the EC2 master has only a conf dir, so it didn't deploy
 properly.
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-0-0-rc3-tp6427p6456.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com 
 (http://Nabble.com).

Re: Spark 1.0.0 rc3

2014-05-01 Thread Manu Suryavansh

Hi,

I tried to build the 1.0.0 rc3 version with Java 8 and I got the error
- java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC
overhead limit exceeded
I am building on a Core-i7(Quad core) windows laptop with 8 GB RAM.

Earlier I had tried to build Spark 0.9.1 with Java 8 and I had gotten an
error about comparator.class not found - which was mentioned today on
another thread, so I am not getting that error now. I have successfully
build Spark 0.9.0 with Java 1.7.

[image: Inline image 1]

Thanks,
Manu


On Tue, Apr 29, 2014 at 10:43 PM, Patrick Wendell pwend...@gmail.comwrote:

 That suggestion got lost along the way and IIRC the patch didn't have
 that. It's a good idea though, if nothing else to provide a simple
 means for backwards compatibility.

 I created a JIRA for this. It's very straightforward so maybe someone
 can pick it up quickly:
 https://issues.apache.org/jira/browse/SPARK-1677


 On Tue, Apr 29, 2014 at 2:20 PM, Dean Wampler deanwamp...@gmail.com
 wrote:
  Thanks. I'm fine with the logic change, although I was a bit surprised to
  see Hadoop used for file I/O.
 
  Anyway, the jira issue and pull request discussions mention a flag to
  enable overwrites. That would be very convenient for a tutorial I'm
  writing, although I wouldn't recommend it for normal use, of course.
  However, I can't figure out if this actually exists. I found the
  spark.files.overwrite property, but that doesn't apply.  Does this
 override
  flag, method call, or method argument actually exist?
 
  Thanks,
  Dean
 
 
  On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hi Dean,
 
  We always used the Hadoop libraries here to read and write local
  files. In Spark 1.0 we started enforcing the rule that you can't
  over-write an existing directory because it can cause
  confusing/undefined behavior if multiple jobs output to the directory
  (they partially clobber each other's output).
 
  https://issues.apache.org/jira/browse/SPARK-1100
  https://github.com/apache/spark/pull/11
 
  In the JIRA I actually proposed slightly deviating from Hadoop
  semantics and allowing the directory to exist if it is empty, but I
  think in the end we decided to just go with the exact same semantics
  as Hadoop (i.e. empty directories are a problem).
 
  - Patrick
 
  On Tue, Apr 29, 2014 at 9:43 AM, Dean Wampler deanwamp...@gmail.com
  wrote:
   I'm observing one anomalous behavior. With the 1.0.0 libraries, it's
  using
   HDFS classes for file I/O, while the same script compiled and running
  with
   0.9.1 uses only the local-mode File IO.
  
   The script is a variation of the Word Count script. Here are the
 guts:
  
   object WordCount2 {
 def main(args: Array[String]) = {
  
   val sc = new SparkContext(local, Word Count (2))
  
   val input = sc.textFile(.../some/local/file).map(line =
   line.toLowerCase)
   input.cache
  
   val wc2 = input
 .flatMap(line = line.split(\W+))
 .map(word = (word, 1))
 .reduceByKey((count1, count2) = count1 + count2)
  
   wc2.saveAsTextFile(output/some/directory)
  
   sc.stop()
  
   It works fine compiled and executed with 0.9.1. If I recompile and run
  with
   1.0.0-RC1, where the same output directory still exists, I get this
   familiar Hadoop-ish exception:
  
   [error] (run-main-0)
 org.apache.hadoop.mapred.FileAlreadyExistsException:
   Output directory
  
 
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
   already exists
   org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
  
 
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
   already exists
at
  
 
 org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
   at
  
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:749)
at
  
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662)
   at
  
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057)
   at spark.activator.WordCount2$.main(WordCount2.scala:42)
at spark.activator.WordCount2.main(WordCount2.scala)
   ...
  
   Thoughts?
  
  
   On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell pwend...@gmail.com
  wrote:
  
   Hey All,
  
   This is not an official vote, but I wanted to cut an RC so that
 people
  can
   test against the Maven artifacts, test building with their
  configuration,
   etc. We are still chasing down a few issues and updating docs, etc.
  
   If you have issues or bug reports for this release, please send an
  e-mail
   to the Spark dev list and/or file a JIRA.
  
   Commit: d636772 (v1.0.0-rc3)
  
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221
  
   Binaries:
   http://people.apache.org/~pwendell/spark-1.0.0-rc3

Re: Spark 1.0.0 rc3

2014-05-01 Thread Madhu

I'm guessing EC2 support is not there yet?

I was able to build using the binary download on both Windows 7 and RHEL 6
without issues.
I tried to create an EC2 cluster, but saw this:

~/spark-ec2
Initializing spark
~ ~/spark-ec2
ERROR: Unknown Spark version
Initializing shark
~ ~/spark-ec2 ~/spark-ec2
ERROR: Unknown Shark version

The spark dir on the EC2 master has only a conf dir, so it didn't deploy
properly.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-0-0-rc3-tp6427p6456.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell

Hey All,

This is not an official vote, but I wanted to cut an RC so that people can
test against the Maven artifacts, test building with their configuration,
etc. We are still chasing down a few issues and updating docs, etc.

If you have issues or bug reports for this release, please send an e-mail
to the Spark dev list and/or file a JIRA.

Commit: d636772 (v1.0.0-rc3)
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221

Binaries:
http://people.apache.org/~pwendell/spark-1.0.0-rc3/

Docs:
http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/

Repository:
https://repository.apache.org/content/repositories/orgapachespark-1012/

== API Changes ==
If you want to test building against Spark there are some minor API
changes. We'll get these written up for the final release but I'm noting a
few here (not comprehensive):

changes to ML vector specification:
http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10

changes to the Java API:
http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

coGroup and related functions now return Iterable[T] instead of Seq[T]
== Call toSeq on the result to restore the old behavior

SparkContext.jarOfClass returns Option[String] instead of Seq[String]
== Call toSeq on the result to restore old behavior

Streaming classes have been renamed:
NetworkReceiver - Receiver

Re: Spark 1.0.0 rc3

2014-04-29 Thread Marcelo Vanzin

Hi Patrick,

What are the expectations / guarantees on binary compatibility between
0.9 and 1.0?

You mention some API changes, which kinda hint that binary
compatibility has already been broken, but just wanted to point out
there are other cases. e.g.:

Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:236)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError:
org.apache.spark.SparkContext$.rddToOrderedRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/Function1;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/OrderedRDDFunctions;

(Compiled against 0.9, run against 1.0.)
Offending code:

  val top10 = counts.sortByKey(false).take(10)

Recompiling fixes the problem.


On Tue, Apr 29, 2014 at 1:05 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey All,

 This is not an official vote, but I wanted to cut an RC so that people can
 test against the Maven artifacts, test building with their configuration,
 etc. We are still chasing down a few issues and updating docs, etc.

 If you have issues or bug reports for this release, please send an e-mail
 to the Spark dev list and/or file a JIRA.

 Commit: d636772 (v1.0.0-rc3)
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221

 Binaries:
 http://people.apache.org/~pwendell/spark-1.0.0-rc3/

 Docs:
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/

 Repository:
 https://repository.apache.org/content/repositories/orgapachespark-1012/

 == API Changes ==
 If you want to test building against Spark there are some minor API
 changes. We'll get these written up for the final release but I'm noting a
 few here (not comprehensive):

 changes to ML vector specification:
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10

 changes to the Java API:
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

 coGroup and related functions now return Iterable[T] instead of Seq[T]
 == Call toSeq on the result to restore the old behavior

 SparkContext.jarOfClass returns Option[String] instead of Seq[String]
 == Call toSeq on the result to restore old behavior

 Streaming classes have been renamed:
 NetworkReceiver - Receiver



-- 
Marcelo

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell

 What are the expectations / guarantees on binary compatibility between
 0.9 and 1.0?

There are not guarantees.

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell

Hi Dean,

We always used the Hadoop libraries here to read and write local
files. In Spark 1.0 we started enforcing the rule that you can't
over-write an existing directory because it can cause
confusing/undefined behavior if multiple jobs output to the directory
(they partially clobber each other's output).

https://issues.apache.org/jira/browse/SPARK-1100
https://github.com/apache/spark/pull/11

In the JIRA I actually proposed slightly deviating from Hadoop
semantics and allowing the directory to exist if it is empty, but I
think in the end we decided to just go with the exact same semantics
as Hadoop (i.e. empty directories are a problem).

- Patrick

On Tue, Apr 29, 2014 at 9:43 AM, Dean Wampler deanwamp...@gmail.com wrote:
 I'm observing one anomalous behavior. With the 1.0.0 libraries, it's using
 HDFS classes for file I/O, while the same script compiled and running with
 0.9.1 uses only the local-mode File IO.

 The script is a variation of the Word Count script. Here are the guts:

 object WordCount2 {
   def main(args: Array[String]) = {

 val sc = new SparkContext(local, Word Count (2))

 val input = sc.textFile(.../some/local/file).map(line =
 line.toLowerCase)
 input.cache

 val wc2 = input
   .flatMap(line = line.split(\W+))
   .map(word = (word, 1))
   .reduceByKey((count1, count2) = count1 + count2)

 wc2.saveAsTextFile(output/some/directory)

 sc.stop()

 It works fine compiled and executed with 0.9.1. If I recompile and run with
 1.0.0-RC1, where the same output directory still exists, I get this
 familiar Hadoop-ish exception:

 [error] (run-main-0) org.apache.hadoop.mapred.FileAlreadyExistsException:
 Output directory
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
 already exists
 org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
 already exists
  at
 org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
 at
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:749)
  at
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662)
 at
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581)
  at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057)
 at spark.activator.WordCount2$.main(WordCount2.scala:42)
  at spark.activator.WordCount2.main(WordCount2.scala)
 ...

 Thoughts?


 On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell pwend...@gmail.com wrote:

 Hey All,

 This is not an official vote, but I wanted to cut an RC so that people can
 test against the Maven artifacts, test building with their configuration,
 etc. We are still chasing down a few issues and updating docs, etc.

 If you have issues or bug reports for this release, please send an e-mail
 to the Spark dev list and/or file a JIRA.

 Commit: d636772 (v1.0.0-rc3)

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221

 Binaries:
 http://people.apache.org/~pwendell/spark-1.0.0-rc3/

 Docs:
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/

 Repository:
 https://repository.apache.org/content/repositories/orgapachespark-1012/

 == API Changes ==
 If you want to test building against Spark there are some minor API
 changes. We'll get these written up for the final release but I'm noting a
 few here (not comprehensive):

 changes to ML vector specification:

 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10

 changes to the Java API:

 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark

 coGroup and related functions now return Iterable[T] instead of Seq[T]
 == Call toSeq on the result to restore the old behavior

 SparkContext.jarOfClass returns Option[String] instead of Seq[String]
 == Call toSeq on the result to restore old behavior

 Streaming classes have been renamed:
 NetworkReceiver - Receiver




 --
 Dean Wampler, Ph.D.
 Typesafe
 @deanwampler
 http://typesafe.com
 http://polyglotprogramming.com

Re: Spark 1.0.0 rc3

2014-04-29 Thread Dean Wampler

Thanks. I'm fine with the logic change, although I was a bit surprised to
see Hadoop used for file I/O.

Anyway, the jira issue and pull request discussions mention a flag to
enable overwrites. That would be very convenient for a tutorial I'm
writing, although I wouldn't recommend it for normal use, of course.
However, I can't figure out if this actually exists. I found the
spark.files.overwrite property, but that doesn't apply.  Does this override
flag, method call, or method argument actually exist?

Thanks,
Dean


On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hi Dean,

 We always used the Hadoop libraries here to read and write local
 files. In Spark 1.0 we started enforcing the rule that you can't
 over-write an existing directory because it can cause
 confusing/undefined behavior if multiple jobs output to the directory
 (they partially clobber each other's output).

 https://issues.apache.org/jira/browse/SPARK-1100
 https://github.com/apache/spark/pull/11

 In the JIRA I actually proposed slightly deviating from Hadoop
 semantics and allowing the directory to exist if it is empty, but I
 think in the end we decided to just go with the exact same semantics
 as Hadoop (i.e. empty directories are a problem).

 - Patrick

 On Tue, Apr 29, 2014 at 9:43 AM, Dean Wampler deanwamp...@gmail.com
 wrote:
  I'm observing one anomalous behavior. With the 1.0.0 libraries, it's
 using
  HDFS classes for file I/O, while the same script compiled and running
 with
  0.9.1 uses only the local-mode File IO.
 
  The script is a variation of the Word Count script. Here are the guts:
 
  object WordCount2 {
def main(args: Array[String]) = {
 
  val sc = new SparkContext(local, Word Count (2))
 
  val input = sc.textFile(.../some/local/file).map(line =
  line.toLowerCase)
  input.cache
 
  val wc2 = input
.flatMap(line = line.split(\W+))
.map(word = (word, 1))
.reduceByKey((count1, count2) = count1 + count2)
 
  wc2.saveAsTextFile(output/some/directory)
 
  sc.stop()
 
  It works fine compiled and executed with 0.9.1. If I recompile and run
 with
  1.0.0-RC1, where the same output directory still exists, I get this
  familiar Hadoop-ish exception:
 
  [error] (run-main-0) org.apache.hadoop.mapred.FileAlreadyExistsException:
  Output directory
 
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
  already exists
  org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
 
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
  already exists
   at
 
 org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
  at
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:749)
   at
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662)
  at
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581)
   at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057)
  at spark.activator.WordCount2$.main(WordCount2.scala:42)
   at spark.activator.WordCount2.main(WordCount2.scala)
  ...
 
  Thoughts?
 
 
  On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey All,
 
  This is not an official vote, but I wanted to cut an RC so that people
 can
  test against the Maven artifacts, test building with their
 configuration,
  etc. We are still chasing down a few issues and updating docs, etc.
 
  If you have issues or bug reports for this release, please send an
 e-mail
  to the Spark dev list and/or file a JIRA.
 
  Commit: d636772 (v1.0.0-rc3)
 
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221
 
  Binaries:
  http://people.apache.org/~pwendell/spark-1.0.0-rc3/
 
  Docs:
  http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/
 
  Repository:
  https://repository.apache.org/content/repositories/orgapachespark-1012/
 
  == API Changes ==
  If you want to test building against Spark there are some minor API
  changes. We'll get these written up for the final release but I'm
 noting a
  few here (not comprehensive):
 
  changes to ML vector specification:
 
 
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10
 
  changes to the Java API:
 
 
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
 
  coGroup and related functions now return Iterable[T] instead of Seq[T]
  == Call toSeq on the result to restore the old behavior
 
  SparkContext.jarOfClass returns Option[String] instead of Seq[String]
  == Call toSeq on the result to restore old behavior
 
  Streaming classes have been renamed:
  NetworkReceiver - Receiver
 
 
 
 
  --
  Dean Wampler, Ph.D.
  Typesafe
  @deanwampler
  http://typesafe.com
  http://polyglotprogramming.com




-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell

That suggestion got lost along the way and IIRC the patch didn't have
that. It's a good idea though, if nothing else to provide a simple
means for backwards compatibility.

I created a JIRA for this. It's very straightforward so maybe someone
can pick it up quickly:
https://issues.apache.org/jira/browse/SPARK-1677


On Tue, Apr 29, 2014 at 2:20 PM, Dean Wampler deanwamp...@gmail.com wrote:
 Thanks. I'm fine with the logic change, although I was a bit surprised to
 see Hadoop used for file I/O.

 Anyway, the jira issue and pull request discussions mention a flag to
 enable overwrites. That would be very convenient for a tutorial I'm
 writing, although I wouldn't recommend it for normal use, of course.
 However, I can't figure out if this actually exists. I found the
 spark.files.overwrite property, but that doesn't apply.  Does this override
 flag, method call, or method argument actually exist?

 Thanks,
 Dean


 On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hi Dean,

 We always used the Hadoop libraries here to read and write local
 files. In Spark 1.0 we started enforcing the rule that you can't
 over-write an existing directory because it can cause
 confusing/undefined behavior if multiple jobs output to the directory
 (they partially clobber each other's output).

 https://issues.apache.org/jira/browse/SPARK-1100
 https://github.com/apache/spark/pull/11

 In the JIRA I actually proposed slightly deviating from Hadoop
 semantics and allowing the directory to exist if it is empty, but I
 think in the end we decided to just go with the exact same semantics
 as Hadoop (i.e. empty directories are a problem).

 - Patrick

 On Tue, Apr 29, 2014 at 9:43 AM, Dean Wampler deanwamp...@gmail.com
 wrote:
  I'm observing one anomalous behavior. With the 1.0.0 libraries, it's
 using
  HDFS classes for file I/O, while the same script compiled and running
 with
  0.9.1 uses only the local-mode File IO.
 
  The script is a variation of the Word Count script. Here are the guts:
 
  object WordCount2 {
def main(args: Array[String]) = {
 
  val sc = new SparkContext(local, Word Count (2))
 
  val input = sc.textFile(.../some/local/file).map(line =
  line.toLowerCase)
  input.cache
 
  val wc2 = input
.flatMap(line = line.split(\W+))
.map(word = (word, 1))
.reduceByKey((count1, count2) = count1 + count2)
 
  wc2.saveAsTextFile(output/some/directory)
 
  sc.stop()
 
  It works fine compiled and executed with 0.9.1. If I recompile and run
 with
  1.0.0-RC1, where the same output directory still exists, I get this
  familiar Hadoop-ish exception:
 
  [error] (run-main-0) org.apache.hadoop.mapred.FileAlreadyExistsException:
  Output directory
 
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
  already exists
  org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
 
 file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
  already exists
   at
 
 org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
  at
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:749)
   at
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662)
  at
 
 org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581)
   at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057)
  at spark.activator.WordCount2$.main(WordCount2.scala:42)
   at spark.activator.WordCount2.main(WordCount2.scala)
  ...
 
  Thoughts?
 
 
  On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Hey All,
 
  This is not an official vote, but I wanted to cut an RC so that people
 can
  test against the Maven artifacts, test building with their
 configuration,
  etc. We are still chasing down a few issues and updating docs, etc.
 
  If you have issues or bug reports for this release, please send an
 e-mail
  to the Spark dev list and/or file a JIRA.
 
  Commit: d636772 (v1.0.0-rc3)
 
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221
 
  Binaries:
  http://people.apache.org/~pwendell/spark-1.0.0-rc3/
 
  Docs:
  http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/
 
  Repository:
  https://repository.apache.org/content/repositories/orgapachespark-1012/
 
  == API Changes ==
  If you want to test building against Spark there are some minor API
  changes. We'll get these written up for the final release but I'm
 noting a
  few here (not comprehensive):
 
  changes to ML vector specification:
 
 
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10
 
  changes to the Java API:
 
 
 http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
 
  coGroup and related functions now return Iterable[T] instead of Seq[T]
  == Call toSeq

Re: Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

Re: Spark 1.0.0 rc3

9 matches

Site Navigation

Mail list logo

Footer information