Hi, I tried to build the 1.0.0 rc3 version with Java 8 and I got the error - java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded I am building on a Core-i7(Quad core) windows laptop with 8 GB RAM.
Earlier I had tried to build Spark 0.9.1 with Java 8 and I had gotten an error about comparator.class not found - which was mentioned today on another thread, so I am not getting that error now. I have successfully build Spark 0.9.0 with Java 1.7. [image: Inline image 1] Thanks, Manu On Tue, Apr 29, 2014 at 10:43 PM, Patrick Wendell <pwend...@gmail.com>wrote: > That suggestion got lost along the way and IIRC the patch didn't have > that. It's a good idea though, if nothing else to provide a simple > means for backwards compatibility. > > I created a JIRA for this. It's very straightforward so maybe someone > can pick it up quickly: > https://issues.apache.org/jira/browse/SPARK-1677 > > > On Tue, Apr 29, 2014 at 2:20 PM, Dean Wampler <deanwamp...@gmail.com> > wrote: > > Thanks. I'm fine with the logic change, although I was a bit surprised to > > see Hadoop used for file I/O. > > > > Anyway, the jira issue and pull request discussions mention a flag to > > enable overwrites. That would be very convenient for a tutorial I'm > > writing, although I wouldn't recommend it for normal use, of course. > > However, I can't figure out if this actually exists. I found the > > spark.files.overwrite property, but that doesn't apply. Does this > override > > flag, method call, or method argument actually exist? > > > > Thanks, > > Dean > > > > > > On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell <pwend...@gmail.com> > wrote: > > > >> Hi Dean, > >> > >> We always used the Hadoop libraries here to read and write local > >> files. In Spark 1.0 we started enforcing the rule that you can't > >> over-write an existing directory because it can cause > >> confusing/undefined behavior if multiple jobs output to the directory > >> (they partially clobber each other's output). > >> > >> https://issues.apache.org/jira/browse/SPARK-1100 > >> https://github.com/apache/spark/pull/11 > >> > >> In the JIRA I actually proposed slightly deviating from Hadoop > >> semantics and allowing the directory to exist if it is empty, but I > >> think in the end we decided to just go with the exact same semantics > >> as Hadoop (i.e. empty directories are a problem). > >> > >> - Patrick > >> > >> On Tue, Apr 29, 2014 at 9:43 AM, Dean Wampler <deanwamp...@gmail.com> > >> wrote: > >> > I'm observing one anomalous behavior. With the 1.0.0 libraries, it's > >> using > >> > HDFS classes for file I/O, while the same script compiled and running > >> with > >> > 0.9.1 uses only the local-mode File IO. > >> > > >> > The script is a variation of the Word Count script. Here are the > "guts": > >> > > >> > object WordCount2 { > >> > def main(args: Array[String]) = { > >> > > >> > val sc = new SparkContext("local", "Word Count (2)") > >> > > >> > val input = sc.textFile(".../some/local/file").map(line => > >> > line.toLowerCase) > >> > input.cache > >> > > >> > val wc2 = input > >> > .flatMap(line => line.split("""\W+""")) > >> > .map(word => (word, 1)) > >> > .reduceByKey((count1, count2) => count1 + count2) > >> > > >> > wc2.saveAsTextFile("output/some/directory") > >> > > >> > sc.stop() > >> > > >> > It works fine compiled and executed with 0.9.1. If I recompile and run > >> with > >> > 1.0.0-RC1, where the same output directory still exists, I get this > >> > familiar Hadoop-ish exception: > >> > > >> > [error] (run-main-0) > org.apache.hadoop.mapred.FileAlreadyExistsException: > >> > Output directory > >> > > >> > file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc > >> > already exists > >> > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > >> > > >> > file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc > >> > already exists > >> > at > >> > > >> > org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121) > >> > at > >> > > >> > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:749) > >> > at > >> > > >> > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662) > >> > at > >> > > >> > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581) > >> > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057) > >> > at spark.activator.WordCount2$.main(WordCount2.scala:42) > >> > at spark.activator.WordCount2.main(WordCount2.scala) > >> > ... > >> > > >> > Thoughts? > >> > > >> > > >> > On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell <pwend...@gmail.com> > >> wrote: > >> > > >> >> Hey All, > >> >> > >> >> This is not an official vote, but I wanted to cut an RC so that > people > >> can > >> >> test against the Maven artifacts, test building with their > >> configuration, > >> >> etc. We are still chasing down a few issues and updating docs, etc. > >> >> > >> >> If you have issues or bug reports for this release, please send an > >> e-mail > >> >> to the Spark dev list and/or file a JIRA. > >> >> > >> >> Commit: d636772 (v1.0.0-rc3) > >> >> > >> >> > >> > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221 > >> >> > >> >> Binaries: > >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc3/ > >> >> > >> >> Docs: > >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/ > >> >> > >> >> Repository: > >> >> > https://repository.apache.org/content/repositories/orgapachespark-1012/ > >> >> > >> >> == API Changes == > >> >> If you want to test building against Spark there are some minor API > >> >> changes. We'll get these written up for the final release but I'm > >> noting a > >> >> few here (not comprehensive): > >> >> > >> >> changes to ML vector specification: > >> >> > >> >> > >> > http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10 > >> >> > >> >> changes to the Java API: > >> >> > >> >> > >> > http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark > >> >> > >> >> coGroup and related functions now return Iterable[T] instead of > Seq[T] > >> >> ==> Call toSeq on the result to restore the old behavior > >> >> > >> >> SparkContext.jarOfClass returns Option[String] instead of Seq[String] > >> >> ==> Call toSeq on the result to restore old behavior > >> >> > >> >> Streaming classes have been renamed: > >> >> NetworkReceiver -> Receiver > >> >> > >> > > >> > > >> > > >> > -- > >> > Dean Wampler, Ph.D. > >> > Typesafe > >> > @deanwampler > >> > http://typesafe.com > >> > http://polyglotprogramming.com > >> > > > > > > > > -- > > Dean Wampler, Ph.D. > > Typesafe > > @deanwampler > > http://typesafe.com > > http://polyglotprogramming.com > -- Manu Suryavansh