Re: Spark 1.0.0 rc3

Manu Suryavansh Thu, 01 May 2014 18:46:28 -0700

Hi,

I tried to build the 1.0.0 rc3 version with Java 8 and I got the error
- java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC
overhead limit exceeded
I am building on a Core-i7(Quad core) windows laptop with 8 GB RAM.


Earlier I had tried to build Spark 0.9.1 with Java 8 and I had gotten an
error about comparator.class not found - which was mentioned today on
another thread, so I am not getting that error now. I have successfully
build Spark 0.9.0 with Java 1.7.

[image: Inline image 1]

Thanks,
Manu


On Tue, Apr 29, 2014 at 10:43 PM, Patrick Wendell <[email protected]>wrote:

> That suggestion got lost along the way and IIRC the patch didn't have
> that. It's a good idea though, if nothing else to provide a simple
> means for backwards compatibility.
>
> I created a JIRA for this. It's very straightforward so maybe someone
> can pick it up quickly:
> https://issues.apache.org/jira/browse/SPARK-1677
>
>
> On Tue, Apr 29, 2014 at 2:20 PM, Dean Wampler <[email protected]>
> wrote:
> > Thanks. I'm fine with the logic change, although I was a bit surprised to
> > see Hadoop used for file I/O.
> >
> > Anyway, the jira issue and pull request discussions mention a flag to
> > enable overwrites. That would be very convenient for a tutorial I'm
> > writing, although I wouldn't recommend it for normal use, of course.
> > However, I can't figure out if this actually exists. I found the
> > spark.files.overwrite property, but that doesn't apply.  Does this
> override
> > flag, method call, or method argument actually exist?
> >
> > Thanks,
> > Dean
> >
> >
> > On Tue, Apr 29, 2014 at 1:54 PM, Patrick Wendell <[email protected]>
> wrote:
> >
> >> Hi Dean,
> >>
> >> We always used the Hadoop libraries here to read and write local
> >> files. In Spark 1.0 we started enforcing the rule that you can't
> >> over-write an existing directory because it can cause
> >> confusing/undefined behavior if multiple jobs output to the directory
> >> (they partially clobber each other's output).
> >>
> >> https://issues.apache.org/jira/browse/SPARK-1100
> >> https://github.com/apache/spark/pull/11
> >>
> >> In the JIRA I actually proposed slightly deviating from Hadoop
> >> semantics and allowing the directory to exist if it is empty, but I
> >> think in the end we decided to just go with the exact same semantics
> >> as Hadoop (i.e. empty directories are a problem).
> >>
> >> - Patrick
> >>
> >> On Tue, Apr 29, 2014 at 9:43 AM, Dean Wampler <[email protected]>
> >> wrote:
> >> > I'm observing one anomalous behavior. With the 1.0.0 libraries, it's
> >> using
> >> > HDFS classes for file I/O, while the same script compiled and running
> >> with
> >> > 0.9.1 uses only the local-mode File IO.
> >> >
> >> > The script is a variation of the Word Count script. Here are the
> "guts":
> >> >
> >> > object WordCount2 {
> >> >   def main(args: Array[String]) = {
> >> >
> >> >     val sc = new SparkContext("local", "Word Count (2)")
> >> >
> >> >     val input = sc.textFile(".../some/local/file").map(line =>
> >> > line.toLowerCase)
> >> >     input.cache
> >> >
> >> >     val wc2 = input
> >> >       .flatMap(line => line.split("""\W+"""))
> >> >       .map(word => (word, 1))
> >> >       .reduceByKey((count1, count2) => count1 + count2)
> >> >
> >> >     wc2.saveAsTextFile("output/some/directory")
> >> >
> >> >     sc.stop()
> >> >
> >> > It works fine compiled and executed with 0.9.1. If I recompile and run
> >> with
> >> > 1.0.0-RC1, where the same output directory still exists, I get this
> >> > familiar Hadoop-ish exception:
> >> >
> >> > [error] (run-main-0)
> org.apache.hadoop.mapred.FileAlreadyExistsException:
> >> > Output directory
> >> >
> >>
> file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
> >> > already exists
> >> > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> >> >
> >>
> file:/Users/deanwampler/projects/typesafe/activator/activator-spark/output/kjv-wc
> >> > already exists
> >> >  at
> >> >
> >>
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> >> > at
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:749)
> >> >  at
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:662)
> >> > at
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:581)
> >> >  at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1057)
> >> > at spark.activator.WordCount2$.main(WordCount2.scala:42)
> >> >  at spark.activator.WordCount2.main(WordCount2.scala)
> >> > ...
> >> >
> >> > Thoughts?
> >> >
> >> >
> >> > On Tue, Apr 29, 2014 at 3:05 AM, Patrick Wendell <[email protected]>
> >> wrote:
> >> >
> >> >> Hey All,
> >> >>
> >> >> This is not an official vote, but I wanted to cut an RC so that
> people
> >> can
> >> >> test against the Maven artifacts, test building with their
> >> configuration,
> >> >> etc. We are still chasing down a few issues and updating docs, etc.
> >> >>
> >> >> If you have issues or bug reports for this release, please send an
> >> e-mail
> >> >> to the Spark dev list and/or file a JIRA.
> >> >>
> >> >> Commit: d636772 (v1.0.0-rc3)
> >> >>
> >> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d636772ea9f98e449a038567b7975b1a07de3221
> >> >>
> >> >> Binaries:
> >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc3/
> >> >>
> >> >> Docs:
> >> >> http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/
> >> >>
> >> >> Repository:
> >> >>
> https://repository.apache.org/content/repositories/orgapachespark-1012/
> >> >>
> >> >> == API Changes ==
> >> >> If you want to test building against Spark there are some minor API
> >> >> changes. We'll get these written up for the final release but I'm
> >> noting a
> >> >> few here (not comprehensive):
> >> >>
> >> >> changes to ML vector specification:
> >> >>
> >> >>
> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/mllib-guide.html#from-09-to-10
> >> >>
> >> >> changes to the Java API:
> >> >>
> >> >>
> >>
> http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark
> >> >>
> >> >> coGroup and related functions now return Iterable[T] instead of
> Seq[T]
> >> >> ==> Call toSeq on the result to restore the old behavior
> >> >>
> >> >> SparkContext.jarOfClass returns Option[String] instead of Seq[String]
> >> >> ==> Call toSeq on the result to restore old behavior
> >> >>
> >> >> Streaming classes have been renamed:
> >> >> NetworkReceiver -> Receiver
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Dean Wampler, Ph.D.
> >> > Typesafe
> >> > @deanwampler
> >> > http://typesafe.com
> >> > http://polyglotprogramming.com
> >>
> >
> >
> >
> > --
> > Dean Wampler, Ph.D.
> > Typesafe
> > @deanwampler
> > http://typesafe.com
> > http://polyglotprogramming.com
>



-- 
Manu Suryavansh

Re: Spark 1.0.0 rc3

Reply via email to