Eclipse Scala IDE/Scala test and Wiki
I was able to set up Spark in Eclipse using the Spark IDE plugin. I also got unit tests running with Scala Test, which makes development quick and easy. I wanted to document the setup steps in this wiki page: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup I can't seem to edit that page. Confluence usually has a an Edit button in the upper right, but it does not appear for me, even though I am logged in. Am I missing something? - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Eclipse-Scala-IDE-Scala-test-and-Wiki-tp6908.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
Hi Patrick, Thanks for all the explanations, that makes sense. @DeveloperApi worries me a little bit especially because of the things Colin mentions - it's sort of hard to make people move off of APIs, or support different versions of the same API. But maybe if expectations (or lack thereof) are set up front, there will be less issues. You mentioned something in your shading argument that kinda reminded me of something. Spark currently depends on slf4j implementations and log4j with compile scope. I'd argue that's the wrong approach if we're talking about Spark being used embedded inside applications; Spark should only depend on the slf4j API package, and let the application provide the underlying implementation. The assembly jars could include an implementation (since I assume those are currently targeted at cluster deployment and not embedding). That way there is less sources of conflict at runtime (i.e. the multiple implementation jars messages you can see when running some Spark programs). On Fri, May 30, 2014 at 10:54 PM, Patrick Wendell pwend...@gmail.com wrote: 2. Many libraries like logging subsystems, configuration systems, etc rely on static state and initialization. I'm not totally sure how e.g. slf4j initializes itself if you have both a shaded and non-shaded copy of slf4j present. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (rc5)
On Mon, Jun 2, 2014 at 6:05 PM, Marcelo Vanzin van...@cloudera.com wrote: You mentioned something in your shading argument that kinda reminded me of something. Spark currently depends on slf4j implementations and log4j with compile scope. I'd argue that's the wrong approach if we're talking about Spark being used embedded inside applications; Spark should only depend on the slf4j API package, and let the application provide the underlying implementation. Good idea in general; in practice, the drawback is that you can't do things like set log levels if you only depend on the SLF4J API. There are a few cases where that's nice to control, and that's only possible if you bind to a particular logger as well. You typically bundle a SLF4J binding anyway, to give a default, or else the end-user has to know to also bind some SLF4J logger to get output. Of course it does make for a bit more surgery if you want to override the binding this way. Shading can bring a whole new level of confusion; I myself would only use it where essential as a workaround. Same with trying to make more elaborate custom classloading schemes -- never in my darkest nightmares have I imagine the failure modes that probably pop up when that goes wrong. I think the library collisions will get better over time as only later versions of Hadoop are in scope, for example, and/or one build system is in play. I like tackling complexity along those lines first.
Which version does the binary compatibility test against by default?
Is there a way to specify the target version? -Xiangrui
Re: Eclipse Scala IDE/Scala test and Wiki
Madhu, can you send me your Wiki username? (Sending it just to me is fine.) I can add you to the list to edit it. Matei On Jun 2, 2014, at 6:27 PM, Reynold Xin r...@databricks.com wrote: I tried but didn't find where I could add you. You probably need Matei to help out with this. On Mon, Jun 2, 2014 at 7:43 AM, Madhu ma...@madhu.com wrote: I was able to set up Spark in Eclipse using the Spark IDE plugin. I also got unit tests running with Scala Test, which makes development quick and easy. I wanted to document the setup steps in this wiki page: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup I can't seem to edit that page. Confluence usually has a an Edit button in the upper right, but it does not appear for me, even though I am logged in. Am I missing something? - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Eclipse-Scala-IDE-Scala-test-and-Wiki-tp6908.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Which version does the binary compatibility test against by default?
Yeah - check out sparkPreviousArtifact in the build: https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L325 - Patrick On Mon, Jun 2, 2014 at 5:30 PM, Xiangrui Meng men...@gmail.com wrote: Is there a way to specify the target version? -Xiangrui
Spark 1.1-snapshot: java.io.FileNotFoundException from ShuffleMapTask
Quite often I notice that shuffle file is missing thus FileNotFoundException is throws. Any idea why shuffle file will be missing ? Am I running low in memory? (I am using latest code from master branch on yarn-hadoop-2.2) -- java.io.FileNotFoundException: /var/storage/sda3/nm-local/usercache/npanj/appcache/application_1401394632504_0131/spark-local-20140603050956-6728/20/shuffle_0_2_97 (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177) at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-1-snapshot-java-io-FileNotFoundException-from-ShuffleMapTask-tp6915.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.