Re: Checkpointed RDD still causing StackOverflow

2014-06-23 Thread Xiangrui Meng
Calling checkpoint() alone doesn't cut the lineage. It only marks the
RDD as to be checkpointed. The lineage is cut after the first time
this RDD is materialized. You see StackOverflow becaure the lineage is
still there. -Xiangrui

On Sun, Jun 22, 2014 at 6:37 PM, dash b...@nd.edu wrote:
 Hi Xiangrui,

 According to my knowledge, calling count is for materialize the RDD, does
 collect do the same thing since it also an action? I can not call count
 because for a Graph object, count does not materialize the RDD. I already
 send an issue on that.

 My question is, why there still have stack overflow even if `isCheckpointed`
 is true?



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Checkpointed-RDD-still-causing-StackOverflow-tp7066p7068.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


RFC: [SPARK-529] Create constants for known config variables.

2014-06-23 Thread Marcelo Vanzin
I started with some code to implement an idea I had for SPARK-529, and
before going much further (since it's a large and kinda boring change)
I'd like to get some feedback from people.

Current code it at:
https://github.com/vanzin/spark/tree/SPARK-529

There are still some parts I haven't fully fleshed out yet (see TODO
list in the commit message), but that's the basic idea. Let me know if
you have any feedback or different ideas.

Thanks!


-- 
Marcelo


Re: RFC: [SPARK-529] Create constants for known config variables.

2014-06-23 Thread Matei Zaharia
Hey Marcelo,

When we did the configuration pull request, we actually avoided having a big 
list of defaults in one class file, because this creates a file that all the 
components in the project depend on. For example, since we have some settings 
specific to streaming and the REPL, do we want those settings to appear in a 
file that’s in “core”? It might be better to just make sure we use constants 
for defaults in the code, or maybe have a separate class per project.

The other problem with this kind of change is that it’s disruptive to all the 
other ongoing patches, so I wouldn’t consider it high-priority right now. We 
haven’t had a ton of problems with settings being mistyped.

If you do want to do something like this though, apart from the comment above 
about modules, please make sure this is not a public API. As soon as we add it 
to the API, it means we can’t change or remove those config settings. I’d also 
suggest giving each config setting a single name instead of having “ui”, 
“shuffle”, etc objects, since the chained calls to conf.ui.port.value look 
somewhat confusing.

Matei

On Jun 23, 2014, at 3:57 PM, Marcelo Vanzin van...@cloudera.com wrote:

 I started with some code to implement an idea I had for SPARK-529, and
 before going much further (since it's a large and kinda boring change)
 I'd like to get some feedback from people.
 
 Current code it at:
 https://github.com/vanzin/spark/tree/SPARK-529
 
 There are still some parts I haven't fully fleshed out yet (see TODO
 list in the commit message), but that's the basic idea. Let me know if
 you have any feedback or different ideas.
 
 Thanks!
 
 
 -- 
 Marcelo



Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-06-23 Thread Reynold Xin
Mridul,

Can you comment a little bit more on this issue? We are running into the
same stack trace but not sure whether it is just different Spark versions
on each cluster (doesn't seem likely) or a bug in Spark.

Thanks.



On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan mri...@gmail.com
wrote:

 I suspect this is an issue we have fixed internally here as part of a
 larger change - the issue we fixed was not a config issue but bugs in
 spark.

 Unfortunately we plan to contribute this as part of 1.1

 Regards,
 Mridul
 On 17-May-2014 4:09 pm, sam (JIRA) j...@apache.org wrote:

  sam created SPARK-1867:
  --
 
   Summary: Spark Documentation Error causes
  java.lang.IllegalStateException: unread block data
   Key: SPARK-1867
   URL: https://issues.apache.org/jira/browse/SPARK-1867
   Project: Spark
Issue Type: Bug
  Reporter: sam
 
 
  I've employed two System Administrators on a contract basis (for quite a
  bit of money), and both contractors have independently hit the following
  exception.  What we are doing is:
 
  1. Installing Spark 0.9.1 according to the documentation on the website,
  along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
  2. Building a fat jar with a Spark app with sbt then trying to run it on
  the cluster
 
  I've also included code snippets, and sbt deps at the bottom.
 
  When I've Googled this, there seems to be two somewhat vague responses:
  a) Mismatching spark versions on nodes/user code
  b) Need to add more jars to the SparkConf
 
  Now I know that (b) is not the problem having successfully run the same
  code on other clusters while only including one jar (it's a fat jar).
 
  But I have no idea how to check for (a) - it appears Spark doesn't have
  any version checks or anything - it would be nice if it checked versions
  and threw a mismatching version exception: you have user code using
  version X and node Y has version Z.
 
  I would be very grateful for advice on this.
 
  The exception:
 
  Exception in thread main org.apache.spark.SparkException: Job aborted:
  Task 0.0:1 failed 32 times (most recent failure: Exception failure:
  java.lang.IllegalStateException: unread block data)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
  at
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at
  scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at org.apache.spark.scheduler.DAGScheduler.org
  $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
  at scala.Option.foreach(Option.scala:236)
  at
 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
  at akka.actor.ActorCell.invoke(ActorCell.scala:456)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
  at
 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
  at
  scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at
  scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at
 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
  14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to
  java.lang.IllegalStateException: unread block data [duplicate 59]
 
  My code snippet:
 
  val conf = new SparkConf()
 .setMaster(clusterMaster)
 .setAppName(appName)
 .setSparkHome(sparkHome)
 .setJars(SparkContext.jarOfClass(this.getClass))
 
  println(count =  + new
 SparkContext(conf).textFile(someHdfsPath).count())
 
  My SBT dependencies:
 
  // relevant
  org.apache.spark % spark-core_2.10 % 0.9.1,
  org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 
  // standard, probably unrelated
  com.github.seratch %% awscala % [0.2,),
  org.scalacheck %% scalacheck % 1.10.1 % test,
  org.specs2 %% specs2 % 1.14 % test,
  org.scala-lang % scala-reflect % 2.10.3,
  org.scalaz %% scalaz-core % 7.0.5,
  

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-06-23 Thread Mridul Muralidharan
There are a few interacting issues here - and unfortunately I dont
recall all of it (since this was fixed a few months back).
From memory though :

a) With shuffle consolidation, data sent to remote node incorrectly
includes data from partially constructed blocks - not just the request
blocks.
Actually, with shuffle consolidation (with and without failures) quite
a few things broke.

b) There might have been a few other bugs in DiskBlockObjectWriter too.

c) We also suspected buffers overlapping when using cached kryo
serializer (though never proved this, just disabled caching across
board for now : and always create new instance).

The way we debug'ed it is by introducing an Input/Output stream which
introduced checksum into the data stream and validating that at each
side for compression, serialization, etc.

Apologies for being non specific ... I really dont have the details
right now, and our internal branch is in flux due to merge effort to
port our local changes to master.
Hopefully we will be able to submit PR's as soon as this is done and
testcases are added to validate.


Regards,
Mridul




On Tue, Jun 24, 2014 at 10:21 AM, Reynold Xin r...@databricks.com wrote:
 Mridul,

 Can you comment a little bit more on this issue? We are running into the
 same stack trace but not sure whether it is just different Spark versions
 on each cluster (doesn't seem likely) or a bug in Spark.

 Thanks.



 On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

 I suspect this is an issue we have fixed internally here as part of a
 larger change - the issue we fixed was not a config issue but bugs in
 spark.

 Unfortunately we plan to contribute this as part of 1.1

 Regards,
 Mridul
 On 17-May-2014 4:09 pm, sam (JIRA) j...@apache.org wrote:

  sam created SPARK-1867:
  --
 
   Summary: Spark Documentation Error causes
  java.lang.IllegalStateException: unread block data
   Key: SPARK-1867
   URL: https://issues.apache.org/jira/browse/SPARK-1867
   Project: Spark
Issue Type: Bug
  Reporter: sam
 
 
  I've employed two System Administrators on a contract basis (for quite a
  bit of money), and both contractors have independently hit the following
  exception.  What we are doing is:
 
  1. Installing Spark 0.9.1 according to the documentation on the website,
  along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
  2. Building a fat jar with a Spark app with sbt then trying to run it on
  the cluster
 
  I've also included code snippets, and sbt deps at the bottom.
 
  When I've Googled this, there seems to be two somewhat vague responses:
  a) Mismatching spark versions on nodes/user code
  b) Need to add more jars to the SparkConf
 
  Now I know that (b) is not the problem having successfully run the same
  code on other clusters while only including one jar (it's a fat jar).
 
  But I have no idea how to check for (a) - it appears Spark doesn't have
  any version checks or anything - it would be nice if it checked versions
  and threw a mismatching version exception: you have user code using
  version X and node Y has version Z.
 
  I would be very grateful for advice on this.
 
  The exception:
 
  Exception in thread main org.apache.spark.SparkException: Job aborted:
  Task 0.0:1 failed 32 times (most recent failure: Exception failure:
  java.lang.IllegalStateException: unread block data)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
  at
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at
  scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at org.apache.spark.scheduler.DAGScheduler.org
  $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
  at scala.Option.foreach(Option.scala:236)
  at
 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
  at akka.actor.ActorCell.invoke(ActorCell.scala:456)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
  at
 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
  at