Unsubscribe

2023-12-16 Thread Andrew Milkowski



unsubscribe

2021-01-24 Thread Andrew Milkowski



unsubscribe

2019-01-30 Thread Andrew Milkowski
unsubscribe


Re: freeing up memory occupied by processed Stream Blocks

2017-01-25 Thread Andrew Milkowski
Hi Takeshi thanks for the answer, looks like spark would free up old RDD's
however using admin UI we see ie

 Block ID, it corresponds with each receiver and a timestamp.
For example, block input-0-1485275695898 is from receiver 0 and it was
created at 1485275695898 (1/24/2017, 11:34:55 AM GMT-5:00).
That corresponds with the start time.

that block even after running whole day is still not being released! RDD's
in our scenario are Strings coming from kinesis stream

is there a way to explicitly purge RDD after last step in M/R process once
and for all ?

thanks much!

On Fri, Jan 20, 2017 at 2:35 AM, Takeshi Yamamuro <linguin@gmail.com>
wrote:

> Hi,
>
> AFAIK, the blocks of minibatch RDDs are checked every job finished, and
> older blocks automatically removed (See: https://github.com/
> apache/spark/blob/master/streaming/src/main/scala/org/
> apache/spark/streaming/dstream/DStream.scala#L463).
>
> You can control this behaviour by StreamingContext#remember to some extent.
>
> // maropu
>
>
> On Fri, Jan 20, 2017 at 3:17 AM, Andrew Milkowski <amgm2...@gmail.com>
> wrote:
>
>> hello
>>
>> using spark 2.0.2  and while running sample streaming app with kinesis
>> noticed (in admin ui Storage tab)  "Stream Blocks" for each worker keeps
>> climbing up
>>
>> then also (on same ui page) in Blocks section I see blocks such as below
>>
>> input-0-1484753367056
>>
>> that are marked as Memory Serialized
>>
>> that do not seem to be "released"
>>
>> above eventually consumes executor memories leading to out of memory
>> exception on some
>>
>> is there a way to "release" these blocks free them up , app is sample m/r
>>
>> I attempted rdd.unpersist(false) in the code but that did not lead to
>> memory free up
>>
>> thanks much in advance!
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>


freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Andrew Milkowski
hello

using spark 2.0.2  and while running sample streaming app with kinesis
noticed (in admin ui Storage tab)  "Stream Blocks" for each worker keeps
climbing up

then also (on same ui page) in Blocks section I see blocks such as below

input-0-1484753367056

that are marked as Memory Serialized

that do not seem to be "released"

above eventually consumes executor memories leading to out of memory
exception on some

is there a way to "release" these blocks free them up , app is sample m/r

I attempted rdd.unpersist(false) in the code but that did not lead to
memory free up

thanks much in advance!


Futures timed out after [120 seconds]

2016-02-08 Thread Andrew Milkowski
Hello, have question , we seeing below exceptions, and at the moment are
enabling JVM profiler to look into gc activity on workers and if you have
any other suggestions do let know please , we dont just want increase rpc
timeout (from 120) to 600 sec lets say but get to reason why workers
timeout

but the question also is , it appears this timeout causes data loss, data
in mappers does not make to reducers and back to driver to collect the data

so the resiliency is lost it seems. because spark does not look like it is
retrying data at some later time but gives up and input record is lost

-Andrew


16/02/08 08:33:37 ERROR YarnScheduler: Lost executor 265 on
ip-172-20-35-115.ec2.internal: remote Rpc client disassociated
[Stage 4313813:>  (0 + 94)
/ 95]16/02/08 08:35:35 ERROR ContextCleaner: Error cleaning broadcast
4311376
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org
$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242)
at
org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:136)
at
org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
at
org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67)
at
org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136)
at org.apache.spark.ContextCleaner.org
$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
[2/8/16, 9:25 AM] Chad Ladensack (c...@revcontent.com):I'll still send over
the log file
[2/8/16, 9:25 AM] Chad Ladensack (c...@revcontent.com):Some guy online was
asking about garbage collection and how it can throw off executors
[2/8/16, 9:25 AM] Chad Ladensack (c...@revcontent.com):
http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning

[2/8/16, 9:26 AM] Chad Ladensack (c...@revcontent.com):He referenced the
tuning guide and it shows how to measure it
[2/8/16, 9:26 AM] Chad Ladensack (c...@revcontent.com):idk, maybe worth
looking into?
[2/8/16, 9:26 AM] Chad Ladensack (c...@revcontent.com):I'll get the log and
email it now


akka.tcp://spark@localhost:7077/user/MapOutputTracker akka.actor.ActorNotFound

2014-07-28 Thread Andrew Milkowski
Hello community

Using following distros:

spark:
http://archive.cloudera.com/cdh5/cdh/5/spark-1.0.0-cdh5.1.0-src.tar.gz
mesos: http://archive.apache.org/dist/mesos/0.19.0/mesos-0.19.0.tar.gz

both assembled with with scala 2.10.4 and java 7

my

#!/usr/bin/env bash

my spark-env.sh looks as follows:

export SCALA_HOME=/opt/local/src/scala/scala-2.10.4
export
MESOS_NATIVE_LIBRARY=/opt/local/src/mesos/mesos-0.19.0/dist/lib/libmesos.so
export
SPARK_EXECUTOR_URI=hdfs://localhost:8020/spark/spark-1.0.0-cdh5.1.0-bin-2.3.0-cdh5.0.3.tgz
export
HADOOP_CONF_DIR=/opt/local/cloudera/hadoop/cdh5/hadoop-2.3.0-cdh5.0.3/etc/hadoop
export STANDALONE_SPARK_MASTER_HOST=192.168.122.1

export MASTER=mesos://192.168.122.1
export SPARK_MASTER_IP=192.168.122.1
export SPARK_LOCAL_IP=192.168.122.1

When I run a sample spark job I get (below)

thanks in advance for explanation/fix to the exception

Note if I run spark job on spark by itself (or hadoop yarn) job runs
without any problem


WARNING: Logging before InitGoogleLogging() is written to STDERR
I0728 14:33:52.421203 19678 fetcher.cpp:73] Fetching URI
'hdfs://localhost:8020/spark/spark-1.0.0-cdh5.1.0-bin-2.3.0-cdh5.0.3.tgz'
I0728 14:33:52.421346 19678 fetcher.cpp:102] Downloading resource from
'hdfs://localhost:8020/spark/spark-1.0.0-cdh5.1.0-bin-2.3.0-cdh5.0.3.tgz'
to
'/tmp/mesos/slaves/20140724-134606-16777343-5050-25095-0/frameworks/20140728-143300-24815808-5050-19059-/executors/20140724-134606-16777343-5050-25095-0/runs/c9c9eaa2-b722-4215-a35a-dc1c353963b9/spark-1.0.0-cdh5.1.0-bin-2.3.0-cdh5.0.3.tgz'
I0728 14:33:58.201438 19678 fetcher.cpp:61] Extracted resource
'/tmp/mesos/slaves/20140724-134606-16777343-5050-25095-0/frameworks/20140728-143300-24815808-5050-19059-/executors/20140724-134606-16777343-5050-25095-0/runs/c9c9eaa2-b722-4215-a35a-dc1c353963b9/spark-1.0.0-cdh5.1.0-bin-2.3.0-cdh5.0.3.tgz'
into
'/tmp/mesos/slaves/20140724-134606-16777343-5050-25095-0/frameworks/20140728-143300-24815808-5050-19059-/executors/20140724-134606-16777343-5050-25095-0/runs/c9c9eaa2-b722-4215-a35a-dc1c353963b9'
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/07/28 14:33:59 INFO SparkHadoopUtil: Using Spark's default log4j
profile: org/apache/spark/log4j-defaults.properties
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0728 14:33:59.896520 19785 exec.cpp:131] Version: 0.19.0
I0728 14:33:59.899474 19805 exec.cpp:205] Executor registered on slave
20140724-134606-16777343-5050-25095-0
14/07/28 14:33:59 INFO MesosExecutorBackend: Registered with Mesos as
executor ID 20140724-134606-16777343-5050-25095-0
14/07/28 14:34:00 INFO SecurityManager: Changing view acls to: amilkowski
14/07/28 14:34:00 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(amilkowski)
14/07/28 14:34:00 INFO Slf4jLogger: Slf4jLogger started
14/07/28 14:34:00 INFO Remoting: Starting remoting
14/07/28 14:34:01 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@localhost:40412]
14/07/28 14:34:01 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@localhost:40412]
14/07/28 14:34:01 INFO SparkEnv: Connecting to MapOutputTracker:
akka.tcp://spark@localhost:7077/user/MapOutputTracker
akka.actor.ActorNotFound: Actor not found for:
ActorSelection[Actor[akka.tcp://spark@localhost
:7077/]/user/MapOutputTracker]
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:66)
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:64)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:269)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:512)
at 

running Spark App on Yarn produces: Exception in thread main java.lang.NoSuchFieldException: DEFAULT_YARN_APPLICATION_CLASSPATH

2014-07-16 Thread Andrew Milkowski
Hello community,

tried to run storm app on yarn, using cloudera hadoop and spark distro
(from http://archive.cloudera.com/cdh5/cdh/5)

hadoop version: hadoop-2.3.0-cdh5.0.3.tar.gz
spark version: spark-0.9.0-cdh5.0.3.tar.gz

DEFAULT_YARN_APPLICATION_CLASSPATH is part of hadoop-api-yarn jar ...

thanks for any replies!

[amilkowski@localhost spark-streaming]$ ./test-yarn.sh
14/07/16 12:47:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/16 12:47:17 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/07/16 12:47:17 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 1
14/07/16 12:47:17 INFO yarn.Client: Queue info ... queueName: root.default,
queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
  queueApplicationCount = 0, queueChildQueueCount = 0
14/07/16 12:47:17 INFO yarn.Client: Max mem capabililty of a single
resource in this cluster 8192
14/07/16 12:47:17 INFO yarn.Client: Preparing Local resources
14/07/16 12:47:18 INFO yarn.Client: Uploading
file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/examples/target/scala-2.10/spark-examples-assembly-0.9.0-cdh5.0.3.jar
to
hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-examples-assembly-0.9.0-cdh5.0.3.jar
14/07/16 12:47:19 INFO yarn.Client: Uploading
file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/assembly/target/scala-2.10/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
to
hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
14/07/16 12:47:19 INFO yarn.Client: Setting up the launch environment
Exception in thread main java.lang.NoSuchFieldException:
DEFAULT_YARN_APPLICATION_CLASSPATH
at java.lang.Class.getField(Class.java:1579)
at
org.apache.spark.deploy.yarn.ClientBase$.getDefaultYarnApplicationClasspath(ClientBase.scala:403)
at
org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
at
org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
at scala.Option.getOrElse(Option.scala:120)
at
org.apache.spark.deploy.yarn.ClientBase$.populateHadoopClasspath(ClientBase.scala:385)
at
org.apache.spark.deploy.yarn.ClientBase$.populateClasspath(ClientBase.scala:444)
at
org.apache.spark.deploy.yarn.ClientBase$class.setupLaunchEnv(ClientBase.scala:274)
at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:41)
at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:77)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:98)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:183)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
[amilkowski@localhost spark-streaming]$


Re: running Spark App on Yarn produces: Exception in thread main java.lang.NoSuchFieldException: DEFAULT_YARN_APPLICATION_CLASSPATH

2014-07-16 Thread Andrew Milkowski
thanks Sandzy, no CM-managed cluster, straight from cloudera tar (
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.3.tar.gz)

trying your suggestion immediate! thanks so much for taking time..


On Wed, Jul 16, 2014 at 1:10 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Andrew,

 Are you running on a CM-managed cluster?  I just checked, and there is a
 bug here (fixed in 1.0), but it's avoided by having
 yarn.application.classpath defined in your yarn-site.xml.

 -Sandy


 On Wed, Jul 16, 2014 at 10:02 AM, Sean Owen so...@cloudera.com wrote:

 Somewhere in here, you are not actually running vs Hadoop 2 binaries.
 Your cluster is certainly Hadoop 2, but your client is not using the
 Hadoop libs you think it is (or your compiled binary is linking
 against Hadoop 1, which is the default for Spark -- did you change
 it?)

 On Wed, Jul 16, 2014 at 5:45 PM, Andrew Milkowski amgm2...@gmail.com
 wrote:
  Hello community,
 
  tried to run storm app on yarn, using cloudera hadoop and spark distro
 (from
  http://archive.cloudera.com/cdh5/cdh/5)
 
  hadoop version: hadoop-2.3.0-cdh5.0.3.tar.gz
  spark version: spark-0.9.0-cdh5.0.3.tar.gz
 
  DEFAULT_YARN_APPLICATION_CLASSPATH is part of hadoop-api-yarn jar ...
 
  thanks for any replies!
 
  [amilkowski@localhost spark-streaming]$ ./test-yarn.sh
  14/07/16 12:47:17 WARN util.NativeCodeLoader: Unable to load
 native-hadoop
  library for your platform... using builtin-java classes where applicable
  14/07/16 12:47:17 INFO client.RMProxy: Connecting to ResourceManager at
  /0.0.0.0:8032
  14/07/16 12:47:17 INFO yarn.Client: Got Cluster metric info from
  ApplicationsManager (ASM), number of NodeManagers: 1
  14/07/16 12:47:17 INFO yarn.Client: Queue info ... queueName:
 root.default,
  queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
queueApplicationCount = 0, queueChildQueueCount = 0
  14/07/16 12:47:17 INFO yarn.Client: Max mem capabililty of a single
 resource
  in this cluster 8192
  14/07/16 12:47:17 INFO yarn.Client: Preparing Local resources
  14/07/16 12:47:18 INFO yarn.Client: Uploading
 
 file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/examples/target/scala-2.10/spark-examples-assembly-0.9.0-cdh5.0.3.jar
  to
 
 hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-examples-assembly-0.9.0-cdh5.0.3.jar
  14/07/16 12:47:19 INFO yarn.Client: Uploading
 
 file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/assembly/target/scala-2.10/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
  to
 
 hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
  14/07/16 12:47:19 INFO yarn.Client: Setting up the launch environment
  Exception in thread main java.lang.NoSuchFieldException:
  DEFAULT_YARN_APPLICATION_CLASSPATH
  at java.lang.Class.getField(Class.java:1579)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.getDefaultYarnApplicationClasspath(ClientBase.scala:403)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
  at scala.Option.getOrElse(Option.scala:120)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.populateHadoopClasspath(ClientBase.scala:385)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.populateClasspath(ClientBase.scala:444)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$class.setupLaunchEnv(ClientBase.scala:274)
  at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:41)
  at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:77)
  at org.apache.spark.deploy.yarn.Client.run(Client.scala:98)
  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:183)
  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
  [amilkowski@localhost spark-streaming]$
 





Re: running Spark App on Yarn produces: Exception in thread main java.lang.NoSuchFieldException: DEFAULT_YARN_APPLICATION_CLASSPATH

2014-07-16 Thread Andrew Milkowski
Sandy, perfect! you saved me tons of time! added this in yarn-site.xml job
ran to completion

Can you do me (us) a favor and push newest and patched spark/hadoop to cdh5
(tar's) if possible

and thanks again for this (huge time saver)


On Wed, Jul 16, 2014 at 1:10 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Andrew,

 Are you running on a CM-managed cluster?  I just checked, and there is a
 bug here (fixed in 1.0), but it's avoided by having
 yarn.application.classpath defined in your yarn-site.xml.

 -Sandy


 On Wed, Jul 16, 2014 at 10:02 AM, Sean Owen so...@cloudera.com wrote:

 Somewhere in here, you are not actually running vs Hadoop 2 binaries.
 Your cluster is certainly Hadoop 2, but your client is not using the
 Hadoop libs you think it is (or your compiled binary is linking
 against Hadoop 1, which is the default for Spark -- did you change
 it?)

 On Wed, Jul 16, 2014 at 5:45 PM, Andrew Milkowski amgm2...@gmail.com
 wrote:
  Hello community,
 
  tried to run storm app on yarn, using cloudera hadoop and spark distro
 (from
  http://archive.cloudera.com/cdh5/cdh/5)
 
  hadoop version: hadoop-2.3.0-cdh5.0.3.tar.gz
  spark version: spark-0.9.0-cdh5.0.3.tar.gz
 
  DEFAULT_YARN_APPLICATION_CLASSPATH is part of hadoop-api-yarn jar ...
 
  thanks for any replies!
 
  [amilkowski@localhost spark-streaming]$ ./test-yarn.sh
  14/07/16 12:47:17 WARN util.NativeCodeLoader: Unable to load
 native-hadoop
  library for your platform... using builtin-java classes where applicable
  14/07/16 12:47:17 INFO client.RMProxy: Connecting to ResourceManager at
  /0.0.0.0:8032
  14/07/16 12:47:17 INFO yarn.Client: Got Cluster metric info from
  ApplicationsManager (ASM), number of NodeManagers: 1
  14/07/16 12:47:17 INFO yarn.Client: Queue info ... queueName:
 root.default,
  queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
queueApplicationCount = 0, queueChildQueueCount = 0
  14/07/16 12:47:17 INFO yarn.Client: Max mem capabililty of a single
 resource
  in this cluster 8192
  14/07/16 12:47:17 INFO yarn.Client: Preparing Local resources
  14/07/16 12:47:18 INFO yarn.Client: Uploading
 
 file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/examples/target/scala-2.10/spark-examples-assembly-0.9.0-cdh5.0.3.jar
  to
 
 hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-examples-assembly-0.9.0-cdh5.0.3.jar
  14/07/16 12:47:19 INFO yarn.Client: Uploading
 
 file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/assembly/target/scala-2.10/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
  to
 
 hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
  14/07/16 12:47:19 INFO yarn.Client: Setting up the launch environment
  Exception in thread main java.lang.NoSuchFieldException:
  DEFAULT_YARN_APPLICATION_CLASSPATH
  at java.lang.Class.getField(Class.java:1579)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.getDefaultYarnApplicationClasspath(ClientBase.scala:403)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
  at scala.Option.getOrElse(Option.scala:120)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.populateHadoopClasspath(ClientBase.scala:385)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.populateClasspath(ClientBase.scala:444)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$class.setupLaunchEnv(ClientBase.scala:274)
  at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:41)
  at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:77)
  at org.apache.spark.deploy.yarn.Client.run(Client.scala:98)
  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:183)
  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
  [amilkowski@localhost spark-streaming]$
 





Re: running Spark App on Yarn produces: Exception in thread main java.lang.NoSuchFieldException: DEFAULT_YARN_APPLICATION_CLASSPATH

2014-07-16 Thread Andrew Milkowski
For others, to solve topic problem: in yarn-site.xml add:

property
nameyarn.application.classpath/name
value$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*/value
/property



On Wed, Jul 16, 2014 at 1:47 PM, Andrew Milkowski amgm2...@gmail.com
wrote:

 Sandy, perfect! you saved me tons of time! added this in yarn-site.xml job
 ran to completion

 Can you do me (us) a favor and push newest and patched spark/hadoop to
 cdh5 (tar's) if possible

 and thanks again for this (huge time saver)


 On Wed, Jul 16, 2014 at 1:10 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Andrew,

 Are you running on a CM-managed cluster?  I just checked, and there is a
 bug here (fixed in 1.0), but it's avoided by having
 yarn.application.classpath defined in your yarn-site.xml.

 -Sandy


 On Wed, Jul 16, 2014 at 10:02 AM, Sean Owen so...@cloudera.com wrote:

 Somewhere in here, you are not actually running vs Hadoop 2 binaries.
 Your cluster is certainly Hadoop 2, but your client is not using the
 Hadoop libs you think it is (or your compiled binary is linking
 against Hadoop 1, which is the default for Spark -- did you change
 it?)

 On Wed, Jul 16, 2014 at 5:45 PM, Andrew Milkowski amgm2...@gmail.com
 wrote:
  Hello community,
 
  tried to run storm app on yarn, using cloudera hadoop and spark distro
 (from
  http://archive.cloudera.com/cdh5/cdh/5)
 
  hadoop version: hadoop-2.3.0-cdh5.0.3.tar.gz
  spark version: spark-0.9.0-cdh5.0.3.tar.gz
 
  DEFAULT_YARN_APPLICATION_CLASSPATH is part of hadoop-api-yarn jar ...
 
  thanks for any replies!
 
  [amilkowski@localhost spark-streaming]$ ./test-yarn.sh
  14/07/16 12:47:17 WARN util.NativeCodeLoader: Unable to load
 native-hadoop
  library for your platform... using builtin-java classes where
 applicable
  14/07/16 12:47:17 INFO client.RMProxy: Connecting to ResourceManager at
  /0.0.0.0:8032
  14/07/16 12:47:17 INFO yarn.Client: Got Cluster metric info from
  ApplicationsManager (ASM), number of NodeManagers: 1
  14/07/16 12:47:17 INFO yarn.Client: Queue info ... queueName:
 root.default,
  queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
queueApplicationCount = 0, queueChildQueueCount = 0
  14/07/16 12:47:17 INFO yarn.Client: Max mem capabililty of a single
 resource
  in this cluster 8192
  14/07/16 12:47:17 INFO yarn.Client: Preparing Local resources
  14/07/16 12:47:18 INFO yarn.Client: Uploading
 
 file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/examples/target/scala-2.10/spark-examples-assembly-0.9.0-cdh5.0.3.jar
  to
 
 hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-examples-assembly-0.9.0-cdh5.0.3.jar
  14/07/16 12:47:19 INFO yarn.Client: Uploading
 
 file:/opt/local/cloudera/spark/cdh5/spark-0.9.0-cdh5.0.3/assembly/target/scala-2.10/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
  to
 
 hdfs://localhost:8020/user/amilkowski/.sparkStaging/application_1405528355264_0004/spark-assembly-0.9.0-cdh5.0.3-hadoop2.3.0-cdh5.0.3.jar
  14/07/16 12:47:19 INFO yarn.Client: Setting up the launch environment
  Exception in thread main java.lang.NoSuchFieldException:
  DEFAULT_YARN_APPLICATION_CLASSPATH
  at java.lang.Class.getField(Class.java:1579)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.getDefaultYarnApplicationClasspath(ClientBase.scala:403)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$$anonfun$5.apply(ClientBase.scala:386)
  at scala.Option.getOrElse(Option.scala:120)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.populateHadoopClasspath(ClientBase.scala:385)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$.populateClasspath(ClientBase.scala:444)
  at
 
 org.apache.spark.deploy.yarn.ClientBase$class.setupLaunchEnv(ClientBase.scala:274)
  at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:41)
  at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:77)
  at org.apache.spark.deploy.yarn.Client.run(Client.scala:98)
  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:183)
  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
  [amilkowski@localhost spark-streaming]$