RE: Unable to load additional JARs in yarn-client mode

2013-12-24 Thread Karavany, Ido
Hi,

Thanks for your responses.
We already tried the one jar approach and it worked - but it is a real pain to 
compile ~15 project every time we need to do a small change in one of them.

Just to make sure I understand you correctly - below is what we've tried to 
pass in our test constructor:

  JavaSparkContext sc = new JavaSparkContext(
   "yarn-client",
   "SPARK YARN TEST"
   , "/app/spark/"
   , new String[] {"/app/iot/test/test_kafka.jar"}
   );

Despite the above, the executor / mapper function doesn't know the function 
inside the above jar (test_kafka.jar).

Are we doing something wrong in the constructor?
Is there a code change / fix which we can quickly apply?

Thanks,
Ido


From: Liu, Raymond [mailto:raymond@intel.com]
Sent: Tuesday, December 24, 2013 07:53
To: user@spark.incubator.apache.org
Subject: RE: Unable to load additional JARs in yarn-client mode

Ido, when you say add external JARS, do you mean by -addJars which adding some 
jar for SparkContext to use in the AM env?

If so, I think you don't need it for yarn-cilent mode at all, for yarn-client 
mode, SparkContext running locally, I think you just need to make sure those 
jars are in the java classpath.

And for those need by executors / tasks, I think , you can package it as Matei 
said. Or maybe we can expose some env for yarn-client mode to allowing adding 
multiple jars as needed.

Best Regards,
Raymond Liu

From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: Tuesday, December 24, 2013 1:17 PM
To: user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>
Subject: Re: Unable to load additional JARs in yarn-client mode

I'm surprised by this, but one way that will definitely work is to assemble 
your application into a single JAR. If passing them to the constructor doesn't 
work, that's probably a bug.

Matei

On Dec 23, 2013, at 12:03 PM, Karavany, Ido 
mailto:ido.karav...@intel.com>> wrote:

Hi All,

For our application we need to use the yarn-client mode featured in 0.8.1. 
(Yarn 2.0.5)
We've successfully executed it both yarn-client and yarn-standalone with our 
java applications.

While in yarn-standalone there is a way to add external JARs - we couldn't find 
a way to add those in  yarn-client.

Adding jars in spark context constructor or setting the SPARK_CLASSPATH didn't 
work as well.

Are we missing something?
Can you please advise?
If it is currently impossible - can you advise a patch / workaround?

It is crucial for us to get it working with external dependencies.

Many Thanks,
Ido



-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Unable to load additional JARs in yarn-client mode

2013-12-23 Thread Karavany, Ido
Hi All,

For our application we need to use the yarn-client mode featured in 0.8.1. 
(Yarn 2.0.5)
We've successfully executed it both yarn-client and yarn-standalone with our 
java applications.

While in yarn-standalone there is a way to add external JARs - we couldn't find 
a way to add those in  yarn-client.

Adding jars in spark context constructor or setting the SPARK_CLASSPATH didn't 
work as well.

Are we missing something?
Can you please advise?
If it is currently impossible - can you advise a patch / workaround?

It is crucial for us to get it working with external dependencies.

Many Thanks,
Ido


-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Run Spark on Yarn Remotely

2013-12-16 Thread Karavany, Ido
 Hi All,

We've started with deploying spark on  Hadoop 2 and Yarn. Our previous 
configuration (still not a production cluster) was Spark on Mesos.

We're running a java application (which runs from tomcat server). The 
application builds a singleton java spark context when it is first lunch and 
then all users' requests are executed using this same spark context.

With Mesos - creating the context included few simple operation and was 
possible via the java application.

I successfully executed Spark and Yarn example and even my own example 
(although I was unable to find the output logs)
I noticed that it is being done using org.apache.spark.deploy.yarn.Client but 
have no example regarding how it can be done.

Successful command:

SPARK_JAR=/app/spark-0.8.0-incubating/assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.4-Intel.jar
 ./spark-class org.apache.spark.deploy.yarn.Client   --jar 
/app/iot/test/test3-0.0.1-SNAPSHOT.jar   --class test3.yarntest   
--args yarn-standalone   --num-workers 3   --master-memory 4g   
--worker-memory 2g   --worker-cores


When I try to emulate the previous method we used and simple execute my test 
jar  - the execution hangs.

Our main goal is to be able to execute spark context on yarn from java code 
(and not shell script) and create a singleton spark context.
In addition the application should be executed on a remote YARN server and not 
on a local one.

Can you please advice?

Thanks,
Ido





Problematic Command:

/usr/java/latest/bin/java -cp 
/usr/lib/hbase/hbase-0.94.7-Intel.jar:/usr/lib/hadoop/hadoop-auth-2.0.4-Intel.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/app/spark-0.8.0-incubating/conf:/app/spark-0.8.0-incubating/assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.4-Intel.jar:/etc/hadoop/conf:/etc/hbase/conf:/etc/hadoop/conf:/app/iot/test/test3-0.0.1-SNAPSHOT.jar
 -Djava.library.path=/usr/lib/hadoop/lib/native -Xms512m -Xmx512m test3.yarntest

Spark Context code piece:

JavaSparkContext sc = new JavaSparkContext(
"yarn-standalone",
"SPARK YARN TEST"
);


Log:

13/12/12 17:30:36 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/12/12 17:30:36 INFO spark.SparkEnv: Registering BlockManagerMaster
13/12/12 17:30:36 INFO storage.MemoryStore: MemoryStore started with capacity 
323.9 MB.
13/12/12 17:30:36 INFO storage.DiskStore: Created local directory at 
/tmp/spark-local-20131212173036-09c0
13/12/12 17:30:36 INFO network.ConnectionManager: Bound socket to port 39426 
with id = 
ConnectionManagerId(ip-172-31-43-121.eu-west-1.compute.internal,39426)
13/12/12 17:30:36 INFO storage.BlockManagerMaster: Trying to register 
BlockManager
13/12/12 17:30:36 INFO storage.BlockManagerMaster: Registered BlockManager
13/12/12 17:30:37 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/12 17:30:37 INFO server.AbstractConnector: Started 
SocketConnector@0.0.0.0:43438
13/12/12 17:30:37 INFO broadcast.HttpBroadcast: Broadcast server started at 
http://172.31.43.121:43438
13/12/12 17:30:37 INFO spark.SparkEnv: Registering MapOutputTracker
13/12/12 17:30:37 INFO spark.HttpFileServer: HTTP File server directory is 
/tmp/spark-b48abc5a-53c6-4af1-9c3c-725e1cd7fbb9
13/12/12 17:30:37 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/12 17:30:37 INFO server.AbstractConnector: Started 
SocketConnector@0.0.0.0:60476
13/12/12 17:30:37 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/storage/rdd,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/storage,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/stages/stage,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/stages/pool,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/stages,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/environment,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/executors,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/metrics/json,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/static,null}
13/12/12 17:30:37 INFO handler.ContextHandler: started 
o.e.j.s.h.ContextHandler{/,null}
13/12/12 17:30:37 INFO server.AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:4040
13/12/12 17:30:37 INFO ui.SparkUI: Started Spark Web UI at 
http://ip-172-31-43-121.eu-west-1.compute.internal:4040
13/12/12 17:30:37 INFO cluster.YarnClust

Spark is unable to read from HDFS

2013-09-28 Thread Karavany, Ido
Hi All,

We're new spark users - trying to install it over Intel Distribution for Hadoop.
IDH (Intel Distribution for Hadoop) has customized Hadoop and has its core jar 
(Hadoop-1.0.3-Intel.jar)

What was done?

1.  Download Scala 2.9.3
2.  Download Spark 0.7.3
3.  Change ./project/SparkBuild.scala and set HADOOP_VERSION=1.0.3
4.  Compile by using sbt/sbt package
5.  Create ./conf/spark-env.sh and set SCALA_HOME in it
6.  Update slaves file
7.  Started a standalone cluster
8.  Successfully tested spark with: ./run spark.examples.SparkPi 
spark://ip-172-31-34-49:7077

9.  Started spark-shell
10. Defining a text file and executing the filter with count()
  val myf = sc.textFile("hdfs://ip-172-31-34-49:8020/iot/test.txt")
  myf.filter(line => line.contains("aa")).count()


*   The file and HDFS are accessible (hdfs fs cat or creating external hive 
table)
*   The above command fails with the below result
*   One option that I can think of is that spark should be compiled against 
the Hadoop intel jar - but I don't know how it can be done...

Any help would be great as we stuck with this issue for ~1 month now...

Thanks,
Ido

below is the output log:

scala> myf.filter(line => line.contains("aa")).count()
13/09/28 13:14:45 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
13/09/28 13:14:45 WARN snappy.LoadSnappy: Snappy native library not loaded
13/09/28 13:14:45 INFO mapred.FileInputFormat: Total input paths to process : 1
13/09/28 13:14:45 INFO spark.SparkContext: Starting job: count at :15
13/09/28 13:14:45 INFO scheduler.DAGScheduler: Got job 0 (count at 
:15) with 1 output partitions (allowLocal=false)
13/09/28 13:14:45 INFO scheduler.DAGScheduler: Final stage: Stage 0 (filter at 
:15)
13/09/28 13:14:45 INFO scheduler.DAGScheduler: Parents of final stage: List()
13/09/28 13:14:45 INFO scheduler.DAGScheduler: Missing parents: List()
13/09/28 13:14:45 INFO scheduler.DAGScheduler: Submitting Stage 0 
(FilteredRDD[3] at filter at :15), which has no missing parents
13/09/28 13:14:45 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from 
Stage 0 (FilteredRDD[3] at filter at :15)
13/09/28 13:14:45 INFO local.LocalScheduler: Running ResultTask(0, 0)
13/09/28 13:14:45 INFO local.LocalScheduler: Size of task 0 is 1543 bytes
13/09/28 13:15:45 WARN hdfs.DFSClient: Failed to connect to 
/172.31.34.49:50010, add to deadNodes and 
continuejava.net.SocketTimeoutException: 6 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.31.34.49:44040 remote=/172.31.34.49:50010]
13/09/28 13:16:46 WARN hdfs.DFSClient: Failed to connect to 
/172.31.34.50:50010, add to deadNodes and 
continuejava.net.SocketTimeoutException: 6 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.31.34.49:59724 remote=/172.31.34.50:50010]
13/09/28 13:16:46 INFO hdfs.DFSClient: Could not obtain block 
blk_-1057940606378039494_1013 from any node: java.io.IOException: No live nodes 
contain current block. Will get new block locations from namenode and retry...
13/09/28 13:17:49 WARN hdfs.DFSClient: Failed to connect to 
/172.31.34.49:50010, add to deadNodes and 
continuejava.net.SocketTimeoutException: 6 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.31.34.49:44826 remote=/172.31.34.49:50010]
13/09/28 13:18:49 WARN hdfs.DFSClient: Failed to connect to 
/172.31.34.50:50010, add to deadNodes and 
continuejava.net.SocketTimeoutException: 6 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.31.34.49:60514 remote=/172.31.34.50:50010]
13/09/28 13:18:49 INFO hdfs.DFSClient: Could not obtain block 
blk_-1057940606378039494_1013 from any node: java.io.IOException: No live nodes 
contain current block. Will get new block locations from namenode and retry...
13/09/28 13:19:52 WARN hdfs.DFSClient: Failed to connect to 
/172.31.34.49:50010, add to deadNodes and 
continuejava.net.SocketTimeoutException: 6 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.31.34.49:45621 remote=/172.31.34.49:50010]
13/09/28 13:20:52 WARN hdfs.DFSClient: Failed to connect to 
/172.31.34.50:50010, add to deadNodes and 
continuejava.net.SocketTimeoutException: 6 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.31.34.49:33081 remote=/172.31.34.50:50010]
13/09/28 13:20:52 INFO hdfs.DFSClient: Could not obtain block 
blk_-1057940606378039494_1013 from any node: java.io.IOException: No live nodes 
contain current block. Will get new block locations from namenode and retry...
13/09/28 13:21:55 WARN h