Re: Spark Streaming RDD to Shark table

2014-07-11 Thread patwhite
Hi,
I'm running into an identical issue running Spark 1.0.0 on Mesos 0.19. Were
you able to get it sorted? There's no real documentation for the
spark.httpBroadcast.uri except what's in the code - is this config setting
required for running on a Mesos cluster?

I'm running this in a dev environment with a simple 2 machine setup - the
driver is running on dev-1, and dev-2 (10.0.0.5 in the below stack trace)
has a mesos master, zookeeper, and mesos slave.  

Stack Trace:

14/07/11 18:00:05 INFO SparkEnv: Connecting to MapOutputTracker:
akka.tcp://spark@dev-1:58136/user/MapOutputTracker
14/07/11 18:00:06 INFO SparkEnv: Connecting to BlockManagerMaster:
akka.tcp://spark@dev-1:58136/user/BlockManagerMaster
14/07/11 18:00:06 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20140711180006-dea8
14/07/11 18:00:06 INFO MemoryStore: MemoryStore started with capacity 589.2
MB.
14/07/11 18:00:06 INFO ConnectionManager: Bound socket to port 60708 with id
= ConnectionManagerId(10.0.0.5,60708)
14/07/11 18:00:06 INFO BlockManagerMaster: Trying to register BlockManager
14/07/11 18:00:06 INFO BlockManagerMaster: Registered BlockManager
java.util.NoSuchElementException: spark.httpBroadcast.uri
at
org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:149)
at
org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:149)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:58)
at org.apache.spark.SparkConf.get(SparkConf.scala:149)
at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:130)
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
at
org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
at
org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
at org.apache.spark.executor.Executor.init(Executor.scala:85)
at
org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56)
Exception in thread Thread-2 I0711 18:00:06.454962 14037 exec.cpp:412]
Deactivating the executor libprocess

If I manually set the httpBroadcastUri to http://dev-1; I get the following
error, I assume because I'm not setting the port correctly (which I don't
think I have any way of knowing?)

14/07/11 18:31:27 ERROR Executor: Exception in task ID 4
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.init(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1300)
at
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196)
at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at

Re: Spark Streaming RDD to Shark table

2014-05-28 Thread Chang Lim
OK...I needed to set the JVM class.path for the worker to find the fb class:
env.put(SPARK_JAVA_OPTS,
-Djava.class.path=/home/myInc/hive-0.9.0-bin/lib/libfb303.jar);

Now I am seeing the following spark.httpBroadcast.uri error.  What am I
missing?

java.util.NoSuchElementException: spark.httpBroadcast.uri
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:58)
at org.apache.spark.SparkConf.get(SparkConf.scala:151)
at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104)
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70)
at
org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81)
at 
org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175)
at org.apache.spark.executor.Executor.init(Executor.scala:110)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56)
. . .
14/05/27 15:26:45 INFO CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://sp...@clim2-dsv.myinc.ad.myinccorp.com:3694/user/CoarseGrainedScheduler
14/05/27 15:26:46 ERROR CoarseGrainedExecutorBackend: Slave registration
failed: Duplicate executor ID: 8

===
Full Stack:
===
Spark Executor Command: /usr/lib/jvm/java-7-openjdk-i386/bin/java -cp
:/home/myInc/spark-0.9.1-bin-hadoop1/conf:/home/myInc/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar
-Djava.library.path=/home/myInc/hive-0.9.0-bin/lib/libfb303.jar
-Djava.library.path=/home/myInc/hive-0.9.0-bin/lib/libfb303.jar -Xms512M
-Xmx512M org.apache.spark.executor.CoarseGrainedExecutorBackend
akka.tcp://sp...@clim2-dsv.myinc.ad.myinccorp.com:3694/user/CoarseGrainedScheduler
8 tahiti-ins.myInc.ad.myInccorp.com 1
akka.tcp://sparkwor...@tahiti-ins.myinc.ad.myinccorp.com:37841/user/Worker
app-20140527152556-0029


log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/05/27 15:26:44 INFO CoarseGrainedExecutorBackend: Using Spark's default
log4j profile: org/apache/spark/log4j-defaults.properties
14/05/27 15:26:44 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkwor...@tahiti-ins.myinc.ad.myinccorp.com:37841/user/Worker
14/05/27 15:26:44 INFO CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://sp...@clim2-dsv.myinc.ad.myinccorp.com:3694/user/CoarseGrainedScheduler
14/05/27 15:26:45 INFO WorkerWatcher: Successfully connected to
akka.tcp://sparkwor...@tahiti-ins.myinc.ad.myinccorp.com:37841/user/Worker
14/05/27 15:26:45 INFO CoarseGrainedExecutorBackend: Successfully registered
with driver
14/05/27 15:26:45 INFO Slf4jLogger: Slf4jLogger started
14/05/27 15:26:45 INFO Remoting: Starting remoting
14/05/27 15:26:45 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sp...@tahiti-ins.myinc.ad.myinccorp.com:43488]
14/05/27 15:26:45 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sp...@tahiti-ins.myinc.ad.myinccorp.com:43488]
14/05/27 15:26:45 INFO SparkEnv: Connecting to BlockManagerMaster:
akka.tcp://sp...@clim2-dsv.myinc.ad.myinccorp.com:3694/user/BlockManagerMaster
14/05/27 15:26:45 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20140527152645-b13b
14/05/27 15:26:45 INFO MemoryStore: MemoryStore started with capacity 297.0
MB.
14/05/27 15:26:45 INFO ConnectionManager: Bound socket to port 55853 with id
= ConnectionManagerId(tahiti-ins.myInc.ad.myInccorp.com,55853)
14/05/27 15:26:45 INFO BlockManagerMaster: Trying to register BlockManager
14/05/27 15:26:45 INFO BlockManagerMaster: Registered BlockManager
14/05/27 15:26:45 ERROR OneForOneStrategy: spark.httpBroadcast.uri
java.util.NoSuchElementException: spark.httpBroadcast.uri
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:58)
at org.apache.spark.SparkConf.get(SparkConf.scala:151)
at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104)
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70)
at
org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81)
at