Spark and Shark

2014-09-01 Thread arthur.hk.c...@gmail.com
Hi,

I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from 
source).

spark: 1.0.2
shark: 0.9.2
hadoop: 2.4.1
java: java version “1.7.0_67”
protobuf: 2.5.0


I have tried the smoke test in shark but got  
“java.util.NoSuchElementException” error,  can you please advise how to fix 
this?

shark create table x1 (a INT);
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: 
java.util.NoSuchElementException(null)
java.util.NoSuchElementException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925)
at java.util.HashMap$ValueIterator.next(HashMap.java:950)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117)
at 
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:237)
at shark.SharkCliDriver.main(SharkCliDriver.scala)


spark-env.sh
#!/usr/bin/env bash
export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop}
export 
SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar
export SPARK_WORKER_MEMORY=2g
export HADOOP_HEAPSIZE=2000

spark-defaults.conf
spark.executor.memory   2048m
spark.shuffle.spill.compressfalse

shark-env.sh
#!/usr/bin/env bash
export SPARK_MEM=2g
export SHARK_MASTER_MEM=2g
SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp 
SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 
SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps 
export SPARK_JAVA_OPTS
export SHARK_EXEC_MODE=yarn
export 
SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar
export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_LIBPATH=$HADOOP_HOME/lib/native/
export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/
export 
SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar


Regards
Arthur



Re: Spark and Shark

2014-09-01 Thread Michael Armbrust
I don't believe that Shark works with Spark  1.0.  Have you considered
trying Spark SQL?


On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.com wrote:

 Hi,

 I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling
 from source).

 spark: 1.0.2
 shark: 0.9.2
 hadoop: 2.4.1
 java: java version “1.7.0_67”
 protobuf: 2.5.0


 I have tried the smoke test in shark but got
  “java.util.NoSuchElementException” error,  can you please advise how to
 fix this?

 shark create table x1 (a INT);
 FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal
 Error: java.util.NoSuchElementException(null)
 java.util.NoSuchElementException
 at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925)
 at java.util.HashMap$ValueIterator.next(HashMap.java:950)
  at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117)
 at
 shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150)
  at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
 at shark.SharkDriver.compile(SharkDriver.scala:215)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
  at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at shark.SharkCliDriver$.main(SharkCliDriver.scala:237)
 at shark.SharkCliDriver.main(SharkCliDriver.scala)


 spark-env.sh
 #!/usr/bin/env bash
 export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar
 export
 CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar
 export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop}
 export
 SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar
 export SPARK_WORKER_MEMORY=2g
 export HADOOP_HEAPSIZE=2000

 spark-defaults.conf
 spark.executor.memory   2048m
 spark.shuffle.spill.compressfalse

 shark-env.sh
 #!/usr/bin/env bash
 export SPARK_MEM=2g
 export SHARK_MASTER_MEM=2g
 SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp 
 SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 
 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps 
 export SPARK_JAVA_OPTS
 export SHARK_EXEC_MODE=yarn
 export
 SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar
 export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar
 export HIVE_CONF_DIR=$HIVE_HOME/conf
 export SPARK_LIBPATH=$HADOOP_HOME/lib/native/
 export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/
 export
 SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar


 Regards
 Arthur




RE: Spark and Shark

2014-09-01 Thread Paolo Platter
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server 
with Spark 1.1 RC2 and it works fine.



Best



Paolo



Paolo Platter
Agile Lab CTO

Da: Michael Armbrust mich...@databricks.com
Inviato: lunedì 1 settembre 2014 19:43
A: arthur.hk.c...@gmail.com
Cc: user@spark.apache.org
Oggetto: Re: Spark and Shark

I don't believe that Shark works with Spark  1.0.  Have you considered trying 
Spark SQL?


On Mon, Sep 1, 2014 at 8:21 AM, 
arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com wrote:
Hi,

I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from 
source).

spark: 1.0.2
shark: 0.9.2
hadoop: 2.4.1
java: java version 1.7.0_67
protobuf: 2.5.0


I have tried the smoke test in shark but got  
java.util.NoSuchElementException error,  can you please advise how to fix 
this?

shark create table x1 (a INT);
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: 
java.util.NoSuchElementException(null)
java.util.NoSuchElementException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925)
at java.util.HashMap$ValueIterator.next(HashMap.java:950)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117)
at 
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:237)
at shark.SharkCliDriver.main(SharkCliDriver.scala)


spark-env.sh
#!/usr/bin/env bash
export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop}
export 
SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar
export SPARK_WORKER_MEMORY=2g
export HADOOP_HEAPSIZE=2000

spark-defaults.conf
spark.executor.memory   2048m
spark.shuffle.spill.compressfalse

shark-env.sh
#!/usr/bin/env bash
export SPARK_MEM=2g
export SHARK_MASTER_MEM=2g
SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp 
SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 
SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps 
export SPARK_JAVA_OPTS
export SHARK_EXEC_MODE=yarn
export 
SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar
export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_LIBPATH=$HADOOP_HOME/lib/native/
export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/
export 
SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar


Regards
Arthur




Spark and Shark Node: RAM Allocation

2014-08-30 Thread arthur.hk.c...@gmail.com
Hi,

Is there any formula to calculate proper RAM allocation values for Spark and 
Shark based on Physical RAM, HADOOP and HBASE RAM usage?
e.g. if a node has 32GB physical RAM


spark-defaults.conf
spark.executor.memory   ?g

spark-env.sh
export SPARK_WORKER_MEMORY=?
export HADOOP_HEAPSIZE=?


shark-env.sh
export SPARK_MEM=?g
export SHARK_MASTER_MEM=?g

spark-defaults.conf
spark.executor.memory   ?g


Regards
Arthur




java.lang.ClassNotFoundException in spark 0.9.0, shark 0.9.0 (pre-release) and hadoop 2.2.0

2014-03-07 Thread pradeeps8
Hi,

We are currently trying to migrate to hadoop 2.2.0 and hence we have
installed spark 0.9.0 and the pre-release version of shark 0.9.0.
When we execute the script ( script.txt
http://apache-spark-user-list.1001560.n3.nabble.com/file/n2401/script.txt 
) we get the following error.
/org.apache.spark.SparkException: Job aborted: Task 1.0:3 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
 
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
 
at scala.Option.foreach(Option.scala:236) 
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
 
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 
at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
/

Has anyone seen this error?
If so, could you please help me get it corrected?

Thanks,
Pradeep




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-in-spark-0-9-0-shark-0-9-0-pre-release-and-hadoop-2-2-0-tp2401.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.