Spark and Shark
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
Re: Spark and Shark
I don't believe that Shark works with Spark 1.0. Have you considered trying Spark SQL? On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
RE: Spark and Shark
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server with Spark 1.1 RC2 and it works fine. Best Paolo Paolo Platter Agile Lab CTO Da: Michael Armbrust mich...@databricks.com Inviato: lunedì 1 settembre 2014 19:43 A: arthur.hk.c...@gmail.com Cc: user@spark.apache.org Oggetto: Re: Spark and Shark I don't believe that Shark works with Spark 1.0. Have you considered trying Spark SQL? On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com arthur.hk.c...@gmail.commailto:arthur.hk.c...@gmail.com wrote: Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version 1.7.0_67 protobuf: 2.5.0 I have tried the smoke test in shark but got java.util.NoSuchElementException error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
Spark and Shark Node: RAM Allocation
Hi, Is there any formula to calculate proper RAM allocation values for Spark and Shark based on Physical RAM, HADOOP and HBASE RAM usage? e.g. if a node has 32GB physical RAM spark-defaults.conf spark.executor.memory ?g spark-env.sh export SPARK_WORKER_MEMORY=? export HADOOP_HEAPSIZE=? shark-env.sh export SPARK_MEM=?g export SHARK_MASTER_MEM=?g spark-defaults.conf spark.executor.memory ?g Regards Arthur
java.lang.ClassNotFoundException in spark 0.9.0, shark 0.9.0 (pre-release) and hadoop 2.2.0
Hi, We are currently trying to migrate to hadoop 2.2.0 and hence we have installed spark 0.9.0 and the pre-release version of shark 0.9.0. When we execute the script ( script.txt http://apache-spark-user-list.1001560.n3.nabble.com/file/n2401/script.txt ) we get the following error. /org.apache.spark.SparkException: Job aborted: Task 1.0:3 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) / Has anyone seen this error? If so, could you please help me get it corrected? Thanks, Pradeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-in-spark-0-9-0-shark-0-9-0-pre-release-and-hadoop-2-2-0-tp2401.html Sent from the Apache Spark User List mailing list archive at Nabble.com.