Re: Spark Hive Snappy Error
HI Removed export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar” It works, THANK YOU!! Regards Arthur On 23 Oct, 2014, at 1:00 pm, Shao, Saisai saisai.s...@intel.com wrote: Seems you just add snappy library into your classpath: export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar But for spark itself, it depends on snappy-0.2.jar. Is there any possibility that this problem caused by different version of snappy? Thanks Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Thursday, October 23, 2014 11:32 AM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Please find the attached file. my spark-default.xml # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # # Example: # spark.masterspark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializerorg.apache.spark.serializer.KryoSerializer # spark.executor.memory 2048m spark.shuffle.spill.compressfalse spark.io.compression.codec org.apache.spark.io.SnappyCompressionCodec my spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_WORKER_DIR=/edh/hadoop_data/spark_work/ export SPARK_LOG_DIR=/edh/hadoop_logs/spark export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar: export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 export SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181 export SPARK_JAVA_OPTS= -XX:+UseConcMarkSweepGC ll $HADOOP_HOME/lib/native/Linux-amd64-64 -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar -rw-rw-r--. 1 tester tester 1062640 Aug 27 12:19 libhadoop.a -rw-rw-r--. 1 tester tester 1487564 Aug 27 11:14 libhadooppipes.a lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so - libhadoopsnappy.so.0.0.1 lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so.0 - libhadoopsnappy.so.0.0.1 -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1 -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so.1.0.0 -rw-rw-r--. 1 tester tester 582472 Aug 27 11:14 libhadooputils.a -rw-rw-r--. 1 tester tester 298626 Aug 27 11:14 libhdfs.a -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so.0.0.0 lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so -/usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so - libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so.8 - libprotobuf-lite.so.8.0.0 -rwxr-xr-x. 1 tester tester 964689 Aug 27 07:08 libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so - libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so.8 - libprotobuf.so.8.0.0 -rwxr-xr-x. 1 tester tester 8300050 Aug 27 07:08 libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so - libprotoc.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so.8 - libprotoc.so.8.0.0 -rwxr-xr-x. 1 tester tester 9935810 Aug 27 07:08 libprotoc.so.8.0.0 -rw-r--r--. 1 tester tester 233554 Aug 27 15:19 libsnappy.a lrwxrwxrwx. 1 tester tester 23 Aug 27 11:32 libsnappy.so - /usr/lib64/libsnappy.so lrwxrwxrwx. 1 tester tester 23 Aug 27 11:33 libsnappy.so.1 - /usr/lib64/libsnappy.so -rwxr-xr-x. 1 tester tester 147726 Aug 27 07:08 libsnappy.so.1.2.0 drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig Regards Arthur On 23 Oct, 2014, at 10:57 am, Shao, Saisai saisai.s...@intel.com wrote: Hi Arthur, I think your problem might be different from what SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, seems your problem is more likely to be a library link problem, would you mind checking your Spark runtime to see if the snappy.so is loaded or not? (through lsof -p). I guess your problem is more likely to be a library not found problem. Thanks
Re: Spark Hive Snappy Error
Hi, Yes, I can always reproduce the issue: about you workload, Spark configuration, JDK version and OS version? I ran SparkPI 1000 java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) cat /etc/centos-release CentOS release 6.5 (Final) My Spark’s hive-site.xml with following: property namehive.exec.compress.output/name valuetrue/value /property property namemapred.output.compression.codec/name valueorg.apache.hadoop.io.compress.SnappyCodec/value /property property namemapred.output.compression.type/name valueBLOCK/value /property e.g. MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000 2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver] java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2014-10-22 20:23:17,036 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35 Exception in thread main org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:240) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at
RE: Spark Hive Snappy Error
Thanks a lot, I will try to reproduce this in my local settings and dig into the details, thanks for your information. BR Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Wednesday, October 22, 2014 8:35 PM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Yes, I can always reproduce the issue: about you workload, Spark configuration, JDK version and OS version? I ran SparkPI 1000 java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) cat /etc/centos-release CentOS release 6.5 (Final) My Spark’s hive-site.xml with following: property namehive.exec.compress.output/name valuetrue/value /property property namemapred.output.compression.codec/name valueorg.apache.hadoop.io.compress.SnappyCodec/value /property property namemapred.output.compression.type/name valueBLOCK/value /property e.g. MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000 2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver] java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2014-10-22 20:23:17,036 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35 Exception in thread main org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:240) at akka.dispatch.Mailbox.run
Re: Spark Hive Snappy Error
Hi, FYI, I use snappy-java-1.0.4.1.jar Regards Arthur On 22 Oct, 2014, at 8:59 pm, Shao, Saisai saisai.s...@intel.com wrote: Thanks a lot, I will try to reproduce this in my local settings and dig into the details, thanks for your information. BR Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Wednesday, October 22, 2014 8:35 PM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Yes, I can always reproduce the issue: about you workload, Spark configuration, JDK version and OS version? I ran SparkPI 1000 java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) cat /etc/centos-release CentOS release 6.5 (Final) My Spark’s hive-site.xml with following: property namehive.exec.compress.output/name valuetrue/value /property property namemapred.output.compression.codec/name valueorg.apache.hadoop.io.compress.SnappyCodec/value /property property namemapred.output.compression.type/name valueBLOCK/value /property e.g. MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000 2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver] java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2014-10-22 20:23:17,036 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35 Exception in thread main org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431
Re: Spark Hive Snappy Error
Hi,Please find the attached file.{\rtf1\ansi\ansicpg1252\cocoartf1265\cocoasubrtf210 {\fonttbl\f0\fnil\fcharset0 Menlo-Regular;} {\colortbl;\red255\green255\blue255;} \paperw11900\paperh16840\margl1440\margr1440\vieww26300\viewh12480\viewkind0 \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural \f0\fs22 \cf0 \CocoaLigature0 lsof -p 16459 (Master)\ COMMAND PIDUSER FD TYPE DEVICE SIZE/OFF NODE NAME\ java16459 tester cwdDIR 253,2 4096 6039786 /hadoop/spark-1.1.0_patched\ java16459 tester rtdDIR 253,0 40962 /\ java16459 tester txtREG 253,0 12150 2780995 /usr/lib/jvm/jdk1.7.0_67/bin/java\ java16459 tester memREG 253,0156928 2228230 /lib64/ld-2.12.so\ java16459 tester memREG 253,0 1926680 2228250 /lib64/libc-2.12.so\ java16459 tester memREG 253,0145896 2228251 /lib64/libpthread-2.12.so\ java16459 tester memREG 253,0 22536 2228254 /lib64/libdl-2.12.so\ java16459 tester memREG 253,0109006 2759278 /usr/lib/jvm/jdk1.7.0_67/lib/amd64/jli/libjli.so\ java16459 tester memREG 253,0599384 2228264 /lib64/libm-2.12.so\ java16459 tester memREG 253,0 47064 2228295 /lib64/librt-2.12.so\ java16459 tester memREG 253,0113952 2228328 /lib64/libresolv-2.12.so\ java16459 tester memREG 253,0 99158576 2388225 /usr/lib/locale/locale-archive\ java16459 tester memREG 253,0 27424 2228249 /lib64/libnss_dns-2.12.so\ java16459 tester memREG 253,2 138832345 6555616 /hadoop/spark-1.1.0_patched/assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop2.4.1.jar\ java16459 tester memREG 253,0580624 2893171 /usr/lib/jvm/jdk1.7.0_67/jre/lib/jsse.jar\ java16459 tester memREG 253,0114742 2893221 /usr/lib/jvm/jdk1.7.0_67/jre/lib/amd64/libnet.so\ java16459 tester memREG 253,0 91178 2893222 /usr/lib/jvm/jdk1.7.0_67/jre/lib/amd64/libnio.so\ java16459 tester memREG 253,2 1769726 6816963 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-rdbms-3.2.1.jar\ java16459 tester memREG 253,2337012 6816961 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-api-jdo-3.2.1.jar\ java16459 tester memREG 253,2 1801810 6816962 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-core-3.2.2.jar\ java16459 tester memREG 253,2 25153 7079998 /hadoop/hive-0.12.0-bin/csv-serde-1.1.2-0.11.0-all.jar\ java16459 tester memREG 253,2 21817 6032989 /hadoop/hbase-0.98.5-hadoop2/lib/gmbal-api-only-3.0.0-b023.jar\ java16459 tester memREG 253,2177131 6032940 /hadoop/hbase-0.98.5-hadoop2/lib/jetty-util-6.1.26.jar\ java16459 tester memREG 253,2 32677 6032915 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-hadoop-compat-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2143602 6032959 /hadoop/hbase-0.98.5-hadoop2/lib/commons-digester-1.8.jar\ java16459 tester memREG 253,2 97738 6032917 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-prefix-tree-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2 17884 6032949 /hadoop/hbase-0.98.5-hadoop2/lib/jackson-jaxrs-1.8.8.jar\ java16459 tester memREG 253,2253086 6032987 /hadoop/hbase-0.98.5-hadoop2/lib/grizzly-http-2.1.2.jar\ java16459 tester memREG 253,2 73778 6032916 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-hadoop2-compat-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2336904 6032993 /hadoop/hbase-0.98.5-hadoop2/lib/grizzly-http-servlet-2.1.2.jar\ java16459 tester memREG 253,2927415 6032914 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-client-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2125740 6033008 /hadoop/hbase-0.98.5-hadoop2/lib/hadoop-yarn-server-applicationhistoryservice-2.4.1.jar\ java16459 tester memREG 253,2 15010 6032936 /hadoop/hbase-0.98.5-hadoop2/lib/xmlenc-0.52.jar\ java16459 tester memREG 253,2 60686 6032926 /hadoop/hbase-0.98.5-hadoop2/lib/commons-logging-1.1.1.jar\ java16459 tester memREG 253,2259600 6032927 /hadoop/hbase-0.98.5-hadoop2/lib/commons-codec-1.7.jar\ java16459 tester memREG 253,2321806 6032957 /hadoop/hbase-0.98.5-hadoop2/lib/jets3t-0.6.1.jar\ java16459 tester memREG 253,2 85353 6032982 /hadoop/hbase-0.98.5-hadoop2/lib/javax.servlet-api-3.0.1.jar\ java16459 tester mem
Re: Spark Hive Snappy Error
Hi May I know where to configure Spark to load libhadoop.so? Regards Arthur On 23 Oct, 2014, at 11:31 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Please find the attached file. lsof.rtf my spark-default.xml # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # # Example: # spark.masterspark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dirhdfs://namenode:8021/directory # spark.serializerorg.apache.spark.serializer.KryoSerializer # spark.executor.memory 2048m spark.shuffle.spill.compressfalse spark.io.compression.codecorg.apache.spark.io.SnappyCompressionCodec my spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_WORKER_DIR=/edh/hadoop_data/spark_work/ export SPARK_LOG_DIR=/edh/hadoop_logs/spark export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar: export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 export SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181 export SPARK_JAVA_OPTS= -XX:+UseConcMarkSweepGC ll $HADOOP_HOME/lib/native/Linux-amd64-64 -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar -rw-rw-r--. 1 tester tester 1062640 Aug 27 12:19 libhadoop.a -rw-rw-r--. 1 tester tester 1487564 Aug 27 11:14 libhadooppipes.a lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so - libhadoopsnappy.so.0.0.1 lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so.0 - libhadoopsnappy.so.0.0.1 -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1 -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so.1.0.0 -rw-rw-r--. 1 tester tester 582472 Aug 27 11:14 libhadooputils.a -rw-rw-r--. 1 tester tester 298626 Aug 27 11:14 libhdfs.a -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so.0.0.0 lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so - /usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so - libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so.8 - libprotobuf-lite.so.8.0.0 -rwxr-xr-x. 1 tester tester 964689 Aug 27 07:08 libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so - libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so.8 - libprotobuf.so.8.0.0 -rwxr-xr-x. 1 tester tester 8300050 Aug 27 07:08 libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so - libprotoc.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so.8 - libprotoc.so.8.0.0 -rwxr-xr-x. 1 tester tester 9935810 Aug 27 07:08 libprotoc.so.8.0.0 -rw-r--r--. 1 tester tester 233554 Aug 27 15:19 libsnappy.a lrwxrwxrwx. 1 tester tester 23 Aug 27 11:32 libsnappy.so - /usr/lib64/libsnappy.so lrwxrwxrwx. 1 tester tester 23 Aug 27 11:33 libsnappy.so.1 - /usr/lib64/libsnappy.so -rwxr-xr-x. 1 tester tester 147726 Aug 27 07:08 libsnappy.so.1.2.0 drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig Regards Arthur On 23 Oct, 2014, at 10:57 am, Shao, Saisai saisai.s...@intel.com wrote: Hi Arthur, I think your problem might be different from what SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, seems your problem is more likely to be a library link problem, would you mind checking your Spark runtime to see if the snappy.so is loaded or not? (through lsof -p). I guess your problem is more likely to be a library not found problem. Thanks Jerry
RE: Spark Hive Snappy Error
Seems you just add snappy library into your classpath: export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar But for spark itself, it depends on snappy-0.2.jar. Is there any possibility that this problem caused by different version of snappy? Thanks Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Thursday, October 23, 2014 11:32 AM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Please find the attached file. my spark-default.xml # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # # Example: # spark.masterspark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializerorg.apache.spark.serializer.KryoSerializer # spark.executor.memory 2048m spark.shuffle.spill.compressfalse spark.io.compression.codec org.apache.spark.io.SnappyCompressionCodec my spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_WORKER_DIR=/edh/hadoop_data/spark_work/ export SPARK_LOG_DIR=/edh/hadoop_logs/spark export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar: export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 export SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181 export SPARK_JAVA_OPTS= -XX:+UseConcMarkSweepGC ll $HADOOP_HOME/lib/native/Linux-amd64-64 -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar -rw-rw-r--. 1 tester tester 1062640 Aug 27 12:19 libhadoop.a -rw-rw-r--. 1 tester tester 1487564 Aug 27 11:14 libhadooppipes.a lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so - libhadoopsnappy.so.0.0.1 lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so.0 - libhadoopsnappy.so.0.0.1 -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1 -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so.1.0.0 -rw-rw-r--. 1 tester tester 582472 Aug 27 11:14 libhadooputils.a -rw-rw-r--. 1 tester tester 298626 Aug 27 11:14 libhdfs.a -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so.0.0.0 lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so - /usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so - libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so.8 - libprotobuf-lite.so.8.0.0 -rwxr-xr-x. 1 tester tester 964689 Aug 27 07:08 libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so - libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so.8 - libprotobuf.so.8.0.0 -rwxr-xr-x. 1 tester tester 8300050 Aug 27 07:08 libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so - libprotoc.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so.8 - libprotoc.so.8.0.0 -rwxr-xr-x. 1 tester tester 9935810 Aug 27 07:08 libprotoc.so.8.0.0 -rw-r--r--. 1 tester tester 233554 Aug 27 15:19 libsnappy.a lrwxrwxrwx. 1 tester tester 23 Aug 27 11:32 libsnappy.so - /usr/lib64/libsnappy.so lrwxrwxrwx. 1 tester tester 23 Aug 27 11:33 libsnappy.so.1 - /usr/lib64/libsnappy.so -rwxr-xr-x. 1 tester tester 147726 Aug 27 07:08 libsnappy.so.1.2.0 drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig Regards Arthur On 23 Oct, 2014, at 10:57 am, Shao, Saisai saisai.s...@intel.commailto:saisai.s...@intel.com wrote: Hi Arthur, I think your problem might be different from what SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, seems your problem is more likely to be a library link problem, would you mind checking your Spark runtime to see if the snappy.so is loaded or not? (through lsof -p). I guess your problem is more likely to be a library not found problem. Thanks Jerry
RE: Spark Hive Snappy Error
Hi Arthur, I think this is a known issue in Spark, you can check (https://issues.apache.org/jira/browse/SPARK-3958). I’m curious about it, can you always reproduce this issue, Is this issue related to some specific data sets, would you mind giving me some information about you workload, Spark configuration, JDK version and OS version? Thanks Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Friday, October 17, 2014 7:13 AM To: user Cc: arthur.hk.c...@gmail.com Subject: Spark Hive Snappy Error Hi, When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(1) from q8_national_market_share sqlContext.sql(select count(1) from q8_national_market_share).collect().foreach(println) java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.sql.hive.HadoopTableReader.init(TableReader.scala:68) at org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:68) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at