Spark Hive Snappy Error

2014-10-16 Thread arthur.hk.c...@gmail.com
Hi,

When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error,


val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(“select count(1) from q8_national_market_share
sqlContext.sql("select count(1) from 
q8_national_market_share").collect().foreach(println)
java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
at 
org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
at 
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:68)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
at 
org.apache.spark.sql.hive.HadoopTableReader.(TableReader.scala:68)
at 
org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:68)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
at 
org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at 
org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at $iwC$$iwC$$iwC$$iwC.(:15)
at $iwC$$iwC$$iwC.(:20)
at $iwC$$iwC.(:22)
at $iwC.(:24)
at (:26)
at .(:30)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954)
at

RE: Spark Hive Snappy Error

2014-10-16 Thread Shao, Saisai
Hi Arthur,

I think this is a known issue in Spark, you can check 
(https://issues.apache.org/jira/browse/SPARK-3958). I’m curious about it, can 
you always reproduce this issue, Is this issue related to some specific data 
sets, would you mind giving me some information about you workload, Spark 
configuration, JDK version and OS version?

Thanks
Jerry

From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com]
Sent: Friday, October 17, 2014 7:13 AM
To: user
Cc: arthur.hk.c...@gmail.com
Subject: Spark Hive Snappy Error

Hi,

When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error,


val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(“select count(1) from q8_national_market_share
sqlContext.sql("select count(1) from 
q8_national_market_share").collect().foreach(println)
java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
 at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
 at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
 at 
org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
 at 
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
 at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
 at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
 at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:68)
 at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
 at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
 at 
org.apache.spark.sql.hive.HadoopTableReader.(TableReader.scala:68)
 at 
org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:68)
 at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
 at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
 at 
org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
 at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
 at 
org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
 at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
 at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
 at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
 at $iwC$$iwC$$iwC$$iwC.(:15)
 at $iwC$$iwC$$iwC.(:20)
 at $iwC$$iwC.(:22)
 at $iwC.(:24)
 at (:26)
 at .(:30)
 at .()
 at .(:7)
 at .()
 at $print()
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
 at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
hread.java:107)
2014-10-22 20:23:17,038 INFO  [sparkDriver-akka.actor.default-dispatcher-14] 
remote.RemoteActorRefProvider$RemotingTerminator 
(Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon.
2014-10-22 20:23:17,039 INFO  [sparkDriver-akka.actor.default-dispatcher-14] 
remote.RemoteActorRefProvider$RemotingTerminator 
(Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with 
flushing remote transports.

 
Regards
Arthur

On 17 Oct, 2014, at 9:33 am, Shao, Saisai  wrote:

> Hi Arthur,
>  
> I think this is a known issue in Spark, you can check 
> (https://issues.apache.org/jira/browse/SPARK-3958). I’m curious about it, can 
> you always reproduce this issue, Is this issue related to some specific data 
> sets, would you mind giving me some information about you workload, Spark 
> configuration, JDK version and OS version?
>  
> Thanks
> Jerry
>  
> From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] 
> Sent: Friday, October 17, 2014 7:13 AM
> To: user
> Cc: arthur.hk.c...@gmail.com
> Subject: Spark Hive Snappy Error
>  
> Hi,
>  
> When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: 
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error,
>  
>  
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlContext.sql(“select count(1) from q8_national_market_share
> sqlContext.sql("select count(1) from 
> q8_national_market_share").collect().foreach(println)
> java.lang.UnsatisfiedLinkError: 
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
>  at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
>  at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
>  at 
> org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
>  at 
> org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:68)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>  at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>  at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader.(TableReader.scala:68)
>  at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:68)
>  at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
>  at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
>  at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
>  at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>  at 
> org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>  at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
>  at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
>  at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
>  at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
>  at org.apache.spark.sql.SchemaRD

RE: Spark Hive Snappy Error

2014-10-22 Thread Shao, Saisai
Thanks a lot, I will try to reproduce this in my local settings and dig into 
the details, thanks for your information.


BR
Jerry

From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com]
Sent: Wednesday, October 22, 2014 8:35 PM
To: Shao, Saisai
Cc: arthur.hk.c...@gmail.com; user
Subject: Re: Spark Hive Snappy Error

Hi,

Yes, I can always reproduce the issue:

about you workload, Spark configuration, JDK version and OS version?

I ran SparkPI 1000

java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

cat /etc/centos-release
CentOS release 6.5 (Final)

My Spark’s hive-site.xml with following:
 
  hive.exec.compress.output
  true
 

 
  mapred.output.compression.codec
  org.apache.hadoop.io.compress.SnappyCodec
 

 
  mapred.output.compression.type
  BLOCK
 

e.g.
MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000
2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] 
actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal 
error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down 
ActorSystem [sparkDriver]
java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
 at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
 at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
 at 
org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
 at 
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
 at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
 at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
 at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:68)
 at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
 at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
 at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829)
 at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769)
 at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-10-22 20:23:17,036 INFO  [main] scheduler.DAGScheduler 
(Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35
Exception in thread "main" org.apache.spark.SparkException: Job cancelled 
because SparkContext was shut down
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693)
 at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
 at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399)
 at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
 at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
 at akka.actor.ActorCell.terminate(ActorCell.scala:338)
 at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
 at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
 at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:240)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abst

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi,

FYI, I use snappy-java-1.0.4.1.jar

Regards
Arthur


On 22 Oct, 2014, at 8:59 pm, Shao, Saisai  wrote:

> Thanks a lot, I will try to reproduce this in my local settings and dig into 
> the details, thanks for your information.
>  
>  
> BR
> Jerry
>  
> From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] 
> Sent: Wednesday, October 22, 2014 8:35 PM
> To: Shao, Saisai
> Cc: arthur.hk.c...@gmail.com; user
> Subject: Re: Spark Hive Snappy Error
>  
> Hi,
>  
> Yes, I can always reproduce the issue:
>  
> about you workload, Spark configuration, JDK version and OS version?
>  
> I ran SparkPI 1000
>  
> java -version
> java version "1.7.0_67"
> Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>  
> cat /etc/centos-release
> CentOS release 6.5 (Final)
>  
> My Spark’s hive-site.xml with following:
>  
>   hive.exec.compress.output
>   true
>  
>  
>  
>   mapred.output.compression.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>  
>  
>  
>   mapred.output.compression.type
>   BLOCK
>  
>  
> e.g.
> MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000
> 2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] 
> actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal 
> error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down 
> ActorSystem [sparkDriver]
> java.lang.UnsatisfiedLinkError: 
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
>  at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
>  at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
>  at 
> org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
>  at 
> org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:68)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>  at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>  at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829)
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360)
>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>  at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>  at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>  at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2014-10-22 20:23:17,036 INFO  [main] scheduler.DAGScheduler 
> (Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35
> Exception in thread "main" org.apache.spark.SparkException: Job cancelled 
> because SparkContext was shut down
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693)
>  at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>  at 
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399)
>  at 
> akka.

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi,Please find the attached file.{\rtf1\ansi\ansicpg1252\cocoartf1265\cocoasubrtf210
{\fonttbl\f0\fnil\fcharset0 Menlo-Regular;}
{\colortbl;\red255\green255\blue255;}
\paperw11900\paperh16840\margl1440\margr1440\vieww26300\viewh12480\viewkind0
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural

\f0\fs22 \cf0 \CocoaLigature0 lsof -p 16459 (Master)\
COMMAND   PIDUSER   FD   TYPE DEVICE  SIZE/OFF NODE NAME\
java16459 tester  cwdDIR  253,2  4096  6039786 /hadoop/spark-1.1.0_patched\
java16459 tester  rtdDIR  253,0  40962 /\
java16459 tester  txtREG  253,0 12150  2780995 /usr/lib/jvm/jdk1.7.0_67/bin/java\
java16459 tester  memREG  253,0156928  2228230 /lib64/ld-2.12.so\
java16459 tester  memREG  253,0   1926680  2228250 /lib64/libc-2.12.so\
java16459 tester  memREG  253,0145896  2228251 /lib64/libpthread-2.12.so\
java16459 tester  memREG  253,0 22536  2228254 /lib64/libdl-2.12.so\
java16459 tester  memREG  253,0109006  2759278 /usr/lib/jvm/jdk1.7.0_67/lib/amd64/jli/libjli.so\
java16459 tester  memREG  253,0599384  2228264 /lib64/libm-2.12.so\
java16459 tester  memREG  253,0 47064  2228295 /lib64/librt-2.12.so\
java16459 tester  memREG  253,0113952  2228328 /lib64/libresolv-2.12.so\
java16459 tester  memREG  253,0  99158576  2388225 /usr/lib/locale/locale-archive\
java16459 tester  memREG  253,0 27424  2228249 /lib64/libnss_dns-2.12.so\
java16459 tester  memREG  253,2 138832345  6555616 /hadoop/spark-1.1.0_patched/assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop2.4.1.jar\
java16459 tester  memREG  253,0580624  2893171 /usr/lib/jvm/jdk1.7.0_67/jre/lib/jsse.jar\
java16459 tester  memREG  253,0114742  2893221 /usr/lib/jvm/jdk1.7.0_67/jre/lib/amd64/libnet.so\
java16459 tester  memREG  253,0 91178  2893222 /usr/lib/jvm/jdk1.7.0_67/jre/lib/amd64/libnio.so\
java16459 tester  memREG  253,2   1769726  6816963 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-rdbms-3.2.1.jar\
java16459 tester  memREG  253,2337012  6816961 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-api-jdo-3.2.1.jar\
java16459 tester  memREG  253,2   1801810  6816962 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-core-3.2.2.jar\
java16459 tester  memREG  253,2 25153  7079998 /hadoop/hive-0.12.0-bin/csv-serde-1.1.2-0.11.0-all.jar\
java16459 tester  memREG  253,2 21817  6032989 /hadoop/hbase-0.98.5-hadoop2/lib/gmbal-api-only-3.0.0-b023.jar\
java16459 tester  memREG  253,2177131  6032940 /hadoop/hbase-0.98.5-hadoop2/lib/jetty-util-6.1.26.jar\
java16459 tester  memREG  253,2 32677  6032915 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-hadoop-compat-0.98.5-hadoop2.jar\
java16459 tester  memREG  253,2143602  6032959 /hadoop/hbase-0.98.5-hadoop2/lib/commons-digester-1.8.jar\
java16459 tester  memREG  253,2 97738  6032917 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-prefix-tree-0.98.5-hadoop2.jar\
java16459 tester  memREG  253,2 17884  6032949 /hadoop/hbase-0.98.5-hadoop2/lib/jackson-jaxrs-1.8.8.jar\
java16459 tester  memREG  253,2253086  6032987 /hadoop/hbase-0.98.5-hadoop2/lib/grizzly-http-2.1.2.jar\
java16459 tester  memREG  253,2 73778  6032916 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-hadoop2-compat-0.98.5-hadoop2.jar\
java16459 tester  memREG  253,2336904  6032993 /hadoop/hbase-0.98.5-hadoop2/lib/grizzly-http-servlet-2.1.2.jar\
java16459 tester  memREG  253,2927415  6032914 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-client-0.98.5-hadoop2.jar\
java16459 tester  memREG  253,2125740  6033008 /hadoop/hbase-0.98.5-hadoop2/lib/hadoop-yarn-server-applicationhistoryservice-2.4.1.jar\
java16459 tester  memREG  253,2 15010  6032936 /hadoop/hbase-0.98.5-hadoop2/lib/xmlenc-0.52.jar\
java16459 tester  memREG  253,2 60686  6032926 /hadoop/hbase-0.98.5-hadoop2/lib/commons-logging-1.1.1.jar\
java16459 tester  memREG  253,2259600  6032927 /hadoop/hbase-0.98.5-hadoop2/lib/commons-codec-1.7.jar\
java16459 tester  memREG  253,2321806  6032957 /hadoop/hbase-0.98.5-hadoop2/lib/jets3t-0.6.1.jar\
java16459 tester  memREG  253,2 85353  6032982 /hadoop/hbase-0.98.5-hadoop2/lib/javax.servlet-api-3.0.1.jar\
java16459 tester  memREG

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi

May I know where to configure Spark to load libhadoop.so?

Regards
Arthur

On 23 Oct, 2014, at 11:31 am, arthur.hk.c...@gmail.com 
 wrote:

> Hi,
> 
> Please find the attached file.
> 
> 
> 
> 
> my spark-default.xml
> # Default system properties included when running spark-submit.
> # This is useful for setting default environmental settings.
> #
> # Example:
> # spark.masterspark://master:7077
> # spark.eventLog.enabled  true
> # spark.eventLog.dirhdfs://namenode:8021/directory
> # spark.serializerorg.apache.spark.serializer.KryoSerializer
> #
> spark.executor.memory   2048m
> spark.shuffle.spill.compressfalse
> spark.io.compression.codecorg.apache.spark.io.SnappyCompressionCodec
> 
> 
> 
> my spark-env.sh
> #!/usr/bin/env bash
> export CLASSPATH="$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar"
> export 
> CLASSPATH="$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar"
> export JAVA_LIBRARY_PATH="$HADOOP_HOME/lib/native/Linux-amd64-64"
> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
> export SPARK_WORKER_DIR="/edh/hadoop_data/spark_work/"
> export SPARK_LOG_DIR="/edh/hadoop_logs/spark"
> export SPARK_LIBRARY_PATH="$HADOOP_HOME/lib/native/Linux-amd64-64"
> export 
> SPARK_CLASSPATH="$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar"
> export 
> SPARK_CLASSPATH="$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar:"
> export SPARK_WORKER_MEMORY=2g
> export HADOOP_HEAPSIZE=2000
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181"
> export SPARK_JAVA_OPTS=" -XX:+UseConcMarkSweepGC"
> 
> 
> ll $HADOOP_HOME/lib/native/Linux-amd64-64
> -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar
> -rw-rw-r--. 1 tester tester  1062640 Aug 27 12:19 libhadoop.a
> -rw-rw-r--. 1 tester tester  1487564 Aug 27 11:14 libhadooppipes.a
> lrwxrwxrwx. 1 tester tester   24 Aug 27 07:08 libhadoopsnappy.so -> 
> libhadoopsnappy.so.0.0.1
> lrwxrwxrwx. 1 tester tester   24 Aug 27 07:08 libhadoopsnappy.so.0 -> 
> libhadoopsnappy.so.0.0.1
> -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1
> -rwxrwxr-x. 1 tester tester   630328 Aug 27 12:19 libhadoop.so
> -rwxrwxr-x. 1 tester tester   630328 Aug 27 12:19 libhadoop.so.1.0.0
> -rw-rw-r--. 1 tester tester   582472 Aug 27 11:14 libhadooputils.a
> -rw-rw-r--. 1 tester tester   298626 Aug 27 11:14 libhdfs.a
> -rwxrwxr-x. 1 tester tester   200370 Aug 27 11:14 libhdfs.so
> -rwxrwxr-x. 1 tester tester   200370 Aug 27 11:14 libhdfs.so.0.0.0
> lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so -> 
> /usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so
> lrwxrwxrwx. 1 tester tester   25 Aug 27 07:08 libprotobuf-lite.so -> 
> libprotobuf-lite.so.8.0.0
> lrwxrwxrwx. 1 tester tester   25 Aug 27 07:08 libprotobuf-lite.so.8 
> -> libprotobuf-lite.so.8.0.0
> -rwxr-xr-x. 1 tester tester   964689 Aug 27 07:08 
> libprotobuf-lite.so.8.0.0
> lrwxrwxrwx. 1 tester tester   20 Aug 27 07:08 libprotobuf.so -> 
> libprotobuf.so.8.0.0
> lrwxrwxrwx. 1 tester tester   20 Aug 27 07:08 libprotobuf.so.8 -> 
> libprotobuf.so.8.0.0
> -rwxr-xr-x. 1 tester tester  8300050 Aug 27 07:08 libprotobuf.so.8.0.0
> lrwxrwxrwx. 1 tester tester   18 Aug 27 07:08 libprotoc.so -> 
> libprotoc.so.8.0.0
> lrwxrwxrwx. 1 tester tester   18 Aug 27 07:08 libprotoc.so.8 -> 
> libprotoc.so.8.0.0
> -rwxr-xr-x. 1 tester tester  9935810 Aug 27 07:08 libprotoc.so.8.0.0
> -rw-r--r--. 1 tester tester   233554 Aug 27 15:19 libsnappy.a
> lrwxrwxrwx. 1 tester tester   23 Aug 27 11:32 libsnappy.so -> 
> /usr/lib64/libsnappy.so
> lrwxrwxrwx. 1 tester tester   23 Aug 27 11:33 libsnappy.so.1 -> 
> /usr/lib64/libsnappy.so
> -rwxr-xr-x. 1 tester tester   147726 Aug 27 07:08 libsnappy.so.1.2.0
> drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig
> 
> 
> Regards
> Arthur
> 
> 
> On 23 Oct, 2014, at 10:57 am, Shao, Saisai  wrote:
> 
>> Hi Arthur,
>>  
>> I think your problem might be different from what 
>> SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, 
>> seems your problem is more likely to be a library link problem, would you 
>> mind checking your Spark runtime to see if the snappy.so is loaded or not? 
>> (through lsof -p).
>>  
>> I guess your problem is more likely to be a library not found problem.
>>  
>>  
>> Thanks
>> Jerry
>>  
>> 



RE: Spark Hive Snappy Error

2014-10-22 Thread Shao, Saisai
Seems you just add snappy library into your classpath:

export CLASSPATH="$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar"

But for spark itself, it depends on snappy-0.2.jar. Is there any possibility 
that this problem caused by different version of snappy?

Thanks
Jerry

From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com]
Sent: Thursday, October 23, 2014 11:32 AM
To: Shao, Saisai
Cc: arthur.hk.c...@gmail.com; user
Subject: Re: Spark Hive Snappy Error

Hi,

Please find the attached file.



my spark-default.xml
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
#
# Example:
# spark.masterspark://master:7077
# spark.eventLog.enabled  true
# spark.eventLog.dir
  hdfs://namenode:8021/directory
# spark.serializerorg.apache.spark.serializer.KryoSerializer
#
spark.executor.memory   2048m
spark.shuffle.spill.compressfalse
spark.io.compression.codec
org.apache.spark.io.SnappyCompressionCodec



my spark-env.sh
#!/usr/bin/env bash
export CLASSPATH="$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar"
export CLASSPATH="$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar"
export JAVA_LIBRARY_PATH="$HADOOP_HOME/lib/native/Linux-amd64-64"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
export SPARK_WORKER_DIR="/edh/hadoop_data/spark_work/"
export SPARK_LOG_DIR="/edh/hadoop_logs/spark"
export SPARK_LIBRARY_PATH="$HADOOP_HOME/lib/native/Linux-amd64-64"
export 
SPARK_CLASSPATH="$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar"
export 
SPARK_CLASSPATH="$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar:"
export SPARK_WORKER_MEMORY=2g
export HADOOP_HEAPSIZE=2000
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
-Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181"
export SPARK_JAVA_OPTS=" -XX:+UseConcMarkSweepGC"


ll $HADOOP_HOME/lib/native/Linux-amd64-64
-rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar
-rw-rw-r--. 1 tester tester  1062640 Aug 27 12:19 libhadoop.a
-rw-rw-r--. 1 tester tester  1487564 Aug 27 11:14 libhadooppipes.a
lrwxrwxrwx. 1 tester tester   24 Aug 27 07:08 libhadoopsnappy.so -> 
libhadoopsnappy.so.0.0.1
lrwxrwxrwx. 1 tester tester   24 Aug 27 07:08 libhadoopsnappy.so.0 -> 
libhadoopsnappy.so.0.0.1
-rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1
-rwxrwxr-x. 1 tester tester   630328 Aug 27 12:19 libhadoop.so
-rwxrwxr-x. 1 tester tester   630328 Aug 27 12:19 libhadoop.so.1.0.0
-rw-rw-r--. 1 tester tester   582472 Aug 27 11:14 libhadooputils.a
-rw-rw-r--. 1 tester tester   298626 Aug 27 11:14 libhdfs.a
-rwxrwxr-x. 1 tester tester   200370 Aug 27 11:14 libhdfs.so
-rwxrwxr-x. 1 tester tester   200370 Aug 27 11:14 libhdfs.so.0.0.0
lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so -> 
/usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so
lrwxrwxrwx. 1 tester tester   25 Aug 27 07:08 libprotobuf-lite.so -> 
libprotobuf-lite.so.8.0.0
lrwxrwxrwx. 1 tester tester   25 Aug 27 07:08 libprotobuf-lite.so.8 -> 
libprotobuf-lite.so.8.0.0
-rwxr-xr-x. 1 tester tester   964689 Aug 27 07:08 libprotobuf-lite.so.8.0.0
lrwxrwxrwx. 1 tester tester   20 Aug 27 07:08 libprotobuf.so -> 
libprotobuf.so.8.0.0
lrwxrwxrwx. 1 tester tester   20 Aug 27 07:08 libprotobuf.so.8 -> 
libprotobuf.so.8.0.0
-rwxr-xr-x. 1 tester tester  8300050 Aug 27 07:08 libprotobuf.so.8.0.0
lrwxrwxrwx. 1 tester tester   18 Aug 27 07:08 libprotoc.so -> 
libprotoc.so.8.0.0
lrwxrwxrwx. 1 tester tester   18 Aug 27 07:08 libprotoc.so.8 -> 
libprotoc.so.8.0.0
-rwxr-xr-x. 1 tester tester  9935810 Aug 27 07:08 libprotoc.so.8.0.0
-rw-r--r--. 1 tester tester   233554 Aug 27 15:19 libsnappy.a
lrwxrwxrwx. 1 tester tester   23 Aug 27 11:32 libsnappy.so -> 
/usr/lib64/libsnappy.so
lrwxrwxrwx. 1 tester tester   23 Aug 27 11:33 libsnappy.so.1 -> 
/usr/lib64/libsnappy.so
-rwxr-xr-x. 1 tester tester   147726 Aug 27 07:08 libsnappy.so.1.2.0
drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig


Regards
Arthur


On 23 Oct, 2014, at 10:57 am, Shao, Saisai 
mailto:saisai.s...@intel.com>> wrote:


Hi Arthur,

I think your problem might be different from what 
SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, seems 
your problem is more likely to be a library link problem, would you mind 
checking your Spark runtime to see if the snappy.so is loaded or not? (through 
lsof -p).

I guess your problem is more likely to be a library not found problem.


Thanks
Jerry




Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
HI

Removed export CLASSPATH="$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar”

It works, THANK YOU!!

Regards 
Arthur
 

On 23 Oct, 2014, at 1:00 pm, Shao, Saisai  wrote:

> Seems you just add snappy library into your classpath:
>  
> export CLASSPATH="$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar"
>  
> But for spark itself, it depends on snappy-0.2.jar. Is there any possibility 
> that this problem caused by different version of snappy?
>  
> Thanks
> Jerry
>  
> From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] 
> Sent: Thursday, October 23, 2014 11:32 AM
> To: Shao, Saisai
> Cc: arthur.hk.c...@gmail.com; user
> Subject: Re: Spark Hive Snappy Error
>  
> Hi,
>  
> Please find the attached file.
>  
>  
>  
> my spark-default.xml
> # Default system properties included when running spark-submit.
> # This is useful for setting default environmental settings.
> #
> # Example:
> # spark.masterspark://master:7077
> # spark.eventLog.enabled  true
> # spark.eventLog.dir
>   hdfs://namenode:8021/directory
> # spark.serializerorg.apache.spark.serializer.KryoSerializer
> #
> spark.executor.memory   2048m
> spark.shuffle.spill.compressfalse
> spark.io.compression.codec
> org.apache.spark.io.SnappyCompressionCodec
>  
>  
>  
> my spark-env.sh
> #!/usr/bin/env bash
> export CLASSPATH="$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar"
> export 
> CLASSPATH="$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar"
> export JAVA_LIBRARY_PATH="$HADOOP_HOME/lib/native/Linux-amd64-64"
> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
> export SPARK_WORKER_DIR="/edh/hadoop_data/spark_work/"
> export SPARK_LOG_DIR="/edh/hadoop_logs/spark"
> export SPARK_LIBRARY_PATH="$HADOOP_HOME/lib/native/Linux-amd64-64"
> export 
> SPARK_CLASSPATH="$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar"
> export 
> SPARK_CLASSPATH="$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar:"
> export SPARK_WORKER_MEMORY=2g
> export HADOOP_HEAPSIZE=2000
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181"
> export SPARK_JAVA_OPTS=" -XX:+UseConcMarkSweepGC"
>  
>  
> ll $HADOOP_HOME/lib/native/Linux-amd64-64
> -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar
> -rw-rw-r--. 1 tester tester  1062640 Aug 27 12:19 libhadoop.a
> -rw-rw-r--. 1 tester tester  1487564 Aug 27 11:14 libhadooppipes.a
> lrwxrwxrwx. 1 tester tester   24 Aug 27 07:08 libhadoopsnappy.so -> 
> libhadoopsnappy.so.0.0.1
> lrwxrwxrwx. 1 tester tester   24 Aug 27 07:08 libhadoopsnappy.so.0 -> 
> libhadoopsnappy.so.0.0.1
> -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1
> -rwxrwxr-x. 1 tester tester   630328 Aug 27 12:19 libhadoop.so
> -rwxrwxr-x. 1 tester tester   630328 Aug 27 12:19 libhadoop.so.1.0.0
> -rw-rw-r--. 1 tester tester   582472 Aug 27 11:14 libhadooputils.a
> -rw-rw-r--. 1 tester tester   298626 Aug 27 11:14 libhdfs.a
> -rwxrwxr-x. 1 tester tester   200370 Aug 27 11:14 libhdfs.so
> -rwxrwxr-x. 1 tester tester   200370 Aug 27 11:14 libhdfs.so.0.0.0
> lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so 
> ->/usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so
> lrwxrwxrwx. 1 tester tester   25 Aug 27 07:08 libprotobuf-lite.so -> 
> libprotobuf-lite.so.8.0.0
> lrwxrwxrwx. 1 tester tester   25 Aug 27 07:08 libprotobuf-lite.so.8 
> -> libprotobuf-lite.so.8.0.0
> -rwxr-xr-x. 1 tester tester   964689 Aug 27 07:08 
> libprotobuf-lite.so.8.0.0
> lrwxrwxrwx. 1 tester tester   20 Aug 27 07:08 libprotobuf.so -> 
> libprotobuf.so.8.0.0
> lrwxrwxrwx. 1 tester tester   20 Aug 27 07:08 libprotobuf.so.8 -> 
> libprotobuf.so.8.0.0
> -rwxr-xr-x. 1 tester tester  8300050 Aug 27 07:08 libprotobuf.so.8.0.0
> lrwxrwxrwx. 1 tester tester   18 Aug 27 07:08 libprotoc.so -> 
> libprotoc.so.8.0.0
> lrwxrwxrwx. 1 tester tester   18 Aug 27 07:08 libprotoc.so.8 -> 
> libprotoc.so.8.0.0
> -rwxr-xr-x. 1 tester tester  9935810 Aug 27 07:08 libprotoc.so.8.0.0
> -rw-r--r--. 1 tester tester   233554 Aug 27 15:19 libsnappy.a
> lrwxrwxrwx. 1 tester tester   23 Aug 27 11:32 libsnappy.so -> 
> /usr/lib64/libsnappy.so
> lrwxrwxrwx. 1 tester tester   23 Aug 27 11:33 libsnappy.so.1 -> 
> /usr/lib64/libsnappy.so
> -rwxr-xr-x. 1 tester tester   147726 Aug 27 07:08 libsnappy.so.1.2.0