As mentioned, deprecated in Spark 1.0+. Try to use the --driver-class-path: ./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar
Don't use glob *, specify the JAR one by one with colon. Date: Wed, 9 Jul 2014 13:45:07 -0700 From: kat...@cs.pitt.edu Subject: SPARK_CLASSPATH Warning To: user@spark.apache.org Hello, I have installed Apache Spark v1.0.0 in a machine with a proprietary Hadoop Distribution installed (v2.2.0 without yarn). Due to the fact that the Hadoop Distribution that I am using, uses a list of jars , I do the following changes to the conf/spark-env.sh #!/usr/bin/env bash export HADOOP_CONF_DIR=/path-to-hadoop-conf/hadoop-conf export SPARK_LOCAL_IP=impl41 export SPARK_CLASSPATH="/path-to-proprietary-hadoop-lib/lib/*:/path-to-proprietary-hadoop-lib/*" ... Also, to make sure that I have everything working I execute the Spark shell as follows: [biadmin@impl41 spark]$ ./bin/spark-shell --jars /path-to-proprietary-hadoop-lib/lib/*.jar 14/07/09 13:37:28 INFO spark.SecurityManager: Changing view acls to: biadmin 14/07/09 13:37:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(biadmin) 14/07/09 13:37:28 INFO spark.HttpServer: Starting HTTP Server 14/07/09 13:37:29 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/09 13:37:29 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44292 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.0 /_/ Using Scala version 2.10.4 (IBM J9 VM, Java 1.7.0) Type in expressions to have them evaluated. Type :help for more information. 14/07/09 13:37:36 WARN spark.SparkConf: SPARK_CLASSPATH was detected (set to 'path-to-proprietary-hadoop-lib/*:/path-to-proprietary-hadoop-lib/lib/*'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath 14/07/09 13:37:36 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/path-to-proprietary-hadoop-lib/lib/*:/path-to-proprietary-hadoop-lib/*' as a work-around. 14/07/09 13:37:36 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/path-to-proprietary-hadoop-lib/lib/*:/path-to-proprietary-hadoop-lib/*' as a work-around. 14/07/09 13:37:36 INFO spark.SecurityManager: Changing view acls to: biadmin 14/07/09 13:37:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(biadmin) 14/07/09 13:37:37 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/07/09 13:37:37 INFO Remoting: Starting remoting 14/07/09 13:37:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@impl41:46081] 14/07/09 13:37:37 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@impl41:46081] 14/07/09 13:37:37 INFO spark.SparkEnv: Registering MapOutputTracker 14/07/09 13:37:37 INFO spark.SparkEnv: Registering BlockManagerMaster 14/07/09 13:37:37 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140709133737-798b 14/07/09 13:37:37 INFO storage.MemoryStore: MemoryStore started with capacity 307.2 MB. 14/07/09 13:37:38 INFO network.ConnectionManager: Bound socket to port 16685 with id = ConnectionManagerId(impl41,16685) 14/07/09 13:37:38 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/07/09 13:37:38 INFO storage.BlockManagerInfo: Registering block manager impl41:16685 with 307.2 MB RAM 14/07/09 13:37:38 INFO storage.BlockManagerMaster: Registered BlockManager 14/07/09 13:37:38 INFO spark.HttpServer: Starting HTTP Server 14/07/09 13:37:38 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/09 13:37:38 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:21938 14/07/09 13:37:38 INFO broadcast.HttpBroadcast: Broadcast server started at http://impl41:21938 14/07/09 13:37:38 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-91e8e040-f2ca-43dd-b574-805033f476c7 14/07/09 13:37:38 INFO spark.HttpServer: Starting HTTP Server 14/07/09 13:37:38 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/09 13:37:38 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:52678 14/07/09 13:37:38 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/09 13:37:38 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/07/09 13:37:38 INFO ui.SparkUI: Started SparkUI at http://impl41:4040 14/07/09 13:37:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/09 13:37:39 INFO spark.SparkContext: Added JAR file:/opt/ibm/biginsights/IHC/lib/adaptive-mr.jar at http://impl41:52678/jars/adaptive-mr.jar with timestamp 1404938259526 14/07/09 13:37:39 INFO executor.Executor: Using REPL class URI: http://impl41:44292 14/07/09 13:37:39 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. scala> So, my question is the following: Am I including my libraries correctly? Why do I get the message that the SPARK_CLASSPATH method is deprecated? Also, when I execute the following example: scala> val file = sc.textFile("hdfs://lpsa.dat") 14/07/09 13:41:43 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes 14/07/09 13:41:43 INFO storage.MemoryStore: ensureFreeSpace(102907) called with curMem=0, maxMem=322122547 14/07/09 13:41:43 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 100.5 KB, free 307.1 MB) file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> val errors = file.filter(line => line.contains("ERROR")) errors: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at <console>:14 scala> errors.count() 14/07/09 13:42:11 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. java.lang.IllegalArgumentException: java.net.UnknownHostException: lpsa.dat at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:231) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2442) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2476) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2458) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:376) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.FilteredRDD.getPartitions(FilteredRDD.scala:29) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1094) at org.apache.spark.rdd.RDD.count(RDD.scala:847) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:17) at $iwC$$iwC$$iwC.<init>(<console>:22) at $iwC$$iwC.<init>(<console>:24) at $iwC.<init>(<console>:26) at <init>(<console>:28) at .<init>(<console>:32) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: lpsa.dat ... 71 more scala> Why do I get this UnknownHostException on the file and what does the following mesage mean: "hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded" I apologize for the long message but I searched previous messages and I can not figure out what I am doing wrong. Thank you, Nick