[ https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265310#comment-15265310 ]
Martin Hall commented on SPARK-14162: ------------------------------------- I got the same error when I had forgotten to copy the oracle jdbc jar file (ojdbc6.jar) to one of the spark worker nodes > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > --------------------------------------------------------------------------------------------------- > > Key: SPARK-14162 > URL: https://issues.apache.org/jira/browse/SPARK-14162 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.1 > Reporter: Zoltan Fedor > > This is an interesting one. > We are using JupyterHub with Python to connect to a Hadoop cluster to run > Spark jobs and as the new Spark versions come out I compile them and add as > new kernels to JupyterHub to be used. > There are also some libraries we are using, like ojdbc to connect to an > Oracle database. > Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly > "it cannot be found" in 1.6.1. > Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, > so there is no reason for it not to work in 1.6.1 if it works in 1.6.0. > This is the pysparjk code I am running in both 1.6.1 and 1.6.0: > {quote} > df = > sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'', > dbtable='bi.contact').load() > print(df.count()){quote} > And it throws this error in 1.6.1 only: > {quote} > java.lang.IllegalStateException: Did not find registered driver with class > oracle.jdbc.OracleDriver > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){quote} > I know that this usually means that the ojdbc driver is not available on the > executor, but it is. Spark is being started the exact same way in 1.6.1 as in > 1.6.0 and it does find it on 1.6.0. > I can steadily reproduce this, so the only conclusion that something must > have changed between 1.6.0 and 1.6.1 causing this, but I have see no > "depreciation" notice of anything what could cause this. > Environment variables set when starting pyspark 1.6.1: > {quote} > "SPARK_HOME": "/usr/lib/spark-1.6.1-hive", > "SCALA_HOME": "/usr/lib/scala", > "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf", > "HADOOP_HOME": "/usr/bin/hadoop", > "HIVE_HOME": "/usr/bin/hive", > "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH", > "YARN_HOME": "", > "SPARK_DIST_CLASSPATH": > "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*", > "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib", > "PATH": > "/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/apps/home/zoltan.fedor/.local/bin:/apps/home/zoltan.fedor/bin:/usr/bin/hadoop/bin", > "PYTHONPATH": > "/usr/lib/spark-1.6.1-hive/python/:/usr/lib/spark-1.6.1-hive/python/lib/py4j-0.9-src.zip", > "PYTHONSTARTUP": "/usr/lib/spark-1.6.1-hive/python/pyspark/shell.py", > "PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client --name > JupyterHub --executor-memory 2G --driver-memory 2G --queue root.frbdusers > --num-executors 10 --executor-cores 2 --conf > spark.executor.extraClassPath=/usr/lib/hadoop/lib,/apps/bin/oracle_ojdbc/ojdbc6.jar > --driver-class-path /apps/bin/oracle_ojdbc/ojdbc6.jar --files > /usr/lib/spark-1.6.1-hive/conf/hive-site.xml --jars > /usr/lib/avro/avro-mapred.jar,/usr/lib/spark-1.6.1-hive/lib/spark-examples-1.6.1-hadoop2.5.0-cdh5.3.3.jar,/apps/bin/oracle_ojdbc/ojdbc6.jar,/usr/lib/spark-1.6.1-hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark-1.6.1-hive/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark-1.6.1-hive/lib/datanucleus-rdbms-3.2.9.jar > pyspark-shell", > "PYSPARK_PYTHON": "/hadoop/cloudera/parcels/Anaconda/bin/python", > "PYTHON_DRIVER_PYTHON": "/apps/bin/anaconda/anaconda2/bin/ipython", > "HIVE_CP": "/hadoop/coudera/parcels/CDH/lib/hive/lib/", > "SPARK_YARN_USER_ENV": > "PYTHONPATH=/usr/lib/spark-1.6.1-hive/python/:/usr/lib/spark-1.6.1-hive/python/lib/py4j-0.9-src.zip" > {quote} > And in 1.6.0: > {quote} > "SPARK_HOME": "/usr/lib/spark-1.6.0-hive", > "SCALA_HOME": "/usr/lib/scala", > "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf", > "HADOOP_HOME": "/usr/bin/hadoop", > "HIVE_HOME": "/usr/bin/hive", > "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH", > "YARN_HOME": "", > "SPARK_DIST_CLASSPATH": > "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*", > "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib", > "PATH": > "/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/apps/home/zoltan.fedor/.local/bin:/apps/home/zoltan.fedor/bin:/usr/bin/hadoop/bin", > "PYTHONPATH": > "/usr/lib/spark-1.6.0-hive/python/:/usr/lib/spark-1.6.0-hive/python/lib/py4j-0.9-src.zip", > "PYTHONSTARTUP": "/usr/lib/spark-1.6.0-hive/python/pyspark/shell.py", > "PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client --name > JupyterHub --executor-memory 2G --driver-memory 2G --queue root.frbdusers > --num-executors 10 --executor-cores 2 --conf > spark.executor.extraClassPath=/usr/lib/hadoop/lib,/apps/bin/oracle_ojdbc/ojdbc6.jar > --driver-class-path /apps/bin/oracle_ojdbc/ojdbc6.jar --files > /usr/lib/spark-1.6.0-hive/conf/hive-site.xml --jars > /usr/lib/avro/avro-mapred.jar,/usr/lib/spark-1.6.0-hive/lib/spark-examples-1.6.0-hadoop2.5.0-cdh5.3.3.jar,/apps/bin/oracle_ojdbc/ojdbc6.jar,/usr/lib/spark-1.6.0-hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark-1.6.0-hive/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark-1.6.0-hive/lib/datanucleus-rdbms-3.2.9.jar > pyspark-shell", > "PYSPARK_PYTHON": "/hadoop/cloudera/parcels/Anaconda/bin/python", > "PYTHON_DRIVER_PYTHON": "/apps/bin/anaconda/anaconda2/bin/ipython", > "HIVE_CP": "/hadoop/coudera/parcels/CDH/lib/hive/lib/", > "SPARK_YARN_USER_ENV": > "PYTHONPATH=/usr/lib/spark-1.6.0-hive/python/:/usr/lib/spark-1.6.0-hive/python/lib/py4j-0.8.2.1-src.zip" > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org