[jira] [Commented] (SPARK-27623) Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
[ https://issues.apache.org/jira/browse/SPARK-27623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831684#comment-16831684 ] Alexandru Barbulescu commented on SPARK-27623: -- The problem might be related to the fact that in Spark 2.4.2, the pre-built convenience binaries are compiled for Scala 2.12. And the spark-cassandra-connector, that I also included, currently only supports 2.11. > Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated > --- > > Key: SPARK-27623 > URL: https://issues.apache.org/jira/browse/SPARK-27623 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.2 >Reporter: Alexandru Barbulescu >Priority: Critical > > After updating to spark 2.4.2 when using the > {code:java} > spark.read.format().options().load() > {code} > > chain of methods, regardless of what parameter is passed to "format" we get > the following error related to avro: > > {code:java} > - .options(**load_options) > - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line > 172, in load > - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line > 1257, in __call__ > - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in > deco > - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line > 328, in get_return_value > - py4j.protocol.Py4JJavaError: An error occurred while calling o69.load. > - : java.util.ServiceConfigurationError: > org.apache.spark.sql.sources.DataSourceRegister: Provider > org.apache.spark.sql.avro.AvroFileFormat could not be instantiated > - at java.util.ServiceLoader.fail(ServiceLoader.java:232) > - at java.util.ServiceLoader.access$100(ServiceLoader.java:185) > - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) > - at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > - at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > - at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) > - at scala.collection.Iterator.foreach(Iterator.scala:941) > - at scala.collection.Iterator.foreach$(Iterator.scala:941) > - at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > - at scala.collection.IterableLike.foreach(IterableLike.scala:74) > - at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > - at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > - at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250) > - at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248) > - at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) > - at scala.collection.TraversableLike.filter(TraversableLike.scala:262) > - at scala.collection.TraversableLike.filter$(TraversableLike.scala:262) > - at scala.collection.AbstractTraversable.filter(Traversable.scala:108) > - at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630) > - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) > - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) > - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > - at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > - at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > - at java.lang.reflect.Method.invoke(Method.java:498) > - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > - at py4j.Gateway.invoke(Gateway.java:282) > - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > - at py4j.commands.CallCommand.execute(CallCommand.java:79) > - at py4j.GatewayConnection.run(GatewayConnection.java:238) > - at java.lang.Thread.run(Thread.java:748) > - Caused by: java.lang.NoClassDefFoundError: > org/apache/spark/sql/execution/datasources/FileFormat$class > - at org.apache.spark.sql.avro.AvroFileFormat.(AvroFileFormat.scala:44) > - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > - at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > - at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > - at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > - at java.lang.Class.newInstance(Class.java:442) > - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) > - ... 29 more > - Caused by: java.lang.ClassNotFoundException: > org.apache.spark.sql.execution.datasources.FileFormat$class > - at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > -
[jira] [Created] (SPARK-27623) Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
Alexandru Barbulescu created SPARK-27623: Summary: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated Key: SPARK-27623 URL: https://issues.apache.org/jira/browse/SPARK-27623 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.2 Reporter: Alexandru Barbulescu After updating to spark 2.4.2 when using the {code:java} spark.read.format().options().load() {code} chain of methods, regardless of what parameter is passed to "format" we get the following error related to avro: {code:java} - .options(**load_options) - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value - py4j.protocol.Py4JJavaError: An error occurred while calling o69.load. - : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated - at java.util.ServiceLoader.fail(ServiceLoader.java:232) - at java.util.ServiceLoader.access$100(ServiceLoader.java:185) - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) - at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) - at java.util.ServiceLoader$1.next(ServiceLoader.java:480) - at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) - at scala.collection.Iterator.foreach(Iterator.scala:941) - at scala.collection.Iterator.foreach$(Iterator.scala:941) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) - at scala.collection.IterableLike.foreach(IterableLike.scala:74) - at scala.collection.IterableLike.foreach$(IterableLike.scala:73) - at scala.collection.AbstractIterable.foreach(Iterable.scala:56) - at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250) - at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248) - at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) - at scala.collection.TraversableLike.filter(TraversableLike.scala:262) - at scala.collection.TraversableLike.filter$(TraversableLike.scala:262) - at scala.collection.AbstractTraversable.filter(Traversable.scala:108) - at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630) - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:498) - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) - at py4j.Gateway.invoke(Gateway.java:282) - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) - at py4j.commands.CallCommand.execute(CallCommand.java:79) - at py4j.GatewayConnection.run(GatewayConnection.java:238) - at java.lang.Thread.run(Thread.java:748) - Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/FileFormat$class - at org.apache.spark.sql.avro.AvroFileFormat.(AvroFileFormat.scala:44) - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) - at java.lang.reflect.Constructor.newInstance(Constructor.java:423) - at java.lang.Class.newInstance(Class.java:442) - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) - ... 29 more - Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.FileFormat$class - at java.net.URLClassLoader.findClass(URLClassLoader.java:382) - at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) - at java.lang.ClassLoader.loadClass(ClassLoader.java:357) - ... 36 more {code} The code we run looks like this: {code:java} spark_session = ( SparkSession.builder .appName(APPLICATION_NAME) .master(MASTER_URL) .config('spark.cassandra.connection.host', SERVER_IP_ADDRESS) .config('spark.cassandra.auth.username', CASSANDRA_USERNAME) .config('spark.cassandra.auth.password', CASSANDRA_PASSWORD) .config('spark.sql.shuffle.partitions', 16) .config('parquet.enable.summary-metadata', 'true')