[jira] [Commented] (SPARK-27623) Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated

2019-05-02 Thread Alexandru Barbulescu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831684#comment-16831684
 ] 

Alexandru Barbulescu commented on SPARK-27623:
--

The problem might be related to the fact that in Spark 2.4.2, the pre-built 
convenience binaries are compiled for Scala 2.12. And the 
spark-cassandra-connector, that I also included, currently only supports 2.11. 

> Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
> ---
>
> Key: SPARK-27623
> URL: https://issues.apache.org/jira/browse/SPARK-27623
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.2
>Reporter: Alexandru Barbulescu
>Priority: Critical
>
> After updating to spark 2.4.2 when using the 
> {code:java}
> spark.read.format().options().load()
> {code}
>  
> chain of methods, regardless of what parameter is passed to "format" we get 
> the following error related to avro:
>  
> {code:java}
> - .options(**load_options)
> - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
> 172, in load
> - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 
> 1257, in __call__
> - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in 
> deco
> - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> - py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
> - : java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
> - at java.util.ServiceLoader.fail(ServiceLoader.java:232)
> - at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
> - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
> - at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> - at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> - at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
> - at scala.collection.Iterator.foreach(Iterator.scala:941)
> - at scala.collection.Iterator.foreach$(Iterator.scala:941)
> - at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> - at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> - at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> - at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> - at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
> - at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
> - at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
> - at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
> - at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
> - at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
> - at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
> - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
> - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
> - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> - at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> - at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> - at java.lang.reflect.Method.invoke(Method.java:498)
> - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> - at py4j.Gateway.invoke(Gateway.java:282)
> - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> - at py4j.commands.CallCommand.execute(CallCommand.java:79)
> - at py4j.GatewayConnection.run(GatewayConnection.java:238)
> - at java.lang.Thread.run(Thread.java:748)
> - Caused by: java.lang.NoClassDefFoundError: 
> org/apache/spark/sql/execution/datasources/FileFormat$class
> - at org.apache.spark.sql.avro.AvroFileFormat.(AvroFileFormat.scala:44)
> - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> - at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> - at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> - at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> - at java.lang.Class.newInstance(Class.java:442)
> - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
> - ... 29 more
> - Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.sql.execution.datasources.FileFormat$class
> - at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> - 

[jira] [Created] (SPARK-27623) Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated

2019-05-02 Thread Alexandru Barbulescu (JIRA)
Alexandru Barbulescu created SPARK-27623:


 Summary: Provider org.apache.spark.sql.avro.AvroFileFormat could 
not be instantiated
 Key: SPARK-27623
 URL: https://issues.apache.org/jira/browse/SPARK-27623
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.2
Reporter: Alexandru Barbulescu


After updating to spark 2.4.2 when using the 
{code:java}
spark.read.format().options().load()
{code}
 

chain of methods, regardless of what parameter is passed to "format" we get the 
following error related to avro:

 
{code:java}
- .options(**load_options)
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, 
in load
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 
1257, in __call__
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in 
deco
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, 
in get_return_value
- py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
- : java.util.ServiceConfigurationError: 
org.apache.spark.sql.sources.DataSourceRegister: Provider 
org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
- at java.util.ServiceLoader.fail(ServiceLoader.java:232)
- at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
- at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
- at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
- at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
- at scala.collection.Iterator.foreach(Iterator.scala:941)
- at scala.collection.Iterator.foreach$(Iterator.scala:941)
- at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
- at scala.collection.IterableLike.foreach(IterableLike.scala:74)
- at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
- at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
- at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
- at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
- at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
- at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
- at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
- at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
- at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:498)
- at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
- at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
- at py4j.Gateway.invoke(Gateway.java:282)
- at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
- at py4j.commands.CallCommand.execute(CallCommand.java:79)
- at py4j.GatewayConnection.run(GatewayConnection.java:238)
- at java.lang.Thread.run(Thread.java:748)
- Caused by: java.lang.NoClassDefFoundError: 
org/apache/spark/sql/execution/datasources/FileFormat$class
- at org.apache.spark.sql.avro.AvroFileFormat.(AvroFileFormat.scala:44)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
- at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
- at java.lang.Class.newInstance(Class.java:442)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
- ... 29 more
- Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.execution.datasources.FileFormat$class
- at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
- ... 36 more

{code}
 

The code we run looks like this:

 
{code:java}
spark_session = (
 SparkSession.builder
 .appName(APPLICATION_NAME)
 .master(MASTER_URL)
 .config('spark.cassandra.connection.host', SERVER_IP_ADDRESS)
 .config('spark.cassandra.auth.username', CASSANDRA_USERNAME)
 .config('spark.cassandra.auth.password', CASSANDRA_PASSWORD)
 .config('spark.sql.shuffle.partitions', 16)
 .config('parquet.enable.summary-metadata', 'true')