Dhruve Ashar created SPARK-26801: ------------------------------------ Summary: Spark unable to read valid avro types Key: SPARK-26801 URL: https://issues.apache.org/jira/browse/SPARK-26801 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Dhruve Ashar
Currently the external avro package reads avro schemasĀ for type records only. This is probably because of representation of InternalRow in spark sql. As a result, if the avro file has anything other than a sequence of records it fails to read it. We faced this issue earlier while trying to read primitive types. We encountered this again while trying to read an array of records. Below are code examples trying to read valid avro data showing the stack traces. {code:java} spark.read.format("avro").load("avroTypes/randomInt.avro").show java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType: "int" at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) ... 49 elided ====================================================================== scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType: { "type" : "enum", "name" : "Suit", "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ] } at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) ... 49 elided {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org