[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570880#comment-16570880 ]
Dongjoon Hyun edited comment on SPARK-24924 at 8/6/18 10:50 PM: ---------------------------------------------------------------- Yep. It will work if those 3rd-party packages are rebuilt on Apache Spark 2.4. So, it will be the next releases, not the currently existing ones. Spark hides Spark-generated metadata. You can see them via `hive` CLI like the following. 1. Run Apache Hive 1.2.2 CLI and check tables; This initialize metastores, too. {code:java} hive> show tables; OK Time taken: 1.163 seconds {code} 2. Apache Spark 2.3.1 Result (See `Provider` field) {code:java} scala> spark.version res1: String = 2.3.1 scala> spark.range(10).write.format("com.databricks.spark.avro").saveAsTable("t") scala> sql("desc formatted t").show(false) +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |id |bigint |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |t | | |Owner |dongjoon | | |Created Time |Mon Aug 06 15:41:40 PDT 2018 | | |Last Access |Wed Dec 31 16:00:00 PST 1969 | | |Created By |Spark 2.3.1 | | |Type |MANAGED | | |Provider |com.databricks.spark.avro | | |Table Properties |[transient_lastDdlTime=1533595300] | | |Location |file:/user/hive/warehouse/t | | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Storage Properties |[serialization.format=1] | | +----------------------------+---------------------------------------------------------+-------+ {code} 3. Apache Hive 1.2.2 CLI Result (See `Table Parameters`) {code:java} hive> describe formatted t; OK # col_name data_type comment col array<string> from deserializer # Detailed Table Information Database: default Owner: dongjoon CreateTime: Mon Aug 06 15:41:40 PDT 2018 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: file:/Users/dongjoon/spark-release/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t Table Type: MANAGED_TABLE Table Parameters: spark.sql.create.version 2.3.1 spark.sql.sources.provider com.databricks.spark.avro spark.sql.sources.schema.numParts 1 spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]} transient_lastDdlTime 1533595300 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: path file:/user/hive/warehouse/t serialization.format 1 Time taken: 1.373 seconds, Fetched: 31 row(s) {code} was (Author: dongjoon): Yep. It will work if those 3rd-party packages are rebuilt on Apache Spark 2.4. So, it will be the next releases, not the currently existing ones. Spark hides Spark-generated metadata. You can see them via `hive` CLI like the following. 1. Run Apache Hive 1.2.2 CLI and check tables; This initialize metastores, too. {code} hive> show tables; OK Time taken: 1.163 seconds {code} 2. Apache Spark 2.3.1 Result {code} scala> spark.version res1: String = 2.3.1 scala> spark.range(10).write.format("com.databricks.spark.avro").saveAsTable("t") scala> sql("desc formatted t").show(false) +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |id |bigint |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |t | | |Owner |dongjoon | | |Created Time |Mon Aug 06 15:41:40 PDT 2018 | | |Last Access |Wed Dec 31 16:00:00 PST 1969 | | |Created By |Spark 2.3.1 | | |Type |MANAGED | | |Provider |com.databricks.spark.avro | | |Table Properties |[transient_lastDdlTime=1533595300] | | |Location |file:/user/hive/warehouse/t | | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Storage Properties |[serialization.format=1] | | +----------------------------+---------------------------------------------------------+-------+ {code} 3. Apache Hive 1.2.2 CLI Result {code} hive> describe formatted t; OK # col_name data_type comment col array<string> from deserializer # Detailed Table Information Database: default Owner: dongjoon CreateTime: Mon Aug 06 15:41:40 PDT 2018 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: file:/Users/dongjoon/spark-release/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t Table Type: MANAGED_TABLE Table Parameters: spark.sql.create.version 2.3.1 spark.sql.sources.provider com.databricks.spark.avro spark.sql.sources.schema.numParts 1 spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]} transient_lastDdlTime 1533595300 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: path file:/user/hive/warehouse/t serialization.format 1 Time taken: 1.373 seconds, Fetched: 31 row(s) {code} > Add mapping for built-in Avro data source > ----------------------------------------- > > Key: SPARK-24924 > URL: https://issues.apache.org/jira/browse/SPARK-24924 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Minor > Fix For: 2.4.0 > > > This issue aims to the followings. > # Like `com.databricks.spark.csv` mapping, we had better map > `com.databricks.spark.avro` to built-in Avro data source. > # Remove incorrect error message, `Please find an Avro package at ...`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org