[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

liancheng Thu, 23 Jun 2016 21:25:29 -0700

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13865#discussion_r68352258
  
    --- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
    @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
           sql("DROP TABLE IF EXISTS createAndInsertTest")
         }
       }
    +
    +  test("SPARK-13709: reading partitioned Avro table with nested schema") {
    +    withTempDir { dir =>
    +      val path = dir.getCanonicalPath
    +      val tableName = "spark_13709"
    +      val tempTableName = "spark_13709_temp"
    +
    +      new File(path, tableName).mkdir()
    +      new File(path, tempTableName).mkdir()
    +
    +      val avroSchema =
    +        """{
    +          |  "name": "test_record",
    +          |  "type": "record",
    +          |  "fields": [ {
    +          |    "name": "f0",
    +          |    "type": "int"
    +          |  }, {
    +          |    "name": "f1",
    +          |    "type": {
    +          |      "type": "record",
    +          |      "name": "inner",
    +          |      "fields": [ {
    +          |        "name": "f10",
    +          |        "type": "int"
    +          |      }, {
    +          |        "name": "f11",
    +          |        "type": "double"
    +          |      } ]
    +          |    }
    +          |  } ]
    +          |}
    +        """.stripMargin
    +
    +      withTable(tableName, tempTableName) {
    +        // Creates the external partitioned Avro table to be tested.
    +        sql(
    +          s"""CREATE EXTERNAL TABLE $tableName
    +             |PARTITIONED BY (ds STRING)
    +             |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    +             |STORED AS
    +             |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    +             |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    +             |LOCATION '$path/$tableName'
    +             |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
    +           """.stripMargin
    +        )
    +
    +        // Creates an temporary Avro table used to prepare testing Avro 
file.
    +        sql(
    +          s"""CREATE EXTERNAL TABLE $tempTableName
    +             |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    +             |STORED AS
    +             |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    +             |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    +             |LOCATION '$path/$tempTableName'
    +             |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
    +           """.stripMargin
    +        )
    +
    +        // Generates Avro data.
    +        sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
    +
    +        // Adds generated Avro data as a new partition to the testing 
table.
    +        sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
    +
    +        checkAnswer(
    +          sql(s"SELECT * FROM $tableName"),
    --- End diff --
    
    Yea, when reading data from a partition, the Avro deserializer needs to 
know the Avro schema defined in the table properties (`avro.schema.literal`). 
However, originally we only initialize the deserializer using the partition 
properties, which doesn't contain `avro.schema.literal`. This PR fixes it by 
merging to sets of properties.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

Reply via email to