[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13865


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352383
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

it's inside `withTable`, tables will be dropped automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352349
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

Yea, sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352336
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

oh, nvm. We have withTable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352322
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

drop the table at the end of this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352282
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

yea, it is a good idea to add comments to explain why this one failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352258
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

Yea, when reading data from a partition, the Avro deserializer needs to 
know the Avro schema defined in the table properties (`avro.schema.literal`). 
However, originally we only initialize the deserializer using the partition 
properties, which doesn't contain `avro.schema.literal`. This PR fixes it by 
merging to sets of properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68343609
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

can you explain it a bit more how this query fails without your patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68335825
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -230,10 +234,21 @@ class HadoopTableReader(
   // Fill all partition keys to the given MutableRow object
   fillPartitionKeys(partValues, mutableRow)
 
+  val tableProperties = relation.tableDesc.getProperties
--- End diff --

Local variable for serialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-22 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/13865

[SPARK-13709][SQL] Initialize deserializer with both table and partition 
properties when reading partitioned tables

## What changes were proposed in this pull request?

When reading partitions of a partitioned Hive SerDe table, we only 
initializes the deserializer using partition properties. However, for SerDes 
like `AvroSerDe`, essential properties (e.g. Avro schema information) may be 
defined in table properties. We should merge both table properties and 
partition properties before initializing the deserializer..

## How was this patch tested?

New test case added `QueryPartitionSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark 
spark-13709-partitioned-avro-table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13865


commit 0901218c1f9b5e496d0964b83cfd37c552b97f88
Author: Cheng Lian 
Date:   2016-06-22T23:51:27Z

Fixes SPARK-13709




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org