[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19779 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152474528 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +val schemaFile = new File(schemaPath, "avroDecimal.avsc") +val writer = new PrintWriter(schemaFile) +writer.write(avroSchema) +writer.close() + +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE EXTERNAL TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + versionSpark.sql( +s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) + val result = versionSpark.table(srcTableName).collect() + assert(versionSpark.table(destTableName).collect() === result) + versionSpark.sql( +s"""INSERT INTO TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) --- End diff -- Updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152473900 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +val schemaFile = new File(schemaPath, "avroDecimal.avsc") +val writer = new PrintWriter(schemaFile) +writer.write(avroSchema) +writer.close() + +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE EXTERNAL TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + versionSpark.sql( +s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) --- End diff -- Sure, I'll update it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152473845 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +val schemaFile = new File(schemaPath, "avroDecimal.avsc") +val writer = new PrintWriter(schemaFile) +writer.write(avroSchema) +writer.close() + +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE EXTERNAL TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + versionSpark.sql( +s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) --- End diff -- @gatorsmile , I tried to remove 'stripMargin', but getting org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '|' expecting {'(', 'SELECT', 'FROM', 'ADD',..} --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152465384 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +val schemaFile = new File(schemaPath, "avroDecimal.avsc") +val writer = new PrintWriter(schemaFile) +writer.write(avroSchema) +writer.close() + +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE EXTERNAL TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + versionSpark.sql( +s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) + val result = versionSpark.table(srcTableName).collect() + assert(versionSpark.table(destTableName).collect() === result) + versionSpark.sql( +s"""INSERT INTO TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) --- End diff -- The same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152465374 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +val schemaFile = new File(schemaPath, "avroDecimal.avsc") +val writer = new PrintWriter(schemaFile) +writer.write(avroSchema) +writer.close() + +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE EXTERNAL TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl') + """.stripMargin + ) + versionSpark.sql( +s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName""".stripMargin) --- End diff -- `stripMargin ` is useless --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152464029 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- Thanks, I've updated the test case to test only managed tables and avoided creating a temp directory. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152448746 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- we can just test the managed table, to avoid creating a temp directory for external table. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152446382 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- @cloud-fan , This bug is for both external and managed tables. I've added a new test case for managed table too. However, to avoid code duplication, should I include both test inside same test method?. Please suggest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152383898 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- This bug is for external table only? how about managed table? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152349156 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- Will change to 'CREATE EXTERNAL TABLE' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152336476 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- do we have to provide a location for an empty table? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152335633 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" --- End diff -- nit: ``` val schemaFile = new File(dir, "avroDecimal.avsc") val writer = new PrintWriter(schemaFile) writer.println(avroSchema) writer.close() ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152286208 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -800,7 +800,7 @@ class VersionsSuite extends SparkFunSuite with Logging { } } -test(s"$version: read avro file containing decimal") { +test(s"$version: SPARK-17920: read avro file containing decimal") { --- End diff -- @cloud-fan , Yes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152259957 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -800,7 +800,7 @@ class VersionsSuite extends SparkFunSuite with Logging { } } -test(s"$version: read avro file containing decimal") { +test(s"$version: SPARK-17920: read avro file containing decimal") { --- End diff -- do you mean SPARK-17920 is already fixed because this test passes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152198714 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { --- End diff -- I am fine to keep it in `VersionSuite`, since it is related to Hive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152067290 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + versionSpark.sql( +s"""insert overwrite table $destTableName select * from $srcTableName""".stripMargin) --- End diff -- Thank you for your review comments, I'll fix all your comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152065236 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + versionSpark.sql( +s"""insert overwrite table $destTableName select * from $srcTableName""".stripMargin) + assert(versionSpark.table(destTableName).count() === +versionSpark.table(srcTableName).count()) + versionSpark.sql( +s"""insert into table $destTableName select * from $srcTableName""".stripMargin) + assert(versionSpark.table(destTableName).count()/2 === +versionSpark.table(srcTableName).count()) --- End diff -- If possible, can we check values instead of count? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152064637 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + versionSpark.sql( +s"""insert overwrite table $destTableName select * from $srcTableName""".stripMargin) --- End diff -- nit. ``` INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152064670 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + versionSpark.sql( +s"""insert overwrite table $destTableName select * from $srcTableName""".stripMargin) + assert(versionSpark.table(destTableName).count() === +versionSpark.table(srcTableName).count()) + versionSpark.sql( +s"""insert into table $destTableName select * from $srcTableName""".stripMargin) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152064200 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { --- End diff -- Could you add `SPARK-19878` in a test case name? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152062291 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: Insert into/overwrite external avro table") { --- End diff -- Is there a reason to have this in `VersionSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r151902033 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala --- @@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc) val fileSinkConfSer = fileSinkConf new OutputWriterFactory { private val jobConf = new SerializableJobConf(new JobConf(conf)) + private val broadcastHadoopConf = sparkSession.sparkContext.broadcast( --- End diff -- Thanks for the comment, I'll change code to use jobConf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user Huamei-17 commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r151890487 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala --- @@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc) val fileSinkConfSer = fileSinkConf new OutputWriterFactory { private val jobConf = new SerializableJobConf(new JobConf(conf)) + private val broadcastHadoopConf = sparkSession.sparkContext.broadcast( --- End diff -- Is it possible to use jobConf as hive serde initialize param directly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
GitHub user vinodkc opened a pull request: https://github.com/apache/spark/pull/19779 [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table which uses Avro schema url 'avro.schema.url' ## What changes were proposed in this pull request? Support writing to Hive table which uses Avro schema url 'avro.schema.url' For ex: create external table avro_in (a string) stored as avro location '/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc'); create external table avro_out (a string) stored as avro location '/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc'); insert overwrite table avro_out select * from avro_in; // fails with java.lang.NullPointerException WARN AvroSerDe: Encountered exception determining schema. Returning signal schema to indicate problem java.lang.NullPointerException at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174) ## Changes proposed in this fix Currently 'null' value is passed to serializer, which causes NPE during insert operation, instead pass Hadoop configuration object ## How was this patch tested? Added new test case in VersionsSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19779 commit 034b2466d073c008b71eae072ee98353df56cbf2 Author: vinodkc Date: 2017-11-18T07:52:59Z pass hadoopConfiguration to Serializer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org