[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19779


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152474528
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
+  val result = versionSpark.table(srcTableName).collect()
+  assert(versionSpark.table(destTableName).collect() === result)
+  versionSpark.sql(
+s"""INSERT INTO TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

Updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152473900
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

Sure, I'll update it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152473845
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

@gatorsmile , I tried to remove 'stripMargin', but getting 
org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '|' 
expecting {'(', 'SELECT', 'FROM', 'ADD',..}


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152465384
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
+  val result = versionSpark.table(srcTableName).collect()
+  assert(versionSpark.table(destTableName).collect() === result)
+  versionSpark.sql(
+s"""INSERT INTO TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152465374
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

`stripMargin ` is useless 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152464029
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

Thanks, I've updated the test case to test only managed tables and avoided 
creating a temp directory.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152448746
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

we can just test the managed table, to avoid creating a temp directory for 
external table.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152446382
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

@cloud-fan , This bug is for both external and managed tables.
 I've added a new test case for managed table too. However, to avoid code 
duplication, should I include both test inside same test method?. Please 
suggest.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152383898
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

This bug is for external table only? how about managed table?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152349156
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

Will change to 'CREATE EXTERNAL TABLE'


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152336476
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

do we have to provide a location for an empty table?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152335633
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
--- End diff --

nit:
```
val schemaFile = new File(dir, "avroDecimal.avsc")
val writer = new PrintWriter(schemaFile)
writer.println(avroSchema)
writer.close()
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152286208
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -800,7 +800,7 @@ class VersionsSuite extends SparkFunSuite with Logging {
   }
 }
 
-test(s"$version: read avro file containing decimal") {
+test(s"$version: SPARK-17920: read avro file containing decimal") {
--- End diff --

@cloud-fan , Yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152259957
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -800,7 +800,7 @@ class VersionsSuite extends SparkFunSuite with Logging {
   }
 }
 
-test(s"$version: read avro file containing decimal") {
+test(s"$version: SPARK-17920: read avro file containing decimal") {
--- End diff --

do you mean SPARK-17920 is already fixed because this test passes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152198714
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
--- End diff --

I am fine to keep it in `VersionSuite`, since it is related to Hive. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152067290
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""insert overwrite table $destTableName select * from 
$srcTableName""".stripMargin)
--- End diff --

Thank you for your review comments, I'll fix all your comments 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152065236
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""insert overwrite table $destTableName select * from 
$srcTableName""".stripMargin)
+  assert(versionSpark.table(destTableName).count() ===
+versionSpark.table(srcTableName).count())
+  versionSpark.sql(
+s"""insert into table $destTableName select * from 
$srcTableName""".stripMargin)
+  assert(versionSpark.table(destTableName).count()/2 ===
+versionSpark.table(srcTableName).count())
--- End diff --

If possible, can we check values instead of count?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152064637
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""insert overwrite table $destTableName select * from 
$srcTableName""".stripMargin)
--- End diff --

nit.
```
INSERT OVERWRITE TABLE $destTableName SELECT * FROM $srcTableName
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152064670
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""insert overwrite table $destTableName select * from 
$srcTableName""".stripMargin)
+  assert(versionSpark.table(destTableName).count() ===
+versionSpark.table(srcTableName).count())
+  versionSpark.sql(
+s"""insert into table $destTableName select * from 
$srcTableName""".stripMargin)
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152064200
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
--- End diff --

Could you add `SPARK-19878` in a test case name?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152062291
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
--- End diff --

Is there a reason to have this in `VersionSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-19 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r151902033
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
 ---
@@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
 val fileSinkConfSer = fileSinkConf
 new OutputWriterFactory {
   private val jobConf = new SerializableJobConf(new JobConf(conf))
+  private val broadcastHadoopConf = 
sparkSession.sparkContext.broadcast(
--- End diff --

Thanks for the comment, I'll change code to use jobConf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-19 Thread Huamei-17
Github user Huamei-17 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r151890487
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
 ---
@@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
 val fileSinkConfSer = fileSinkConf
 new OutputWriterFactory {
   private val jobConf = new SerializableJobConf(new JobConf(conf))
+  private val broadcastHadoopConf = 
sparkSession.sparkContext.broadcast(
--- End diff --

Is it possible to use jobConf as hive serde initialize param directly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-18 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19779

[SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table 
which uses Avro schema url 'avro.schema.url'

## What changes were proposed in this pull request?
Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex: 
create external table avro_in (a string) stored as avro location 
'/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location 
'/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

 insert overwrite table avro_out select * from avro_in;  // fails with 
java.lang.NullPointerException

 WARN AvroSerDe: Encountered exception determining schema. Returning signal 
schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)

## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during 
insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19779


commit 034b2466d073c008b71eae072ee98353df56cbf2
Author: vinodkc 
Date:   2017-11-18T07:52:59Z

pass hadoopConfiguration to Serializer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org