[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23215


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23215#discussion_r239460682
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -272,9 +279,14 @@ object PartitioningUtils {
   val literal = if (userSpecifiedDataTypes.contains(columnName)) {
 // SPARK-26188: if user provides corresponding column schema, get 
the column value without
 //  inference, and then cast it as user specified data 
type.
-val columnValue = inferPartitionColumnValue(rawColumnValue, false, 
timeZone)
-val castedValue =
-  Cast(columnValue, userSpecifiedDataTypes(columnName), 
Option(timeZone.getID)).eval()
+val dataType = userSpecifiedDataTypes(columnName)
+val columnValueLiteral = inferPartitionColumnValue(rawColumnValue, 
false, timeZone)
+val columnValue = columnValueLiteral.eval()
+val castedValue = Cast(columnValueLiteral, dataType, 
Option(timeZone.getID)).eval()
+if (validatePartitionColumns && columnValue != null && castedValue 
== null) {
+  throw new RuntimeException(s"Failed to cast value `$columnValue` 
to `$dataType` " +
+s"for partition column `$columnName`")
+}
 Literal.create(castedValue, userSpecifiedDataTypes(columnName))
--- End diff --

`Literal.create(castedValue, dataType)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23215#discussion_r239453312
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1396,6 +1396,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val VALIDATE_PARTITION_COLUMNS =
+buildConf("spark.sql.sources.validatePartitionColumns")
+  .internal()
+  .doc("When this option is set to true, partition column values will 
be validated with " +
+"provided schema. If the validation fails, a runtime exception is 
thrown." +
+"When this option is set to false, the partition column value will 
be converted to null " +
+"if it can not be converted to corresponding provided schema.")
--- End diff --

`... can not be casted to ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23215#discussion_r239453026
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -95,6 +95,31 @@ class FileIndexSuite extends SharedSQLContext {
 }
   }
 
+  test("SPARK-26263: Throw exception when partition value can't be 
converted to specific type") {
--- End diff --

`can't be casted to user-specified type`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23215#discussion_r239423651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -272,9 +279,13 @@ object PartitioningUtils {
   val literal = if (userSpecifiedDataTypes.contains(columnName)) {
 // SPARK-26188: if user provides corresponding column schema, get 
the column value without
 //  inference, and then cast it as user specified data 
type.
-val columnValue = inferPartitionColumnValue(rawColumnValue, false, 
timeZone)
-val castedValue =
-  Cast(columnValue, userSpecifiedDataTypes(columnName), 
Option(timeZone.getID)).eval()
+val dataType = userSpecifiedDataTypes(columnName)
+val columnValueLiteral = inferPartitionColumnValue(rawColumnValue, 
false, timeZone)
+val columnValue = columnValueLiteral.eval()
+val castedValue = Cast(columnValueLiteral, dataType, 
Option(timeZone.getID)).eval()
+if (validatePartitionColumns && columnValue != null && castedValue 
== null) {
+  throw new RuntimeException(s"Failed to cast partition value 
`$columnValue` to $dataType")
--- End diff --

Good suggestion. Thanks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23215#discussion_r239388954
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -272,9 +279,13 @@ object PartitioningUtils {
   val literal = if (userSpecifiedDataTypes.contains(columnName)) {
 // SPARK-26188: if user provides corresponding column schema, get 
the column value without
 //  inference, and then cast it as user specified data 
type.
-val columnValue = inferPartitionColumnValue(rawColumnValue, false, 
timeZone)
-val castedValue =
-  Cast(columnValue, userSpecifiedDataTypes(columnName), 
Option(timeZone.getID)).eval()
+val dataType = userSpecifiedDataTypes(columnName)
+val columnValueLiteral = inferPartitionColumnValue(rawColumnValue, 
false, timeZone)
+val columnValue = columnValueLiteral.eval()
+val castedValue = Cast(columnValueLiteral, dataType, 
Option(timeZone.getID)).eval()
+if (validatePartitionColumns && columnValue != null && castedValue 
== null) {
+  throw new RuntimeException(s"Failed to cast partition value 
`$columnValue` to $dataType")
--- End diff --

Can we also show `columnName` in this exception message? So it is easier to 
know which partition column has such error.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org