[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459189586



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
##
@@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors {
   case DateType => dateTypeInfo
   case TimestampType => timestampTypeInfo
   case NullType => voidTypeInfo
+  case dt =>
+throw new AnalysisException("HiveInspectors does not support convert " 
+

Review comment:
   > nit: `s"${dt.catalogString}" cannot be converted to Hive TypeInfo"`
   
   same reason like 
https://github.com/apache/spark/pull/29085#discussion_r459189448
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


maropu commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662794540


   Looks almost okay now, so could you split this PR into pieces? I think its 
somewhat big fro reviews. For example;
1.  More refactoring PR  for `HiveScriptTransformationExec` and 
`BaseScriptTransfromationExec` just like SPARK-32105
2. PR to improve test coverage of `HiveScriptTransformationExec`
3. Then, PR to implement Spark-native TRRANSFORM
   
   WDTY?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459188140



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
##
@@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors {
   case DateType => dateTypeInfo
   case TimestampType => timestampTypeInfo
   case NullType => voidTypeInfo
+  case dt =>
+throw new AnalysisException("HiveInspectors does not support convert " 
+

Review comment:
   nit: `s"${dt.catalogString}" cannot be converted to Hive TypeInfo"`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459188225



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr 
=>
+val converter = 
CatalystTypeConverters.createToCatalystConverter(attr.dataType)
+attr.dataType match {
+  case StringType => wrapperConvertException(data => data, converter)
+  case BooleanType => wrapperConvertException(data => data.toBoolean, 
converter)
+  case ByteType => wrapperConvertException(data => data.toByte, converter)
+  case BinaryType =>
+wrapperConvertException(data => UTF8String.fromString(data).getBytes, 
converter)
+  case IntegerType => wrapperConvertException(data => data.toInt, 
converter)
+  case ShortType => wrapperConvertException(data => data.toShort, 
converter)
+  case LongType => wrapperConvertException(data => data.toLong, converter)
+  case FloatType => wrapperConvertException(data => data.toFloat, 
converter)
+  case DoubleType => wrapperConvertException(data => data.toDouble, 
converter)
+  case _: DecimalType => wrapperConvertException(data => BigDecimal(data), 
converter)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+wrapperConvertException(data => DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull, converter)
+  case DateType => wrapperConvertException(data => 
DateTimeUtils.stringToDate(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaDate).orNull, converter)
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+ wrapperConvertException(data => DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull, converter)
+  case TimestampType => wrapperConvertException(data => 
DateTimeUtils.stringToTimestamp(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaTimestamp).orNull, converter)
+  case CalendarIntervalType => wrapperConvertException(
+data => IntervalUtils.stringToInterval(UTF8String.fromString(data)),
+converter)
+  case udt: UserDefinedType[_] =>
+wrapperConvertException(data => udt.deserialize(data), converter)
+  case dt =>
+throw new SparkException("TRANSFORM without serde does not support " +
+  s"${dt.getClass.getSimpleName} as output data type")

Review comment:
   `dt.getClass.getSimpleName` -> `dt.catalogString`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p

2020-07-22 Thread GitBox


Ngone51 commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459187090



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,79 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  // Only return rewrite attributes which could be used by the parent 
node.
+  // Otherwise, it could introduce duplicate rewrite attributes. For 
example,
+  // for the following plan, if we don't do filter for the 
`childAttrMapping`,
+  // the node `SubqueryAlias b` will return rewrite attribute of 
[kind#220 -> kind#228]
+  // (which is from the conflict plan `Project [id#218, foo AS 
kind#228]`), and the node
+  // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> 
kind#229] (which
+  // is from the conflict plan `Project [id#227, foo AS kind#229]`). 
As a result, the top
+  // Join will have duplicated rewrite attribute.
+  //
+  // The problem is, the plan `Join Inner, (kind#229 = kind#223)` 
shouldn't keep returning
+  // rewrite attribute of [kind#220 -> kind#229] to its parent node 
`Project [id#227]` as
+  // it doesn't really need it.
+  //
+  // Join Inner, (id#218 = id#227)
+  // :- SubqueryAlias b
+  // :  +- Project [id#218, foo AS kind#228]
+  // : +- SubqueryAlias a
+  // :+- Project [1 AS id#218]
+  // :   +- OneRowRelation
+  // +- SubqueryAlias c
+  //+- Project [id#227]
+  //   +- Join Inner, (kind#229 = kind#223)
+  //  :- SubqueryAlias l
+  //  :  +- SubqueryAlias b
+  //  : +- Project [id#227, foo AS kind#229]
+  //  :+- SubqueryAlias a
+  //  :   +- Project [1 AS id#227]
+  //  :  +- OneRowRelation
+  //  +- SubqueryAlias r
+  // +- SubqueryAlias b
+  //+- Project [id#224, foo AS kind#223]
+  //   +- SubqueryAlias a
+  //  +- Project [1 AS id#224]
+  // +- OneRowRelation
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+(plan.references ++ plan.outputSet ++ 
plan.producedAttributes).contains(oldAttr)

Review comment:
   Yes, they won't. I mean, you probably have a typo above:
   
> Seems we need plan.outputSet too.
   
   IIUC, it sounds like you're actually meaning the ` plan.references` 
according to the context.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459186587



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr 
=>
+val converter = 
CatalystTypeConverters.createToCatalystConverter(attr.dataType)
+attr.dataType match {
+  case StringType => wrapperConvertException(data => data, converter)
+  case BooleanType => wrapperConvertException(data => data.toBoolean, 
converter)
+  case ByteType => wrapperConvertException(data => data.toByte, converter)
+  case BinaryType =>
+wrapperConvertException(data => UTF8String.fromString(data).getBytes, 
converter)
+  case IntegerType => wrapperConvertException(data => data.toInt, 
converter)
+  case ShortType => wrapperConvertException(data => data.toShort, 
converter)
+  case LongType => wrapperConvertException(data => data.toLong, converter)
+  case FloatType => wrapperConvertException(data => data.toFloat, 
converter)
+  case DoubleType => wrapperConvertException(data => data.toDouble, 
converter)
+  case _: DecimalType => wrapperConvertException(data => BigDecimal(data), 
converter)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+wrapperConvertException(data => DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull, converter)
+  case DateType => wrapperConvertException(data => 
DateTimeUtils.stringToDate(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaDate).orNull, converter)
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+ wrapperConvertException(data => DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull, converter)
+  case TimestampType => wrapperConvertException(data => 
DateTimeUtils.stringToTimestamp(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaTimestamp).orNull, converter)
+  case CalendarIntervalType => wrapperConvertException(
+data => IntervalUtils.stringToInterval(UTF8String.fromString(data)),
+converter)
+  case udt: UserDefinedType[_] =>
+wrapperConvertException(data => udt.deserialize(data), converter)
+  case dt =>
+throw new SparkException("TRANSFORM without serde does not support " +

Review comment:
   nit: `TRANSFORM` -> `s"$nodeName...`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict pl

2020-07-22 Thread GitBox


viirya commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459186345



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,79 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  // Only return rewrite attributes which could be used by the parent 
node.
+  // Otherwise, it could introduce duplicate rewrite attributes. For 
example,
+  // for the following plan, if we don't do filter for the 
`childAttrMapping`,
+  // the node `SubqueryAlias b` will return rewrite attribute of 
[kind#220 -> kind#228]
+  // (which is from the conflict plan `Project [id#218, foo AS 
kind#228]`), and the node
+  // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> 
kind#229] (which
+  // is from the conflict plan `Project [id#227, foo AS kind#229]`). 
As a result, the top
+  // Join will have duplicated rewrite attribute.
+  //
+  // The problem is, the plan `Join Inner, (kind#229 = kind#223)` 
shouldn't keep returning
+  // rewrite attribute of [kind#220 -> kind#229] to its parent node 
`Project [id#227]` as
+  // it doesn't really need it.
+  //
+  // Join Inner, (id#218 = id#227)
+  // :- SubqueryAlias b
+  // :  +- Project [id#218, foo AS kind#228]
+  // : +- SubqueryAlias a
+  // :+- Project [1 AS id#218]
+  // :   +- OneRowRelation
+  // +- SubqueryAlias c
+  //+- Project [id#227]
+  //   +- Join Inner, (kind#229 = kind#223)
+  //  :- SubqueryAlias l
+  //  :  +- SubqueryAlias b
+  //  : +- Project [id#227, foo AS kind#229]
+  //  :+- SubqueryAlias a
+  //  :   +- Project [1 AS id#227]
+  //  :  +- OneRowRelation
+  //  +- SubqueryAlias r
+  // +- SubqueryAlias b
+  //+- Project [id#224, foo AS kind#223]
+  //   +- SubqueryAlias a
+  //  +- Project [1 AS id#224]
+  // +- OneRowRelation
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+(plan.references ++ plan.outputSet ++ 
plan.producedAttributes).contains(oldAttr)

Review comment:
   `plan.references` won't be the output, right? I guess they won't 
conflict with others?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29166:
URL: https://github.com/apache/spark/pull/29166#issuecomment-662791663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662791661







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29166:
URL: https://github.com/apache/spark/pull/29166#issuecomment-662791663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662791661







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan

2020-07-22 Thread GitBox


SparkQA commented on pull request #29166:
URL: https://github.com/apache/spark/pull/29166#issuecomment-662791379


   **[Test build #126375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126375/testReport)**
 for PR 29166 at commit 
[`f73f186`](https://github.com/apache/spark/commit/f73f1861b4e76182cedaaf94ddd5061c1cf48dc6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


SparkQA commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662791349


   **[Test build #126374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126374/testReport)**
 for PR 29192 at commit 
[`1000377`](https://github.com/apache/spark/commit/100037d580766ec2db4ffc230dae045838e0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p

2020-07-22 Thread GitBox


Ngone51 commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459185548



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,79 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  // Only return rewrite attributes which could be used by the parent 
node.
+  // Otherwise, it could introduce duplicate rewrite attributes. For 
example,
+  // for the following plan, if we don't do filter for the 
`childAttrMapping`,
+  // the node `SubqueryAlias b` will return rewrite attribute of 
[kind#220 -> kind#228]
+  // (which is from the conflict plan `Project [id#218, foo AS 
kind#228]`), and the node
+  // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> 
kind#229] (which
+  // is from the conflict plan `Project [id#227, foo AS kind#229]`). 
As a result, the top
+  // Join will have duplicated rewrite attribute.
+  //
+  // The problem is, the plan `Join Inner, (kind#229 = kind#223)` 
shouldn't keep returning
+  // rewrite attribute of [kind#220 -> kind#229] to its parent node 
`Project [id#227]` as
+  // it doesn't really need it.
+  //
+  // Join Inner, (id#218 = id#227)
+  // :- SubqueryAlias b
+  // :  +- Project [id#218, foo AS kind#228]
+  // : +- SubqueryAlias a
+  // :+- Project [1 AS id#218]
+  // :   +- OneRowRelation
+  // +- SubqueryAlias c
+  //+- Project [id#227]
+  //   +- Join Inner, (kind#229 = kind#223)
+  //  :- SubqueryAlias l
+  //  :  +- SubqueryAlias b
+  //  : +- Project [id#227, foo AS kind#229]
+  //  :+- SubqueryAlias a
+  //  :   +- Project [1 AS id#227]
+  //  :  +- OneRowRelation
+  //  +- SubqueryAlias r
+  // +- SubqueryAlias b
+  //+- Project [id#224, foo AS kind#223]
+  //   +- SubqueryAlias a
+  //  +- Project [1 AS id#224]
+  // +- OneRowRelation
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+(plan.references ++ plan.outputSet ++ 
plan.producedAttributes).contains(oldAttr)

Review comment:
   > Seems we need plan.outputSet too. I think it is similar to case of 
aliases in projection above you added.
   
   Yeah, "aliases in projection" is another case. And I assume you were 
referring `plan.references` too. @viirya 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p

2020-07-22 Thread GitBox


Ngone51 commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459185098



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,79 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  // Only return rewrite attributes which could be used by the parent 
node.
+  // Otherwise, it could introduce duplicate rewrite attributes. For 
example,
+  // for the following plan, if we don't do filter for the 
`childAttrMapping`,
+  // the node `SubqueryAlias b` will return rewrite attribute of 
[kind#220 -> kind#228]

Review comment:
   I've changed to the short comment as you mentioned above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-22 Thread GitBox


HyukjinKwon commented on a change in pull request #29188:
URL: https://github.com/apache/spark/pull/29188#discussion_r459184480



##
File path: dev/requirements.txt
##
@@ -3,3 +3,4 @@ jira==1.0.3
 PyGithub==1.26.0
 Unidecode==0.04.19
 sphinx
+pydata_sphinx_theme

Review comment:
   I thought about this for a while actually especially given the 
discussion we had before.
   `pydata_sphinx_theme` is actually relatively very new and it has [frequent 
releases](https://github.com/pandas-dev/pydata-sphinx-theme/tags). I would like 
to not pin the versions for now to promote dev people to keep it up to date.. 
maybe we could see how it goes if that sounds fine to you as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


maropu edited a comment on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662779093


   `bpchar` is internally used in postgresql [1][2], but users can use it in 
DDL;
- [1] https://www.postgresql.org/docs/current/datatype-character.html
- [2] https://www.postgresql.org/docs/current/typeconv-query.html
   ```
   postgres=# create table t (v bpchar);
   CREATE TABLE
   ```
   So, I think its okay to add it for better interoperability.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662557759


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


holdenk commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-662790280


   Cool I’ll try merge this after dinner, thanks everyone for working on and 
reviewing this 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662790096







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


maropu commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662790228


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662790096







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-22 Thread GitBox


HyukjinKwon commented on a change in pull request #29188:
URL: https://github.com/apache/spark/pull/29188#discussion_r459184480



##
File path: dev/requirements.txt
##
@@ -3,3 +3,4 @@ jira==1.0.3
 PyGithub==1.26.0
 Unidecode==0.04.19
 sphinx
+pydata_sphinx_theme

Review comment:
   I thought about this for a while actually especially given the 
discussion we had before.
   `pydata_sphinx_theme` is actually relatively very new and it has [frequent 
releases](https://github.com/pandas-dev/pydata-sphinx-theme/tags). I would like 
to not pin the versions for now and see how it goes if that sounds fine to you 
as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662789794


   **[Test build #126373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126373/testReport)**
 for PR 29085 at commit 
[`7f3cff8`](https://github.com/apache/spark/commit/7f3cff81d6238d8de6ea2c57dc94eed5abe1ff75).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


agrawaldevesh commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-662789786


   @holdenk... this is ready to go in. Thanks !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-22 Thread GitBox


HyukjinKwon commented on pull request #29188:
URL: https://github.com/apache/spark/pull/29188#issuecomment-662789431


   > I wonder if there is some non-hacky way to organize functions into logical 
groups, similarly to what ScalaDoc does.
   
   I tried hard but looked difficult to do. I will take a look one more time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662788634







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662788634







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459182797



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
##
@@ -1031,4 +1031,96 @@ class PlanParserSuite extends AnalysisTest {
 assertEqual("select a, b from db.c;;;", table("db", "c").select('a, 'b))
 assertEqual("select a, b from db.c; ;;  ;", table("db", "c").select('a, 
'b))
   }
+
+  test("SPARK-32106: TRANSFORM without serde") {

Review comment:
   > Also, could you check `ROW FORMAT SERDE`, too?
   
   Add UT





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662788394


   **[Test build #126372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126372/testReport)**
 for PR 29085 at commit 
[`be80c27`](https://github.com/apache/spark/commit/be80c27557f614074ac50e2674180a778c0208ab).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662787840


   All done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459182038



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala
##
@@ -54,87 +52,110 @@ case class HiveScriptTransformationExec(
 script: String,
 output: Seq[Attribute],
 child: SparkPlan,
-ioschema: HiveScriptIOSchema)
-  extends BaseScriptTransformationExec {
+ioschema: ScriptTransformationIOSchema)
+  extends BaseScriptTransformationExec with HiveInspectors {
 
-  override def processIterator(
-  inputIterator: Iterator[InternalRow],
-  hadoopConf: Configuration): Iterator[InternalRow] = {
-val cmd = List("/bin/bash", "-c", script)
-val builder = new ProcessBuilder(cmd.asJava)
-
-val proc = builder.start()
-val inputStream = proc.getInputStream
-val outputStream = proc.getOutputStream
-val errorStream = proc.getErrorStream
-
-// In order to avoid deadlocks, we need to consume the error output of the 
child process.
-// To avoid issues caused by large error output, we use a circular buffer 
to limit the amount
-// of error output that we retain. See SPARK-7862 for more discussion of 
the deadlock / hang
-// that motivates this.
-val stderrBuffer = new CircularBuffer(2048)
-new RedirectThread(
-  errorStream,
-  stderrBuffer,
-  "Thread-ScriptTransformation-STDERR-Consumer").start()
+  private def initInputSerDe(
+  input: Seq[Expression]): Option[(AbstractSerDe, StructObjectInspector)] 
= {
+ioschema.inputSerdeClass.map { serdeClass =>
+  val (columns, columnTypes) = parseAttrs(input)
+  val serde = initSerDe(serdeClass, columns, columnTypes, 
ioschema.inputSerdeProps)
+  val fieldObjectInspectors = columnTypes.map(toInspector)
+  val objectInspector = ObjectInspectorFactory
+.getStandardStructObjectInspector(columns.asJava, 
fieldObjectInspectors.asJava)
+  (serde, objectInspector)
+}
+  }
 
-val outputProjection = new InterpretedProjection(input, child.output)
+  private def initOutputSerDe(
+  output: Seq[Attribute]): Option[(AbstractSerDe, StructObjectInspector)] 
= {
+ioschema.outputSerdeClass.map { serdeClass =>
+  val (columns, columnTypes) = parseAttrs(output)
+  val serde = initSerDe(serdeClass, columns, columnTypes, 
ioschema.outputSerdeProps)
+  val structObjectInspector = 
serde.getObjectInspector().asInstanceOf[StructObjectInspector]
+  (serde, structObjectInspector)
+}
+  }
 
-// This nullability is a performance optimization in order to avoid an 
Option.foreach() call
-// inside of a loop
-@Nullable val (inputSerde, inputSoi) = 
ioschema.initInputSerDe(input).getOrElse((null, null))
+  private def parseAttrs(attrs: Seq[Expression]): (Seq[String], Seq[DataType]) 
= {
+val columns = attrs.zipWithIndex.map(e => s"${e._1.prettyName}_${e._2}")
+val columnTypes = attrs.map(_.dataType)
+(columns, columnTypes)
+  }
 
-// This new thread will consume the ScriptTransformation's input rows and 
write them to the
-// external process. That process's output will be read by this current 
thread.
-val writerThread = new HiveScriptTransformationWriterThread(
-  inputIterator.map(outputProjection),
-  input.map(_.dataType),
-  inputSerde,
-  inputSoi,
-  ioschema,
-  outputStream,
-  proc,
-  stderrBuffer,
-  TaskContext.get(),
-  hadoopConf
-)
+  private def initSerDe(

Review comment:
   > Sorry for the confusion, but, on second thought, its better to pull 
out hive-serde related functions from `HiveScriptTransformationExec` then 
create a companion object having them for readability 
[maropu@972775b](https://github.com/maropu/spark/commit/972775b821406d81d3c1ba1c718de3037a0ca068).
 WDTY?
   
   Agree, make ScripTransformExec only handle data process. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29194:
URL: https://github.com/apache/spark/pull/29194#issuecomment-662787285







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29194:
URL: https://github.com/apache/spark/pull/29194#issuecomment-662787285







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29194:
URL: https://github.com/apache/spark/pull/29194#issuecomment-662702681


   **[Test build #126361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126361/testReport)**
 for PR 29194 at commit 
[`f9b8691`](https://github.com/apache/spark/commit/f9b8691aee331f214392fdcf108e6ad2cfa6799e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.

2020-07-22 Thread GitBox


SparkQA commented on pull request #29194:
URL: https://github.com/apache/spark/pull/29194#issuecomment-662786999


   **[Test build #126361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126361/testReport)**
 for PR 29194 at commit 
[`f9b8691`](https://github.com/apache/spark/commit/f9b8691aee331f214392fdcf108e6ad2cfa6799e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662785780







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662785780







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p

2020-07-22 Thread GitBox


Ngone51 commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459179732



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,79 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  // Only return rewrite attributes which could be used by the parent 
node.
+  // Otherwise, it could introduce duplicate rewrite attributes. For 
example,
+  // for the following plan, if we don't do filter for the 
`childAttrMapping`,
+  // the node `SubqueryAlias b` will return rewrite attribute of 
[kind#220 -> kind#228]
+  // (which is from the conflict plan `Project [id#218, foo AS 
kind#228]`), and the node
+  // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> 
kind#229] (which
+  // is from the conflict plan `Project [id#227, foo AS kind#229]`). 
As a result, the top
+  // Join will have duplicated rewrite attribute.
+  //
+  // The problem is, the plan `Join Inner, (kind#229 = kind#223)` 
shouldn't keep returning
+  // rewrite attribute of [kind#220 -> kind#229] to its parent node 
`Project [id#227]` as
+  // it doesn't really need it.
+  //
+  // Join Inner, (id#218 = id#227)
+  // :- SubqueryAlias b
+  // :  +- Project [id#218, foo AS kind#228]
+  // : +- SubqueryAlias a
+  // :+- Project [1 AS id#218]
+  // :   +- OneRowRelation
+  // +- SubqueryAlias c
+  //+- Project [id#227]
+  //   +- Join Inner, (kind#229 = kind#223)
+  //  :- SubqueryAlias l
+  //  :  +- SubqueryAlias b
+  //  : +- Project [id#227, foo AS kind#229]
+  //  :+- SubqueryAlias a
+  //  :   +- Project [1 AS id#227]
+  //  :  +- OneRowRelation
+  //  +- SubqueryAlias r
+  // +- SubqueryAlias b
+  //+- Project [id#224, foo AS kind#223]
+  //   +- SubqueryAlias a
+  //  +- Project [1 AS id#224]
+  // +- OneRowRelation
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+(plan.references ++ plan.outputSet ++ 
plan.producedAttributes).contains(oldAttr)

Review comment:
   `plan.references` is also necessary? For example,  Join's condition 
references the attributes but its output doesn't contain the attribute.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


SparkQA commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662785553


   **[Test build #126371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126371/testReport)**
 for PR 29196 at commit 
[`9c12088`](https://github.com/apache/spark/commit/9c120880b35f1ddeaec6161481380c0b3e601eea).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459178460



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##
@@ -532,6 +532,21 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 }
   }
 
+  object SparkScripts extends Strategy {
+def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+  case logical.ScriptTransformation(input, script, output, child, ioschema)
+if ioschema.inputSerdeClass.isEmpty && 
ioschema.outputSerdeClass.isEmpty =>

Review comment:
   > We need to check this here? Seems like it has been checked in 
https://github.com/apache/spark/pull/29085/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R783-R784
 ?
   
   Yea, don't need now





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


maropu edited a comment on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662784282


   btw, thanks for your first contribution, @kujon !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


maropu commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662784282


   btw, thanks for your contribution, @kujon !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29159:
URL: https://github.com/apache/spark/pull/29159#issuecomment-662782378







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29159:
URL: https://github.com/apache/spark/pull/29159#issuecomment-662782378







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29159:
URL: https://github.com/apache/spark/pull/29159#issuecomment-662767092


   **[Test build #126367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126367/testReport)**
 for PR 29159 at commit 
[`52dd9b4`](https://github.com/apache/spark/commit/52dd9b4cf03381134a181df817b98daeb30e4616).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity

2020-07-22 Thread GitBox


SparkQA commented on pull request #29159:
URL: https://github.com/apache/spark/pull/29159#issuecomment-662782139


   **[Test build #126367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126367/testReport)**
 for PR 29159 at commit 
[`52dd9b4`](https://github.com/apache/spark/commit/52dd9b4cf03381134a181df817b98daeb30e4616).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662781417


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126370/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459175410



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
##
@@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors {
   case DateType => dateTypeInfo
   case TimestampType => timestampTypeInfo
   case NullType => voidTypeInfo
+  case dt =>
+throw new AnalysisException("TRANSFORM with hive serde does not 
support " +

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662781414







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662781211


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126369/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lo

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662780801


   **[Test build #126370 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/testReport)**
 for PR 29193 at commit 
[`e54f221`](https://github.com/apache/spark/commit/e54f22186a1f82ac9704f6644e057dadfc285b86).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662781414


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-22 Thread GitBox


SparkQA commented on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662781403


   **[Test build #126370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/testReport)**
 for PR 29193 at commit 
[`e54f221`](https://github.com/apache/spark/commit/e54f22186a1f82ac9704f6644e057dadfc285b86).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662780783


   **[Test build #126369 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126369/testReport)**
 for PR 29196 at commit 
[`5b20b65`](https://github.com/apache/spark/commit/5b20b650bf974f0d47822930d0c386038e495ccc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459175002



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkScriptTransformationSuite.scala
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.{SparkException, TestUtils}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
+import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.test.SharedSparkSession
+
+class SparkScriptTransformationSuite extends BaseScriptTransformationSuite 
with SharedSparkSession {
+  import testImplicits._
+
+  override def isHive23OrSpark: Boolean = true
+
+  override def createScriptTransformationExec(
+  input: Seq[Expression],
+  script: String,
+  output: Seq[Attribute],
+  child: SparkPlan,
+  ioschema: ScriptTransformationIOSchema): BaseScriptTransformationExec = {
+SparkScriptTransformationExec(
+  input = input,
+  script = script,
+  output = output,
+  child = child,
+  ioschema = ioschema
+)
+  }
+
+  test("SPARK-32106: TRANSFORM with serde without hive should throw 
exception") {
+assume(TestUtils.testCommandAvailable("/bin/bash"))
+withTempView("v") {
+  val df = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
+  df.createTempView("v")
+
+  val e = intercept[ParseException] {
+sql(
+  """
+|SELECT TRANSFORM (a)
+|ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
+|USING 'cat' AS (a)
+|ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
+|FROM v
+  """.stripMargin)
+  }.getMessage
+  assert(e.contains("TRANSFORM with serde is only supported in hive mode"))
+}
+  }
+
+  test("TRANSFORM doesn't support ArrayType/MapType/StructType as output data 
type (no serde)") {

Review comment:
   DOne





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662781049







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662781036







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662781205







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459174831



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala
##
@@ -206,75 +169,147 @@ class HiveScriptTransformationSuite extends 
SparkPlanTest with SQLTestUtils with
 
   val query = sql(
 s"""
-  |SELECT
-  |TRANSFORM(a, b, c, d, e)
-  |USING 'python $scriptFilePath' AS (a, b, c, d, e)
-  |FROM v
+   |SELECT TRANSFORM(a, b, c, d, e)
+   |USING 'python ${scriptFilePath}'
+   |FROM v
 """.stripMargin)
 
-  // In Hive 1.2, the string representation of a decimal omits trailing 
zeroes.
-  // But in Hive 2.3, it is always padded to 18 digits with trailing 
zeroes if necessary.
-  val decimalToString: Column => Column = if (HiveUtils.isHive23) {
-c => c.cast("string")
-  } else {
-c => c.cast("decimal(1, 0)").cast("string")
-  }
-  checkAnswer(query, identity, df.select(
-'a.cast("string"),
-'b.cast("string"),
-'c.cast("string"),
-decimalToString('d),
-'e.cast("string")).collect())
+  // In hive default serde mode, if we don't define output schema, it will 
choose first
+  // two column as output schema (key: String, value: String)
+  checkAnswer(
+query,
+identity,
+df.select(
+  'a.cast("string").as("key"),
+  'b.cast("string").as("value")).collect())
 }
   }
 
-  test("SPARK-30973: TRANSFORM should wait for the termination of the script 
(no serde)") {
+  testBasicInputDataTypesWith(hiveIOSchema, "hive serde")
+
+  test("SPARK-32106: TRANSFORM supports complex data types type (hive serde)") 
{
 assume(TestUtils.testCommandAvailable("/bin/bash"))
+withTempView("v") {
+  val df = Seq(
+(1, "1", Array(0, 1, 2), Map("a" -> 1)),
+(2, "2", Array(3, 4, 5), Map("b" -> 2)))
+.toDF("a", "b", "c", "d")
+  .select('a, 'b, 'c, 'd, struct('a, 'b).as("e"))
+  df.createTempView("v")
 
-val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
-val e = intercept[SparkException] {
-  val plan =
-new HiveScriptTransformationExec(
-  input = Seq(rowsDf.col("a").expr),
-  script = "some_non_existent_command",
-  output = Seq(AttributeReference("a", StringType)()),
-  child = rowsDf.queryExecution.sparkPlan,
-  ioschema = noSerdeIOSchema)
-  SparkPlanTest.executePlan(plan, hiveContext)
+  // Hive serde support ArrayType/MapType/StructType as input and output 
data type
+  checkAnswer(
+df,
+(child: SparkPlan) => createScriptTransformationExec(
+  input = Seq(
+df.col("c").expr,
+df.col("d").expr,
+df.col("e").expr),
+  script = "cat",
+  output = Seq(
+AttributeReference("c", ArrayType(IntegerType))(),
+AttributeReference("d", MapType(StringType, IntegerType))(),
+AttributeReference("e", StructType(
+  Seq(
+StructField("col1", IntegerType, false),
+StructField("col2", StringType, true()),
+  child = child,
+  ioschema = hiveIOSchema
+),
+df.select('c, 'd, 'e).collect())
 }
-assert(e.getMessage.contains("Subprocess exited with status"))
-assert(uncaughtExceptionHandler.exception.isEmpty)
   }
 
-  test("SPARK-30973: TRANSFORM should wait for the termination of the script 
(with serde)") {
+  test("SPARK-32106: TRANSFORM supports complex data types end to end (hive 
serde) ") {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


SparkQA commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662781194


   **[Test build #126369 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126369/testReport)**
 for PR 29196 at commit 
[`5b20b65`](https://github.com/apache/spark/commit/5b20b650bf974f0d47822930d0c386038e495ccc).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662781036







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459174654



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala
##
@@ -33,14 +32,13 @@ import org.apache.hadoop.hive.serde2.objectinspector._
 import org.apache.hadoop.io.Writable
 
 import org.apache.spark.TaskContext
-import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.hive.HiveInspectors
 import org.apache.spark.sql.hive.HiveShim._
-import org.apache.spark.sql.types.DataType
-import org.apache.spark.util.{CircularBuffer, RedirectThread, Utils}
+import org.apache.spark.sql.types.{DataType, StringType}

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662781049







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


SparkQA commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662780783


   **[Test build #126369 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126369/testReport)**
 for PR 29196 at commit 
[`5b20b65`](https://github.com/apache/spark/commit/5b20b650bf974f0d47822930d0c386038e495ccc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-22 Thread GitBox


SparkQA commented on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662780801


   **[Test build #126370 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/testReport)**
 for PR 29193 at commit 
[`e54f221`](https://github.com/apache/spark/commit/e54f22186a1f82ac9704f6644e057dadfc285b86).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


srowen commented on a change in pull request #29196:
URL: https://github.com/apache/spark/pull/29196#discussion_r459174022



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala
##
@@ -309,14 +311,18 @@ class HistoryServerSuite extends SparkFunSuite with 
BeforeAndAfter with Matchers
 
 val urlsThroughKnox = responseThroughKnox \\ "@href" map (_.toString)
 val siteRelativeLinksThroughKnox = urlsThroughKnox filter 
(_.startsWith("/"))
-all (siteRelativeLinksThroughKnox) should startWith (knoxBaseUrl)
+for (link <- siteRelativeLinksThroughKnox) {
+  link should startWith (knoxBaseUrl)
+}
 
 val directRequest = mock[HttpServletRequest]
 val directResponse = page.render(directRequest)
 
 val directUrls = directResponse \\ "@href" map (_.toString)
 val directSiteRelativeLinks = directUrls filter (_.startsWith("/"))
-all (directSiteRelativeLinks) should not startWith (knoxBaseUrl)
+for (link <- directSiteRelativeLinks) {

Review comment:
   Had to change up the scalatest syntax in some `all(...) should ...` 
cases to get it to keep working in 3.2.0+ ; should be equivalent

##
File path: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
##
@@ -727,7 +727,7 @@ class JavaSparkContext(val sc: SparkContext) extends 
Closeable {
* @note This does not necessarily mean the caching or computation was 
successful.
*/
   def getPersistentRDDs: JMap[java.lang.Integer, JavaRDD[_]] = {
-sc.getPersistentRDDs.mapValues(s => JavaRDD.fromRDD(s))
+sc.getPersistentRDDs.mapValues(s => JavaRDD.fromRDD(s)).toMap

Review comment:
   This was another test fix for 2.13 I picked up along the way

##
File path: pom.xml
##
@@ -892,7 +907,25 @@
   
 org.scalatest
 scalatest_${scala.binary.version}
-3.0.8
+3.2.0
+test
+  
+  
+org.scalatestplus
+
scalatestplus-scalacheck_${scala.binary.version}
+3.1.0.0-RC2

Review comment:
   These lib-specific integrations are apparently separate now, so are 
added to the test build.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-22 Thread GitBox


wypoon commented on pull request #29193:
URL: https://github.com/apache/spark/pull/29193#issuecomment-662780377


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen opened a new pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


srowen opened a new pull request #29196:
URL: https://github.com/apache/spark/pull/29196


   ### What changes were proposed in this pull request?
   
   Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the 
new location of scalatest classes.
   
   ### Why are the changes needed?
   
   3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, only affects tests.
   
   ### How was this patch tested?
   
   Existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459173720



##
File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql
##
@@ -0,0 +1,146 @@
+-- Test data.
+CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES
+('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), 
bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), 
date('2000-04-01')),
+('2', false, unhex('537061726B2053514C'), tinyint(2), 2,  smallint(200), 
bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), 
date('2000-04-02')),
+('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), 
bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), 
date('2000-04-03'))
+AS t(a, b, c, d, e, f, g, h, i, j, k, l);
+
+SELECT TRANSFORM(a)
+USING 'cat' AS (a)
+FROM t;
+
+-- with non-exist command
+SELECT TRANSFORM(a)
+USING 'some_non_existent_command' AS (a)
+FROM t;
+
+-- with non-exist file
+SELECT TRANSFORM(a)
+USING 'python some_non_existent_file' AS (a)
+FROM t;
+
+-- common supported data types between no serde and serde transform
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+USING 'cat' AS (
+a string,
+b boolean,
+c binary,
+d tinyint,
+e int,
+f smallint,
+g long,
+h float,
+i double,
+j decimal(38, 18),
+k timestamp,
+l date)
+FROM t
+) tmp;
+
+-- common supported data types between no serde and serde transform
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+USING 'cat' AS (
+a string,
+b string,
+c string,
+d string,
+e string,
+f string,
+g string,
+h string,
+i string,
+j string,
+k string,
+l string)
+FROM t
+) tmp;
+
+-- SPARK-32388 handle schema less
+SELECT TRANSFORM(a)
+USING 'cat'
+FROM t;
+
+SELECT TRANSFORM(a, b)
+USING 'cat'
+FROM t;
+
+SELECT TRANSFORM(a, b, c)
+USING 'cat'
+FROM t;
+
+-- return null when return string incompatible (no serde)
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i)
+USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g 
decimal(38, 18), h date, i timestamp)
+FROM VALUES
+('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 
00:00:') tmp(a, b, c, d, e, f, g, h, i);
+
+-- SPARK-28227: transform can't run with aggregation
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b;
+
+-- transform use MAP
+MAP a, b USING 'cat' AS (a, b) FROM t;
+
+-- transform use REDUCE
+REDUCE a, b USING 'cat' AS (a, b) FROM t;
+
+-- transform with defined row format delimit
+SELECT TRANSFORM(a, b, c, null)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY '|'
+LINES TERMINATED BY '\n'
+NULL DEFINED AS 'NULL'
+USING 'cat' AS (a, b, c, d)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY '|'
+LINES TERMINATED BY '\n'
+NULL DEFINED AS 'NULL'
+FROM t;
+
+
+SELECT TRANSFORM(a, b, c, null)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY '|'
+LINES TERMINATED BY '\n'
+NULL DEFINED AS 'NULL'
+USING 'cat' AS (d)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY '||'
+LINES TERMINATED BY '\n'
+NULL DEFINED AS 'NULL'
+FROM t;
+
+-- SPARK-31937 transform with defined row format delimit

Review comment:
   > This JIRA is related to this query? I read it though, I'm not sure 
about the relationship. What kind of exceptions does this query throws?
   
   Test for support Array/Map/Struct
   Remove now and add it in that pr:





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459173424



##
File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql
##
@@ -0,0 +1,146 @@
+-- Test data.
+CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES
+('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), 
bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), 
date('2000-04-01')),
+('2', false, unhex('537061726B2053514C'), tinyint(2), 2,  smallint(200), 
bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), 
date('2000-04-02')),
+('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), 
bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), 
date('2000-04-03'))
+AS t(a, b, c, d, e, f, g, h, i, j, k, l);
+
+SELECT TRANSFORM(a)
+USING 'cat' AS (a)
+FROM t;
+
+-- with non-exist command
+SELECT TRANSFORM(a)
+USING 'some_non_existent_command' AS (a)
+FROM t;
+
+-- with non-exist file
+SELECT TRANSFORM(a)
+USING 'python some_non_existent_file' AS (a)
+FROM t;
+
+-- common supported data types between no serde and serde transform
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+USING 'cat' AS (
+a string,
+b boolean,
+c binary,
+d tinyint,
+e int,
+f smallint,
+g long,
+h float,
+i double,
+j decimal(38, 18),
+k timestamp,
+l date)
+FROM t
+) tmp;
+
+-- common supported data types between no serde and serde transform
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+USING 'cat' AS (
+a string,
+b string,
+c string,
+d string,
+e string,
+f string,
+g string,
+h string,
+i string,
+j string,
+k string,
+l string)
+FROM t
+) tmp;
+
+-- SPARK-32388 handle schema less
+SELECT TRANSFORM(a)
+USING 'cat'
+FROM t;
+
+SELECT TRANSFORM(a, b)
+USING 'cat'
+FROM t;
+
+SELECT TRANSFORM(a, b, c)
+USING 'cat'
+FROM t;
+
+-- return null when return string incompatible (no serde)
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i)
+USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g 
decimal(38, 18), h date, i timestamp)
+FROM VALUES
+('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 
00:00:') tmp(a, b, c, d, e, f, g, h, i);
+
+-- SPARK-28227: transform can't run with aggregation
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b;
+
+-- transform use MAP
+MAP a, b USING 'cat' AS (a, b) FROM t;
+
+-- transform use REDUCE
+REDUCE a, b USING 'cat' AS (a, b) FROM t;
+
+-- transform with defined row format delimit
+SELECT TRANSFORM(a, b, c, null)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY '|'
+LINES TERMINATED BY '\n'
+NULL DEFINED AS 'NULL'
+USING 'cat' AS (a, b, c, d)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY '|'
+LINES TERMINATED BY '\n'
+NULL DEFINED AS 'NULL'
+FROM t;
+
+

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


maropu commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662779093


   `bpchar` is internally used in postgresql 
(https://www.postgresql.org/docs/current/typeconv-query.html), but users can 
use it in DDL;
   ```
   postgres=# create table t (v bpchar);
   CREATE TABLE
   ```
   So, I think its okay to add it for better interoperability.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459172282



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -744,8 +744,29 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 selectClause.hints.asScala.foldRight(withWindow)(withHints)
   }
 
+  // Decode and input/output format.
+  type Format = (Seq[(String, String)], Option[String], Seq[(String, String)], 
Option[String])

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-662778384







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-662778384







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


maropu commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662778415


   Could you add tests in `PostgresIntegrationSuite` then check if the test can 
pass on your local env? (Note: our testing framework, Jenkins, does not run it).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-662677438


   **[Test build #126359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126359/testReport)**
 for PR 29067 at commit 
[`e58783e`](https://github.com/apache/spark/commit/e58783ec28fc839e5559ca8c2888aa54d3f1f5de).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-22 Thread GitBox


SparkQA commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-662778108


   **[Test build #126359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126359/testReport)**
 for PR 29067 at commit 
[`e58783e`](https://github.com/apache/spark/commit/e58783ec28fc839e5559ca8c2888aa54d3f1f5de).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ColumnarToRowExec(child: SparkPlan) extends 
ColumnarToRowTransition with CodegenSupport `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


maropu commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459169944



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
   val provider = maybeV2Provider.get
   val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
 provider, df.sparkSession.sessionState.conf)
-  val options = sessionOptions ++ extraOptions
+  val options = sessionOptions ++ extraOptions.originalMap

Review comment:
   not `toMap` but `originalMap`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu closed pull request #29189: [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server

2020-07-22 Thread GitBox


maropu closed pull request #29189:
URL: https://github.com/apache/spark/pull/29189


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29189: [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server

2020-07-22 Thread GitBox


maropu commented on pull request #29189:
URL: https://github.com/apache/spark/pull/29189#issuecomment-662773741


   Thanks! Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29189: [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server

2020-07-22 Thread GitBox


maropu commented on a change in pull request #29189:
URL: https://github.com/apache/spark/pull/29189#discussion_r459166990



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
##
@@ -218,9 +218,7 @@ private[hive] class SparkExecuteStatementOperation(
   execute()
 }
   } catch {
-case e: HiveSQLException =>
-  setOperationException(e)
-  log.error("Error running hive query: ", e)

Review comment:
   Ah, I see. Nice catch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


SparkQA commented on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662772347


   **[Test build #126368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126368/testReport)**
 for PR 29056 at commit 
[`ffa0603`](https://github.com/apache/spark/commit/ffa0603b8690863a6b83bcade5b6d7a1a29680ab).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662772419







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662772419







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662770223


   **[Test build #126368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126368/testReport)**
 for PR 29056 at commit 
[`ffa0603`](https://github.com/apache/spark/commit/ffa0603b8690863a6b83bcade5b6d7a1a29680ab).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29168:
URL: https://github.com/apache/spark/pull/29168#issuecomment-662771853







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29168:
URL: https://github.com/apache/spark/pull/29168#issuecomment-662771853







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29168:
URL: https://github.com/apache/spark/pull/29168#issuecomment-662662506


   **[Test build #126356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126356/testReport)**
 for PR 29168 at commit 
[`5a64c63`](https://github.com/apache/spark/commit/5a64c638cf5bda64796bb7dc3c3b571100ef4060).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC

2020-07-22 Thread GitBox


SparkQA commented on pull request #29168:
URL: https://github.com/apache/spark/pull/29168#issuecomment-662771464


   **[Test build #126356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126356/testReport)**
 for PR 29168 at commit 
[`5a64c63`](https://github.com/apache/spark/commit/5a64c638cf5bda64796bb7dc3c3b571100ef4060).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


SparkQA commented on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662770223


   **[Test build #126368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126368/testReport)**
 for PR 29056 at commit 
[`ffa0603`](https://github.com/apache/spark/commit/ffa0603b8690863a6b83bcade5b6d7a1a29680ab).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-22 Thread GitBox


dongjoon-hyun commented on pull request #29062:
URL: https://github.com/apache/spark/pull/29062#issuecomment-662769652


   cc @peter-toth , too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


maropu commented on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662769125


   Note: If nobody has comments, this is just a doc improvement, so I will 
merge in a few days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662768893







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662768893







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-656144859


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-22 Thread GitBox


maropu commented on pull request #29056:
URL: https://github.com/apache/spark/pull/29056#issuecomment-662768621


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29152: [SPARK-32356][SQL] Forbid create view with null type

2020-07-22 Thread GitBox


maropu commented on pull request #29152:
URL: https://github.com/apache/spark/pull/29152#issuecomment-662767785


   Looks okay to me, but (this is trivial though) we might need to choose 
either action?;
- jira type: `improvement` -> `bug`, then backport this to 
`branch-3.0`/`branch-2.4`, or
- update the migration guide for the behaviour change
   
   cc: @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29159:
URL: https://github.com/apache/spark/pull/29159#issuecomment-662767378







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29159:
URL: https://github.com/apache/spark/pull/29159#issuecomment-662767378







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >