[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459189586 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ## @@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors { case DateType => dateTypeInfo case TimestampType => timestampTypeInfo case NullType => voidTypeInfo + case dt => +throw new AnalysisException("HiveInspectors does not support convert " + Review comment: > nit: `s"${dt.catalogString}" cannot be converted to Hive TypeInfo"` same reason like https://github.com/apache/spark/pull/29085#discussion_r459189448 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
maropu commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662794540 Looks almost okay now, so could you split this PR into pieces? I think its somewhat big fro reviews. For example; 1. More refactoring PR for `HiveScriptTransformationExec` and `BaseScriptTransfromationExec` just like SPARK-32105 2. PR to improve test coverage of `HiveScriptTransformationExec` 3. Then, PR to implement Spark-native TRRANSFORM WDTY? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459188140 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ## @@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors { case DateType => dateTypeInfo case TimestampType => timestampTypeInfo case NullType => voidTypeInfo + case dt => +throw new AnalysisException("HiveInspectors does not support convert " + Review comment: nit: `s"${dt.catalogString}" cannot be converted to Hive TypeInfo"` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459188225 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr => +val converter = CatalystTypeConverters.createToCatalystConverter(attr.dataType) +attr.dataType match { + case StringType => wrapperConvertException(data => data, converter) + case BooleanType => wrapperConvertException(data => data.toBoolean, converter) + case ByteType => wrapperConvertException(data => data.toByte, converter) + case BinaryType => +wrapperConvertException(data => UTF8String.fromString(data).getBytes, converter) + case IntegerType => wrapperConvertException(data => data.toInt, converter) + case ShortType => wrapperConvertException(data => data.toShort, converter) + case LongType => wrapperConvertException(data => data.toLong, converter) + case FloatType => wrapperConvertException(data => data.toFloat, converter) + case DoubleType => wrapperConvertException(data => data.toDouble, converter) + case _: DecimalType => wrapperConvertException(data => BigDecimal(data), converter) + case DateType if conf.datetimeJava8ApiEnabled => +wrapperConvertException(data => DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.daysToLocalDate).orNull, converter) + case DateType => wrapperConvertException(data => DateTimeUtils.stringToDate( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaDate).orNull, converter) + case TimestampType if conf.datetimeJava8ApiEnabled => + wrapperConvertException(data => DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.microsToInstant).orNull, converter) + case TimestampType => wrapperConvertException(data => DateTimeUtils.stringToTimestamp( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaTimestamp).orNull, converter) + case CalendarIntervalType => wrapperConvertException( +data => IntervalUtils.stringToInterval(UTF8String.fromString(data)), +converter) + case udt: UserDefinedType[_] => +wrapperConvertException(data => udt.deserialize(data), converter) + case dt => +throw new SparkException("TRANSFORM without serde does not support " + + s"${dt.getClass.getSimpleName} as output data type") Review comment: `dt.getClass.getSimpleName` -> `dt.catalogString` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p
Ngone51 commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459187090 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,79 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + // Only return rewrite attributes which could be used by the parent node. + // Otherwise, it could introduce duplicate rewrite attributes. For example, + // for the following plan, if we don't do filter for the `childAttrMapping`, + // the node `SubqueryAlias b` will return rewrite attribute of [kind#220 -> kind#228] + // (which is from the conflict plan `Project [id#218, foo AS kind#228]`), and the node + // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> kind#229] (which + // is from the conflict plan `Project [id#227, foo AS kind#229]`). As a result, the top + // Join will have duplicated rewrite attribute. + // + // The problem is, the plan `Join Inner, (kind#229 = kind#223)` shouldn't keep returning + // rewrite attribute of [kind#220 -> kind#229] to its parent node `Project [id#227]` as + // it doesn't really need it. + // + // Join Inner, (id#218 = id#227) + // :- SubqueryAlias b + // : +- Project [id#218, foo AS kind#228] + // : +- SubqueryAlias a + // :+- Project [1 AS id#218] + // : +- OneRowRelation + // +- SubqueryAlias c + //+- Project [id#227] + // +- Join Inner, (kind#229 = kind#223) + // :- SubqueryAlias l + // : +- SubqueryAlias b + // : +- Project [id#227, foo AS kind#229] + // :+- SubqueryAlias a + // : +- Project [1 AS id#227] + // : +- OneRowRelation + // +- SubqueryAlias r + // +- SubqueryAlias b + //+- Project [id#224, foo AS kind#223] + // +- SubqueryAlias a + // +- Project [1 AS id#224] + // +- OneRowRelation + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +(plan.references ++ plan.outputSet ++ plan.producedAttributes).contains(oldAttr) Review comment: Yes, they won't. I mean, you probably have a typo above: > Seems we need plan.outputSet too. IIUC, it sounds like you're actually meaning the ` plan.references` according to the context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459186587 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr => +val converter = CatalystTypeConverters.createToCatalystConverter(attr.dataType) +attr.dataType match { + case StringType => wrapperConvertException(data => data, converter) + case BooleanType => wrapperConvertException(data => data.toBoolean, converter) + case ByteType => wrapperConvertException(data => data.toByte, converter) + case BinaryType => +wrapperConvertException(data => UTF8String.fromString(data).getBytes, converter) + case IntegerType => wrapperConvertException(data => data.toInt, converter) + case ShortType => wrapperConvertException(data => data.toShort, converter) + case LongType => wrapperConvertException(data => data.toLong, converter) + case FloatType => wrapperConvertException(data => data.toFloat, converter) + case DoubleType => wrapperConvertException(data => data.toDouble, converter) + case _: DecimalType => wrapperConvertException(data => BigDecimal(data), converter) + case DateType if conf.datetimeJava8ApiEnabled => +wrapperConvertException(data => DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.daysToLocalDate).orNull, converter) + case DateType => wrapperConvertException(data => DateTimeUtils.stringToDate( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaDate).orNull, converter) + case TimestampType if conf.datetimeJava8ApiEnabled => + wrapperConvertException(data => DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.microsToInstant).orNull, converter) + case TimestampType => wrapperConvertException(data => DateTimeUtils.stringToTimestamp( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaTimestamp).orNull, converter) + case CalendarIntervalType => wrapperConvertException( +data => IntervalUtils.stringToInterval(UTF8String.fromString(data)), +converter) + case udt: UserDefinedType[_] => +wrapperConvertException(data => udt.deserialize(data), converter) + case dt => +throw new SparkException("TRANSFORM without serde does not support " + Review comment: nit: `TRANSFORM` -> `s"$nodeName...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict pl
viirya commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459186345 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,79 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + // Only return rewrite attributes which could be used by the parent node. + // Otherwise, it could introduce duplicate rewrite attributes. For example, + // for the following plan, if we don't do filter for the `childAttrMapping`, + // the node `SubqueryAlias b` will return rewrite attribute of [kind#220 -> kind#228] + // (which is from the conflict plan `Project [id#218, foo AS kind#228]`), and the node + // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> kind#229] (which + // is from the conflict plan `Project [id#227, foo AS kind#229]`). As a result, the top + // Join will have duplicated rewrite attribute. + // + // The problem is, the plan `Join Inner, (kind#229 = kind#223)` shouldn't keep returning + // rewrite attribute of [kind#220 -> kind#229] to its parent node `Project [id#227]` as + // it doesn't really need it. + // + // Join Inner, (id#218 = id#227) + // :- SubqueryAlias b + // : +- Project [id#218, foo AS kind#228] + // : +- SubqueryAlias a + // :+- Project [1 AS id#218] + // : +- OneRowRelation + // +- SubqueryAlias c + //+- Project [id#227] + // +- Join Inner, (kind#229 = kind#223) + // :- SubqueryAlias l + // : +- SubqueryAlias b + // : +- Project [id#227, foo AS kind#229] + // :+- SubqueryAlias a + // : +- Project [1 AS id#227] + // : +- OneRowRelation + // +- SubqueryAlias r + // +- SubqueryAlias b + //+- Project [id#224, foo AS kind#223] + // +- SubqueryAlias a + // +- Project [1 AS id#224] + // +- OneRowRelation + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +(plan.references ++ plan.outputSet ++ plan.producedAttributes).contains(oldAttr) Review comment: `plan.references` won't be the output, right? I guess they won't conflict with others? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict
AmplabJenkins removed a comment on pull request #29166: URL: https://github.com/apache/spark/pull/29166#issuecomment-662791663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
AmplabJenkins removed a comment on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662791661 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
AmplabJenkins commented on pull request #29166: URL: https://github.com/apache/spark/pull/29166#issuecomment-662791663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
AmplabJenkins commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662791661 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
SparkQA commented on pull request #29166: URL: https://github.com/apache/spark/pull/29166#issuecomment-662791379 **[Test build #126375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126375/testReport)** for PR 29166 at commit [`f73f186`](https://github.com/apache/spark/commit/f73f1861b4e76182cedaaf94ddd5061c1cf48dc6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
SparkQA commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662791349 **[Test build #126374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126374/testReport)** for PR 29192 at commit [`1000377`](https://github.com/apache/spark/commit/100037d580766ec2db4ffc230dae045838e0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p
Ngone51 commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459185548 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,79 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + // Only return rewrite attributes which could be used by the parent node. + // Otherwise, it could introduce duplicate rewrite attributes. For example, + // for the following plan, if we don't do filter for the `childAttrMapping`, + // the node `SubqueryAlias b` will return rewrite attribute of [kind#220 -> kind#228] + // (which is from the conflict plan `Project [id#218, foo AS kind#228]`), and the node + // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> kind#229] (which + // is from the conflict plan `Project [id#227, foo AS kind#229]`). As a result, the top + // Join will have duplicated rewrite attribute. + // + // The problem is, the plan `Join Inner, (kind#229 = kind#223)` shouldn't keep returning + // rewrite attribute of [kind#220 -> kind#229] to its parent node `Project [id#227]` as + // it doesn't really need it. + // + // Join Inner, (id#218 = id#227) + // :- SubqueryAlias b + // : +- Project [id#218, foo AS kind#228] + // : +- SubqueryAlias a + // :+- Project [1 AS id#218] + // : +- OneRowRelation + // +- SubqueryAlias c + //+- Project [id#227] + // +- Join Inner, (kind#229 = kind#223) + // :- SubqueryAlias l + // : +- SubqueryAlias b + // : +- Project [id#227, foo AS kind#229] + // :+- SubqueryAlias a + // : +- Project [1 AS id#227] + // : +- OneRowRelation + // +- SubqueryAlias r + // +- SubqueryAlias b + //+- Project [id#224, foo AS kind#223] + // +- SubqueryAlias a + // +- Project [1 AS id#224] + // +- OneRowRelation + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +(plan.references ++ plan.outputSet ++ plan.producedAttributes).contains(oldAttr) Review comment: > Seems we need plan.outputSet too. I think it is similar to case of aliases in projection above you added. Yeah, "aliases in projection" is another case. And I assume you were referring `plan.references` too. @viirya This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p
Ngone51 commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459185098 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,79 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + // Only return rewrite attributes which could be used by the parent node. + // Otherwise, it could introduce duplicate rewrite attributes. For example, + // for the following plan, if we don't do filter for the `childAttrMapping`, + // the node `SubqueryAlias b` will return rewrite attribute of [kind#220 -> kind#228] Review comment: I've changed to the short comment as you mentioned above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
HyukjinKwon commented on a change in pull request #29188: URL: https://github.com/apache/spark/pull/29188#discussion_r459184480 ## File path: dev/requirements.txt ## @@ -3,3 +3,4 @@ jira==1.0.3 PyGithub==1.26.0 Unidecode==0.04.19 sphinx +pydata_sphinx_theme Review comment: I thought about this for a while actually especially given the discussion we had before. `pydata_sphinx_theme` is actually relatively very new and it has [frequent releases](https://github.com/pandas-dev/pydata-sphinx-theme/tags). I would like to not pin the versions for now to promote dev people to keep it up to date.. maybe we could see how it goes if that sounds fine to you as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
maropu edited a comment on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662779093 `bpchar` is internally used in postgresql [1][2], but users can use it in DDL; - [1] https://www.postgresql.org/docs/current/datatype-character.html - [2] https://www.postgresql.org/docs/current/typeconv-query.html ``` postgres=# create table t (v bpchar); CREATE TABLE ``` So, I think its okay to add it for better interoperability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
AmplabJenkins removed a comment on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662557759 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
holdenk commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-662790280 Cool I’ll try merge this after dinner, thanks everyone for working on and reviewing this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662790096 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
maropu commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662790228 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662790096 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
HyukjinKwon commented on a change in pull request #29188: URL: https://github.com/apache/spark/pull/29188#discussion_r459184480 ## File path: dev/requirements.txt ## @@ -3,3 +3,4 @@ jira==1.0.3 PyGithub==1.26.0 Unidecode==0.04.19 sphinx +pydata_sphinx_theme Review comment: I thought about this for a while actually especially given the discussion we had before. `pydata_sphinx_theme` is actually relatively very new and it has [frequent releases](https://github.com/pandas-dev/pydata-sphinx-theme/tags). I would like to not pin the versions for now and see how it goes if that sounds fine to you as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662789794 **[Test build #126373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126373/testReport)** for PR 29085 at commit [`7f3cff8`](https://github.com/apache/spark/commit/7f3cff81d6238d8de6ea2c57dc94eed5abe1ff75). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
agrawaldevesh commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-662789786 @holdenk... this is ready to go in. Thanks ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
HyukjinKwon commented on pull request #29188: URL: https://github.com/apache/spark/pull/29188#issuecomment-662789431 > I wonder if there is some non-hacky way to organize functions into logical groups, similarly to what ScalaDoc does. I tried hard but looked difficult to do. I will take a look one more time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662788634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662788634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459182797 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala ## @@ -1031,4 +1031,96 @@ class PlanParserSuite extends AnalysisTest { assertEqual("select a, b from db.c;;;", table("db", "c").select('a, 'b)) assertEqual("select a, b from db.c; ;; ;", table("db", "c").select('a, 'b)) } + + test("SPARK-32106: TRANSFORM without serde") { Review comment: > Also, could you check `ROW FORMAT SERDE`, too? Add UT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662788394 **[Test build #126372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126372/testReport)** for PR 29085 at commit [`be80c27`](https://github.com/apache/spark/commit/be80c27557f614074ac50e2674180a778c0208ab). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662787840 All done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459182038 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala ## @@ -54,87 +52,110 @@ case class HiveScriptTransformationExec( script: String, output: Seq[Attribute], child: SparkPlan, -ioschema: HiveScriptIOSchema) - extends BaseScriptTransformationExec { +ioschema: ScriptTransformationIOSchema) + extends BaseScriptTransformationExec with HiveInspectors { - override def processIterator( - inputIterator: Iterator[InternalRow], - hadoopConf: Configuration): Iterator[InternalRow] = { -val cmd = List("/bin/bash", "-c", script) -val builder = new ProcessBuilder(cmd.asJava) - -val proc = builder.start() -val inputStream = proc.getInputStream -val outputStream = proc.getOutputStream -val errorStream = proc.getErrorStream - -// In order to avoid deadlocks, we need to consume the error output of the child process. -// To avoid issues caused by large error output, we use a circular buffer to limit the amount -// of error output that we retain. See SPARK-7862 for more discussion of the deadlock / hang -// that motivates this. -val stderrBuffer = new CircularBuffer(2048) -new RedirectThread( - errorStream, - stderrBuffer, - "Thread-ScriptTransformation-STDERR-Consumer").start() + private def initInputSerDe( + input: Seq[Expression]): Option[(AbstractSerDe, StructObjectInspector)] = { +ioschema.inputSerdeClass.map { serdeClass => + val (columns, columnTypes) = parseAttrs(input) + val serde = initSerDe(serdeClass, columns, columnTypes, ioschema.inputSerdeProps) + val fieldObjectInspectors = columnTypes.map(toInspector) + val objectInspector = ObjectInspectorFactory +.getStandardStructObjectInspector(columns.asJava, fieldObjectInspectors.asJava) + (serde, objectInspector) +} + } -val outputProjection = new InterpretedProjection(input, child.output) + private def initOutputSerDe( + output: Seq[Attribute]): Option[(AbstractSerDe, StructObjectInspector)] = { +ioschema.outputSerdeClass.map { serdeClass => + val (columns, columnTypes) = parseAttrs(output) + val serde = initSerDe(serdeClass, columns, columnTypes, ioschema.outputSerdeProps) + val structObjectInspector = serde.getObjectInspector().asInstanceOf[StructObjectInspector] + (serde, structObjectInspector) +} + } -// This nullability is a performance optimization in order to avoid an Option.foreach() call -// inside of a loop -@Nullable val (inputSerde, inputSoi) = ioschema.initInputSerDe(input).getOrElse((null, null)) + private def parseAttrs(attrs: Seq[Expression]): (Seq[String], Seq[DataType]) = { +val columns = attrs.zipWithIndex.map(e => s"${e._1.prettyName}_${e._2}") +val columnTypes = attrs.map(_.dataType) +(columns, columnTypes) + } -// This new thread will consume the ScriptTransformation's input rows and write them to the -// external process. That process's output will be read by this current thread. -val writerThread = new HiveScriptTransformationWriterThread( - inputIterator.map(outputProjection), - input.map(_.dataType), - inputSerde, - inputSoi, - ioschema, - outputStream, - proc, - stderrBuffer, - TaskContext.get(), - hadoopConf -) + private def initSerDe( Review comment: > Sorry for the confusion, but, on second thought, its better to pull out hive-serde related functions from `HiveScriptTransformationExec` then create a companion object having them for readability [maropu@972775b](https://github.com/maropu/spark/commit/972775b821406d81d3c1ba1c718de3037a0ca068). WDTY? Agree, make ScripTransformExec only handle data process. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.
AmplabJenkins removed a comment on pull request #29194: URL: https://github.com/apache/spark/pull/29194#issuecomment-662787285 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.
AmplabJenkins commented on pull request #29194: URL: https://github.com/apache/spark/pull/29194#issuecomment-662787285 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.
SparkQA removed a comment on pull request #29194: URL: https://github.com/apache/spark/pull/29194#issuecomment-662702681 **[Test build #126361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126361/testReport)** for PR 29194 at commit [`f9b8691`](https://github.com/apache/spark/commit/f9b8691aee331f214392fdcf108e6ad2cfa6799e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.
SparkQA commented on pull request #29194: URL: https://github.com/apache/spark/pull/29194#issuecomment-662786999 **[Test build #126361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126361/testReport)** for PR 29194 at commit [`f9b8691`](https://github.com/apache/spark/commit/f9b8691aee331f214392fdcf108e6ad2cfa6799e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins removed a comment on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662785780 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662785780 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict p
Ngone51 commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459179732 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,79 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + // Only return rewrite attributes which could be used by the parent node. + // Otherwise, it could introduce duplicate rewrite attributes. For example, + // for the following plan, if we don't do filter for the `childAttrMapping`, + // the node `SubqueryAlias b` will return rewrite attribute of [kind#220 -> kind#228] + // (which is from the conflict plan `Project [id#218, foo AS kind#228]`), and the node + // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> kind#229] (which + // is from the conflict plan `Project [id#227, foo AS kind#229]`). As a result, the top + // Join will have duplicated rewrite attribute. + // + // The problem is, the plan `Join Inner, (kind#229 = kind#223)` shouldn't keep returning + // rewrite attribute of [kind#220 -> kind#229] to its parent node `Project [id#227]` as + // it doesn't really need it. + // + // Join Inner, (id#218 = id#227) + // :- SubqueryAlias b + // : +- Project [id#218, foo AS kind#228] + // : +- SubqueryAlias a + // :+- Project [1 AS id#218] + // : +- OneRowRelation + // +- SubqueryAlias c + //+- Project [id#227] + // +- Join Inner, (kind#229 = kind#223) + // :- SubqueryAlias l + // : +- SubqueryAlias b + // : +- Project [id#227, foo AS kind#229] + // :+- SubqueryAlias a + // : +- Project [1 AS id#227] + // : +- OneRowRelation + // +- SubqueryAlias r + // +- SubqueryAlias b + //+- Project [id#224, foo AS kind#223] + // +- SubqueryAlias a + // +- Project [1 AS id#224] + // +- OneRowRelation + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +(plan.references ++ plan.outputSet ++ plan.producedAttributes).contains(oldAttr) Review comment: `plan.references` is also necessary? For example, Join's condition references the attributes but its output doesn't contain the attribute. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
SparkQA commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662785553 **[Test build #126371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126371/testReport)** for PR 29196 at commit [`9c12088`](https://github.com/apache/spark/commit/9c120880b35f1ddeaec6161481380c0b3e601eea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459178460 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -532,6 +532,21 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } } + object SparkScripts extends Strategy { +def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { + case logical.ScriptTransformation(input, script, output, child, ioschema) +if ioschema.inputSerdeClass.isEmpty && ioschema.outputSerdeClass.isEmpty => Review comment: > We need to check this here? Seems like it has been checked in https://github.com/apache/spark/pull/29085/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R783-R784 ? Yea, don't need now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
maropu edited a comment on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662784282 btw, thanks for your first contribution, @kujon ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
maropu commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662784282 btw, thanks for your contribution, @kujon ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity
AmplabJenkins removed a comment on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-662782378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity
AmplabJenkins commented on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-662782378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity
SparkQA removed a comment on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-662767092 **[Test build #126367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126367/testReport)** for PR 29159 at commit [`52dd9b4`](https://github.com/apache/spark/commit/52dd9b4cf03381134a181df817b98daeb30e4616). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity
SparkQA commented on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-662782139 **[Test build #126367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126367/testReport)** for PR 29159 at commit [`52dd9b4`](https://github.com/apache/spark/commit/52dd9b4cf03381134a181df817b98daeb30e4616). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor
AmplabJenkins removed a comment on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662781417 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126370/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459175410 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ## @@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors { case DateType => dateTypeInfo case TimestampType => timestampTypeInfo case NullType => voidTypeInfo + case dt => +throw new AnalysisException("TRANSFORM with hive serde does not support " + Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
AmplabJenkins commented on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662781414 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins removed a comment on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662781211 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126369/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lo
SparkQA removed a comment on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662780801 **[Test build #126370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/testReport)** for PR 29193 at commit [`e54f221`](https://github.com/apache/spark/commit/e54f22186a1f82ac9704f6644e057dadfc285b86). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor
AmplabJenkins removed a comment on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662781414 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
SparkQA commented on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662781403 **[Test build #126370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/testReport)** for PR 29193 at commit [`e54f221`](https://github.com/apache/spark/commit/e54f22186a1f82ac9704f6644e057dadfc285b86). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
SparkQA removed a comment on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662780783 **[Test build #126369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126369/testReport)** for PR 29196 at commit [`5b20b65`](https://github.com/apache/spark/commit/5b20b650bf974f0d47822930d0c386038e495ccc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459175002 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkScriptTransformationSuite.scala ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.{SparkException, TestUtils} +import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} +import org.apache.spark.sql.catalyst.parser.ParseException +import org.apache.spark.sql.test.SharedSparkSession + +class SparkScriptTransformationSuite extends BaseScriptTransformationSuite with SharedSparkSession { + import testImplicits._ + + override def isHive23OrSpark: Boolean = true + + override def createScriptTransformationExec( + input: Seq[Expression], + script: String, + output: Seq[Attribute], + child: SparkPlan, + ioschema: ScriptTransformationIOSchema): BaseScriptTransformationExec = { +SparkScriptTransformationExec( + input = input, + script = script, + output = output, + child = child, + ioschema = ioschema +) + } + + test("SPARK-32106: TRANSFORM with serde without hive should throw exception") { +assume(TestUtils.testCommandAvailable("/bin/bash")) +withTempView("v") { + val df = Seq("a", "b", "c").map(Tuple1.apply).toDF("a") + df.createTempView("v") + + val e = intercept[ParseException] { +sql( + """ +|SELECT TRANSFORM (a) +|ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' +|USING 'cat' AS (a) +|ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' +|FROM v + """.stripMargin) + }.getMessage + assert(e.contains("TRANSFORM with serde is only supported in hive mode")) +} + } + + test("TRANSFORM doesn't support ArrayType/MapType/StructType as output data type (no serde)") { Review comment: DOne This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins removed a comment on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662781049 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor
AmplabJenkins removed a comment on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662781036 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662781205 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459174831 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala ## @@ -206,75 +169,147 @@ class HiveScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with val query = sql( s""" - |SELECT - |TRANSFORM(a, b, c, d, e) - |USING 'python $scriptFilePath' AS (a, b, c, d, e) - |FROM v + |SELECT TRANSFORM(a, b, c, d, e) + |USING 'python ${scriptFilePath}' + |FROM v """.stripMargin) - // In Hive 1.2, the string representation of a decimal omits trailing zeroes. - // But in Hive 2.3, it is always padded to 18 digits with trailing zeroes if necessary. - val decimalToString: Column => Column = if (HiveUtils.isHive23) { -c => c.cast("string") - } else { -c => c.cast("decimal(1, 0)").cast("string") - } - checkAnswer(query, identity, df.select( -'a.cast("string"), -'b.cast("string"), -'c.cast("string"), -decimalToString('d), -'e.cast("string")).collect()) + // In hive default serde mode, if we don't define output schema, it will choose first + // two column as output schema (key: String, value: String) + checkAnswer( +query, +identity, +df.select( + 'a.cast("string").as("key"), + 'b.cast("string").as("value")).collect()) } } - test("SPARK-30973: TRANSFORM should wait for the termination of the script (no serde)") { + testBasicInputDataTypesWith(hiveIOSchema, "hive serde") + + test("SPARK-32106: TRANSFORM supports complex data types type (hive serde)") { assume(TestUtils.testCommandAvailable("/bin/bash")) +withTempView("v") { + val df = Seq( +(1, "1", Array(0, 1, 2), Map("a" -> 1)), +(2, "2", Array(3, 4, 5), Map("b" -> 2))) +.toDF("a", "b", "c", "d") + .select('a, 'b, 'c, 'd, struct('a, 'b).as("e")) + df.createTempView("v") -val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a") -val e = intercept[SparkException] { - val plan = -new HiveScriptTransformationExec( - input = Seq(rowsDf.col("a").expr), - script = "some_non_existent_command", - output = Seq(AttributeReference("a", StringType)()), - child = rowsDf.queryExecution.sparkPlan, - ioschema = noSerdeIOSchema) - SparkPlanTest.executePlan(plan, hiveContext) + // Hive serde support ArrayType/MapType/StructType as input and output data type + checkAnswer( +df, +(child: SparkPlan) => createScriptTransformationExec( + input = Seq( +df.col("c").expr, +df.col("d").expr, +df.col("e").expr), + script = "cat", + output = Seq( +AttributeReference("c", ArrayType(IntegerType))(), +AttributeReference("d", MapType(StringType, IntegerType))(), +AttributeReference("e", StructType( + Seq( +StructField("col1", IntegerType, false), +StructField("col2", StringType, true()), + child = child, + ioschema = hiveIOSchema +), +df.select('c, 'd, 'e).collect()) } -assert(e.getMessage.contains("Subprocess exited with status")) -assert(uncaughtExceptionHandler.exception.isEmpty) } - test("SPARK-30973: TRANSFORM should wait for the termination of the script (with serde)") { + test("SPARK-32106: TRANSFORM supports complex data types end to end (hive serde) ") { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
SparkQA commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662781194 **[Test build #126369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126369/testReport)** for PR 29196 at commit [`5b20b65`](https://github.com/apache/spark/commit/5b20b650bf974f0d47822930d0c386038e495ccc). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
AmplabJenkins commented on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662781036 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459174654 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala ## @@ -33,14 +32,13 @@ import org.apache.hadoop.hive.serde2.objectinspector._ import org.apache.hadoop.io.Writable import org.apache.spark.TaskContext -import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema import org.apache.spark.sql.execution._ import org.apache.spark.sql.hive.HiveInspectors import org.apache.spark.sql.hive.HiveShim._ -import org.apache.spark.sql.types.DataType -import org.apache.spark.util.{CircularBuffer, RedirectThread, Utils} +import org.apache.spark.sql.types.{DataType, StringType} Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662781049 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
SparkQA commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662780783 **[Test build #126369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126369/testReport)** for PR 29196 at commit [`5b20b65`](https://github.com/apache/spark/commit/5b20b650bf974f0d47822930d0c386038e495ccc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
SparkQA commented on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662780801 **[Test build #126370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/testReport)** for PR 29193 at commit [`e54f221`](https://github.com/apache/spark/commit/e54f22186a1f82ac9704f6644e057dadfc285b86). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
srowen commented on a change in pull request #29196: URL: https://github.com/apache/spark/pull/29196#discussion_r459174022 ## File path: core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala ## @@ -309,14 +311,18 @@ class HistoryServerSuite extends SparkFunSuite with BeforeAndAfter with Matchers val urlsThroughKnox = responseThroughKnox \\ "@href" map (_.toString) val siteRelativeLinksThroughKnox = urlsThroughKnox filter (_.startsWith("/")) -all (siteRelativeLinksThroughKnox) should startWith (knoxBaseUrl) +for (link <- siteRelativeLinksThroughKnox) { + link should startWith (knoxBaseUrl) +} val directRequest = mock[HttpServletRequest] val directResponse = page.render(directRequest) val directUrls = directResponse \\ "@href" map (_.toString) val directSiteRelativeLinks = directUrls filter (_.startsWith("/")) -all (directSiteRelativeLinks) should not startWith (knoxBaseUrl) +for (link <- directSiteRelativeLinks) { Review comment: Had to change up the scalatest syntax in some `all(...) should ...` cases to get it to keep working in 3.2.0+ ; should be equivalent ## File path: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala ## @@ -727,7 +727,7 @@ class JavaSparkContext(val sc: SparkContext) extends Closeable { * @note This does not necessarily mean the caching or computation was successful. */ def getPersistentRDDs: JMap[java.lang.Integer, JavaRDD[_]] = { -sc.getPersistentRDDs.mapValues(s => JavaRDD.fromRDD(s)) +sc.getPersistentRDDs.mapValues(s => JavaRDD.fromRDD(s)).toMap Review comment: This was another test fix for 2.13 I picked up along the way ## File path: pom.xml ## @@ -892,7 +907,25 @@ org.scalatest scalatest_${scala.binary.version} -3.0.8 +3.2.0 +test + + +org.scalatestplus + scalatestplus-scalacheck_${scala.binary.version} +3.1.0.0-RC2 Review comment: These lib-specific integrations are apparently separate now, so are added to the test build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on pull request #29193: [SPARK-32003][CORE][3.0] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
wypoon commented on pull request #29193: URL: https://github.com/apache/spark/pull/29193#issuecomment-662780377 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen opened a new pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
srowen opened a new pull request #29196: URL: https://github.com/apache/spark/pull/29196 ### What changes were proposed in this pull request? Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes. ### Why are the changes needed? 3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility. ### Does this PR introduce _any_ user-facing change? No, only affects tests. ### How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459173720 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -0,0 +1,146 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES +('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), date('2000-04-01')), +('2', false, unhex('537061726B2053514C'), tinyint(2), 2, smallint(200), bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), date('2000-04-02')), +('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), date('2000-04-03')) +AS t(a, b, c, d, e, f, g, h, i, j, k, l); + +SELECT TRANSFORM(a) +USING 'cat' AS (a) +FROM t; + +-- with non-exist command +SELECT TRANSFORM(a) +USING 'some_non_existent_command' AS (a) +FROM t; + +-- with non-exist file +SELECT TRANSFORM(a) +USING 'python some_non_existent_file' AS (a) +FROM t; + +-- common supported data types between no serde and serde transform +SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( +SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l) +USING 'cat' AS ( +a string, +b boolean, +c binary, +d tinyint, +e int, +f smallint, +g long, +h float, +i double, +j decimal(38, 18), +k timestamp, +l date) +FROM t +) tmp; + +-- common supported data types between no serde and serde transform +SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( +SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l) +USING 'cat' AS ( +a string, +b string, +c string, +d string, +e string, +f string, +g string, +h string, +i string, +j string, +k string, +l string) +FROM t +) tmp; + +-- SPARK-32388 handle schema less +SELECT TRANSFORM(a) +USING 'cat' +FROM t; + +SELECT TRANSFORM(a, b) +USING 'cat' +FROM t; + +SELECT TRANSFORM(a, b, c) +USING 'cat' +FROM t; + +-- return null when return string incompatible (no serde) +SELECT TRANSFORM(a, b, c, d, e, f, g, h, i) +USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g decimal(38, 18), h date, i timestamp) +FROM VALUES +('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 00:00:') tmp(a, b, c, d, e, f, g, h, i); + +-- SPARK-28227: transform can't run with aggregation +SELECT TRANSFORM(b, max(a), sum(f)) +USING 'cat' AS (a, b) +FROM t +GROUP BY b; + +-- transform use MAP +MAP a, b USING 'cat' AS (a, b) FROM t; + +-- transform use REDUCE +REDUCE a, b USING 'cat' AS (a, b) FROM t; + +-- transform with defined row format delimit +SELECT TRANSFORM(a, b, c, null) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY '|' +LINES TERMINATED BY '\n' +NULL DEFINED AS 'NULL' +USING 'cat' AS (a, b, c, d) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY '|' +LINES TERMINATED BY '\n' +NULL DEFINED AS 'NULL' +FROM t; + + +SELECT TRANSFORM(a, b, c, null) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY '|' +LINES TERMINATED BY '\n' +NULL DEFINED AS 'NULL' +USING 'cat' AS (d) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY '||' +LINES TERMINATED BY '\n' +NULL DEFINED AS 'NULL' +FROM t; + +-- SPARK-31937 transform with defined row format delimit Review comment: > This JIRA is related to this query? I read it though, I'm not sure about the relationship. What kind of exceptions does this query throws? Test for support Array/Map/Struct Remove now and add it in that pr: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459173424 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -0,0 +1,146 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES +('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), date('2000-04-01')), +('2', false, unhex('537061726B2053514C'), tinyint(2), 2, smallint(200), bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), date('2000-04-02')), +('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), date('2000-04-03')) +AS t(a, b, c, d, e, f, g, h, i, j, k, l); + +SELECT TRANSFORM(a) +USING 'cat' AS (a) +FROM t; + +-- with non-exist command +SELECT TRANSFORM(a) +USING 'some_non_existent_command' AS (a) +FROM t; + +-- with non-exist file +SELECT TRANSFORM(a) +USING 'python some_non_existent_file' AS (a) +FROM t; + +-- common supported data types between no serde and serde transform +SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( +SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l) +USING 'cat' AS ( +a string, +b boolean, +c binary, +d tinyint, +e int, +f smallint, +g long, +h float, +i double, +j decimal(38, 18), +k timestamp, +l date) +FROM t +) tmp; + +-- common supported data types between no serde and serde transform +SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( +SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l) +USING 'cat' AS ( +a string, +b string, +c string, +d string, +e string, +f string, +g string, +h string, +i string, +j string, +k string, +l string) +FROM t +) tmp; + +-- SPARK-32388 handle schema less +SELECT TRANSFORM(a) +USING 'cat' +FROM t; + +SELECT TRANSFORM(a, b) +USING 'cat' +FROM t; + +SELECT TRANSFORM(a, b, c) +USING 'cat' +FROM t; + +-- return null when return string incompatible (no serde) +SELECT TRANSFORM(a, b, c, d, e, f, g, h, i) +USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g decimal(38, 18), h date, i timestamp) +FROM VALUES +('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 00:00:') tmp(a, b, c, d, e, f, g, h, i); + +-- SPARK-28227: transform can't run with aggregation +SELECT TRANSFORM(b, max(a), sum(f)) +USING 'cat' AS (a, b) +FROM t +GROUP BY b; + +-- transform use MAP +MAP a, b USING 'cat' AS (a, b) FROM t; + +-- transform use REDUCE +REDUCE a, b USING 'cat' AS (a, b) FROM t; + +-- transform with defined row format delimit +SELECT TRANSFORM(a, b, c, null) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY '|' +LINES TERMINATED BY '\n' +NULL DEFINED AS 'NULL' +USING 'cat' AS (a, b, c, d) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY '|' +LINES TERMINATED BY '\n' +NULL DEFINED AS 'NULL' +FROM t; + + Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
maropu commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662779093 `bpchar` is internally used in postgresql (https://www.postgresql.org/docs/current/typeconv-query.html), but users can use it in DDL; ``` postgres=# create table t (v bpchar); CREATE TABLE ``` So, I think its okay to add it for better interoperability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459172282 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -744,8 +744,29 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging selectClause.hints.asScala.foldRight(withWindow)(withHints) } + // Decode and input/output format. + type Format = (Seq[(String, String)], Option[String], Seq[(String, String)], Option[String]) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
AmplabJenkins removed a comment on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-662778384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
AmplabJenkins commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-662778384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
maropu commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662778415 Could you add tests in `PostgresIntegrationSuite` then check if the test can pass on your local env? (Note: our testing framework, Jenkins, does not run it). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
SparkQA removed a comment on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-662677438 **[Test build #126359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126359/testReport)** for PR 29067 at commit [`e58783e`](https://github.com/apache/spark/commit/e58783ec28fc839e5559ca8c2888aa54d3f1f5de). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
SparkQA commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-662778108 **[Test build #126359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126359/testReport)** for PR 29067 at commit [`e58783e`](https://github.com/apache/spark/commit/e58783ec28fc839e5559ca8c2888aa54d3f1f5de). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ColumnarToRowExec(child: SparkPlan) extends ColumnarToRowTransition with CodegenSupport ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
maropu commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459169944 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val provider = maybeV2Provider.get val sessionOptions = DataSourceV2Utils.extractSessionConfigs( provider, df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions + val options = sessionOptions ++ extraOptions.originalMap Review comment: not `toMap` but `originalMap`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #29189: [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server
maropu closed pull request #29189: URL: https://github.com/apache/spark/pull/29189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29189: [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server
maropu commented on pull request #29189: URL: https://github.com/apache/spark/pull/29189#issuecomment-662773741 Thanks! Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29189: [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server
maropu commented on a change in pull request #29189: URL: https://github.com/apache/spark/pull/29189#discussion_r459166990 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -218,9 +218,7 @@ private[hive] class SparkExecuteStatementOperation( execute() } } catch { -case e: HiveSQLException => - setOperationException(e) - log.error("Error running hive query: ", e) Review comment: Ah, I see. Nice catch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
SparkQA commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662772347 **[Test build #126368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126368/testReport)** for PR 29056 at commit [`ffa0603`](https://github.com/apache/spark/commit/ffa0603b8690863a6b83bcade5b6d7a1a29680ab). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
AmplabJenkins commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662772419 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
AmplabJenkins removed a comment on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662772419 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
SparkQA removed a comment on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662770223 **[Test build #126368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126368/testReport)** for PR 29056 at commit [`ffa0603`](https://github.com/apache/spark/commit/ffa0603b8690863a6b83bcade5b6d7a1a29680ab). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC
AmplabJenkins commented on pull request #29168: URL: https://github.com/apache/spark/pull/29168#issuecomment-662771853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC
AmplabJenkins removed a comment on pull request #29168: URL: https://github.com/apache/spark/pull/29168#issuecomment-662771853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC
SparkQA removed a comment on pull request #29168: URL: https://github.com/apache/spark/pull/29168#issuecomment-662662506 **[Test build #126356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126356/testReport)** for PR 29168 at commit [`5a64c63`](https://github.com/apache/spark/commit/5a64c638cf5bda64796bb7dc3c3b571100ef4060). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29168: [WIP][SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC
SparkQA commented on pull request #29168: URL: https://github.com/apache/spark/pull/29168#issuecomment-662771464 **[Test build #126356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126356/testReport)** for PR 29168 at commit [`5a64c63`](https://github.com/apache/spark/commit/5a64c638cf5bda64796bb7dc3c3b571100ef4060). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
SparkQA commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662770223 **[Test build #126368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126368/testReport)** for PR 29056 at commit [`ffa0603`](https://github.com/apache/spark/commit/ffa0603b8690863a6b83bcade5b6d7a1a29680ab). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
dongjoon-hyun commented on pull request #29062: URL: https://github.com/apache/spark/pull/29062#issuecomment-662769652 cc @peter-toth , too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
maropu commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662769125 Note: If nobody has comments, this is just a doc improvement, so I will merge in a few days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
AmplabJenkins removed a comment on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662768893 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
AmplabJenkins commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662768893 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
AmplabJenkins removed a comment on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-656144859 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
maropu commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-662768621 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29152: [SPARK-32356][SQL] Forbid create view with null type
maropu commented on pull request #29152: URL: https://github.com/apache/spark/pull/29152#issuecomment-662767785 Looks okay to me, but (this is trivial though) we might need to choose either action?; - jira type: `improvement` -> `bug`, then backport this to `branch-3.0`/`branch-2.4`, or - update the migration guide for the behaviour change cc: @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity
AmplabJenkins removed a comment on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-662767378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29159: [SPARK-32310][ML][PySpark][3.0] ML params default value parity
AmplabJenkins commented on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-662767378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org