[GitHub] [spark] SparkQA commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
SparkQA commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554254541 **[Test build #113859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113859/testReport)** for PR 26533 at commit [`a95e8bf`](https://github.com/apache/spark/commit/a95e8bf68981fd182dc442f323093d845604f659). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource
HyukjinKwon commented on a change in pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource URL: https://github.com/apache/spark/pull/26102#discussion_r346698272 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala ## @@ -207,7 +208,16 @@ class ParquetWriteSupport extends WriteSupport[InternalRow] with Logging { case t: UserDefinedType[_] => makeWriter(t.sqlType) - // TODO Adds IntervalType support + case CalendarIntervalType => +(row: SpecializedGetters, ordinal: Int) => + val interval = row.getInterval(ordinal) + val buf = ByteBuffer.wrap(reusableBuffer) + buf.order(ByteOrder.LITTLE_ENDIAN) +.putInt((interval.milliseconds()).toInt) Review comment: @MaxGekk the doc says: > three little-endian unsigned integers what happens if we set negative values for some parts in the interval and negative values are written here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
BryanCutler commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554254028 Oh, I saw you just added it as a property in the pom. Hopefully that should do the trick. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554253786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113845/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554253786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113845/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
BryanCutler commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554253576 right, I think for JDK >= 9 this needs to be set "-Dio.netty.tryReflectionSetAccessible=true". This was due to some Netty API that Arrow started using. I can look into it and try to figure out what we should do for Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554253778 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554253778 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554253366 **[Test build #113845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113845/testReport)** for PR 26416 at commit [`a3a0c4a`](https://github.com/apache/spark/commit/a3a0c4acfdd602ad0a3566cc49ac38b707603fe7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554213947 **[Test build #113845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113845/testReport)** for PR 26416 at commit [`a3a0c4a`](https://github.com/apache/spark/commit/a3a0c4acfdd602ad0a3566cc49ac38b707603fe7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554252945 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554252950 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18728/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554252945 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554252950 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18728/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346696563 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala ## @@ -157,31 +159,61 @@ abstract class AggregationIterator( inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit = { val joinedRow = new JoinedRow if (expressions.nonEmpty) { - val mergeExpressions = functions.zip(expressions).flatMap { -case (ae: DeclarativeAggregate, expression) => - expression.mode match { + val filterExpressions = expressions.map(_.filter) + val notExistsFilter = !filterExpressions.exists(_ != None) + var isFinalOrMerge = false + val mergeExpressions = functions.zipWithIndex.collect { +case (ae: DeclarativeAggregate, i) => + expressions(i).mode match { case Partial | Complete => ae.updateExpressions -case PartialMerge | Final => ae.mergeExpressions +case PartialMerge | Final => + isFinalOrMerge = true + ae.mergeExpressions } case (agg: AggregateFunction, _) => Seq.fill(agg.aggBufferAttributes.length)(NoOp) } val updateFunctions = functions.zipWithIndex.collect { case (ae: ImperativeAggregate, i) => expressions(i).mode match { -case Partial | Complete => +case Partial | Complete if filterExpressions(i).isDefined => + (buffer: InternalRow, row: InternalRow) => +if (predicates(i).eval(row)) { ae.update(buffer, row) } +case Partial | Complete if filterExpressions(i).isEmpty => (buffer: InternalRow, row: InternalRow) => ae.update(buffer, row) case PartialMerge | Final => (buffer: InternalRow, row: InternalRow) => ae.merge(buffer, row) } }.toArray // This projection is used to merge buffer values for all expression-based aggregates. val aggregationBufferSchema = functions.flatMap(_.aggBufferAttributes) - val updateProjection = -newMutableProjection(mergeExpressions, aggregationBufferSchema ++ inputAttributes) + val updateProjection = newMutableProjection( +mergeExpressions.flatMap(_.seq), aggregationBufferSchema ++ inputAttributes) (currentBuffer: InternalRow, row: InternalRow) => { // Process all expression-based aggregate functions. -updateProjection.target(currentBuffer)(joinedRow(currentBuffer, row)) +if (notExistsFilter || isFinalOrMerge) { Review comment: Good idea! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26409: [SPARK-29655][SQL] Read bucketed tables obeys spark.sql.shuffle.partitions
cloud-fan commented on issue #26409: [SPARK-29655][SQL] Read bucketed tables obeys spark.sql.shuffle.partitions URL: https://github.com/apache/spark/pull/26409#issuecomment-554252095 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource
HyukjinKwon commented on a change in pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource URL: https://github.com/apache/spark/pull/26102#discussion_r346695819 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -498,10 +498,8 @@ case class DataSource( outputColumnNames: Seq[String], physicalPlan: SparkPlan): BaseRelation = { val outputColumns = DataWritingCommand.logicalPlanOutputWithNames(data, outputColumnNames) -if (outputColumns.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { Review comment: and Python and R needs a proper conversion for both to read and write as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #26409: [SPARK-29655][SQL] Read bucketed tables obeys spark.sql.shuffle.partitions
cloud-fan closed pull request #26409: [SPARK-29655][SQL] Read bucketed tables obeys spark.sql.shuffle.partitions URL: https://github.com/apache/spark/pull/26409 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource
cloud-fan commented on a change in pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource URL: https://github.com/apache/spark/pull/26102#discussion_r344585132 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -498,10 +498,8 @@ case class DataSource( outputColumnNames: Seq[String], physicalPlan: SparkPlan): BaseRelation = { val outputColumns = DataWritingCommand.logicalPlanOutputWithNames(data, outputColumnNames) -if (outputColumns.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { Review comment: interval type is kind of an internal type for now. It's a big decision if we can read/write it from/to data sources. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts
AmplabJenkins removed a comment on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts URL: https://github.com/apache/spark/pull/26534#issuecomment-554251312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18727/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts
AmplabJenkins removed a comment on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts URL: https://github.com/apache/spark/pull/26534#issuecomment-554251307 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts
AmplabJenkins commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts URL: https://github.com/apache/spark/pull/26534#issuecomment-554251307 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts
AmplabJenkins commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts URL: https://github.com/apache/spark/pull/26534#issuecomment-554251312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18727/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346694871 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -547,14 +478,16 @@ object IntervalUtils { case ' ' => fraction /= NANOS_PER_MICROS.toInt state = TRIM_BEFORE_UNIT -case _ => return null +case _ if '0' <= b && b <= '9' => + throwIAE(s"invalid value fractional part '$fraction$nextWord' out of range") Review comment: I'd improve this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts
SparkQA commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts URL: https://github.com/apache/spark/pull/26534#issuecomment-554250511 **[Test build #113858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113858/testReport)** for PR 26534 at commit [`a44f218`](https://github.com/apache/spark/commit/a44f21890f6f52e80a80f31a0437e0ac99fe6b98). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346694150 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ## @@ -343,6 +344,87 @@ abstract class ExplodeBase extends UnaryExpression with CollectionGenerator with } } +@ExpressionDescription( + usage = "_FUNC_(expr) - Separates the elements of array `expr` into multiple rows recursively.", + examples = """ +Examples: + > SELECT _FUNC_(array(10, 20)); + 10 + 20 + > SELECT _FUNC_(a) FROM VALUES (array(1,2)), (array(3,4)) AS v1(a); + 1 + 2 + 3 + 4 + > SELECT _FUNC_(a) FROM VALUES (array(array(1,2), array(3,4))) AS v1(a); + 1 + 2 + 3 + 4 + """, + since = "3.0.0") +case class UnNest(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def prettyName: String = "unnest" + + override def elementSchema: StructType = { +new StructType().add(prettyName, getLeafDataType(child.dataType), true) + } + + // TODO: multidimensional arrays must have array expressions with matching dimensions + override def checkInputDataTypes(): TypeCheckResult = child.dataType match { +case ArrayType(_: NullType, _) => TypeCheckResult.TypeCheckSuccess +case _: ArrayType => + var currentType = child.dataType + while(currentType.isInstanceOf[ArrayType]) { +val arrayType = currentType.asInstanceOf[ArrayType] +if (arrayType.containsNull && !arrayType.elementType.isInstanceOf[AtomicType]) { + return TypeCheckResult.TypeCheckFailure("multidimensional arrays must not contains nulls") +} +currentType = getElementTypeOrItself(currentType) + } + TypeUtils.checkForSameTypeInputExpr(child.children.map(_.dataType), s"function $prettyName") + +case _ => + TypeCheckResult.TypeCheckFailure( +s"input to function unnest should be array type not ${child.dataType.catalogString}") + } + + @scala.annotation.tailrec + private def getLeafDataType(typ: DataType): DataType = typ match { +case ArrayType(et, _) => getLeafDataType(et) +case at => at + } + + private def getElementTypeOrItself(typ: DataType): DataType = typ match { +case ArrayType(et, _) => et +case _ => typ + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +var inputArray = child.eval(input).asInstanceOf[ArrayData] +if (inputArray == null) { + Nil +} else { + val rows = ArrayBuffer[InternalRow]() + var currentType = child.dataType + while (currentType.isInstanceOf[ArrayType]) { +val currentElementType = getElementTypeOrItself(currentType) +if (!currentElementType.isInstanceOf[ArrayType]) { + inputArray.foreach(currentElementType, (_, e) => rows += InternalRow(e)) Review comment: Can we avoid Scala's foreach in such critical path (see https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex)? We can at least switch to a regular for loop instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346693625 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ## @@ -343,6 +344,87 @@ abstract class ExplodeBase extends UnaryExpression with CollectionGenerator with } } +@ExpressionDescription( + usage = "_FUNC_(expr) - Separates the elements of array `expr` into multiple rows recursively.", + examples = """ +Examples: + > SELECT _FUNC_(array(10, 20)); + 10 + 20 + > SELECT _FUNC_(a) FROM VALUES (array(1,2)), (array(3,4)) AS v1(a); + 1 + 2 + 3 + 4 + > SELECT _FUNC_(a) FROM VALUES (array(array(1,2), array(3,4))) AS v1(a); + 1 + 2 + 3 + 4 + """, + since = "3.0.0") +case class UnNest(child: Expression) extends UnaryExpression with Generator with CodegenFallback { Review comment: Can we implement codegen as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts
cloud-fan commented on issue #26534: [SPARK-29343][SQL][FOLLOW-UP] Remove AVG from order-insensitive aggregates in EliminateSorts URL: https://github.com/apache/spark/pull/26534#issuecomment-554249765 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346693286 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala ## @@ -157,31 +159,61 @@ abstract class AggregationIterator( inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit = { val joinedRow = new JoinedRow if (expressions.nonEmpty) { - val mergeExpressions = functions.zip(expressions).flatMap { -case (ae: DeclarativeAggregate, expression) => - expression.mode match { + val filterExpressions = expressions.map(_.filter) + val notExistsFilter = !filterExpressions.exists(_ != None) + var isFinalOrMerge = false Review comment: `isFinalOrMerge` is related to `expressions`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346693230 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ## @@ -343,6 +344,87 @@ abstract class ExplodeBase extends UnaryExpression with CollectionGenerator with } } +@ExpressionDescription( + usage = "_FUNC_(expr) - Separates the elements of array `expr` into multiple rows recursively.", + examples = """ +Examples: + > SELECT _FUNC_(array(10, 20)); + 10 + 20 + > SELECT _FUNC_(a) FROM VALUES (array(1,2)), (array(3,4)) AS v1(a); + 1 + 2 + 3 + 4 + > SELECT _FUNC_(a) FROM VALUES (array(array(1,2), array(3,4))) AS v1(a); + 1 + 2 + 3 + 4 + """, + since = "3.0.0") +case class UnNest(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def prettyName: String = "unnest" + + override def elementSchema: StructType = { +new StructType().add(prettyName, getLeafDataType(child.dataType), true) + } + + // TODO: multidimensional arrays must have array expressions with matching dimensions Review comment: what does this mean? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
cloud-fan commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#discussion_r346692869 ## File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ## @@ -129,6 +132,32 @@ class RelationalGroupedDataset protected[sql]( (inputExpr: Expression) => exprToFunc(inputExpr) } + /** + * Returns a `KeyValueGroupedDataset` where the data is grouped by the grouping expressions + * of current `RelationalGroupedDataset`. + * + * @since 3.0.0 + */ + def keyAs[U : Encoder]: KeyValueGroupedDataset[U, T] = { +val keyEncoder = encoderFor[U] +val aliasedGrps = groupingExprs.map { g => + g.transformDown { + case u: UnresolvedAttribute => df.resolve(u.name) + } +}.map(alias) +val additionalCols = aliasedGrps.filter(g => !df.logicalPlan.outputSet.contains(g.toAttribute)) +val qe = Dataset.ofRows( + df.sparkSession, + Project(df.logicalPlan.output ++ additionalCols, df.logicalPlan)).queryExecution Review comment: `groupingAttributes` is private in `KeyValueGroupedDataset` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346692879 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ## @@ -343,6 +344,87 @@ abstract class ExplodeBase extends UnaryExpression with CollectionGenerator with } } +@ExpressionDescription( + usage = "_FUNC_(expr) - Separates the elements of array `expr` into multiple rows recursively.", + examples = """ +Examples: + > SELECT _FUNC_(array(10, 20)); + 10 + 20 + > SELECT _FUNC_(a) FROM VALUES (array(1,2)), (array(3,4)) AS v1(a); + 1 + 2 + 3 + 4 + > SELECT _FUNC_(a) FROM VALUES (array(array(1,2), array(3,4))) AS v1(a); + 1 + 2 + 3 + 4 + """, + since = "3.0.0") +case class UnNest(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def prettyName: String = "unnest" Review comment: I think we don't need to override this since it's matched to class name This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r346692667 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/streaming/StreamingWrite.java ## @@ -48,8 +49,10 @@ * * If this method fails (by throwing an exception), the action will fail and no Spark job will be * submitted. + * + * @param writeInfo Information about the RDD that will be written to this data writer */ - StreamingDataWriterFactory createStreamingWriterFactory(); + StreamingDataWriterFactory createStreamingWriterFactory(PhysicalWriteInfo writeInfo); Review comment: for simplicity, we can just call the parameter "info" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346692072 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/aggregates_part4.sql ## @@ -404,12 +403,8 @@ -- [SPARK-28382] Array Functions: unnest -- check collation-sensitive matching between grouping expressions --- select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*) --- from unnest(array['a','b']) u(v) --- group by v||'a' order by 1; --- select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*) --- from unnest(array['a','b']) u(v) --- group by v||'a' order by 1; +select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*) from (select unnest(array('a','b')) as v) u group by v||'a' order by 1; +select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*) from (select unnest(array('a','b')) as v) u group by v||'a' order by 1; Review comment: Why should we inline them? can't we just uncomment the tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r346691872 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/write/WriteInfoImpl.scala ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.write + +import org.apache.spark.sql.types.StructType + +private[sql] case class WriteInfoImpl( +queryId: String, +schema: StructType) extends WriteInfo Review comment: how about `LogicalWriteInfo` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346691887 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql ## @@ -1089,10 +1089,9 @@ order by fault; -- left join unnest(v1ys) as u1(u1y) on u1y = v2y; -- [SPARK-28382] Array Functions: unnest --- select * from --- (values (1, array(10,20)), (2, array(20,30))) as v1(v1x,v1ys) --- left join (values (1, 10), (2, 20)) as v2(v2x,v2y) on v2x = v1x --- left join unnest(v1ys) as u1(u1y) on u1y = v2y; +create or replace temporary view v1 as select * from values (1, array(10,20)), (2, array(20,30)) as v1(v1x,v1ys); +create or replace temporary view v2 as select * from values (1, 10), (2, 20) as v2(v2x,v2y); Review comment: Hm, why is it different from commented? Does it match to PostgreSQL's? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346691763 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala ## @@ -40,6 +41,7 @@ abstract class AggregationIterator( aggregateAttributes: Seq[Attribute], initialInputBufferOffset: Int, resultExpressions: Seq[NamedExpression], +predicates: HashMap[Int, GenPredicate], Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest
HyukjinKwon commented on a change in pull request #26279: [SPARK-28382][SQL] Add array function - unnest URL: https://github.com/apache/spark/pull/26279#discussion_r346691561 ## File path: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ## @@ -344,6 +344,7 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession { } } + Review comment: I would remove unnecessary changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r346691710 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/PhysicalWriteInfo.java ## @@ -0,0 +1,17 @@ +package org.apache.spark.sql.connector.write; + +import org.apache.spark.annotation.Experimental; +import org.apache.spark.sql.connector.write.streaming.StreamingDataWriterFactory; + +/** + * :: Experimental :: + * This interface contains physical (i.e. RDD) write information that data sources can use when generating a + * {@link DataWriterFactory} or a {@link StreamingDataWriterFactory}. + */ +@Experimental Review comment: we shall use `@Evolving` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346691731 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala ## @@ -157,31 +159,61 @@ abstract class AggregationIterator( inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit = { val joinedRow = new JoinedRow if (expressions.nonEmpty) { - val mergeExpressions = functions.zip(expressions).flatMap { -case (ae: DeclarativeAggregate, expression) => - expression.mode match { + val filterExpressions = expressions.map(_.filter) + val notExistsFilter = !filterExpressions.exists(_ != None) Review comment: Good suggestion! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26458: [SPARK-29821] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on issue #26458: [SPARK-29821] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26458#issuecomment-554247661 haha actually we left a note https://github.com/apache/spark/blob/6f687691ef6212c125f4023fcd79839ee9244049/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L58-L60 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
wangyum commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554247101 @HyukjinKwon @BryanCutler It seems failed tests related to https://issues.apache.org/jira/browse/ARROW-5412? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
cloud-fan commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346690817 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -598,26 +531,26 @@ object IntervalUtils { } else if (s.matchAt(microsStr, i)) { microseconds = Math.addExact(microseconds, currentValue) i += microsStr.numBytes() -} else return null - case _ => return null +} else throwIAE(s"invalid unit '$nextWord'") + case _ => throwIAE(s"invalid unit '$nextWord'") } } catch { -case _: ArithmeticException => return null +case e: ArithmeticException => throwIAE(e.getMessage, e) } state = UNIT_SUFFIX case UNIT_SUFFIX => b match { case 's' => state = UNIT_END case ' ' => state = TRIM_BEFORE_SIGN -case _ => return null +case _ => throwIAE(s"invalid unit suffix '$nextWord'") Review comment: for error reporting, usually backtracing is necessary. For example, it's better to tell users that `123a` is not valid, instead of just saying `a` is not valid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
viirya commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#discussion_r346690836 ## File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ## @@ -129,6 +132,32 @@ class RelationalGroupedDataset protected[sql]( (inputExpr: Expression) => exprToFunc(inputExpr) } + /** + * Returns a `KeyValueGroupedDataset` where the data is grouped by the grouping expressions + * of current `RelationalGroupedDataset`. + * + * @since 3.0.0 + */ + def keyAs[U : Encoder]: KeyValueGroupedDataset[U, T] = { +val keyEncoder = encoderFor[U] +val aliasedGrps = groupingExprs.map { g => + g.transformDown { + case u: UnresolvedAttribute => df.resolve(u.name) + } +}.map(alias) +val additionalCols = aliasedGrps.filter(g => !df.logicalPlan.outputSet.contains(g.toAttribute)) +val qe = Dataset.ofRows( + df.sparkSession, + Project(df.logicalPlan.output ++ additionalCols, df.logicalPlan)).queryExecution Review comment: Will it also break break source compatibility? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
AmplabJenkins removed a comment on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-554246861 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113838/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
AmplabJenkins removed a comment on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-554246855 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
SparkQA commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-554246674 **[Test build #113838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113838/testReport)** for PR 26519 at commit [`85f8d69`](https://github.com/apache/spark/commit/85f8d693de63a28642f1f05d2b3930c93ebcbf5c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346690471 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -598,26 +531,26 @@ object IntervalUtils { } else if (s.matchAt(microsStr, i)) { microseconds = Math.addExact(microseconds, currentValue) i += microsStr.numBytes() -} else return null - case _ => return null +} else throwIAE(s"invalid unit '$nextWord'") + case _ => throwIAE(s"invalid unit '$nextWord'") } } catch { -case _: ArithmeticException => return null +case e: ArithmeticException => throwIAE(e.getMessage, e) } state = UNIT_SUFFIX case UNIT_SUFFIX => b match { case 's' => state = UNIT_END case ' ' => state = TRIM_BEFORE_SIGN -case _ => return null +case _ => throwIAE(s"invalid unit suffix '$nextWord'") Review comment: or we change the logic of unit parsing we can extract the whole part like `case _ if b>= 'A' $$ b<= 'z' => unit = s.substring(i, s.numBytes()).subStringIndex(UTF8String.blankString(1), 1)` than do `unit` case matching and error capture. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
SparkQA removed a comment on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-554204749 **[Test build #113838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113838/testReport)** for PR 26519 at commit [`85f8d69`](https://github.com/apache/spark/commit/85f8d693de63a28642f1f05d2b3930c93ebcbf5c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
AmplabJenkins commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-554246861 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113838/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
AmplabJenkins commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-554246855 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] edrevo commented on a change in pull request #25945: [SPARK-29248][SQL] Pass in number of partitions to WriteBuilder
edrevo commented on a change in pull request #25945: [SPARK-29248][SQL] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25945#discussion_r346689861 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/MicroBatchWrite.scala ## @@ -36,8 +36,8 @@ class MicroBatchWrite(eppchId: Long, val writeSupport: StreamingWrite) extends B writeSupport.abort(eppchId, messages) } - override def createBatchWriterFactory(): DataWriterFactory = { -new MicroBatchWriterFactory(eppchId, writeSupport.createStreamingWriterFactory()) + override def createBatchWriterFactory(numPartitions: Int): DataWriterFactory = { Review comment: No problem! Since we still want to move forward with the interface-based approach, I've decided to evolve https://github.com/apache/spark/pull/25990/ to include both the `PhysicalWriteInfo` as well as `WriteInfo` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346689827 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala ## @@ -130,7 +135,8 @@ case class AggregateExpression( @transient override lazy val references: AttributeSet = { mode match { - case Partial | Complete => aggregateFunction.references + case Partial | Complete if filter == None => aggregateFunction.references Review comment: Yeah, you said right. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] edrevo commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
edrevo commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r346689656 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/PhysicalWriteInfo.java ## @@ -0,0 +1,17 @@ +package org.apache.spark.sql.connector.write; + +import org.apache.spark.annotation.Experimental; +import org.apache.spark.sql.connector.write.streaming.StreamingDataWriterFactory; + +/** + * :: Experimental :: + * This interface contains physical (i.e. RDD) write information that data sources can use when generating a + * {@link DataWriterFactory} or a {@link StreamingDataWriterFactory}. + */ +@Experimental Review comment: Is this the correct annotation for the new interfaces I'm adding? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources
AmplabJenkins removed a comment on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources URL: https://github.com/apache/spark/pull/26507#issuecomment-554245237 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18726/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources
AmplabJenkins removed a comment on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources URL: https://github.com/apache/spark/pull/26507#issuecomment-554245229 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources
AmplabJenkins commented on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources URL: https://github.com/apache/spark/pull/26507#issuecomment-554245229 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26537: [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal
cloud-fan commented on a change in pull request #26537: [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal URL: https://github.com/apache/spark/pull/26537#discussion_r346688874 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2154,17 +2154,17 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case ("smallint" | "short", Nil) => ShortType case ("int" | "integer", Nil) => IntegerType case ("bigint" | "long", Nil) => LongType - case ("float", Nil) => FloatType + case ("float" | "real", Nil) => FloatType Review comment: I take a look at SQL standard, float and real are different ``` FLOAT specifies the data type approximate numeric, with binary precision equal to or greater than the value of the specified . The maximum value of is implementation-defined. shall not be greater than this value. REAL specifies the data type approximate numeric, with implementation-defined precision. ``` We can say that now Spark supports real type, and its precision is the same as java float. But it needs to be proposed explicitly, e.g. send an email to dev list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources
AmplabJenkins commented on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources URL: https://github.com/apache/spark/pull/26507#issuecomment-554245237 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18726/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-554244844 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-554244851 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18725/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-554244844 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-554244851 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18725/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed
SparkQA commented on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed URL: https://github.com/apache/spark/pull/26543#issuecomment-554244513 **[Test build #113855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113855/testReport)** for PR 26543 at commit [`e6bb796`](https://github.com/apache/spark/commit/e6bb79623f097b03d3edfef7a6cfebcfb06e33fd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed
AmplabJenkins removed a comment on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed URL: https://github.com/apache/spark/pull/26543#issuecomment-554242752 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed
AmplabJenkins removed a comment on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed URL: https://github.com/apache/spark/pull/26543#issuecomment-554242761 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18724/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources
SparkQA commented on issue #26507: [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources URL: https://github.com/apache/spark/pull/26507#issuecomment-554244491 **[Test build #113857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113857/testReport)** for PR 26507 at commit [`1852511`](https://github.com/apache/spark/commit/18525117051143a4306bdcfcbfdd878f40983799). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346688075 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -598,26 +531,26 @@ object IntervalUtils { } else if (s.matchAt(microsStr, i)) { microseconds = Math.addExact(microseconds, currentValue) i += microsStr.numBytes() -} else return null - case _ => return null +} else throwIAE(s"invalid unit '$nextWord'") + case _ => throwIAE(s"invalid unit '$nextWord'") } } catch { -case _: ArithmeticException => return null +case e: ArithmeticException => throwIAE(e.getMessage, e) } state = UNIT_SUFFIX case UNIT_SUFFIX => b match { case 's' => state = UNIT_END case ' ' => state = TRIM_BEFORE_SIGN -case _ => return null +case _ => throwIAE(s"invalid unit suffix '$nextWord'") Review comment: the `nextword ` represents which character the error occurs, the only special case is for nanoseconds out of range, the fraction + `nextword` could just exactly show the out of range number This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
SparkQA commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-554244415 **[Test build #113856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113856/testReport)** for PR 26509 at commit [`576558d`](https://github.com/apache/spark/commit/576558d02c8994ee408be48615f777018c809957). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346687485 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1084,6 +1084,11 @@ class Analyzer( result case UnresolvedExtractValue(child, fieldExpr) if child.resolved => ExtractValue(child, fieldExpr, resolver) +case f @ UnresolvedFunction(_, children, _, filter) if filter.isDefined => + val newChildren = children.map(resolveExpressionTopDown(_, q)) + val newFilter = filter.map{ expr => expr.mapChildren(resolveExpressionTopDown(_, q))} Review comment: Due to the above modifications, this problem does not exist. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation
SparkQA commented on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation URL: https://github.com/apache/spark/pull/26529#issuecomment-554243400 **[Test build #113853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113853/testReport)** for PR 26529 at commit [`76180fa`](https://github.com/apache/spark/commit/76180fac61fb8f4c4fd4e97dfa5382df2df60dad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation
AmplabJenkins commented on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation URL: https://github.com/apache/spark/pull/26529#issuecomment-554243485 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113853/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation
AmplabJenkins removed a comment on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation URL: https://github.com/apache/spark/pull/26529#issuecomment-554243485 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113853/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554243174 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113847/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation
AmplabJenkins commented on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation URL: https://github.com/apache/spark/pull/26529#issuecomment-554243481 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation
SparkQA removed a comment on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation URL: https://github.com/apache/spark/pull/26529#issuecomment-554240295 **[Test build #113853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113853/testReport)** for PR 26529 at commit [`76180fa`](https://github.com/apache/spark/commit/76180fac61fb8f4c4fd4e97dfa5382df2df60dad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation
AmplabJenkins removed a comment on issue #26529: [SPARK-29902][Doc][Minor]Add listener event queue capacity configuration to documentation URL: https://github.com/apache/spark/pull/26529#issuecomment-554243481 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554243167 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554243174 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113847/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
AmplabJenkins removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554243167 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
SparkQA commented on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554242942 **[Test build #113847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113847/testReport)** for PR 26533 at commit [`275390f`](https://github.com/apache/spark/commit/275390f6c6d57346cd359630fd4d4e022f0a3fa0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11
SparkQA removed a comment on issue #26533: [WIP][test-hadoop3.2][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-554216800 **[Test build #113847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113847/testReport)** for PR 26533 at commit [`275390f`](https://github.com/apache/spark/commit/275390f6c6d57346cd359630fd4d4e022f0a3fa0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346686779 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -547,14 +478,16 @@ object IntervalUtils { case ' ' => fraction /= NANOS_PER_MICROS.toInt state = TRIM_BEFORE_UNIT -case _ => return null +case _ if '0' <= b && b <= '9' => + throwIAE(s"invalid value fractional part '$fraction$nextWord' out of range") Review comment: https://github.com/apache/spark/pull/26491/files#diff-105a430951a95bd0c9c899d2a24e3badR107-R115 you can check some test case here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed
AmplabJenkins commented on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed URL: https://github.com/apache/spark/pull/26543#issuecomment-554242761 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18724/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed
AmplabJenkins commented on issue #26543: [SPARK-29911][SQL] Fix cache table leak when session closed URL: https://github.com/apache/spark/pull/26543#issuecomment-554242752 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin opened a new pull request #26543: [SPARK-29911][SQL] Fix cache table leak when session closed
LantaoJin opened a new pull request #26543: [SPARK-29911][SQL] Fix cache table leak when session closed URL: https://github.com/apache/spark/pull/26543 ### What changes were proposed in this pull request? How to reproduce: 1. create a local temporary view v1 2. cache it in memory 3. close session without drop v1. The application will hold the memory forever. In a long running thrift server scenario. It's worse. ```shell 0: jdbc:hive2://localhost:1> CACHE TABLE testCacheTable AS SELECT 1; CACHE TABLE testCacheTable AS SELECT 1; +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.498 seconds) 0: jdbc:hive2://localhost:1> !close !close Closing: 0: jdbc:hive2://localhost:1 0: jdbc:hive2://localhost:1 (closed)> !connect 'jdbc:hive2://localhost:1' !connect 'jdbc:hive2://localhost:1' Connecting to jdbc:hive2://localhost:1 Enter username for jdbc:hive2://localhost:1: lajin Enter password for jdbc:hive2://localhost:1: *** Connected to: Spark SQL (version 3.0.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://localhost:1> select * from testCacheTable; select * from testCacheTable; Error: Error running query: org.apache.spark.sql.AnalysisException: Table or view not found: testCacheTable; line 1 pos 14; 'Project [*] +- 'UnresolvedRelation [testCacheTable] (state=,code=0) ``` https://user-images.githubusercontent.com/1853780/68923527-7ca8c180-07b9-11ea-9cc7-74f276c46840.png";> ### Why are the changes needed? Resolve memory leak for thrift server ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manual test in UI storage tab This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346686470 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -547,14 +478,16 @@ object IntervalUtils { case ' ' => fraction /= NANOS_PER_MICROS.toInt state = TRIM_BEFORE_UNIT -case _ => return null +case _ if '0' <= b && b <= '9' => + throwIAE(s"invalid value fractional part '$fraction$nextWord' out of range") Review comment: > how about `interval can only support nanosecond precision, '$nextWord' is out of range` the is not suitable, `0.99` the nextword will be `9` only but '$fraction$nextWord' is the exact `99` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346686470 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -547,14 +478,16 @@ object IntervalUtils { case ' ' => fraction /= NANOS_PER_MICROS.toInt state = TRIM_BEFORE_UNIT -case _ => return null +case _ if '0' <= b && b <= '9' => + throwIAE(s"invalid value fractional part '$fraction$nextWord' out of range") Review comment: > how about `interval can only support nanosecond precision, '$nextWord' is out of range` the is not suitable, `0.99` the nextword will be `9` only but '$fraction$nextWord' is the exact `99` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi.
AmplabJenkins commented on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi. URL: https://github.com/apache/spark/pull/26516#issuecomment-554242215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113842/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi.
AmplabJenkins removed a comment on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi. URL: https://github.com/apache/spark/pull/26516#issuecomment-554242211 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi.
AmplabJenkins commented on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi. URL: https://github.com/apache/spark/pull/26516#issuecomment-554242211 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi.
AmplabJenkins removed a comment on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi. URL: https://github.com/apache/spark/pull/26516#issuecomment-554242215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113842/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi.
SparkQA removed a comment on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi. URL: https://github.com/apache/spark/pull/26516#issuecomment-554207880 **[Test build #113842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113842/testReport)** for PR 26516 at commit [`beca952`](https://github.com/apache/spark/commit/beca952a3a977814203936de99e059ecbef2bafa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346685776 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -518,10 +449,10 @@ object IntervalUtils { isNegative = false case '.' => isNegative = false - fractionScale = (NANOS_PER_SECOND / 10).toInt i += 1 + fractionScale = (NANOS_PER_SECOND / 10).toInt state = VALUE_FRACTIONAL_PART -case _ => return null +case _ => throwIAE( s"unrecognized sign '$nextWord'") Review comment: number seems ok to me This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi.
SparkQA commented on issue #26516: [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi. URL: https://github.com/apache/spark/pull/26516#issuecomment-554241724 **[Test build #113842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113842/testReport)** for PR 26516 at commit [`beca952`](https://github.com/apache/spark/commit/beca952a3a977814203936de99e059ecbef2bafa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346685433 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -518,10 +449,10 @@ object IntervalUtils { isNegative = false case '.' => isNegative = false - fractionScale = (NANOS_PER_SECOND / 10).toInt i += 1 + fractionScale = (NANOS_PER_SECOND / 10).toInt state = VALUE_FRACTIONAL_PART -case _ => return null +case _ => throwIAE( s"unrecognized sign '$nextWord'") Review comment: checkFromInvalidString("~1 hour", "unrecognized sign") This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r346685495 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1084,6 +1084,11 @@ class Analyzer( result case UnresolvedExtractValue(child, fieldExpr) if child.resolved => ExtractValue(child, fieldExpr, resolver) +case f @ UnresolvedFunction(_, children, _, filter) if filter.isDefined => Review comment: OK. I will try like this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
yaooqinn commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346685433 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -518,10 +449,10 @@ object IntervalUtils { isNegative = false case '.' => isNegative = false - fractionScale = (NANOS_PER_SECOND / 10).toInt i += 1 + fractionScale = (NANOS_PER_SECOND / 10).toInt state = VALUE_FRACTIONAL_PART -case _ => return null +case _ => throwIAE( s"unrecognized sign '$nextWord'") Review comment: `checkFromInvalidString("~1 hour", "unrecognized sign")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
cloud-fan commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346685302 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -598,26 +531,26 @@ object IntervalUtils { } else if (s.matchAt(microsStr, i)) { microseconds = Math.addExact(microseconds, currentValue) i += microsStr.numBytes() -} else return null - case _ => return null +} else throwIAE(s"invalid unit '$nextWord'") + case _ => throwIAE(s"invalid unit '$nextWord'") } } catch { -case _: ArithmeticException => return null +case e: ArithmeticException => throwIAE(e.getMessage, e) } state = UNIT_SUFFIX case UNIT_SUFFIX => b match { case 's' => state = UNIT_END case ' ' => state = TRIM_BEFORE_SIGN -case _ => return null +case _ => throwIAE(s"invalid unit suffix '$nextWord'") Review comment: `invalid unit '$nextWord'` is better if we can implement `nextword` correctly. Or we should introduce "currentWord" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
cloud-fan commented on a change in pull request #26491: [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval URL: https://github.com/apache/spark/pull/26491#discussion_r346685009 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -547,14 +478,16 @@ object IntervalUtils { case ' ' => fraction /= NANOS_PER_MICROS.toInt state = TRIM_BEFORE_UNIT -case _ => return null +case _ if '0' <= b && b <= '9' => + throwIAE(s"invalid value fractional part '$fraction$nextWord' out of range") +case _ => throwIAE(s"invalid value '$nextWord' in fractional part") } i += 1 case TRIM_BEFORE_UNIT => trimToNextState(b, UNIT_BEGIN) case UNIT_BEGIN => // Checks that only seconds can have the fractional part if (b != 's' && fractionScale >= 0) { -return null +throwIAE(s"'$nextWord' with fractional part is unsupported") Review comment: nit: `$nextWord' cannot have fractional part` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org