[GitHub] [spark] HyukjinKwon commented on a change in pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
HyukjinKwon commented on a change in pull request #32161: URL: https://github.com/apache/spark/pull/32161#discussion_r635788033 ## File path: docs/sql-data-sources-parquet.md ## @@ -260,6 +260,12 @@ Data source options of Parquet can be set via: Property NameDefaultMeaningScope + +maxFilesPerTrigger Review comment: this is not a Parquet specific option. it's for structured streming only option. we should probably take out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
AmplabJenkins removed a comment on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844699781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138722/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
AmplabJenkins commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844724760 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844699823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin
ulysses-you opened a new pull request #32602: URL: https://github.com/apache/spark/pull/32602 ### What changes were proposed in this pull request? - Add semi / anti elimination check if left side is empty. - Add outer join elimination check if left side is empty for left/right outer join check if both side are empty for full outer join - Add multi-join elimination if no shuffle node inside change `transformDown` to `transformUp` and add `LocalRelation` check ### Why are the changes needed? Make `EliminateUnnecessaryJoin` available with more case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message
SparkQA commented on pull request #32600: URL: https://github.com/apache/spark/pull/32600#issuecomment-844723643 **[Test build #138729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138729/testReport)** for PR 32600 at commit [`309fa9f`](https://github.com/apache/spark/commit/309fa9f3e095b12226574ce069683edcb90bd162). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32601: [SPARK-35130][SQL] Add make_duration function to construct DayTimeIntervalType value
AmplabJenkins commented on pull request #32601: URL: https://github.com/apache/spark/pull/32601#issuecomment-844721318 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844721237 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138726/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30404: [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
HyukjinKwon commented on pull request #30404: URL: https://github.com/apache/spark/pull/30404#issuecomment-844716174 Maybe let's just use the 4.8 that's matched with the official releases there ... Thanks for pointing this out, @bozhang2820. Mind making a quick followup to match this to 4.8? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30404: [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
HyukjinKwon commented on pull request #30404: URL: https://github.com/apache/spark/pull/30404#issuecomment-844713306 Just quick question: Do you guys know the diff between 4.8-1 vs 4.8? The official release seems to be 4.8 https://www.antlr.org/download/index.html - I checked the pom files and jars too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
dongjoon-hyun commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844713151 +1, LGTM. Thanks, @ueshin and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
allisonwang-db commented on a change in pull request #32478: URL: https://github.com/apache/spark/pull/32478#discussion_r635773423 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala ## @@ -1451,4 +1454,146 @@ private[spark] object QueryCompilationErrors { "outer and local references, which is not supported: " + funcStr new AnalysisException(msg) } + + def lookupFunctionInNonFunctionCatalogError( + ident: Identifier, catalog: CatalogPlugin): Throwable = { +new AnalysisException(s"Trying to lookup function '$ident' in " + + s"catalog '${catalog.name()}', but it is not a FunctionCatalog.") + } + + def functionCannotProcessInputError( + unbound: UnboundFunction, + arguments: Seq[Expression], + unsupported: UnsupportedOperationException): Throwable = { +new AnalysisException(s"Function '${unbound.name}' cannot process " + + s"input: (${arguments.map(_.dataType.simpleString).mkString(", ")}): " + + unsupported.getMessage, cause = Some(unsupported)) + } + + def ambiguousRelationAliasNameInNestedCTEError(name: String): Throwable = { +new AnalysisException(s"Name $name is ambiguous in nested CTE. " + + s"Please set ${LEGACY_CTE_PRECEDENCE_POLICY.key} to CORRECTED so that name " + + "defined in inner CTE takes precedence. If set it to LEGACY, outer CTE " + + "definitions will take precedence. See more details in SPARK-28228.") + } + + def commandUnsupportedInV2TableError(name: String): Throwable = { +new AnalysisException(s"$name is not supported for v2 tables.") + } + + def cannotResolveColumnNameAmongAttributesError( + lattr: Attribute, rightOutputAttrs: Seq[Attribute]): Throwable = { +new AnalysisException( + s""" + |Cannot resolve column name "${lattr.name}" among + |(${rightOutputAttrs.map(_.name).mkString(", ")}) + """.stripMargin.replaceAll("\n", " ")) + } + + def cannotWriteTooManyColumnsToTableError( + tableName: String, expected: Seq[Attribute], query: LogicalPlan): Throwable = { +new AnalysisException( + s""" + |Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(c => s"'${c.name}'").mkString(", ")} + |Data columns: ${query.output.map(c => s"'${c.name}'").mkString(", ")} + """.stripMargin) + } + + def cannotWriteNotEnoughColumnsToTableError( + tableName: String, expected: Seq[Attribute], query: LogicalPlan): Throwable = { +new AnalysisException( + s"""Cannot write to '$tableName', not enough data columns: + |Table columns: ${expected.map(c => s"'${c.name}'").mkString(", ")} + |Data columns: ${query.output.map(c => s"'${c.name}'").mkString(", ")}""" +.stripMargin) + } + + def cannotWriteIncompatibleDataToTableError(tableName: String, errors: Seq[String]): Throwable = { +new AnalysisException( + s"Cannot write incompatible data to table '$tableName':\n- ${errors.mkString("\n- ")}") + } + + def secondArgumentOfFunctionIsNotIntegerError( + function: String, e: NumberFormatException): Throwable = { +new AnalysisException( + s"The second argument of '$function' function needs to be an integer.", cause = Some(e)) + } + + def nonPartitionPruningPredicatesNotExpectedError( + nonPartitionPruningPredicates: Seq[Expression]): Throwable = { +new AnalysisException( + s"Expected only partition pruning predicates: $nonPartitionPruningPredicates") + } + + def columnNotDefinedInTableError( + colType: String, colName: String, tableName: String, tableCols: Seq[String]): Throwable = { +new AnalysisException(s"$colType column $colName is not defined in table $tableName, " + + s"defined table columns are: ${tableCols.mkString(", ")}") + } + + def invalidLiteralForWindowDurationError(): Throwable = { +new AnalysisException("The duration and time inputs to window must be " + + "an integer, long or string literal.") + } + + def noSuchStructFieldInGivenFieldsError( + fieldName: String, fields: Array[StructField]): Throwable = { +new AnalysisException( + s"No such struct field $fieldName in ${fields.map(_.name).mkString(", ")}") + } + + def ambiguousReferenceToFieldsError(fields: String): Throwable = { +new AnalysisException(s"Ambiguous reference to fields $fields") + } + + def secondArgumentInFunctionIsNotBooleanLiteralError(funcName: String): Throwable = { +new AnalysisException(s"The second argument in $funcName should be a boolean literal.") + } + + def joinConditionMissingOrTrivialError( + join: Join, left: LogicalPlan, right: LogicalPlan): Throwable = { +new AnalysisException( + s"""Detected implicit cartesian product for ${join.joinType.sql} join between logical plans + |${left.treeString(false).trim} + |and + |${right.treeString(false).trim} + |Join
[GitHub] [spark] xuanyuanking commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark
xuanyuanking commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-844712368 After reviewing the SPIP doc and the WIP implementation, I think it's a reasonable requirement and scenario for SS. Since the API will be added to Spark Core, we might need extra confirmation from the others to make sure there's no side effect. cc @Ngone51 and @jiangxb1987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] linhongliu-db commented on a change in pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF
linhongliu-db commented on a change in pull request #32587: URL: https://github.com/apache/spark/pull/32587#discussion_r635776715 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java ## @@ -144,6 +155,11 @@ public ExpressionInfo( this.name + "]. It should be a value in " + validGroups + "; however, " + "got [" + group + "]."); } +if (!language.isEmpty() && !validLanguages.contains(language)) { Review comment: how about we change this to `functionType`, and use "scala_udf", "java_udf", "python_udf", "built-in", "hive". Because using empty to indicate builtin is a little bit risky assumption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] copperybean opened a new pull request #32601: [SPARK-35130][SQL] Add make_duration function to construct DayTimeIntervalType value
copperybean opened a new pull request #32601: URL: https://github.com/apache/spark/pull/32601 ### What changes were proposed in this pull request? Providing a new function make_duration to construct DayTimeIntervalType value ### Why are the changes needed? As the JIRA described, we should provide a function to construct DayTimeIntervalType value ### Does this PR introduce _any_ user-facing change? Yes, a new make_duration function provided ### How was this patch tested? Updated UTs, manual testing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844644163 **[Test build #138726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138726/testReport)** for PR 32516 at commit [`a000940`](https://github.com/apache/spark/commit/a0009405cf4250e4e557fdc8505f0983118e072d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message
yaooqinn opened a new pull request #32600: URL: https://github.com/apache/spark/pull/32600 ### What changes were proposed in this pull request? Print the invalid value in config validation error message for `checkValue` just like `checkValues` ### Why are the changes needed? Invalid configuration values may come in many ways, this PR can help different kinds of users or developer to identify what the config the error is related to ### Does this PR introduce _any_ user-facing change? yes, but only error msg ### How was this patch tested? yes, modified tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
viirya commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844705907 cc @maropu @cloud-fan too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844705593 **[Test build #138726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138726/testReport)** for PR 32516 at commit [`a000940`](https://github.com/apache/spark/commit/a0009405cf4250e4e557fdc8505f0983118e072d). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
wangyum commented on a change in pull request #32563: URL: https://github.com/apache/spark/pull/32563#discussion_r635773394 ## File path: sql/core/src/test/resources/sql-tests/results/show-tables.sql.out ## @@ -120,19 +120,9 @@ show_t3 -- !query SHOW TABLE EXTENDED LIKE 'show_t*' -- !query schema -struct +struct> -- !query output - show_t3 trueTable: show_t3 Review comment: The output only contains information column. This is the change of [HiveResult](https://github.com/apache/spark/pull/32563/files#diff-275ec2fd7ca81e5482c959562236c22007d3c2274e456f4e44b78695e37814a4R68-R72). Example of Hive output : ``` hive> SHOW TABLE EXTENDED LIKE '*'; OK tableName:spark_32976 owner:yumwang location:file:/tmp/spark/spark_32976 inputformat:org.apache.hadoop.mapred.TextInputFormat outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat columns:struct columns { i32 id, string name} partitioned:true partitionColumns:struct partition_columns { string part} totalNumberFiles:unknown totalFileSize:unknown maxFileSize:unknown minFileSize:unknown lastAccessTime:unknown lastUpdateTime:unknown tableName:t1 owner:yumwang location:file:/tmp/hive/t1 inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat columns:struct columns { string id} partitioned:true partitionColumns:struct partition_columns { date part} totalNumberFiles:unknown totalFileSize:unknown maxFileSize:unknown minFileSize:unknown lastAccessTime:unknown lastUpdateTime:unknown ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
wangyum commented on a change in pull request #32563: URL: https://github.com/apache/spark/pull/32563#discussion_r635771768 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -314,21 +315,27 @@ case class DropNamespace( */ case class DescribeNamespace( namespace: LogicalPlan, -extended: Boolean, -override val output: Seq[Attribute] = DescribeNamespace.getOutputAttrs) extends UnaryCommand { +extended: Boolean) extends UnaryCommand { + override val output: Seq[Attribute] = { Review comment: OK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on pull request #32287: URL: https://github.com/apache/spark/pull/32287#issuecomment-844702300 Thank you, everyone!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844699823 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43250/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
AmplabJenkins commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844699781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138722/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844699807 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43250/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
dongjoon-hyun commented on pull request #32287: URL: https://github.com/apache/spark/pull/32287#issuecomment-844699111 Sorry for missing ping here. +1, late LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
SparkQA removed a comment on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844598185 **[Test build #138722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138722/testReport)** for PR 32595 at commit [`abc6150`](https://github.com/apache/spark/commit/abc6150ecb7db3fdfe0edb2e5f13d1257a1ae337). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
SparkQA commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844698658 **[Test build #138722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138722/testReport)** for PR 32595 at commit [`abc6150`](https://github.com/apache/spark/commit/abc6150ecb7db3fdfe0edb2e5f13d1257a1ae337). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844691139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata
viirya commented on pull request #32272: URL: https://github.com/apache/spark/pull/32272#issuecomment-844693655 Thanks @HeartSaVioR. I will take another look with #32582 tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844691139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
cloud-fan commented on a change in pull request #32563: URL: https://github.com/apache/spark/pull/32563#discussion_r635761625 ## File path: sql/core/src/test/resources/sql-tests/results/show-tables.sql.out ## @@ -120,19 +120,9 @@ show_t3 -- !query SHOW TABLE EXTENDED LIKE 'show_t*' -- !query schema -struct +struct> -- !query output - show_t3 trueTable: show_t3 Review comment: which is the information column? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
cloud-fan commented on a change in pull request #32563: URL: https://github.com/apache/spark/pull/32563#discussion_r635760696 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -314,21 +315,27 @@ case class DropNamespace( */ case class DescribeNamespace( namespace: LogicalPlan, -extended: Boolean, -override val output: Seq[Attribute] = DescribeNamespace.getOutputAttrs) extends UnaryCommand { +extended: Boolean) extends UnaryCommand { + override val output: Seq[Attribute] = { Review comment: This is the reason why we created `object DescribeNamespace` and others. We shouldn't revert it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations
maropu commented on a change in pull request #32590: URL: https://github.com/apache/spark/pull/32590#discussion_r635670349 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala ## @@ -67,61 +81,78 @@ object DeduplicateRelations extends Rule[LogicalPlan] { * all relations of the new LogicalPlan ) */ private def renewDuplicatedRelations( - existingRelations: Seq[MultiInstanceRelation], - plan: LogicalPlan): (LogicalPlan, Seq[MultiInstanceRelation]) = plan match { -case p: LogicalPlan if p.isStreaming => (plan, Nil) + existingRelations: mutable.HashSet[ReferenceEqualPlanWrapper], + plan: LogicalPlan) +: (LogicalPlan, mutable.HashSet[ReferenceEqualPlanWrapper], Boolean) = plan match { +case p: LogicalPlan if p.isStreaming => (plan, mutable.HashSet.empty, false) case m: MultiInstanceRelation => - if (isDuplicated(existingRelations, m)) { + val planWrapper = ReferenceEqualPlanWrapper(m) + if (isDuplicated(existingRelations, planWrapper)) { val newNode = m.newInstance() newNode.copyTagsFrom(m) -(newNode, Nil) +(newNode, mutable.HashSet.empty, true) } else { -(m, Seq(m)) +val mWrapper = new mutable.HashSet[ReferenceEqualPlanWrapper]() +mWrapper.add(planWrapper) +(m, mWrapper, false) } case plan: LogicalPlan => - val relations = ArrayBuffer.empty[MultiInstanceRelation] + val relations = new mutable.HashSet[ReferenceEqualPlanWrapper]() + var planChanged = false val newPlan = if (plan.children.nonEmpty) { -val newChildren = ArrayBuffer.empty[LogicalPlan] +val newChildren = mutable.ArrayBuffer.empty[LogicalPlan] for (c <- plan.children) { - val (renewed, collected) = renewDuplicatedRelations(existingRelations ++ relations, c) + val (renewed, collected, changed) = +renewDuplicatedRelations(existingRelations ++ relations, c) newChildren += renewed relations ++= collected + if (changed) { +planChanged = true + } } -if (plan.childrenResolved) { - val attrMap = AttributeMap( -plan - .children - .flatMap(_.output).zip(newChildren.flatMap(_.output)) - .filter { case (a1, a2) => a1.exprId != a2.exprId } - ) - plan.withNewChildren(newChildren.toSeq).rewriteAttrs(attrMap) +if (planChanged) { + if (plan.childrenResolved) { +val planWithNewChildren = plan.withNewChildren(newChildren.toSeq) +val attrMap = AttributeMap( + plan +.children +.flatMap(_.output).zip(newChildren.flatMap(_.output)) +.filter { case (a1, a2) => a1.exprId != a2.exprId } +) +if (attrMap.isEmpty) { + planWithNewChildren +} else { + planWithNewChildren.rewriteAttrs(attrMap) +} + } else { +plan.withNewChildren(newChildren.toSeq) + } } else { - plan.withNewChildren(newChildren.toSeq) + plan } } else { plan } val planWithNewSubquery = newPlan.transformExpressions { case subquery: SubqueryExpression => - val (renewed, collected) = renewDuplicatedRelations( + val (renewed, collected, changed) = renewDuplicatedRelations( existingRelations ++ relations, subquery.plan) relations ++= collected + if (changed) planChanged = true subquery.withNewPlan(renewed) } - (planWithNewSubquery, relations.toSeq) + (planWithNewSubquery, relations, planChanged) } + @inline Review comment: This annotation can also reduce the running time? (The PR description says nothing though) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
cloud-fan commented on a change in pull request #32563: URL: https://github.com/apache/spark/pull/32563#discussion_r635760348 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -314,21 +315,27 @@ case class DropNamespace( */ case class DescribeNamespace( namespace: LogicalPlan, -extended: Boolean, -override val output: Seq[Attribute] = DescribeNamespace.getOutputAttrs) extends UnaryCommand { +extended: Boolean) extends UnaryCommand { + override val output: Seq[Attribute] = { Review comment: We shouldn't do this, as the output expr ID becomes unstable and it changes after the plan is copied. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations
cloud-fan commented on a change in pull request #32590: URL: https://github.com/apache/spark/pull/32590#discussion_r635759716 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala ## @@ -67,61 +81,78 @@ object DeduplicateRelations extends Rule[LogicalPlan] { * all relations of the new LogicalPlan ) */ private def renewDuplicatedRelations( - existingRelations: Seq[MultiInstanceRelation], - plan: LogicalPlan): (LogicalPlan, Seq[MultiInstanceRelation]) = plan match { -case p: LogicalPlan if p.isStreaming => (plan, Nil) + existingRelations: mutable.HashSet[ReferenceEqualPlanWrapper], + plan: LogicalPlan) +: (LogicalPlan, mutable.HashSet[ReferenceEqualPlanWrapper], Boolean) = plan match { +case p: LogicalPlan if p.isStreaming => (plan, mutable.HashSet.empty, false) case m: MultiInstanceRelation => - if (isDuplicated(existingRelations, m)) { + val planWrapper = ReferenceEqualPlanWrapper(m) + if (isDuplicated(existingRelations, planWrapper)) { val newNode = m.newInstance() newNode.copyTagsFrom(m) -(newNode, Nil) +(newNode, mutable.HashSet.empty, true) } else { -(m, Seq(m)) +val mWrapper = new mutable.HashSet[ReferenceEqualPlanWrapper]() +mWrapper.add(planWrapper) +(m, mWrapper, false) } case plan: LogicalPlan => - val relations = ArrayBuffer.empty[MultiInstanceRelation] + val relations = new mutable.HashSet[ReferenceEqualPlanWrapper]() + var planChanged = false val newPlan = if (plan.children.nonEmpty) { -val newChildren = ArrayBuffer.empty[LogicalPlan] +val newChildren = mutable.ArrayBuffer.empty[LogicalPlan] for (c <- plan.children) { - val (renewed, collected) = renewDuplicatedRelations(existingRelations ++ relations, c) + val (renewed, collected, changed) = +renewDuplicatedRelations(existingRelations ++ relations, c) newChildren += renewed relations ++= collected + if (changed) { +planChanged = true + } } -if (plan.childrenResolved) { - val attrMap = AttributeMap( -plan - .children - .flatMap(_.output).zip(newChildren.flatMap(_.output)) - .filter { case (a1, a2) => a1.exprId != a2.exprId } - ) - plan.withNewChildren(newChildren.toSeq).rewriteAttrs(attrMap) +if (planChanged) { + if (plan.childrenResolved) { +val planWithNewChildren = plan.withNewChildren(newChildren.toSeq) +val attrMap = AttributeMap( + plan +.children +.flatMap(_.output).zip(newChildren.flatMap(_.output)) +.filter { case (a1, a2) => a1.exprId != a2.exprId } +) +if (attrMap.isEmpty) { + planWithNewChildren +} else { + planWithNewChildren.rewriteAttrs(attrMap) +} + } else { +plan.withNewChildren(newChildren.toSeq) + } } else { - plan.withNewChildren(newChildren.toSeq) + plan } } else { plan } val planWithNewSubquery = newPlan.transformExpressions { case subquery: SubqueryExpression => - val (renewed, collected) = renewDuplicatedRelations( + val (renewed, collected, changed) = renewDuplicatedRelations( existingRelations ++ relations, subquery.plan) relations ++= collected + if (changed) planChanged = true subquery.withNewPlan(renewed) } - (planWithNewSubquery, relations.toSeq) + (planWithNewSubquery, relations, planChanged) } + @inline Review comment: Instead of calling this function, can we manually inline it and write `existingRelations.contains(planWrapper)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844681278 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43250/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
cloud-fan commented on pull request #32287: URL: https://github.com/apache/spark/pull/32287#issuecomment-844676584 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32588: [SPARK-35443][K8S] Mark K8s ConfigMaps and Secrets created by Spark as immutable
dongjoon-hyun commented on pull request #32588: URL: https://github.com/apache/spark/pull/32588#issuecomment-844676646 Thank you for your first contribution. I added you to the Apache Spark contributor group and assigned SPARK-35443 to you. Welcome! @ashrayjain -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
cloud-fan closed pull request #32287: URL: https://github.com/apache/spark/pull/32287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32588: [SPARK-35443][K8S] Mark K8s ConfigMaps and Secrets created by Spark as immutable
dongjoon-hyun closed pull request #32588: URL: https://github.com/apache/spark/pull/32588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844666931 **[Test build #138728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138728/testReport)** for PR 32516 at commit [`36c0234`](https://github.com/apache/spark/commit/36c023415ce345cb3494827aad1ab297f3b2d8a7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32588: [SPARK-35443][K8S] Mark K8s ConfigMaps and Secrets created by Spark as immutable
dongjoon-hyun commented on pull request #32588: URL: https://github.com/apache/spark/pull/32588#issuecomment-844674916 Thank you for making a contribution, @ashrayjain . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32588: [SPARK-35443][K8S] Mark K8s ConfigMaps and Secrets created by Spark as immutable
dongjoon-hyun commented on pull request #32588: URL: https://github.com/apache/spark/pull/32588#issuecomment-844674819 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844674783 **[Test build #138728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138728/testReport)** for PR 32516 at commit [`36c0234`](https://github.com/apache/spark/commit/36c023415ce345cb3494827aad1ab297f3b2d8a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DataTypeOps(object, metaclass=ABCMeta):` * `\"\"\"The base class for binary operations of pandas-on-Spark objects (of different data types).\"\"\"` * `class BooleanOps(DataTypeOps):` * `class CategoricalOps(DataTypeOps):` * `class DateOps(DataTypeOps):` * `class DatetimeOps(DataTypeOps):` * `class NumericOps(DataTypeOps):` * `class IntegralOps(NumericOps):` * `class FractionalOps(NumericOps):` * `class StringOps(DataTypeOps):` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844672135 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43248/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute Command so that query command with CTE
AmplabJenkins removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-844670814 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43249/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute Command so that query command with CTE
AmplabJenkins commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-844670814 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43249/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute Command so that query command with CTE
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-844670801 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43249/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady commented on a change in pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In predicate
cfmcgrady commented on a change in pull request #32488: URL: https://github.com/apache/spark/pull/32488#discussion_r635736572 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala ## @@ -89,10 +89,11 @@ import org.apache.spark.sql.types._ */ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( -_.containsPattern(BINARY_COMPARISON), ruleId) { +_.containsAnyPattern(BINARY_COMPARISON, IN), ruleId) { Review comment: not really! For instance: ```scala spark.range(50) .write .mode("overwrite") .parquet("/tmp/parquet/t1") val condition = InSet($"id".expr, Set(1, 2, "4")) val df = spark.read.parquet("/tmp/parquet/t1") .filter(Column(condition)) df.queryExecution.optimizedPlan foreach { case f: Filter => val inset = f.condition.asInstanceOf[InSet] println(s"InSet.value.dataType: [ ${inset.child.dataType} ]") println("InSet.hset.Type: " + inset.hset.toArray.map(_.getClass.getCanonicalName).mkString("[ ", ",", " ]")) case _ => } ``` Output: ``` InSet.value.dataType: [ LongType ] InSet.hset.Type: [ java.lang.Integer,java.lang.Integer,java.lang.String ] ``` I also found that 1. Spark SQL has no syntax for `InSet` predicate, and `org.apache.spark.sql.catalyst.dsl.scala` don't have either. 2. The answer to this query is incorrect. ``` actual:expected; +---+ +---+ | id| | id| +---+ +---+ | 1| | 1| | 2| | 2| +---+ | 4| +---+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA removed a comment on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844603936 **[Test build #138725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138725/testReport)** for PR 32597 at commit [`ef1501a`](https://github.com/apache/spark/commit/ef1501a1a30e87b9c42e8b6615f05dbe7f2fad86). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
AmplabJenkins removed a comment on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844665541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins removed a comment on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844665543 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138721/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844666931 **[Test build #138728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138728/testReport)** for PR 32516 at commit [`36c0234`](https://github.com/apache/spark/commit/36c023415ce345cb3494827aad1ab297f3b2d8a7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
AmplabJenkins commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-84485 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138725/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844666111 **[Test build #138725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138725/testReport)** for PR 32597 at commit [`ef1501a`](https://github.com/apache/spark/commit/ef1501a1a30e87b9c42e8b6615f05dbe7f2fad86). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844665543 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138721/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
AmplabJenkins commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844665542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA removed a comment on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844602397 **[Test build #138723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138723/testReport)** for PR 32597 at commit [`b4c2267`](https://github.com/apache/spark/commit/b4c226749aaa891ff1a4dfc1b7cdbca456352439). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844663190 **[Test build #138723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138723/testReport)** for PR 32597 at commit [`b4c2267`](https://github.com/apache/spark/commit/b4c226749aaa891ff1a4dfc1b7cdbca456352439). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844661572 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43248/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute Command so that query command with CTE
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-844660785 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43249/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] YuzhouSun commented on pull request #32530: [SPARK-35106][Core][SQL] Avoid failing rename caused by destination directory not exist
YuzhouSun commented on pull request #32530: URL: https://github.com/apache/spark/pull/32530#issuecomment-844660497 > @YuzhouSun can you leave a comment in the JIRA ticket so that I can assign it to you? Thank you. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata
HeartSaVioR commented on a change in pull request #32272: URL: https://github.com/apache/spark/pull/32272#discussion_r635722361 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import java.io.File +import java.nio.charset.StandardCharsets.UTF_8 +import java.nio.file.Files + +import scala.collection.Seq + +import com.fasterxml.jackson.annotation.JsonInclude.Include +import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper} +import com.fasterxml.jackson.module.scala.{DefaultScalaModule, ScalaObjectMapper} +import org.json4s.NoTypeHints +import org.json4s.jackson.Serialization + +/** + * Classes to represent metadata of checkpoints saved to DFS. Since this is converted to JSON, any + * changes to this MUST be backward-compatible. + */ +case class RocksDBCheckpointMetadata( +sstFiles: Seq[RocksDBSstFile], +logFiles: Seq[RocksDBLogFile], +numKeys: Long) { + import RocksDBCheckpointMetadata._ + + def json: String = { +// We turn this field into a null to avoid write a empty logFiles field in the json. +val nullified = if (logFiles.isEmpty) this.copy(logFiles = null) else this +mapper.writeValueAsString(nullified) + } + + def prettyJson: String = Serialization.writePretty(this)(RocksDBCheckpointMetadata.format) Review comment: OK I see where it is used. Just for logging - got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA removed a comment on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844598119 **[Test build #138720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138720/testReport)** for PR 32597 at commit [`71aa6e8`](https://github.com/apache/spark/commit/71aa6e83c04e7ad1b0dcdc68c9570f450bf32fde). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844652948 **[Test build #138720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138720/testReport)** for PR 32597 at commit [`71aa6e8`](https://github.com/apache/spark/commit/71aa6e83c04e7ad1b0dcdc68c9570f450bf32fde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA removed a comment on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844598150 **[Test build #138721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138721/testReport)** for PR 32596 at commit [`a583c50`](https://github.com/apache/spark/commit/a583c50556ed40bac652cadf682abbb3b97bf611). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844652391 **[Test build #138721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138721/testReport)** for PR 32596 at commit [`a583c50`](https://github.com/apache/spark/commit/a583c50556ed40bac652cadf682abbb3b97bf611). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
AmplabJenkins removed a comment on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844621608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins removed a comment on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844643740 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43244/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
AmplabJenkins removed a comment on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844643742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844644163 **[Test build #138726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138726/testReport)** for PR 32516 at commit [`a000940`](https://github.com/apache/spark/commit/a0009405cf4250e4e557fdc8505f0983118e072d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute Command so that query command with CTE
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-844644199 **[Test build #138727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138727/testReport)** for PR 32513 at commit [`309fb0f`](https://github.com/apache/spark/commit/309fb0f7e95fc4a67d9015fbdaad702bc42c4b85). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
AmplabJenkins commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844643743 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43245/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844643740 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43244/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
AmplabJenkins commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844643744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin closed pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
ueshin closed pull request #32596: URL: https://github.com/apache/spark/pull/32596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF
cloud-fan commented on a change in pull request #32587: URL: https://github.com/apache/spark/pull/32587#discussion_r635709224 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java ## @@ -144,6 +155,11 @@ public ExpressionInfo( this.name + "]. It should be a value in " + validGroups + "; however, " + "got [" + group + "]."); } +if (!language.isEmpty() && !validLanguages.contains(language)) { Review comment: if `language` can be empty, how about we use empty to indicate builtin? strictly builtin is not a language. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
ueshin commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844641911 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
ueshin commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844641598 I also confirmed the `lint-python` passes with the latest master branch merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
itholic commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844637725 Thanks for the review, @ueshin , @xinrong-databricks ! > What about `koalas` accessor? Maybe in the following PR? Yeah, I just created new JIRA for this : https://issues.apache.org/jira/browse/SPARK-35453 Let me fix this in the following PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic edited a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
itholic edited a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-844637725 Thanks for the review, @ueshin , @xinrong-databricks ! > What about `koalas` accessor? Maybe in the following PR? Yeah, I just created new JIRA for this : https://issues.apache.org/jira/browse/SPARK-35453 Let me fix this in the following PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
beliefer commented on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-844636986 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844636592 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43246/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
itholic commented on a change in pull request #32516: URL: https://github.com/apache/spark/pull/32516#discussion_r635704467 ## File path: python/pyspark/pandas/tests/test_dataframe_conversion.py ## @@ -101,69 +101,69 @@ def test_to_excel(self): koalas_location = dirpath + "/" + "output2.xlsx" Review comment: Thanks!! Just addressed them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
SparkQA commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844634213 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43245/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844629172 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43243/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844629074 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43244/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] linhongliu-db commented on pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF
linhongliu-db commented on pull request #32587: URL: https://github.com/apache/spark/pull/32587#issuecomment-844613482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on pull request #32287: URL: https://github.com/apache/spark/pull/32287#issuecomment-844624889 @mridulm I already rebased against that PR :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844622897 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43246/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32598: [SPARK-35408][PYTHON][FOLLOW-UP] Avoid unnecessary f-string format
SparkQA removed a comment on pull request #32598: URL: https://github.com/apache/spark/pull/32598#issuecomment-844603941 **[Test build #138724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138724/testReport)** for PR 32598 at commit [`7a46478`](https://github.com/apache/spark/commit/7a464783b314557152feee9ea5a530d7cdbab4df). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins removed a comment on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844620229 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43242/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32598: [SPARK-35408][PYTHON][FOLLOW-UP] Avoid unnecessary f-string format
AmplabJenkins removed a comment on pull request #32598: URL: https://github.com/apache/spark/pull/32598#issuecomment-844620225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32597: [SPARK-35450][INFRA] Follow checkout-merge way to use the latest commit for linter, or other workflows.
SparkQA commented on pull request #32597: URL: https://github.com/apache/spark/pull/32597#issuecomment-844615958 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43243/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set
AmplabJenkins commented on pull request #32595: URL: https://github.com/apache/spark/pull/32595#issuecomment-844621608 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32598: [SPARK-35408][PYTHON][FOLLOW-UP] Avoid unnecessary f-string format
SparkQA commented on pull request #32598: URL: https://github.com/apache/spark/pull/32598#issuecomment-844614658 **[Test build #138724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138724/testReport)** for PR 32598 at commit [`7a46478`](https://github.com/apache/spark/commit/7a464783b314557152feee9ea5a530d7cdbab4df). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844615375 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43244/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32598: [SPARK-35408][PYTHON][FOLLOW-UP] Avoid unnecessary f-string format
AmplabJenkins commented on pull request #32598: URL: https://github.com/apache/spark/pull/32598#issuecomment-844620226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32596: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins commented on pull request #32596: URL: https://github.com/apache/spark/pull/32596#issuecomment-844620229 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43242/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org