[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21193#discussion_r185995523 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -112,6 +112,112 @@ object JavaCode { def isNullExpression(code: String): SimpleExprValue = { expression(code, BooleanType) } + + def block(code: String): Block = { +CodeBlock(codeParts = Seq(code), blockInputs = Seq.empty) + } +} + +/** + * A trait representing a block of java code. + */ +trait Block extends JavaCode { + + // The expressions to be evaluated inside this block. + def exprValues: Seq[ExprValue] + + // This will be called during string interpolation. + override def toString: String = _marginChar match { +case Some(c) => code.stripMargin(c) +case _ => code + } + + var _marginChar: Option[Char] = None + + def stripMargin(c: Char): this.type = { +_marginChar = Some(c) +this + } + + def stripMargin: this.type = { +_marginChar = Some('|') +this + } + + def + (other: Block): Block +} + +object Block { + implicit def blockToString(block: Block): String = block.toString + + implicit def blocksToBlock(blocks: Seq[Block]): Block = Blocks(blocks) + + implicit class BlockHelper(val sc: StringContext) extends AnyVal { +def code(args: Any*): Block = { + sc.checkLengths(args) + if (sc.parts.length == 0) { +EmptyBlock + } else { +args.foreach { + case _: ExprValue => + case _: Int | _: Long | _: Float | _: Double | _: String => + case _: Block => + case other => throw new IllegalArgumentException( +s"Can not interpolate ${other.getClass.getName} into code block.") +} +CodeBlock(sc.parts, args) + } +} + } +} + +/** + * A block of java code. Including a sequence of code parts and some inputs to this block. + * The actual java code is generated by embedding the inputs into the code parts. + */ +case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[Any]) extends Block { + override def exprValues: Seq[ExprValue] = { +blockInputs.flatMap { + case b: Block => b.exprValues + case e: ExprValue => Seq(e) + case _ => Seq.empty +} + } + + override def code: String = { +val strings = codeParts.iterator +val inputs = blockInputs.iterator +var buf = new StringBuffer(strings.next) +while (strings.hasNext) { + buf append inputs.next + buf append strings.next +} +buf.toString + } + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case b: Blocks => Blocks(Seq(this) ++ b.blocks) +case EmptyBlock => this + } +} + +case class Blocks(blocks: Seq[Block]) extends Block { + override def exprValues: Seq[ExprValue] = blocks.flatMap(_.exprValues) + override def code: String = blocks.map(_.toString).mkString --- End diff -- I am just curious whether `mkString` may lead to 64KB limit issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalog...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21232 Wow. Thank you for informing that, @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternal...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/21232 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2899/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90171/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #90171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90171/testReport)** for PR 21061 at commit [`bc2e4c7`](https://github.com/apache/spark/commit/bc2e4c7b781bbdcad4fe338b4d235887302da791). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21190 **[Test build #90182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90182/testReport)** for PR 21190 at commit [`c27267d`](https://github.com/apache/spark/commit/c27267d49679fefb161ba4884bb26f7eed3fc8f1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90172/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21220 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21220 **[Test build #90172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90172/testReport)** for PR 21220 at commit [`da3fd2f`](https://github.com/apache/spark/commit/da3fd2f8510482e3e71cc37a9da2207e3aef1ef0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2897/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2898/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21230#discussion_r185991248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -23,49 +23,46 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project import org.apache.spark.sql.catalyst.rules.Rule object PushDownOperatorsToDataSource extends Rule[LogicalPlan] { - override def apply( - plan: LogicalPlan): LogicalPlan = plan transformUp { -// PhysicalOperation guarantees that filters are deterministic; no need to check -case PhysicalOperation(project, newFilters, relation : DataSourceV2Relation) => - // merge the filters - val filters = relation.filters match { -case Some(existing) => - existing ++ newFilters -case _ => - newFilters - } + override def apply(plan: LogicalPlan): LogicalPlan = { +var pushed = false +plan transformDown { + // PhysicalOperation guarantees that filters are deterministic; no need to check + case PhysicalOperation(project, filters, relation: DataSourceV2Relation) if !pushed => --- End diff -- `PhysicalOperation` just accumulates project and filter above a specific node, if we transform down a tree, and only transform once, we will never hit `PhysicalOperation` move than once. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2896/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21190 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90181/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21190 **[Test build #90181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90181/testReport)** for PR 21190 at commit [`c0b1095`](https://github.com/apache/spark/commit/c0b109529b97163edd224bbf770a8caa20246b0a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class WidenSetOperationTypes(conf: SQLConf) extends Rule[LogicalPlan] ` * ` case class FunctionArgumentConversion(conf: SQLConf) extends TypeCoercionRule ` * ` case class CaseWhenCoercion(conf: SQLConf) extends TypeCoercionRule ` * ` case class IfCoercion(conf: SQLConf) extends TypeCoercionRule ` * ` case class ImplicitTypeCasts(conf: SQLConf) extends TypeCoercionRule ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21190 **[Test build #90181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90181/testReport)** for PR 21190 at commit [`c0b1095`](https://github.com/apache/spark/commit/c0b109529b97163edd224bbf770a8caa20246b0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21224 I realized #21086 is only in master, so this bug doesn't exist in 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21229 **[Test build #90180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90180/testReport)** for PR 21229 at commit [`a2d81df`](https://github.com/apache/spark/commit/a2d81dfa5f5b2cf58413f3e39717bf436c956d70). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21229 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21190 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90168/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21229 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21222 **[Test build #90168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90168/testReport)** for PR 21222 at commit [`c1ad1c5`](https://github.com/apache/spark/commit/c1ad1c557e6165455457adb6f148d6d9616548a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` implicit class DebugStreamQuery(query: StreamingQuery) extends Logging ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21229 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90170/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21229 **[Test build #90170 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90170/testReport)** for PR 21229 at commit [`a2d81df`](https://github.com/apache/spark/commit/a2d81dfa5f5b2cf58413f3e39717bf436c956d70). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalog...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21232 and actually I think he already merged this into branch-2.2 - https://github.com/apache/spark/commits/branch-2.2 :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21224#discussion_r185988219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -342,6 +342,7 @@ class ParquetFileFormat sparkSession.sessionState.conf.parquetFilterPushDown // Whole stage codegen (PhysicalRDD) is able to deal with batches directly val returningBatch = supportBatch(sparkSession, resultSchema) +val pushDownDate = sqlConf.parquetFilterPushDownDate --- End diff -- Ah, I see. Thank you, @cloud-fan ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21186: [SPARK-22279][SPARK-24112] Enable `convertMetasto...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21186#discussion_r185988150 --- Diff: docs/sql-programming-guide.md --- @@ -1812,6 +1812,9 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0. - Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception. - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a tempora ry workaround. + - Since Spark 2.4, Spark uses its own ORC support by default instead of Hive SerDe for better performance during Hive metastore table access. To set `false` to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior. + - Since Spark 2.4, Spark supports table properties while converting Parquet/ORC Hive tables. To set `false` to `spark.sql.hive.convertMetastoreTableProperty` restores the previous behavior. --- End diff -- Sure! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalog...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21232 Thank you for review, @gatorsmile and @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21158 The default value introduced by SPARK-22479 was already part of Apache Spark 2.3. It is hard to say this is a regression, since URIs could contain the access keys and usernames could have potentially personal identifiable information. Thus, it also makes sense to block them. Although the original JIRA is for JDBC, we also make it more secure. If users do not want to redact them, they can change the default. I am not feeling comfortable to not redact them in the current situation, since we already did it in Spark 2.3. All the security issues could be serious. Thus, I think we should make it unchanged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21217: [SPARK-24151][SQL] Fix CURRENT_DATE, CURRENT_TIMESTAMP t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21217 @jamesthomp Could you document the behavior change in the migration guide? https://github.com/apache/spark/blame/master/docs/sql-programming-guide.md#L1802 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18447 **[Test build #90179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90179/testReport)** for PR 18447 at commit [`d18d14f`](https://github.com/apache/spark/commit/d18d14fe5ee19161c9d56284e051d5da88414094). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21135 @pwoody Could you show me how to reproduce the issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18447 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18447 **[Test build #90178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90178/testReport)** for PR 18447 at commit [`704ad80`](https://github.com/apache/spark/commit/704ad80cb19f692ce44d2c509dbdf915e858d79f). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90178/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18447 **[Test build #90178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90178/testReport)** for PR 18447 at commit [`704ad80`](https://github.com/apache/spark/commit/704ad80cb19f692ce44d2c509dbdf915e858d79f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21230#discussion_r185986217 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -23,49 +23,46 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project import org.apache.spark.sql.catalyst.rules.Rule object PushDownOperatorsToDataSource extends Rule[LogicalPlan] { - override def apply( - plan: LogicalPlan): LogicalPlan = plan transformUp { -// PhysicalOperation guarantees that filters are deterministic; no need to check -case PhysicalOperation(project, newFilters, relation : DataSourceV2Relation) => - // merge the filters - val filters = relation.filters match { -case Some(existing) => - existing ++ newFilters -case _ => - newFilters - } + override def apply(plan: LogicalPlan): LogicalPlan = { +var pushed = false +plan transformDown { + // PhysicalOperation guarantees that filters are deterministic; no need to check + case PhysicalOperation(project, filters, relation: DataSourceV2Relation) if !pushed => --- End diff -- Is that possible one plan has multiple `PhysicalOperation`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90165/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21135 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90166/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21135 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21135 **[Test build #90166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90166/testReport)** for PR 21135 at commit [`dbaa00c`](https://github.com/apache/spark/commit/dbaa00ca99161cd2820b9a74aebac9665c626581). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21207 **[Test build #90165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90165/testReport)** for PR 21207 at commit [`4daa548`](https://github.com/apache/spark/commit/4daa548c470a6e5d5de477a10661c4d838c520f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90164/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18447 **[Test build #90177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90177/testReport)** for PR 18447 at commit [`a1bb258`](https://github.com/apache/spark/commit/a1bb2586f9db2e82167140d7565a73629b3b996b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21208 **[Test build #90164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90164/testReport)** for PR 21208 at commit [`7a8610f`](https://github.com/apache/spark/commit/7a8610fa25ed1cec4b5b385694a3e24d66e0ccdb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r185984894 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -323,4 +324,44 @@ class LDASuite extends SparkFunSuite with MLlibTestSparkContext with DefaultRead assert(model.getOptimizer === optimizer) } } + + test("LDA with Array input") { +val featuresColNameD = "array_double_features" +val featuresColNameF = "array_float_features" +val doubleUDF = udf { (features: Vector) => + val featureArray = Array.fill[Double](features.size)(0.0) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val floatUDF = udf { (features: Vector) => + val featureArray = Array.fill[Float](features.size)(0.0f) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val newdatasetD = dataset.withColumn(featuresColNameD, doubleUDF(col("features"))) + .drop("features") +val newdatasetF = dataset.withColumn(featuresColNameF, floatUDF(col("features"))) + .drop("features") +assert(newdatasetD.schema(featuresColNameD).dataType.equals(new ArrayType(DoubleType, false))) +assert(newdatasetF.schema(featuresColNameF).dataType.equals(new ArrayType(FloatType, false))) + +val ldaD = new LDA().setK(k).setOptimizer("online") + .setMaxIter(1).setFeaturesCol(featuresColNameD).setSeed(1) +val ldaF = new LDA().setK(k).setOptimizer("online"). + setMaxIter(1).setFeaturesCol(featuresColNameF).setSeed(1) +val modelD = ldaD.fit(newdatasetD) +val modelF = ldaF.fit(newdatasetF) + +// logLikelihood, logPerplexity +val llD = modelD.logLikelihood(newdatasetD) +val llF = modelF.logLikelihood(newdatasetF) +// assert(llD == llF) +assert(llD <= 0.0 && llD != Double.NegativeInfinity) +assert(llF <= 0.0 && llF != Double.NegativeInfinity) +val lpD = modelD.logPerplexity(newdatasetD) +val lpF = modelF.logPerplexity(newdatasetF) +// assert(lpD == lpF) +assert(lpD >= 0.0 && lpD != Double.NegativeInfinity) +assert(lpF >= 0.0 && lpF != Double.NegativeInfinity) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r185983646 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -182,6 +184,40 @@ class BisectingKMeansSuite model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0) } + + test("BisectingKMeans with Array input") { +val featuresColNameD = "array_double_features" +val featuresColNameF = "array_float_features" +val doubleUDF = udf { (features: Vector) => + val featureArray = Array.fill[Double](features.size)(0.0) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val floatUDF = udf { (features: Vector) => + val featureArray = Array.fill[Float](features.size)(0.0f) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val newdatasetD = dataset.withColumn(featuresColNameD, doubleUDF(col("features"))) + .drop("features") --- End diff -- * Unnecessary to drop `features`. Or you can simply replace the features column: ~~~scala val newdatasetD = dataset.withColumn(FEATURES, doubleUDF(col(FEATURES))) ~~~ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r185984527 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -256,6 +258,42 @@ class GaussianMixtureSuite extends SparkFunSuite with MLlibTestSparkContext val expectedMatrix = GaussianMixture.unpackUpperTriangularMatrix(4, triangularValues) assert(symmetricMatrix === expectedMatrix) } + + test("GaussianMixture with Array input") { +val featuresColNameD = "array_double_features" +val featuresColNameF = "array_float_features" +val doubleUDF = udf { (features: Vector) => + val featureArray = Array.fill[Double](features.size)(0.0) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val floatUDF = udf { (features: Vector) => + val featureArray = Array.fill[Float](features.size)(0.0f) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val newdatasetD = dataset.withColumn(featuresColNameD, doubleUDF(col("features"))) + .drop("features") +val newdatasetF = dataset.withColumn(featuresColNameF, floatUDF(col("features"))) + .drop("features") +assert(newdatasetD.schema(featuresColNameD).dataType.equals(new ArrayType(DoubleType, false))) +assert(newdatasetF.schema(featuresColNameF).dataType.equals(new ArrayType(FloatType, false))) + +val gmD = new GaussianMixture().setK(k).setMaxIter(1) + .setFeaturesCol(featuresColNameD).setSeed(1) +val gmF = new GaussianMixture().setK(k).setMaxIter(1) + .setFeaturesCol(featuresColNameF).setSeed(1) +val modelD = gmD.fit(newdatasetD) +val modelF = gmF.fit(newdatasetF) +val transformedD = modelD.transform(newdatasetD) +val transformedF = modelF.transform(newdatasetF) +val predictDifference = transformedD.select("prediction") + .except(transformedF.select("prediction")) +assert(predictDifference.count() == 0) +val probabilityDifference = transformedD.select("probability") + .except(transformedF.select("probability")) +assert(probabilityDifference.count() == 0) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r185971647 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -182,6 +184,40 @@ class BisectingKMeansSuite model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0) } + + test("BisectingKMeans with Array input") { +val featuresColNameD = "array_double_features" +val featuresColNameF = "array_float_features" +val doubleUDF = udf { (features: Vector) => + val featureArray = Array.fill[Double](features.size)(0.0) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) --- End diff -- * If `.toFloat` is to keep the same precision, we should leave an inline comment. * `features.toArray.map(_.toFloat.toDouble)` should do the work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r185984500 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -182,6 +184,40 @@ class BisectingKMeansSuite model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0) } + + test("BisectingKMeans with Array input") { +val featuresColNameD = "array_double_features" +val featuresColNameF = "array_float_features" +val doubleUDF = udf { (features: Vector) => + val featureArray = Array.fill[Double](features.size)(0.0) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val floatUDF = udf { (features: Vector) => + val featureArray = Array.fill[Float](features.size)(0.0f) + features.foreachActive((idx, value) => featureArray(idx) = value.toFloat) + featureArray +} +val newdatasetD = dataset.withColumn(featuresColNameD, doubleUDF(col("features"))) + .drop("features") +val newdatasetF = dataset.withColumn(featuresColNameF, floatUDF(col("features"))) + .drop("features") +assert(newdatasetD.schema(featuresColNameD).dataType.equals(new ArrayType(DoubleType, false))) +assert(newdatasetF.schema(featuresColNameF).dataType.equals(new ArrayType(FloatType, false))) + +val bkmD = new BisectingKMeans() + .setK(k).setMaxIter(1).setFeaturesCol(featuresColNameD).setSeed(1) +val bkmF = new BisectingKMeans() + .setK(k).setMaxIter(1).setFeaturesCol(featuresColNameF).setSeed(1) +val modelD = bkmD.fit(newdatasetD) +val modelF = bkmF.fit(newdatasetF) +val transformedD = modelD.transform(newdatasetD) +val transformedF = modelF.transform(newdatasetF) +val predictDifference = transformedD.select("prediction") + .except(transformedF.select("prediction")) +assert(predictDifference.count() == 0) --- End diff -- This only verifies it handles `Array[Double]` and `Array[Float]` the same way. But it doesn't guarantee that the result is correct. We can define a method that takes a dataset, apply one iteration, and return the cost. ~~~scala def trainAndComputeCost(dataset: DataFrame): Double = { val model = new BisectingKMeans() .setK(k).setMaxIter(1).setSeed(1) .fit(dataset) model.computeCost(dataset) } val trueCost = trainAndComputeCost(dataset) val floatArrayCost = trainAndComputeCost(newDatasetF) assert(floatArrayCost === trueCost) val doubleArrayCost = trainAndComputeCost(newDatasetD) assert(doubleArrayCost === trueCost) ~~~ We can map the original dataset to single precision to have exact match. Or we can test equality with a threshold. See https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r185971385 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala --- @@ -101,4 +102,17 @@ private[spark] object SchemaUtils { require(!schema.fieldNames.contains(col.name), s"Column ${col.name} already exists.") StructType(schema.fields :+ col) } + + /** + * Check whether the given column in the schema is one of the supporting vector type: Vector, + * Array[Dloat]. Array[Double] --- End diff -- nit: Float --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90163/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21021 **[Test build #90163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90163/testReport)** for PR 21021 at commit [`e3fcaaa`](https://github.com/apache/spark/commit/e3fcaaa7e44c09ead1d395bd5be78de29305c4b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90169/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21193 **[Test build #90169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90169/testReport)** for PR 21193 at commit [`53daaf3`](https://github.com/apache/spark/commit/53daaf36587b29b542e10a473f6909c1f7a20b63). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21173 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21173 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90173/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21173 **[Test build #90173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90173/testReport)** for PR 21173 at commit [`35759ca`](https://github.com/apache/spark/commit/35759ca256df2bd757cb8c85f96ebe4ae9126227). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21193 **[Test build #90176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90176/testReport)** for PR 21193 at commit [`d138ee0`](https://github.com/apache/spark/commit/d138ee05541bc1976442c5f1170ba7a7f3f5114c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2895/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21193 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21222: [SPARK-24161][SS] Enable debug package feature on...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21222#discussion_r185981329 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala --- @@ -88,23 +100,62 @@ package object debug { } } + /** + * Get WholeStageCodegenExec subtrees and the codegen in a query plan into one String + * + * @param query the streaming query for codegen + * @return single String containing all WholeStageCodegen subtrees and corresponding codegen + */ + def codegenString(query: StreamingQuery): String = { +val msg = query match { + case w: StreamExecution if w.lastExecution != null => --- End diff -- I think this can be rewritten as if `case w: StreamExecution => if (w.lastExecution != null) ... else ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21212 > do you mean optimize space usage for MapStatus when there are lots of consecutive empty-blocks ? Yea, something like doing an RLE for the size array in `CompressedMapStatus`. But this can be done as a followup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90174/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21193 **[Test build #90174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90174/testReport)** for PR 21193 at commit [`d138ee0`](https://github.com/apache/spark/commit/d138ee05541bc1976442c5f1170ba7a7f3f5114c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21212#discussion_r185980680 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -267,28 +269,28 @@ final class ShuffleBlockFetcherIterator( // at most maxBytesInFlight in order to limit the amount of data in flight. val remoteRequests = new ArrayBuffer[FetchRequest] -// Tracks total number of blocks (including zero sized blocks) -var totalBlocks = 0 for ((address, blockInfos) <- blocksByAddress) { - totalBlocks += blockInfos.size if (address.executorId == blockManager.blockManagerId.executorId) { -// Filter out zero-sized blocks -localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1) +blockInfos.find(_._2 < 0) match { --- End diff -- shall we use `_._2 < 0` to make sure the 0-size blocks are filtered? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185961170 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( + pod: Pod, + container: Container, + sparkConf: SparkConf, + prefix : String): (Pod, Container) = { +val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, prefix) +addHostPathVolumes(pod, container, hostPathVolumeSpecs) + } + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @param volumeTypeKey the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String, + volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = { +val volumes = HashMap[String, KubernetesVolumeSpec]() +val properties = sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList +// Extract volume names +properties.foreach { + k => +val keys = k._1.split("\\.") +if (keys.nonEmpty && !volumes.contains(keys(0))) { + volumes.update(keys(0), KubernetesVolumeSpec.emptySpec()) +} +} +// Populate spec +volumes.foreach { + case (name, spec) => +properties.foreach { + k => +k._1.split("\\.") match { + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_PATH_KEY) => +spec.mountPath = Some(k._2) + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_READONLY_KEY) => +spec.mountReadOnly = Some(k._2.toBoolean) + case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) => +spec.optionsSpec.update(option, k._2) + case _ => +None +} +} +} +volumes.toMap + } + + /** + * Extract Spark hostPath volume configuration properties with a given name prefix and + * return the result as a Map. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseHostPathVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String): Map[String, KubernetesVolumeSpec] = { +parseVolumesWithPrefix(sparkConf, prefix, KUBERNETES_VOLUMES_HOSTPATH_KEY) + } + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param volumes list of named volume specs + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addHostPathVolumes( + pod: Pod, +
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185961495 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( + pod: Pod, + container: Container, + sparkConf: SparkConf, + prefix : String): (Pod, Container) = { +val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, prefix) +addHostPathVolumes(pod, container, hostPathVolumeSpecs) + } + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @param volumeTypeKey the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String, + volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = { +val volumes = HashMap[String, KubernetesVolumeSpec]() +val properties = sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList +// Extract volume names +properties.foreach { + k => +val keys = k._1.split("\\.") +if (keys.nonEmpty && !volumes.contains(keys(0))) { + volumes.update(keys(0), KubernetesVolumeSpec.emptySpec()) +} +} +// Populate spec +volumes.foreach { + case (name, spec) => +properties.foreach { + k => +k._1.split("\\.") match { + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_PATH_KEY) => +spec.mountPath = Some(k._2) + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_READONLY_KEY) => +spec.mountReadOnly = Some(k._2.toBoolean) + case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) => +spec.optionsSpec.update(option, k._2) + case _ => +None +} +} +} +volumes.toMap + } + + /** + * Extract Spark hostPath volume configuration properties with a given name prefix and + * return the result as a Map. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseHostPathVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String): Map[String, KubernetesVolumeSpec] = { +parseVolumesWithPrefix(sparkConf, prefix, KUBERNETES_VOLUMES_HOSTPATH_KEY) + } + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param volumes list of named volume specs + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addHostPathVolumes( + pod: Pod, +
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185960781 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( + pod: Pod, + container: Container, + sparkConf: SparkConf, + prefix : String): (Pod, Container) = { +val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, prefix) +addHostPathVolumes(pod, container, hostPathVolumeSpecs) + } + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @param volumeTypeKey the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String, + volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = { +val volumes = HashMap[String, KubernetesVolumeSpec]() +val properties = sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList +// Extract volume names +properties.foreach { + k => +val keys = k._1.split("\\.") +if (keys.nonEmpty && !volumes.contains(keys(0))) { + volumes.update(keys(0), KubernetesVolumeSpec.emptySpec()) +} +} +// Populate spec +volumes.foreach { + case (name, spec) => +properties.foreach { + k => +k._1.split("\\.") match { + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_PATH_KEY) => +spec.mountPath = Some(k._2) + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_READONLY_KEY) => +spec.mountReadOnly = Some(k._2.toBoolean) + case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) => +spec.optionsSpec.update(option, k._2) + case _ => +None +} +} +} +volumes.toMap + } + + /** + * Extract Spark hostPath volume configuration properties with a given name prefix and + * return the result as a Map. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseHostPathVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String): Map[String, KubernetesVolumeSpec] = { +parseVolumesWithPrefix(sparkConf, prefix, KUBERNETES_VOLUMES_HOSTPATH_KEY) + } + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param volumes list of named volume specs + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addHostPathVolumes( + pod: Pod, +
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185980118 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( + pod: Pod, + container: Container, + sparkConf: SparkConf, + prefix : String): (Pod, Container) = { +val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, prefix) +addHostPathVolumes(pod, container, hostPathVolumeSpecs) + } + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @param volumeTypeKey the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String, + volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = { +val volumes = HashMap[String, KubernetesVolumeSpec]() +val properties = sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList +// Extract volume names +properties.foreach { + k => +val keys = k._1.split("\\.") +if (keys.nonEmpty && !volumes.contains(keys(0))) { + volumes.update(keys(0), KubernetesVolumeSpec.emptySpec()) +} +} +// Populate spec +volumes.foreach { + case (name, spec) => +properties.foreach { + k => +k._1.split("\\.") match { + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_PATH_KEY) => +spec.mountPath = Some(k._2) + case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, KUBERNETES_VOLUMES_READONLY_KEY) => +spec.mountReadOnly = Some(k._2.toBoolean) + case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) => +spec.optionsSpec.update(option, k._2) + case _ => +None +} +} +} +volumes.toMap + } + + /** + * Extract Spark hostPath volume configuration properties with a given name prefix and + * return the result as a Map. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseHostPathVolumesWithPrefix( --- End diff -- Looks like you don't really need this function as it's just a wrapper of `parseVolumesWithPrefix `. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2894/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185980514 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -109,7 +109,15 @@ private[spark] class BasicDriverFeatureStep( .addToImagePullSecrets(conf.imagePullSecrets(): _*) .endSpec() .build() -SparkPod(driverPod, driverContainer) + +val (driverPodWithVolumes, driverContainerVolumes) = --- End diff -- Instead of putting the logic of volume mounting in `BasicDriverFeatureStep` and `BasicExecutorFeatureStep`, we should add a new step for mounting volumes, similarly to how we handle secrets, e.g., `MountVolumesFeatureStep` where the logic of `addVolumes` should be. This feature step can be used for both the driver and executors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185960630 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( + pod: Pod, + container: Container, + sparkConf: SparkConf, + prefix : String): (Pod, Container) = { +val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, prefix) +addHostPathVolumes(pod, container, hostPathVolumeSpecs) + } + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @param volumeTypeKey the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String, + volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = { +val volumes = HashMap[String, KubernetesVolumeSpec]() +val properties = sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList +// Extract volume names +properties.foreach { + k => +val keys = k._1.split("\\.") +if (keys.nonEmpty && !volumes.contains(keys(0))) { + volumes.update(keys(0), KubernetesVolumeSpec.emptySpec()) +} +} +// Populate spec +volumes.foreach { + case (name, spec) => --- End diff -- The `case` line should be merged into the previous line according to the Spark code convention, e.g., `volumes.foreach { case (name, spec) =>`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185960149 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( --- End diff -- The Scaladoc should not mention hostPath as this function is not hostPath exclusively. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21095#discussion_r185960658 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import scala.collection.mutable.HashMap + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ + +private[spark] object KubernetesVolumeUtils { + + /** + * Given hostPath volume specs, add volume to pod and volume mount to container. + * + * @param pod original specification of the pod + * @param container original specification of the container + * @param sparkConf Spark configuration + * @param prefix the prefix for volume configuration + * @return a tuple of (pod with the volume(s) added, container with mount(s) added) + */ + def addVolumes( + pod: Pod, + container: Container, + sparkConf: SparkConf, + prefix : String): (Pod, Container) = { +val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, prefix) +addHostPathVolumes(pod, container, hostPathVolumeSpecs) + } + + /** + * Extract Spark volume configuration properties with a given name prefix. + * + * @param sparkConf Spark configuration + * @param prefix the given property name prefix + * @param volumeTypeKey the given property name prefix + * @return a Map storing with volume name as key and spec as value + */ + def parseVolumesWithPrefix( + sparkConf: SparkConf, + prefix: String, + volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = { +val volumes = HashMap[String, KubernetesVolumeSpec]() +val properties = sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList +// Extract volume names +properties.foreach { + k => +val keys = k._1.split("\\.") +if (keys.nonEmpty && !volumes.contains(keys(0))) { + volumes.update(keys(0), KubernetesVolumeSpec.emptySpec()) +} +} +// Populate spec +volumes.foreach { + case (name, spec) => +properties.foreach { + k => --- End diff -- Ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21186: [SPARK-22279][SPARK-24112] Enable `convertMetasto...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21186#discussion_r185980350 --- Diff: docs/sql-programming-guide.md --- @@ -1812,6 +1812,9 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0. - Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception. - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a tempora ry workaround. + - Since Spark 2.4, Spark uses its own ORC support by default instead of Hive SerDe for better performance during Hive metastore table access. To set `false` to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior. + - Since Spark 2.4, Spark supports table properties while converting Parquet/ORC Hive tables. To set `false` to `spark.sql.hive.convertMetastoreTableProperty` restores the previous behavior. --- End diff -- please polish the migration guide w.r.t. https://issues.apache.org/jira/browse/SPARK-24175 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21229 **[Test build #90175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90175/testReport)** for PR 21229 at commit [`a2d81df`](https://github.com/apache/spark/commit/a2d81dfa5f5b2cf58413f3e39717bf436c956d70). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21229 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 all the places that use `GenerateUnsafeRowJoiner` assume the input row is unsafe row. `ShuffleExchangeExec` assumes its input is unsafe row, because its serializer is `UnsafeRowSerializer`. Note that we don't enforce this at the API level, i.e. `SparkPlan.execute` still returns `RDD[InternalRow]`. This is because we have exceptions: the object related operators can return safe row, and object related operators always appear in a group, and the last operator will output unsafe row. That said, you may not be able to see `UnsafeRow` in the generated code, but you will get `ClassCastException` if you don't follow this rule and output safe row. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21219 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90161/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21219 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21219 **[Test build #90161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90161/testReport)** for PR 21219 at commit [`3ecd7da`](https://github.com/apache/spark/commit/3ecd7da67e372f2cbb25c61b269030b2115885dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21207 @HyukjinKwon thanks for noticing, updated the title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2893/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org