[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220985607 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -50,7 +52,26 @@ case class AnalyzeColumnCommand( val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, tableMeta) // Compute stats for each column -val (rowCount, newColStats) = computeColumnStats(sparkSession, tableIdentWithDB, columnNames) +val conf = sparkSession.sessionState.conf +val relation = sparkSession.table(tableIdent).logicalPlan +val attributesToAnalyze = if (allColumns) { + relation.output +} else { + columnNames.get.map { col => +val exprOption = relation.output.find(attr => conf.resolver(attr.name, col)) +exprOption.getOrElse(throw new AnalysisException(s"Column $col does not exist.")) + } +} +// Make sure the column types are supported for stats gathering. +attributesToAnalyze.foreach { attr => + if (!supportsType(attr.dataType)) { +throw new AnalysisException( + s"Column ${attr.name} in table $tableIdent is of type ${attr.dataType}, " + +"and Spark does not support statistics collection on this column type.") + } +} --- End diff -- @gatorsmile OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220985491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -33,11 +33,13 @@ import org.apache.spark.sql.types._ /** * Analyzes the given columns of the given table to generate statistics, which will be used in - * query optimizations. + * query optimizations. Parameter `allColumns` may be specified to generate statistics of all the + * columns of a given table. */ case class AnalyzeColumnCommand( tableIdent: TableIdentifier, -columnNames: Seq[String]) extends RunnableCommand { +columnNames: Option[Seq[String]], +allColumns: Boolean = false ) extends RunnableCommand { --- End diff -- @gatorsmile ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22531 @wangyum, are you interested in submitting a PR to check if we can add a rule for `.toLowerCase(Locale.ROOT)` and `.toUpperCase(Locale.ROOT)` and add it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22165#discussion_r220984492 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.TimeoutException + +import scala.concurrent.duration._ +import scala.language.postfixOps + +import org.scalatest.concurrent.Eventually + +import org.apache.spark._ +import org.apache.spark.rpc.RpcTimeout + +class BarrierCoordinatorSuite extends SparkFunSuite with LocalSparkContext with Eventually { + + /** + * Get the current ContextBarrierState from barrierCoordinator.states by ContextBarrierId. + */ + private def getBarrierState( + stageId: Int, + stageAttemptId: Int, + barrierCoordinator: BarrierCoordinator) = { +val barrierId = ContextBarrierId(stageId, stageAttemptId) +barrierCoordinator.states.get(barrierId) + } + + test("normal test for single task") { +sc = new SparkContext("local", "test") +val barrierCoordinator = new BarrierCoordinator(5, sc.listenerBus, sc.env.rpcEnv) +val rpcEndpointRef = sc.env.rpcEnv.setupEndpoint("barrierCoordinator", barrierCoordinator) +val stageId = 0 +val stageAttemptNumber = 0 +rpcEndpointRef.askSync[Unit]( --- End diff -- Sorry for missing this, done in 8cd78a9. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22165#discussion_r220984340 --- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala --- @@ -187,6 +191,12 @@ private[spark] class BarrierCoordinator( requesters.clear() cancelTimerTask() } + +// Check for clearing internal data, visible for test only. +private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null + +// Get currently barrier epoch, visible for test only. +private[spark] def getBarrierEpoch(): Int = barrierEpoch --- End diff -- https://github.com/apache/spark/pull/22165#discussion_r218093991 As the comment here, need revert back? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22165#discussion_r220983212 --- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala --- @@ -141,7 +145,7 @@ private[spark] class BarrierCoordinator( logInfo(s"Current barrier epoch for $barrierId is $barrierEpoch.") if (epoch != barrierEpoch) { requester.sendFailure(new SparkException(s"The request to sync of $barrierId with " + - s"barrier epoch $barrierEpoch has already finished. Maybe task $taskId is not " + + s"barrier epoch $epoch has already finished. Maybe task $taskId is not " + --- End diff -- During write the UT for ContextBarrierState, I think this is a little bug in log? @jiangxb1987 Please checking if I'm wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22165 **[Test build #96702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96702/testReport)** for PR 22165 at commit [`8cd78a9`](https://github.com/apache/spark/commit/8cd78a95a0e0649fed81fe6217790943855b7417). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22558: [SPARK-25546][core] Don't cache value of EVENT_LOG_CALLS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22558 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22165 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3542/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22165 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22237 Hi @MaxGekk , I just reviewed this PR. I noticed that there is one behavior change. The column value of `from_json(corrupt_record...)` become `Row(null, nulll, ...)`, instead of `null`. ``` val df = Seq("""{"a" 1, "b": 2}""").toDS() val schema = new StructType().add("a", IntegerType).add("b", IntegerType) ``` Before the code change: ``` scala> df.select(from_json($"value", schema).as("col")).where("col is null").show() ++ | col| ++ |null| ++ scala> df.select(from_json($"value", schema).as("col")).where("col.a is null").show() ++ | col| ++ |null| ++ ``` After the code change: ``` scala> df.select(from_json($"value", schema).as("col")).where("col is null").show() +---+ |col| +---+ +---+ scala> df.select(from_json($"value", schema).as("col")).where("col.a is null").show() +---+ |col| +---+ |[,]| +---+ ``` The main difference is that we can't filter the null `col` in the result column. Is there any reason for changing this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANALYZE TA...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22566 @dilipbiswal Thanks for working on this! Also cc @juliuszsompolski --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220979068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -33,11 +33,13 @@ import org.apache.spark.sql.types._ /** * Analyzes the given columns of the given table to generate statistics, which will be used in - * query optimizations. + * query optimizations. Parameter `allColumns` may be specified to generate statistics of all the + * columns of a given table. */ case class AnalyzeColumnCommand( tableIdent: TableIdentifier, -columnNames: Seq[String]) extends RunnableCommand { +columnNames: Option[Seq[String]], +allColumns: Boolean = false ) extends RunnableCommand { --- End diff -- let us do not use the default values? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220978815 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -50,7 +52,26 @@ case class AnalyzeColumnCommand( val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, tableMeta) // Compute stats for each column -val (rowCount, newColStats) = computeColumnStats(sparkSession, tableIdentWithDB, columnNames) +val conf = sparkSession.sessionState.conf +val relation = sparkSession.table(tableIdent).logicalPlan +val attributesToAnalyze = if (allColumns) { + relation.output --- End diff -- Are we still able to create a table with zero column? for example, using dataframewriter? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220978327 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -70,25 +91,9 @@ case class AnalyzeColumnCommand( */ private def computeColumnStats( sparkSession: SparkSession, - tableIdent: TableIdentifier, - columnNames: Seq[String]): (Long, Map[String, CatalogColumnStat]) = { - + relation: LogicalPlan, + attributesToAnalyze: Seq[Attribute]): (Long, Map[String, CatalogColumnStat]) = { --- End diff -- `attributesToAnalyze ` -> `columns` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220978087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -50,7 +52,26 @@ case class AnalyzeColumnCommand( val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, tableMeta) // Compute stats for each column -val (rowCount, newColStats) = computeColumnStats(sparkSession, tableIdentWithDB, columnNames) +val conf = sparkSession.sessionState.conf +val relation = sparkSession.table(tableIdent).logicalPlan +val attributesToAnalyze = if (allColumns) { + relation.output +} else { + columnNames.get.map { col => +val exprOption = relation.output.find(attr => conf.resolver(attr.name, col)) +exprOption.getOrElse(throw new AnalysisException(s"Column $col does not exist.")) + } +} +// Make sure the column types are supported for stats gathering. +attributesToAnalyze.foreach { attr => + if (!supportsType(attr.dataType)) { +throw new AnalysisException( + s"Column ${attr.name} in table $tableIdent is of type ${attr.dataType}, " + +"and Spark does not support statistics collection on this column type.") + } +} --- End diff -- Also throw an exception when `allColumns ` is set to true but columnNames is not empty. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220977575 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -50,7 +52,26 @@ case class AnalyzeColumnCommand( val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, tableMeta) // Compute stats for each column -val (rowCount, newColStats) = computeColumnStats(sparkSession, tableIdentWithDB, columnNames) +val conf = sparkSession.sessionState.conf +val relation = sparkSession.table(tableIdent).logicalPlan +val attributesToAnalyze = if (allColumns) { + relation.output +} else { + columnNames.get.map { col => +val exprOption = relation.output.find(attr => conf.resolver(attr.name, col)) +exprOption.getOrElse(throw new AnalysisException(s"Column $col does not exist.")) + } +} +// Make sure the column types are supported for stats gathering. +attributesToAnalyze.foreach { attr => + if (!supportsType(attr.dataType)) { +throw new AnalysisException( + s"Column ${attr.name} in table $tableIdent is of type ${attr.dataType}, " + +"and Spark does not support statistics collection on this column type.") + } +} --- End diff -- creating a new private function for the code between 55 and 72? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(numb...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22419#discussion_r220973430 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1245,3 +1245,27 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * The number truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(number, scale) - Returns number truncated to scale decimal places. " + +"If scale is omitted, then number is truncated to 0 places. " + +"scale can be negative to truncate (make zero) scale digits left of the decimal point.", + examples = """ +Examples: + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Truncate(child: Expression, scale: Expression) --- End diff -- I am still preferring to extend `trunc`. Not straightforward to know the difference between `truncate` and `trunc` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22566#discussion_r220962782 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -33,11 +33,13 @@ import org.apache.spark.sql.types._ /** * Analyzes the given columns of the given table to generate statistics, which will be used in - * query optimizations. + * query optimizations. Parameter `allColumns` may be specified to generate statistics of all the + * columns of a given table. */ case class AnalyzeColumnCommand( tableIdent: TableIdentifier, -columnNames: Seq[String]) extends RunnableCommand { +columnNames: Option[Seq[String]], +allColumns: Boolean = false ) extends RunnableCommand { --- End diff -- nit. `false )` -> `false`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22558: [SPARK-25546][core] Don't cache value of EVENT_LOG_CALLS...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22558 Also cc @michaelmior and @cloud-fan . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220959695 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.AggregateBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/AggregateBenchmark-results.txt". + * }}} */ -class AggregateBenchmark extends BenchmarkWithCodegen { +object AggregateBenchmark extends SqlBasedBenchmark { - ignore("aggregate without grouping") { -val N = 500L << 22 -val benchmark = new Benchmark("agg without grouping", N) -runBenchmark("agg w/o group", N) { - sparkSession.range(N).selectExpr("sum(id)").collect() + override def benchmark(): Unit = { +runBenchmark("aggregate without grouping") { + val N = 500L << 22 + runBenchmarkWithCodegen("agg w/o group", N) { +spark.range(N).selectExpr("sum(id)").collect() + } } -/* -agg w/o group: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -agg w/o group wholestage off30136 / 31885 69.6 14.4 1.0X -agg w/o group wholestage on 1851 / 1860 1132.9 0.9 16.3X - */ - } - ignore("stat functions") { -val N = 100L << 20 +runBenchmark("stat functions") { + val N = 100L << 20 -runBenchmark("stddev", N) { - sparkSession.range(N).groupBy().agg("id" -> "stddev").collect() -} + runBenchmarkWithCodegen("stddev", N) { +spark.range(N).groupBy().agg("id" -> "stddev").collect() + } -runBenchmark("kurtosis", N) { - sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect() + runBenchmarkWithCodegen("kurtosis", N) { +spark.range(N).groupBy().agg("id" -> "kurtosis").collect() + } } -/* -Using ImperativeAggregate (as implemented in Spark 1.6): - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev:Avg Time(ms)Avg Rate(M/s) Relative Rate - --- - stddev w/o codegen 2019.0410.39 1.00 X - stddev w codegen2097.2910.00 0.96 X - kurtosis w/o codegen2108.99 9.94 0.96 X - kurtosis w codegen 2090.6910.03 0.97 X - - Using DeclarativeAggregate: - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - stddev codegen=false 5630 / 5776 18.0 55.6 1.0X - stddev codegen=true 1259 / 1314 83.0 12.0 4.5X - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - kurtosis: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - kurtosis codegen=false 14847 / 15084 7.0 142.9 1.0X - kurtosis codegen=true1652 / 2124 63.0 15.9 9.0X -*/ - } - - ignore("aggregate with linear keys") { -val N = 20 << 22 +runBenchmark("aggregate with linear keys") { + val N = 20 << 22 -val benchmark = new Benchmark("Aggregate w keys", N) -def f(): Unit = { - sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() -} + val benchmark = new Benchmark("Aggregate w keys", N, output = output) -benchmark.addCase(s"codegen = F", numIters = 2) { iter => - sparkSession.conf.set("spark.sql.codegen.wholeStage", "false") - f() -} +
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22569 **[Test build #96701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96701/testReport)** for PR 22569 at commit [`dca4e5c`](https://github.com/apache/spark/commit/dca4e5c991c94b5cfbf64bb7b661518ed069329f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22569 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3541/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/21990 I'm +1 on switching to the builder and not using the private interface. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22425#discussion_r220956727 --- Diff: dev/tox.ini --- @@ -14,6 +14,8 @@ # limitations under the License. [pycodestyle] -ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504 +ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504,W605 max-line-length=100 exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/* +[pydocstyle] +ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414 --- End diff -- I just asked to add a line break at the end of file, and the current style looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenH...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22569#discussion_r220955679 --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala --- @@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers { val set = new OpenHashSet[Long](0) assert(set.size === 0) } + + test("support for more than 12M items") { +val cnt = 1200 // 12M +val set = new OpenHashSet[Int](cnt) +for (i <- 0 until cnt) { + set.add(i) + + val pos1 = set.addWithoutResize(i) & OpenHashSet.POSITION_MASK + val pos2 = set.getPos(i) + assert(pos1 == pos2) +} --- End diff -- If we want to add the check, we can also add it inside the loop. Another loop seems unnecessary to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20503 I _think_ this could be good to backport into 2.4 assuming the current RC fails if @ashashwat has the chance to update it and no one sees any issues with including this in a backport to that branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20503 Sure let's add a test with a unicode string to it if there's concern about that and make sure the existing repr with named fields is covered the same test case since I don't see an existing explicit test for that (although it's probably covered implicitly elsewhere). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22569 LGTM except one minor comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenH...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22569#discussion_r220954056 --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala --- @@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers { val set = new OpenHashSet[Long](0) assert(set.size === 0) } + + test("support for more than 12M items") { +val cnt = 1200 // 12M +val set = new OpenHashSet[Int](cnt) +for (i <- 0 until cnt) { + set.add(i) + + val pos1 = set.addWithoutResize(i) & OpenHashSet.POSITION_MASK + val pos2 = set.getPos(i) + assert(pos1 == pos2) +} --- End diff -- nit: Is it better to add the following to check each value after adding all, too? ``` for (i <- 0 until cnt) { assert(set.contains(i)) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22425#discussion_r220950524 --- Diff: dev/tox.ini --- @@ -14,6 +14,8 @@ # limitations under the License. [pycodestyle] -ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504 +ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504,W605 max-line-length=100 exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/* +[pydocstyle] +ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414 --- End diff -- I don't think that's what @ueshin was asking for, I think it was a blank line after the `ignore=...`, but if @ueshin is around we can see what @ueshin says. It's also relatively minor provided everything functions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22425#discussion_r220950740 --- Diff: dev/tox.ini --- @@ -14,6 +14,8 @@ # limitations under the License. [pycodestyle] -ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504 +ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504,W605 --- End diff -- I'm just confused why this would need to be changed in this PR -- hopefully just a hold over from the previous PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22568#discussion_r220948793 --- Diff: python/pyspark/sql/tests.py --- @@ -5714,24 +5755,31 @@ def test_wrong_args(self): pandas_udf(lambda x, y: x, DoubleType(), PandasUDFType.SCALAR)) def test_unsupported_types(self): +from distutils.version import LooseVersion +import pyarrow as pa from pyspark.sql.functions import pandas_udf, PandasUDFType -schema = StructType( -[StructField("id", LongType(), True), - StructField("map", MapType(StringType(), IntegerType()), True)]) -with QuietTest(self.sc): -with self.assertRaisesRegexp( -NotImplementedError, -'Invalid returnType.*grouped map Pandas UDF.*MapType'): -pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP) -schema = StructType( -[StructField("id", LongType(), True), - StructField("arr_ts", ArrayType(TimestampType()), True)]) -with QuietTest(self.sc): -with self.assertRaisesRegexp( -NotImplementedError, -'Invalid returnType.*grouped map Pandas UDF.*ArrayType.*TimestampType'): -pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP) +common_err_msg = 'Invalid returnType.*grouped map Pandas UDF.*' +unsupported_types = [ +StructField('map', MapType(StringType(), IntegerType())), +StructField('arr_ts', ArrayType(TimestampType())), +StructField('null', NullType()), +] + +# TODO: Remove this if-statement once minimum pyarrow version is 0.10.0 +if LooseVersion(pa.__version__) < LooseVersion("0.10.0"): +unsupported_types.append( +StructField('bin', BinaryType()) --- End diff -- Likewise, let's just ``` unsupported_types.append(StructField('bin', BinaryType())) ``` or ``` unsupported_types += [StructField('bin', BinaryType())] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22568#discussion_r220948302 --- Diff: python/pyspark/sql/tests.py --- @@ -5525,32 +5525,73 @@ def data(self): .withColumn("v", explode(col('vs'))).drop('vs') def test_supported_types(self): -from pyspark.sql.functions import pandas_udf, PandasUDFType, array, col -df = self.data.withColumn("arr", array(col("id"))) +from decimal import Decimal +from distutils.version import LooseVersion +import pyarrow as pa +from pyspark.sql.functions import pandas_udf, PandasUDFType -# Different forms of group map pandas UDF, results of these are the same +input_values_with_schema = [ +(1, StructField('id', IntegerType())), +(2, StructField('byte', ByteType())), +(3, StructField('short', ShortType())), +(4, StructField('int', IntegerType())), +(5, StructField('long', LongType())), +(1.1, StructField('float', FloatType())), +(2.2, StructField('double', DoubleType())), +(Decimal(1.123), StructField('decim', DecimalType(10, 3))), +([1, 2, 3], StructField('array', ArrayType(IntegerType(, +(True, StructField('bool', BooleanType())), +('hello', StructField('str', StringType())), +] --- End diff -- I understood why you did this but I think we can just do like: ```python values = [ 1, 2, 3, 4, 5, 1.1, ... ] output_schema = StructType([ StructField('id', IntegerType()), StructField('byte', ByteType()), StructField('short', ShortType()), StructField('int', IntegerType()), StructField('long', LongType()), StructField('float', FloatType()), ... ]) ``` Let's just keep the original way and make it simple. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22568#discussion_r220943714 --- Diff: python/pyspark/sql/tests.py --- @@ -5525,32 +5525,73 @@ def data(self): .withColumn("v", explode(col('vs'))).drop('vs') def test_supported_types(self): -from pyspark.sql.functions import pandas_udf, PandasUDFType, array, col -df = self.data.withColumn("arr", array(col("id"))) +from decimal import Decimal +from distutils.version import LooseVersion +import pyarrow as pa +from pyspark.sql.functions import pandas_udf, PandasUDFType -# Different forms of group map pandas UDF, results of these are the same +input_values_with_schema = [ +(1, StructField('id', IntegerType())), +(2, StructField('byte', ByteType())), +(3, StructField('short', ShortType())), +(4, StructField('int', IntegerType())), +(5, StructField('long', LongType())), +(1.1, StructField('float', FloatType())), +(2.2, StructField('double', DoubleType())), +(Decimal(1.123), StructField('decim', DecimalType(10, 3))), +([1, 2, 3], StructField('array', ArrayType(IntegerType(, +(True, StructField('bool', BooleanType())), +('hello', StructField('str', StringType())), +] -output_schema = StructType( -[StructField('id', LongType()), - StructField('v', IntegerType()), - StructField('arr', ArrayType(LongType())), - StructField('v1', DoubleType()), - StructField('v2', LongType())]) +# TODO: Add BinaryType to 'input_values_with_schema' once minimum pyarrow version is 0.10.0 +if LooseVersion(pa.__version__) >= LooseVersion("0.10.0"): +input_values_with_schema.append( +(bytearray([0x01, 0x02]), StructField('bin', BinaryType())) --- End diff -- tiny nit: I would just ```python input_values_with_schema += [bytearray([0x01, 0x02]), StructField('bin', BinaryType())] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22568#discussion_r220948939 --- Diff: python/pyspark/sql/tests.py --- @@ -5714,24 +5755,31 @@ def test_wrong_args(self): pandas_udf(lambda x, y: x, DoubleType(), PandasUDFType.SCALAR)) def test_unsupported_types(self): +from distutils.version import LooseVersion +import pyarrow as pa from pyspark.sql.functions import pandas_udf, PandasUDFType -schema = StructType( -[StructField("id", LongType(), True), - StructField("map", MapType(StringType(), IntegerType()), True)]) -with QuietTest(self.sc): -with self.assertRaisesRegexp( -NotImplementedError, -'Invalid returnType.*grouped map Pandas UDF.*MapType'): -pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP) -schema = StructType( -[StructField("id", LongType(), True), - StructField("arr_ts", ArrayType(TimestampType()), True)]) -with QuietTest(self.sc): -with self.assertRaisesRegexp( -NotImplementedError, -'Invalid returnType.*grouped map Pandas UDF.*ArrayType.*TimestampType'): -pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP) +common_err_msg = 'Invalid returnType.*grouped map Pandas UDF.*' +unsupported_types = [ +StructField('map', MapType(StringType(), IntegerType())), +StructField('arr_ts', ArrayType(TimestampType())), +StructField('null', NullType()), +] + +# TODO: Remove this if-statement once minimum pyarrow version is 0.10.0 +if LooseVersion(pa.__version__) < LooseVersion("0.10.0"): +unsupported_types.append( +StructField('bin', BinaryType()) +) + +for unsupported_type in unsupported_types: +schema = StructType([ +StructField('id', LongType(), True), +unsupported_type +]) --- End diff -- I think we can make this inlined as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22568#discussion_r220944429 --- Diff: python/pyspark/sql/tests.py --- @@ -5525,32 +5525,73 @@ def data(self): .withColumn("v", explode(col('vs'))).drop('vs') def test_supported_types(self): -from pyspark.sql.functions import pandas_udf, PandasUDFType, array, col -df = self.data.withColumn("arr", array(col("id"))) +from decimal import Decimal +from distutils.version import LooseVersion +import pyarrow as pa +from pyspark.sql.functions import pandas_udf, PandasUDFType -# Different forms of group map pandas UDF, results of these are the same +input_values_with_schema = [ +(1, StructField('id', IntegerType())), +(2, StructField('byte', ByteType())), +(3, StructField('short', ShortType())), +(4, StructField('int', IntegerType())), +(5, StructField('long', LongType())), +(1.1, StructField('float', FloatType())), +(2.2, StructField('double', DoubleType())), +(Decimal(1.123), StructField('decim', DecimalType(10, 3))), +([1, 2, 3], StructField('array', ArrayType(IntegerType(, +(True, StructField('bool', BooleanType())), +('hello', StructField('str', StringType())), +] -output_schema = StructType( -[StructField('id', LongType()), - StructField('v', IntegerType()), - StructField('arr', ArrayType(LongType())), - StructField('v1', DoubleType()), - StructField('v2', LongType())]) +# TODO: Add BinaryType to 'input_values_with_schema' once minimum pyarrow version is 0.10.0 +if LooseVersion(pa.__version__) >= LooseVersion("0.10.0"): +input_values_with_schema.append( +(bytearray([0x01, 0x02]), StructField('bin', BinaryType())) +) + +values = [[x[0] for x in input_values_with_schema]] +output_schema = StructType([x[1] for x in input_values_with_schema]) +df = self.spark.createDataFrame(values, schema=output_schema) + +# Different forms of group map pandas UDF, results of these are the same udf1 = pandas_udf( -lambda pdf: pdf.assign(v1=pdf.v * pdf.id * 1.0, v2=pdf.v + pdf.id), +lambda pdf: pdf.assign( +decim=pdf.decim + pdf.decim, +double=pdf.double + pdf.float, +byte=pdf.byte + 1, +long=pdf.byte + pdf.int + pdf.long + pdf.short, --- End diff -- I would get rid of those calculations with different types to make it easier to read. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/22295 nvm, the merge script only triggers the edits if we have conflicts. If you can update 3.0 to 2.5 I'd be happy to merge. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/22295 LGTM except the 3.0 to 2.5 I'll change that during the merge. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21522 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21522 **[Test build #96700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96700/testReport)** for PR 21522 at commit [`8e3aa44`](https://github.com/apache/spark/commit/8e3aa44c3937d60d5aa35dd03604e57ef218ebb4). * This patch **fails Java style tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class VectorAssemblerEstimator(override val uid: String)` * `class VectorAssemblerModel(override val uid: String, val vectorColsLengths: Map[String, Int])` * ` class VectorAssemblerModelWriter(instance: VectorAssemblerModel) extends MLWriter ` * ` class VectorAssemblerModelReader extends MLReader[VectorAssemblerModel] ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21522 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96700/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary to repla...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17654 Thanks for working on this, remove duplicated code is great. I'm curious as to why we couldn't remove some of the function calls to super and instead depend on inheritance? If it's the types on the setters could we add another type parameter of the model? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22570 Exactly same opinion with Sean's. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 Gental ping @cloud-fan @gatorsmile @HyukjinKwon @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21522 **[Test build #96700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96700/testReport)** for PR 21522 at commit [`8e3aa44`](https://github.com/apache/spark/commit/8e3aa44c3937d60d5aa35dd03604e57ef218ebb4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22379 **[Test build #96699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96699/testReport)** for PR 22379 at commit [`4c2fcea`](https://github.com/apache/spark/commit/4c2fcea15549f0338b4c7f58aa2f8968ba660aff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/21522 cc @jkbradley as the reporter of this issue you might want to take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/21522 Jenkins ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3540/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22570 **[Test build #96698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96698/testReport)** for PR 22570 at commit [`68f83a3`](https://github.com/apache/spark/commit/68f83a366497c1b263d4f4bf67ad46bcc5c65c6d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringC...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22570#discussion_r220933694 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -91,7 +91,7 @@ private[rest] class StandaloneStatusRequestServlet(masterEndpoint: RpcEndpointRe protected def handleStatus(submissionId: String): SubmissionStatusResponse = { val response = masterEndpoint.askSync[DeployMessages.DriverStatusResponse]( DeployMessages.RequestDriverStatus(submissionId)) -val message = response.exception.map { s"Exception from the cluster:\n" + formatException(_) } +val message = response.exception.map { "Exception from the cluster:\n" + formatException(_) } --- End diff -- In cases like this, we should use interpolation in place of concatenation, if anything. Likewise the previous instance, for example. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3539/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22569 **[Test build #96697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96697/testReport)** for PR 22569 at commit [`39a77d6`](https://github.com/apache/spark/commit/39a77d6596779044617912f2d6a5c2aeeb1a32f8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22569 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22569 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22569 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22569 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96688/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22569 **[Test build #96688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96688/testReport)** for PR 22569 at commit [`39a77d6`](https://github.com/apache/spark/commit/39a77d6596779044617912f2d6a5c2aeeb1a32f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22570 **[Test build #96696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96696/testReport)** for PR 22570 at commit [`4adfa46`](https://github.com/apache/spark/commit/4adfa46dfda68ede43f68485e6df36b50b8dfa96). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96696/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3538/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22570 **[Test build #96696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96696/testReport)** for PR 22570 at commit [`4adfa46`](https://github.com/apache/spark/commit/4adfa46dfda68ede43f68485e6df36b50b8dfa96). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 Just rebased. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22138 **[Test build #96695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96695/testReport)** for PR 22138 at commit [`d3f097b`](https://github.com/apache/spark/commit/d3f097bcd2c808a543326a6990b217774f86). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22570 **[Test build #96694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96694/testReport)** for PR 22570 at commit [`0a01aa3`](https://github.com/apache/spark/commit/0a01aa311f038771706cb029cf89607390f0cd57). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `println(\"Per-class example fractions, counts:\")` * ` instr.logWarning(\"All labels belong to a single class and fitIntercept=false. It's a \" +` * ` require(className == expectedClassName, \"Error loading metadata: Expected class name\" +` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22563 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96694/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22563 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22570 **[Test build #96694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96694/testReport)** for PR 22570 at commit [`0a01aa3`](https://github.com/apache/spark/commit/0a01aa311f038771706cb029cf89607390f0cd57). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3537/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22570 cc @dongjoon-hyun @HyukjinKwon @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringC...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22570 [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker to scalastyle config ## What changes were proposed in this pull request? [EmptyInterpolatedStringChecker](http://www.scalastyle.org/rules-dev.html#org_scalastyle_scalariform_EmptyInterpolatedStringChecker) used for check for empty string interpolations. This feature is very useful to us. This PR add it to `scalastyle-config.xml` and fix all empty interpolated string issue. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25553 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22570.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22570 commit 0a01aa311f038771706cb029cf89607390f0cd57 Author: Yuming Wang Date: 2018-09-27T12:43:15Z Add EmptyInterpolatedStringChecker to scalastyle-config.xml --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22568 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96692/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22568 **[Test build #96692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96692/testReport)** for PR 22568 at commit [`53ff750`](https://github.com/apache/spark/commit/53ff750456057098b029a801266818fc7204cf79). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 I am running the same test case with hive version **1.2.1.spark2** and it is passing,can I know with what hive version CI is running and how org.apache.hive.jdbc.HiveStatement and external catalog are linked,I don't see any such code. cc @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96685/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21588 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21588 **[Test build #96685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96685/testReport)** for PR 21588 at commit [`a011e50`](https://github.com/apache/spark/commit/a011e50e57537589b23099b4c7b6e2e893c86f9e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21257 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22528 > Another concern here is, we have another place to control the compression codec (where we usually delegate to HDFS libraries). I was considering using Compressor API but its streaming nature controverses to structure of zip archive where meta-info is located at the end of files, and you cannot read/uncompress it sequentially block-by-block. > It just sounds like a bandaid fix to allow one zipped file case in multi line mode. I believe it is better to return correct result in a case when wrong result is returned for now (try to read zipped CSV), or to force users to use this workaround only to read zip archives via RDD API: https://docs.databricks.com/spark/latest/data-sources/zip-files.html#zip-files . Especially in the case of compressed not splittable CSV, there is not big difference how to read it in multiLine enabled or disabled. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96687/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22563 **[Test build #96687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96687/testReport)** for PR 22563 at commit [`d810e0d`](https://github.com/apache/spark/commit/d810e0dfec9bc8f182509ccf75623f2e92e95290). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22010 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9168: [SPARK-11182] HDFS Delegation Token will be expired when ...
Github user Tianny commented on the issue: https://github.com/apache/spark/pull/9168 @jackiehff Have you solved the problemï¼I met the error same as you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22010 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r220908825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala --- @@ -15,50 +15,51 @@ * limitations under the License. */ -package org.apache.spark.sql.execution.datasources +package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkException import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.GenericInternalRow -import org.apache.spark.sql.catalyst.util._ -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.StructType +import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.unsafe.types.UTF8String class FailureSafeParser[IN]( rawParser: IN => Seq[InternalRow], mode: ParseMode, -schema: StructType, +dataType: DataType, columnNameOfCorruptRecord: String, isMultiLine: Boolean) { - - private val corruptFieldIndex = schema.getFieldIndex(columnNameOfCorruptRecord) - private val actualSchema = StructType(schema.filterNot(_.name == columnNameOfCorruptRecord)) - private val resultRow = new GenericInternalRow(schema.length) - private val nullResult = new GenericInternalRow(schema.length) - // This function takes 2 parameters: an optional partial result, and the bad record. If the given // schema doesn't contain a field for corrupted record, we just return the partial result or a // row with all fields null. If the given schema contains a field for corrupted record, we will // set the bad record to this field, and set other fields according to the partial result or null. - private val toResultRow: (Option[InternalRow], () => UTF8String) => InternalRow = { --- End diff -- Just to make the review easier, backporting easier, and keep the original author of the codes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22568 **[Test build #96692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96692/testReport)** for PR 22568 at commit [`53ff750`](https://github.com/apache/spark/commit/53ff750456057098b029a801266818fc7204cf79). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22237 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96684/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22237 **[Test build #96693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96693/testReport)** for PR 22237 at commit [`939e220`](https://github.com/apache/spark/commit/939e220475b2c33ab63df117ee6aec1ed4afdd2b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22237 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22237 **[Test build #96684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96684/testReport)** for PR 22237 at commit [`4e196f6`](https://github.com/apache/spark/commit/4e196f654b3b921787ca5a936f0a5635d26d75d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org