[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95618/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/6 **[Test build #95622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95622/testReport)** for PR 6 at commit [`72d2628`](https://github.com/apache/spark/commit/72d2628323af4e44da1083c99c0d4996c34e4c8c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22288: [SPARK-22148][Scheduler] Acquire new executors to...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/22288#discussion_r214719743 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -414,9 +425,54 @@ private[spark] class TaskSchedulerImpl( launchedAnyTask |= launchedTaskAtCurrentMaxLocality } while (launchedTaskAtCurrentMaxLocality) } + if (!launchedAnyTask) { - taskSet.abortIfCompletelyBlacklisted(hostToExecutors) -} + taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match { +case taskIndex: Some[Int] => // Returns the taskIndex which was unschedulable + if (conf.getBoolean("spark.dynamicAllocation.enabled", false)) { +// If the taskSet is unschedulable we kill the existing blacklisted executor/s and +// kick off an abortTimer which after waiting will abort the taskSet if we were +// unable to get new executors and couldn't schedule a task from the taskSet. +// Note: We keep a track of schedulability on a per taskSet basis rather than on a +// per task basis. +if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) { + hostToExecutors.valuesIterator.foreach(executors => executors.foreach({ +executor => + logDebug("Killing executor because of task unschedulability: " + executor) + blacklistTrackerOpt.foreach(blt => blt.killBlacklistedExecutor(executor)) --- End diff -- Seriously? You killed all executors ? What if other taskSets' tasks are running on them ? BTW, if you want to refresh executors, you have to enable `spark.blacklist.killBlacklistedExecutors` also. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...
Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r214732767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -817,7 +819,7 @@ class Analyzer( case s: SubqueryExpression => s.withNewPlan(dedupOuterReferencesInSubquery(s.plan, attributeRewrites)) } - } + }, attributeRewrites) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...
Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r214732751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -805,10 +807,10 @@ class Analyzer( * that this rule cannot handle. When that is the case, there must be another rule * that resolves these conflicts. Otherwise, the analysis will fail. */ - right + (right, AttributeMap.empty[Attribute]) case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) - right transformUp { + (right transformUp { --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214735437 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -754,6 +754,47 @@ class HiveDDLSuite } } + test("Insert overwrite Hive table should output correct schema") { +withTable("tbl", "tbl2") { + withView("view1") { +spark.sql("CREATE TABLE tbl(id long)") --- End diff -- I am not familiar with Hive. But as I look at the debug message of this logical plan, the top level is `InsertIntoHiveTable `default`.`tbl2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, true, false, [ID]`. It should not be related to this configuration, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22320 **[Test build #95633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95633/testReport)** for PR 22320 at commit [`98bf027`](https://github.com/apache/spark/commit/98bf027df9c4467adc2673097c6762a3ec5210ce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22319 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95630/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22319 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22306 **[Test build #95621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95621/testReport)** for PR 22306 at commit [`8d7baee`](https://github.com/apache/spark/commit/8d7baee91199141f5999f0e49ab3092fb121cc41). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95621/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22313: [SPARK-25306][SQL] Use cache to speed up `createF...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22313#discussion_r214743988 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala --- @@ -55,19 +59,52 @@ import org.apache.spark.sql.types._ * known to be convertible. */ private[orc] object OrcFilters extends Logging { + case class FilterWithTypeMap(filter: Filter, typeMap: Map[String, DataType]) + + private lazy val cacheExpireTimeout = + org.apache.spark.sql.execution.datasources.orc.OrcFilters.cacheExpireTimeout + + private lazy val searchArgumentCache = CacheBuilder.newBuilder() +.expireAfterAccess(cacheExpireTimeout, TimeUnit.SECONDS) +.build( + new CacheLoader[FilterWithTypeMap, Option[Builder]]() { +override def load(typeMapAndFilter: FilterWithTypeMap): Option[Builder] = { + buildSearchArgument( +typeMapAndFilter.typeMap, typeMapAndFilter.filter, SearchArgumentFactory.newBuilder()) +} + }) + + private def getOrBuildSearchArgumentWithNewBuilder( + dataTypeMap: Map[String, DataType], + expression: Filter): Option[Builder] = { +// When `spark.sql.orc.cache.sarg.timeout` is 0, cache is disabled. +if (cacheExpireTimeout > 0) { + searchArgumentCache.get(FilterWithTypeMap(expression, dataTypeMap)) +} else { + buildSearchArgument(dataTypeMap, expression, SearchArgumentFactory.newBuilder()) --- End diff -- Ya. It's possible. But, if we create a Guava loading cache and pass through all the cache management logic in Guava, it means a more overhead than this PR. In this PR, `spark.sql.orc.cache.sarg.timeout=0` means not creating the loading cache at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...
Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r214793247 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -295,4 +295,14 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { df.join(df, df("id") <=> df("id")).queryExecution.optimizedPlan } } + + test("SPARK-25150: Attribute deduplication handles attributes in join condition properly") { +val a = spark.range(1, 5) +val b = spark.range(10) +val c = b.filter($"id" % 2 === 0) + +val r = a.join(b, a("id") === b("id"), "inner").join(c, a("id") === c("id"), "inner") --- End diff -- That simpler join doesn't hit the issue. It is handled by a different rule `ResolveNaturalAndUsingJoin`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22324 **[Test build #95645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95645/testReport)** for PR 22324 at commit [`510d729`](https://github.com/apache/spark/commit/510d729b0ed6f83b05a3b0f06c2631163d62ef1a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FileSourceSuite extends SharedSQLContext ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95645/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @mgaido91 , 2.2 also suffered from this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22314 @ueshin Just verified in 2.3. This problem does not exist in 2.3. This is due to the fact that implementation of `nullSafeCodeGen` is different in 2.3 than in master. However, we are missing the test cases we added in these PRs in 2.3. Should we have the test cases checked in into the branch ? I am afraid that if we ever backported the pr that changed nullSafeCodeGen , we may introduce this bug. Please advise .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22324 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r214754379 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql]( new RelationalGroupedDataset( df, groupingExprs, - RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply))) + RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr))) --- End diff -- Don't see any advantages of this. It is longer and slower. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22315 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22315 **[Test build #95636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95636/testReport)** for PR 22315 at commit [`712542c`](https://github.com/apache/spark/commit/712542c0480bfb51317a6d4905bbaa0349c940e8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95639/testReport)** for PR 22240 at commit [`b6a3c5b`](https://github.com/apache/spark/commit/b6a3c5b3de3ef145805542511770da4f59886858). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2808/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95641/testReport)** for PR 22240 at commit [`47ebd08`](https://github.com/apache/spark/commit/47ebd0849ec3344f05eb8eb74df36d7bfda7e130). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214762021 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext { assert(!ser2.getAutoReset) } + test("ClassCastException when writing a Map after previously " + --- End diff -- Since this is a bug fix test case, could you add `SPARK-25176` like `SPARK-25176 ClassCastException ...`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22317: [SPARK-25310][SQL] ArraysOverlap may throw a Compilation...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22317 **[Test build #95629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95629/testReport)** for PR 22317 at commit [`7c5b656`](https://github.com/apache/spark/commit/7c5b65657f6e58534ff2ad897f1dfa0618634287). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22315 **[Test build #95636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95636/testReport)** for PR 22315 at commit [`712542c`](https://github.com/apache/spark/commit/712542c0480bfb51317a6d4905bbaa0349c940e8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22145: [SPARK-25152][K8S] Enable SparkR Integration Tests for K...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22145 what's the latest on this, btw? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214750815 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } } + test("Insert overwrite table command should output correct schema: basic") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).toDF("id") --- End diff -- Why is `toDF("id")` required? Why not `spark.range(10)` alone? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214751930 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -754,6 +754,47 @@ class HiveDDLSuite } } + test("Insert overwrite Hive table should output correct schema") { +withTable("tbl", "tbl2") { + withView("view1") { +spark.sql("CREATE TABLE tbl(id long)") +spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4") +spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") +spark.sql("CREATE TABLE tbl2(ID long)") +spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") +checkAnswer(spark.table("tbl2"), Seq(Row(4))) + } +} + } + + test("Insert into Hive directory should output correct schema") { +withTable("tbl") { + withView("view1") { +withTempPath { path => + spark.sql("CREATE TABLE tbl(id long)") + spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4") --- End diff -- `s/SELECT/VALUES` as it could be a bit more Spark-idiomatic? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214751219 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } } + test("Insert overwrite table command should output correct schema: basic") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).toDF("id") +df.write.format("parquet").saveAsTable("tbl") +spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") +spark.sql("CREATE TABLE tbl2(ID long) USING parquet") +spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") +val identifier = TableIdentifier("tbl2", Some("default")) +val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString +val expectedSchema = StructType(Seq(StructField("ID", LongType, true))) +assert(spark.read.parquet(location).schema == expectedSchema) +checkAnswer(spark.table("tbl2"), df) + } +} + } + + test("Insert overwrite table command should output correct schema: complex") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3") +df.write.format("parquet").saveAsTable("tbl") +spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl") +spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " + + "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS") +spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 FROM view1") +val identifier = TableIdentifier("tbl2", Some("default")) +val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString +val expectedSchema = StructType(Seq( + StructField("COL1", LongType, true), --- End diff -- `nullable` is `true` by default. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214751023 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } } + test("Insert overwrite table command should output correct schema: basic") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).toDF("id") +df.write.format("parquet").saveAsTable("tbl") +spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") +spark.sql("CREATE TABLE tbl2(ID long) USING parquet") +spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") +val identifier = TableIdentifier("tbl2", Some("default")) --- End diff -- `default` is the default database name, isn't it? I'd remove it from the test or use `spark.catalog.currentDatabase`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214751748 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -63,7 +63,7 @@ case class CreateHiveTableAsSelectCommand( query, overwrite = false, ifPartitionNotExists = false, -outputColumns = outputColumns).run(sparkSession, child) +outputColumnNames = outputColumnNames).run(sparkSession, child) --- End diff -- Can you remove one `outputColumnNames`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214751169 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } } + test("Insert overwrite table command should output correct schema: basic") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).toDF("id") +df.write.format("parquet").saveAsTable("tbl") +spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") +spark.sql("CREATE TABLE tbl2(ID long) USING parquet") +spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") +val identifier = TableIdentifier("tbl2", Some("default")) +val location = spark.sessionState.catalog.getTableMetadata(identifier).location.toString +val expectedSchema = StructType(Seq(StructField("ID", LongType, true))) +assert(spark.read.parquet(location).schema == expectedSchema) +checkAnswer(spark.table("tbl2"), df) + } +} + } + + test("Insert overwrite table command should output correct schema: complex") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).map(x => (x, x.toInt, x.toInt)).toDF("col1", "col2", "col3") +df.write.format("parquet").saveAsTable("tbl") +spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl") +spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING parquet PARTITIONED " + + "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS") +spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 FROM view1") +val identifier = TableIdentifier("tbl2", Some("default")) --- End diff -- Same as above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2807/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95640/testReport)** for PR 22240 at commit [`ae4a8e6`](https://github.com/apache/spark/commit/ae4a8e6b784519a2f2a237be258ed1059e91be64). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95639/testReport)** for PR 22240 at commit [`b6a3c5b`](https://github.com/apache/spark/commit/b6a3c5b3de3ef145805542511770da4f59886858). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95639/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95641/testReport)** for PR 22240 at commit [`47ebd08`](https://github.com/apache/spark/commit/47ebd0849ec3344f05eb8eb74df36d7bfda7e130). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22179 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22179 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2810/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r214761811 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext { assert(exception.getMessage.contains("aggregate functions are not allowed")) } + + test("pivoting column list with values") { +val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) :: Nil +val df = trainingSales + .groupBy($"sales.year") + .pivot(struct(lower($"sales.course"), $"training"), Seq( +struct(lit("dotnet"), lit("Experts")), +struct(lit("java"), lit("Dummies"))) + ).agg(sum($"sales.earnings")) + +checkAnswer(df, expected) + } + + test("pivoting column list") { +val exception = intercept[RuntimeException] { + trainingSales +.groupBy($"sales.year") +.pivot(struct(lower($"sales.course"), $"training")) +.agg(sum($"sales.earnings")) +.collect() --- End diff -- I tried in your branch; ``` scala> df.show +++ |training| sales| +++ | Experts|[dotNET, 2012, 10...| | Experts|[JAVA, 2012, 2000...| | Dummies|[dotNet, 2012, 50...| | Experts|[dotNET, 2013, 48...| | Dummies|[Java, 2013, 3000...| +++ scala> df.groupBy($"sales.year").pivot(struct(lower($"sales.course"), $"training")).agg(sum($"sales.earnings")) java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema [dotnet,Dummies] at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164) at scala.util.Try.getOrElse(Try.scala:79) at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:163) at org.apache.spark.sql.functions$.typedLit(functions.scala:127) ``` I miss something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r214761843 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -69,7 +69,7 @@ case class InsertIntoHiveTable( query: LogicalPlan, overwrite: Boolean, ifPartitionNotExists: Boolean, -outputColumns: Seq[Attribute]) extends SaveAsHiveFile { +outputColumnNames: Seq[String]) extends SaveAsHiveFile { --- End diff -- thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22313: [SPARK-25306][SQL] Use cache to speed up `createF...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22313#discussion_r214744306 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala --- @@ -55,19 +59,52 @@ import org.apache.spark.sql.types._ * known to be convertible. */ private[orc] object OrcFilters extends Logging { + case class FilterWithTypeMap(filter: Filter, typeMap: Map[String, DataType]) + + private lazy val cacheExpireTimeout = + org.apache.spark.sql.execution.datasources.orc.OrcFilters.cacheExpireTimeout + + private lazy val searchArgumentCache = CacheBuilder.newBuilder() +.expireAfterAccess(cacheExpireTimeout, TimeUnit.SECONDS) +.build( + new CacheLoader[FilterWithTypeMap, Option[Builder]]() { +override def load(typeMapAndFilter: FilterWithTypeMap): Option[Builder] = { + buildSearchArgument( +typeMapAndFilter.typeMap, typeMapAndFilter.filter, SearchArgumentFactory.newBuilder()) +} + }) + + private def getOrBuildSearchArgumentWithNewBuilder( + dataTypeMap: Map[String, DataType], + expression: Filter): Option[Builder] = { +// When `spark.sql.orc.cache.sarg.timeout` is 0, cache is disabled. +if (cacheExpireTimeout > 0) { + searchArgumentCache.get(FilterWithTypeMap(expression, dataTypeMap)) +} else { + buildSearchArgument(dataTypeMap, expression, SearchArgumentFactory.newBuilder()) +} + } + def createFilter(schema: StructType, filters: Array[Filter]): Option[SearchArgument] = { val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap // First, tries to convert each filter individually to see whether it's convertible, and then // collect all convertible ones to build the final `SearchArgument`. val convertibleFilters = for { filter <- filters - _ <- buildSearchArgument(dataTypeMap, filter, SearchArgumentFactory.newBuilder()) + _ <- getOrBuildSearchArgumentWithNewBuilder(dataTypeMap, filter) } yield filter for { // Combines all convertible filters using `And` to produce a single conjunction - conjunction <- convertibleFilters.reduceOption(And) + conjunction <- convertibleFilters.reduceOption { (x, y) => +val newFilter = org.apache.spark.sql.sources.And(x, y) +if (cacheExpireTimeout > 0) { + // Build in a bottom-up manner + getOrBuildSearchArgumentWithNewBuilder(dataTypeMap, newFilter) +} --- End diff -- Final conjunction? All sub function results will be cached in the end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22315 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22315 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95634/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22315 **[Test build #95634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95634/testReport)** for PR 22315 at commit [`712542c`](https://github.com/apache/spark/commit/712542c0480bfb51317a6d4905bbaa0349c940e8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22317: [SPARK-25310][SQL] ArraysOverlap may throw a Compilation...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22317 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95629/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22317: [SPARK-25310][SQL] ArraysOverlap may throw a Compilation...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22317 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22313 Thank you for review and advice, @cloud-fan . It turns out that my initial assessment is not enough. First of all, from the beginning, [SPARK-2883](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-6cac9bc2656e3782b0312dceb8c55d47R75) is designed as a recursive function like the following. Please see `tryLeft` and `tryRight`. It's a purely computation to check if it succeeds. There is no reuse here. So, I tried to cache the first two `tryLeft` and `tryRight` operation since they can be re-used. ```scala val tryLeft = buildSearchArgument(left, newBuilder) val tryRight = buildSearchArgument(right, newBuilder) val conjunction = for { _ <- tryLeft _ <- tryRight lhs <- buildSearchArgument(left, builder.startAnd()) rhs <- buildSearchArgument(right, lhs) } yield rhs.end() ``` However, before that, `createFilter` generates the target tree with [reduceOption(And)](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-6cac9bc2656e3782b0312dceb8c55d47R35) as a deeply skewed tree. That was the root cause. I'll update this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95632/testReport)** for PR 22318 at commit [`d6e316a`](https://github.com/apache/spark/commit/d6e316a92cc4283f52f9cf141fe57bcece2cdf6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22320 **[Test build #95633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95633/testReport)** for PR 22320 at commit [`98bf027`](https://github.com/apache/spark/commit/98bf027df9c4467adc2673097c6762a3ec5210ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95633/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22314 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2806/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95640/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95640/testReport)** for PR 22240 at commit [`ae4a8e6`](https://github.com/apache/spark/commit/ae4a8e6b784519a2f2a237be258ed1059e91be64). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22315 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22315 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2803/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22315 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95631/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22316 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22316 **[Test build #95631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95631/testReport)** for PR 22316 at commit [`673ef00`](https://github.com/apache/spark/commit/673ef001adf9b64d644c782eed2aefecc029ed81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22313 **[Test build #95637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95637/testReport)** for PR 22313 at commit [`4acbaf8`](https://github.com/apache/spark/commit/4acbaf8be9e572c5cdbc61c49b488e8aef9e646b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r214752480 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -295,4 +295,14 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { df.join(df, df("id") <=> df("id")).queryExecution.optimizedPlan } } + + test("SPARK-25150: Attribute deduplication handles attributes in join condition properly") { +val a = spark.range(1, 5) +val b = spark.range(10) +val c = b.filter($"id" % 2 === 0) + +val r = a.join(b, a("id") === b("id"), "inner").join(c, a("id") === c("id"), "inner") --- End diff -- Why is this a simpler `a.join(b, "id").join(c, "id")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r214752855 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql]( new RelationalGroupedDataset( df, groupingExprs, - RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply))) + RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr))) --- End diff -- What do you think about `map(lit).map(_.expr)` instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22314 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95635/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22314 **[Test build #95635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95635/testReport)** for PR 22314 at commit [`d27256e`](https://github.com/apache/spark/commit/d27256ec70868f3fc66901abec97b4ccd75977ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22315 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95636/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2809/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22240 **[Test build #95642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95642/testReport)** for PR 22240 at commit [`c61eec3`](https://github.com/apache/spark/commit/c61eec363f78d586070c673e44e9120eb10b83b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22240 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95641/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22179 **[Test build #95643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95643/testReport)** for PR 22179 at commit [`f2fb28d`](https://github.com/apache/spark/commit/f2fb28da3eb272651530b77dbd4ea33511f0727d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22179 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org