[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22813 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97960/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22813 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #97960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97960/testReport)** for PR 22512 at commit [`31d75ba`](https://github.com/apache/spark/commit/31d75baeccfd811aaec2d099e3e994cf59a676c8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22813: [SPARK-25818][CORE] WorkDirCleanup should only re...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/22813 [SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of t⦠## What changes were proposed in this pull request? The cleanup mechanism will clear all the eligible directories under SPARK_WORK_DIR. If the other configured paths are the same as the SPARK_WORK_DIR configuration, this will cause the file directories of other configuration items to be deleted by mistake. For example, the SPARK_LOCAL_DIRS and SPARK_WORK_DIR settings are the same. We should add an another condition which start with 'app-' when removing the app-* directories in SPARK_WORK_DIR ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark SPARK-25818 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22813 commit cf8b30f64f8855ea1574fa47955a7ce42c7d0703 Author: ouyangxiaochen Date: 2018-10-24T06:48:24Z WorkDirCleanup should only remove the directory at the beginning of the app- --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97961/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #97961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97961/testReport)** for PR 22512 at commit [`932d6f5`](https://github.com/apache/spark/commit/932d6f5e30bc9352582ac098c719b47af6bf41fb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97964/testReport)** for PR 22749 at commit [`ed4f4c9`](https://github.com/apache/spark/commit/ed4f4c90ec4c162f56373950089eec4632787817). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4434/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22749 Let me rebase again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r227655503 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -45,6 +46,11 @@ case class CreateHiveTableAsSelectCommand( override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = { val catalog = sparkSession.sessionState.catalog +val metastoreCatalog = catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog + +// Whether this table is convertible to data source relation. +val isConvertible = metastoreCatalog.isConvertible(tableDesc) --- End diff -- That is interesting idea. Let me try it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22749 hmm, it still has conflict... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21860#discussion_r227655432 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -831,7 +832,14 @@ case class HashAggregateExec( ctx.currentVars = new Array[ExprCode](aggregateBufferAttributes.length) ++ input val updateRowInRegularHashMap: String = { - ctx.INPUT_ROW = unsafeRowBuffer + val updatedTmpAggBuffer = +if (isFastHashMapEnabled && !isVectorizedHashMapEnabled) { + updatedAggBuffer --- End diff -- This also simplifies the generated code. We don't need a if-else to assign value to this new variable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r227654755 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -45,6 +46,11 @@ case class CreateHiveTableAsSelectCommand( override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = { val catalog = sparkSession.sessionState.catalog +val metastoreCatalog = catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog + +// Whether this table is convertible to data source relation. +val isConvertible = metastoreCatalog.isConvertible(tableDesc) --- End diff -- I feel `CreateHiveTableAsSelectCommand` is not useful. It simply creates the table first and then call `InsertIntoHiveTable.run`. Maybe we should just remove it and implement hive table CTAS as `Union(CreateTable, InsertIntoTable)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r227654240 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -45,6 +46,11 @@ case class CreateHiveTableAsSelectCommand( override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = { val catalog = sparkSession.sessionState.catalog +val metastoreCatalog = catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog + +// Whether this table is convertible to data source relation. +val isConvertible = metastoreCatalog.isConvertible(tableDesc) --- End diff -- another idea: can we move this logic to the `RelationConversions` rule? e.g. ``` case CreateTable(tbl, mode, Some(query)) if DDLUtils.isHiveTable(tbl) && isConvertible(tbl) => Union(CreateTable(tbl, mode, None), InsertIntoTable ...) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r227653464 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala --- @@ -92,4 +92,18 @@ class HiveParquetSuite extends QueryTest with ParquetTest with TestHiveSingleton } } } + + test("SPARK-25271: write empty map into hive parquet table") { +val testData = hiveContext.getHiveFile("data/files/empty_map.dat").getCanonicalFile() +val sourceTable = "sourceTable" +val targetTable = "targetTable" +withTable(sourceTable, targetTable) { + sql(s"CREATE TABLE $sourceTable (i int,m map) ROW FORMAT DELIMITED FIELDS " + +"TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' MAP KEYS TERMINATED BY '$'") + sql(s"LOAD DATA LOCAL INPATH '${testData.toURI}' INTO TABLE $sourceTable") --- End diff -- can we generate the input data with a temp view? e.g. create a dataframe with literals and register temp view. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97955/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22754 **[Test build #97955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97955/testReport)** for PR 22754 at commit [`6f8404b`](https://github.com/apache/spark/commit/6f8404b474539a989e08459949f54395bcd7ed10). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97956/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #97956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97956/testReport)** for PR 21860 at commit [`ccfb5a6`](https://github.com/apache/spark/commit/ccfb5a6db420aa0de3823db0e99696021f160767). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21860#discussion_r227650326 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -831,7 +832,14 @@ case class HashAggregateExec( ctx.currentVars = new Array[ExprCode](aggregateBufferAttributes.length) ++ input val updateRowInRegularHashMap: String = { - ctx.INPUT_ROW = unsafeRowBuffer + val updatedTmpAggBuffer = +if (isFastHashMapEnabled && !isVectorizedHashMapEnabled) { + updatedAggBuffer --- End diff -- just realized it. Do we create the `updatedAggBuffer` variable only to improve the readability of the generated code? It looks to me we don't need this variable. Here we can write ``` ctx.INPUT_ROW = if (isFastHashMapEnabled && !isVectorizedHashMapEnabled) fastRowBuffer else unsafeRowBuffer ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22144 I think this is different from the blocker tickets we opened before. We should try our best to avoid accidentally dropping the existing support. Please encourage more people in the community to try our RCs and find out all the new regressions or bugs. During the RC stage, for the new regressions, we can either fix it if the fix is very safe/tiny; or revert the PRs that introduce the regressions. This is how we handle the regressions during the RC stage. For this specific case, I do not think the root cause is found. If we revert the previous PRs https://issues.apache.org/jira/browse/SPARK-18186 that were merged in 2.2 release, it could easily introduce new regressions. Thus, we normally do not revert the PRs that were merged in the previous releases. Please add it as a known issue in the release note. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #97963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97963/testReport)** for PR 22468 at commit [`2b25d09`](https://github.com/apache/spark/commit/2b25d09613b85ef065236bdf9ae4611b94500b88). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4433/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97954/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22204 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22204 **[Test build #97954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97954/testReport)** for PR 22204 at commit [`b72a7fc`](https://github.com/apache/spark/commit/b72a7fce3bf42654e31bfcfb54bd7a21659dba8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #97962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97962/testReport)** for PR 22512 at commit [`56698bb`](https://github.com/apache/spark/commit/56698bb3dc428474e032915185fab936b64d4984). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4432/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #97961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97961/testReport)** for PR 22512 at commit [`932d6f5`](https://github.com/apache/spark/commit/932d6f5e30bc9352582ac098c719b47af6bf41fb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4431/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #97960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97960/testReport)** for PR 22512 at commit [`31d75ba`](https://github.com/apache/spark/commit/31d75baeccfd811aaec2d099e3e994cf59a676c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4430/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 Unfortunately, we didn't drop it mistakenly. It's a mistake and we should fix it. What I try to avoid is adding back the `supportsPartial` flag. We should look into the root cause and see how to fix it better. I don't know if this policy is written down officially, but I do remember we followed this policy many times in the previous releases. Please correct me if I am wrong. I'll list it as a known issue in 2.4.0 release notes. It will be great if someone can investigate the root cause and propose a fix(with a test). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r227640284 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType + val fieldValue = Invoke( -AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), -returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) - val clsName = getClassNameFromType(fieldType) +AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(trueFieldType), +returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive) + val clsName = getClassNameFromType(trueFieldType) --- End diff -- Why we need such special handling? There is new serialization handling for value class added above, can't we simple get the object of value class here and let recursively call of `serializerFor` to handle it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22237 LGTM, pending jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r227638126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -128,6 +128,16 @@ object ScalaReflection extends ScalaReflection { case _ => false } + def isValueClass(tpe: `Type`): Boolean = { +tpe.typeSymbol.asClass.isDerivedValueClass + } + + /** Returns the underlying type of value class `cls`. */ + def getUnderlyingTypeOf(cls: `Type`): `Type` = { --- End diff -- nit: it is usually to name it as `tpe` for `Type` in ScalaReflection. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4429/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r227638782 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -622,6 +654,14 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "serialize", udt, inputObject :: Nil) + case t if isValueClass(t) => +val (name, underlyingType) = getConstructorParameters(t).head --- End diff -- Can we use `getUnderlyingTypeOf` consistently? Let `getUnderlyingTypeOf` return both parameter name and type. Or just use `getConstructorParameters` and get rid of `getUnderlyingTypeOf`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22812 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22812 cc @michalsenkyr @vofque @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r227638941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1837,8 +1837,6 @@ case class GetArrayFromMap private( arrayGetter: MapData => ArrayData, elementTypeGetter: MapType => DataType) extends UnaryExpression with NonSQLExpression { - private lazy val encodedFunctionName: String = TermName(functionName).encodedName.toString --- End diff -- this is to address https://github.com/apache/spark/pull/22745#discussion_r227407344 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22812 **[Test build #97959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97959/testReport)** for PR 22812 at commit [`da31d26`](https://github.com/apache/spark/commit/da31d2602b8e12eb8949336cf14b903c0df731cf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r227638902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1090,15 +1096,9 @@ case class CatalystToExternalMap private( val tupleLoopValue = ctx.freshName("tupleLoopValue") val builderValue = ctx.freshName("builderValue") -val getLength = s"${genInputData.value}.numElements()" --- End diff -- these are unrelated, but is a followup of https://github.com/apache/spark/pull/16986 to address the remaining comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22812 [SPARK-25817][SQL] Dataset encoder should support combination of map and product type ## What changes were proposed in this pull request? After https://github.com/apache/spark/pull/22745 , Dataset encoder supports the combination of java bean and map type. This PR is to fix the Scala side. The reason why it didn't work before is, `CatalystToExternalMap` tries to get the data type of the input map expression, while it can be unresolved and its data type is known. To fix it, we can follow `UnresolvedMapObjects`, to create a `UnresolvedCatalystToExternalMap`, and only create `CatalystToExternalMap` when the input map expression is resolved and the data type is known. ## How was this patch tested? enable a old test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark map Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22812.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22812 commit da31d2602b8e12eb8949336cf14b903c0df731cf Author: Wenchen Fan Date: 2018-10-24T04:21:44Z Dataset encoder should support combination of map and product type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97953/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97953/testReport)** for PR 22749 at commit [`552e8dd`](https://github.com/apache/spark/commit/552e8dd3f5031b97cf5158ed07c77ff923233c79). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaPrefixSpanExample ` * `trait ScalaReflection extends Logging ` * `// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()`` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22730: [SPARK-16775][CORE] Remove deprecated accumulator v1 API...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22730 **[Test build #4389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4389/testReport)** for PR 22730 at commit [`41f02f4`](https://github.com/apache/spark/commit/41f02f461d0f632606adb68a36d03a7ed9f044c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22729: [SPARK-25737][CORE] Remove JavaSparkContextVarargsWorkar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22729 **[Test build #4391 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4391/testReport)** for PR 22729 at commit [`0860d27`](https://github.com/apache/spark/commit/0860d27a205d3dd3d94e6bbe2c9db49b7e432ef4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22795#discussion_r227634476 --- Diff: python/pyspark/sql/functions.py --- @@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None, functionType=None): conversion on returned data. The conversion is not guaranteed to be correct and results should be checked for accuracy by users. """ + +# The following table shows most of Pandas data and SQL type conversions in Pandas UDFs that +# are not yet visible to the user. Some of behaviors are buggy and might be changed in the near +# future. The table might have to be eventually documented externally. +# Please see SPARK-25798's PR to see the codes in order to generate the table below. +# +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# |SQL Type \ Pandas Value(Type)|None(object(NoneType))|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|1.0(float16)|1.0(float32)|1.0(float64)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|a(object(string))| 1(object(Decimal))|[1 2 3](object(array[int32]))|1.0(float128)|(1+0j)(complex64)|(1+0j)(complex128)|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# | boolean| None| True| True|True|True|True|True| True| True| True| False| False| False| False|False| X| X|X|False| False| False| X| False| # noqa +# | tinyint| None| 1| 1| 1| 1| 1| X|X| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| 0| X| # noqa +# | smallint| None| 1| 1| 1| 1| 1| 1|X| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| X| X| # noqa +# | int| None| 1| 1| 1| 1| 1| 1|1| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| X| X| # noqa +# | bigint| None| 1| 1| 1| 1| 1| 1|1| 1|X| 1| 1| 1| 0| 18| X| X|X|X| X| X| X| X| # noqa +# |float| None| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X|
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97949/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97949/testReport)** for PR 22608 at commit [`51959b2`](https://github.com/apache/spark/commit/51959b22cfdb4606260aa516c41e0d3f6eba56ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4428/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22309 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22309 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97950/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 https://github.com/apache/spark/pull/22237/files#r223707899 makes sense to me. Addressed. LGTM from my side as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22237 **[Test build #97958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97958/testReport)** for PR 22237 at commit [`b2988c7`](https://github.com/apache/spark/commit/b2988c76456627e245bd8e157c76197fe4cc0ade). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22309 **[Test build #97950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97950/testReport)** for PR 22309 at commit [`ca98663`](https://github.com/apache/spark/commit/ca98663a1ca56bfbf68de716e453a518df452354). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `// (e.g. as its parent trait or generic), the compiler keeps the class` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #97957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97957/testReport)** for PR 22514 at commit [`5780a5e`](https://github.com/apache/spark/commit/5780a5ecaf671e4a7475cc7ac8fc345308368fcf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22729: [SPARK-25737][CORE] Remove JavaSparkContextVarargsWorkar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22729 **[Test build #4390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4390/testReport)** for PR 22729 at commit [`0860d27`](https://github.com/apache/spark/commit/0860d27a205d3dd3d94e6bbe2c9db49b7e432ef4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22811: Branch 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22811 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22571: [SPARK-25392][Spark Job History]Inconsistent beha...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22571#discussion_r227630523 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2434,8 +2434,15 @@ class SparkContext(config: SparkConf) extends Logging { val schedulingMode = getSchedulingMode.toString val addedJarPaths = addedJars.keys.toSeq val addedFilePaths = addedFiles.keys.toSeq + // SPARK-25392 pool Information should be stored in the event + val poolInformation = getAllPools.map { it => +val xmlString = ("
[GitHub] spark issue #22811: Branch 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22811 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22811: Branch 2.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22811 Hi, @un-knower . Could you close this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22811: Branch 2.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22811 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21688 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97947/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/21632 @holdenk @sethah @HyukjinKwon @jkbradley ping... could you please take another look at this PR? I've updated it to latest master again. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21688 **[Test build #97947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97947/testReport)** for PR 21688 at commit [`052f706`](https://github.com/apache/spark/commit/052f70683db032783901fcce1e981a5392c92ea1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22811: Branch 2.4
GitHub user un-knower opened a pull request: https://github.com/apache/spark/pull/22811 Branch 2.4 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22811.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22811 commit b632e775cc057492ebba6b65647d90908aa00421 Author: Marco Gaido Date: 2018-09-06T07:27:59Z [SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String ## What changes were proposed in this pull request? SPARK-10399 introduced a performance regression on the hash computation for UTF8String. The regression can be evaluated with the code attached in the JIRA. That code runs in about 120 us per method on my laptop (MacBook Pro 2.5 GHz Intel Core i7, RAM 16 GB 1600 MHz DDR3) while the code from branch 2.3 takes on the same machine about 45 us for me. After the PR, the code takes about 45 us on the master branch too. ## How was this patch tested? running the perf test from the JIRA Closes #22338 from mgaido91/SPARK-25317. Authored-by: Marco Gaido Signed-off-by: Wenchen Fan (cherry picked from commit 64c314e22fecca1ca3fe32378fc9374d8485deec) Signed-off-by: Wenchen Fan commit 085f731adb9b8c82a2bf4bbcae6d889a967fbd53 Author: Shahid Date: 2018-09-06T16:52:58Z [SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws serialization Exception ## What changes were proposed in this pull request? mapValues in scala is currently not serializable. To avoid the serialization issue while running pageRank, we need to use map instead of mapValues. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22271 from shahidki31/master_latest. Authored-by: Shahid Signed-off-by: Joseph K. Bradley (cherry picked from commit 3b6591b0b064b13a411e5b8f8ee4883a69c39e2d) Signed-off-by: Joseph K. Bradley commit f2d5022233b637eb50567f7945042b3a8c9c6b25 Author: hyukjinkwon Date: 2018-09-06T15:18:49Z [SPARK-25328][PYTHON] Add an example for having two columns as the grouping key in group aggregate pandas UDF ## What changes were proposed in this pull request? This PR proposes to add another example for multiple grouping key in group aggregate pandas UDF since this feature could make users still confused. ## How was this patch tested? Manually tested and documentation built. Closes #22329 from HyukjinKwon/SPARK-25328. Authored-by: hyukjinkwon Signed-off-by: Bryan Cutler (cherry picked from commit 7ef6d1daf858cc9a2c390074f92aaf56c219518a) Signed-off-by: Bryan Cutler commit 3682d29f45870031d9dc4e812accbfbb583cc52a Author: liyuanjian Date: 2018-09-06T17:17:29Z [SPARK-25072][PYSPARK] Forbid extra value for custom Row ## What changes were proposed in this pull request? Add value length check in `_create_row`, forbid extra value for custom Row in PySpark. ## How was this patch tested? New UT in pyspark-sql Closes #22140 from xuanyuanking/SPARK-25072. Lead-authored-by: liyuanjian Co-authored-by: Yuanjian Li Signed-off-by: Bryan Cutler (cherry picked from commit c84bc40d7f33c71eca1c08f122cd60517f34c1f8) Signed-off-by: Bryan Cutler commit a7cfe5158f5c25ae5f774e1fb45d63a67a4bb89c Author: xuejianbest <384329882@...> Date: 2018-09-06T14:17:37Z [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem This is not a perfect solution. It is designed to minimize complexity on the basis of solving problems. It is effective for English, Chinese characters, Japanese, Korean and so on. ```scala before: +---+---+-+ |id |ä¸å½ |s2 | +---+---+-+ |1 |ab |[a] | |2 |null |[ä¸å½, abc]| |3 |ab1|[hello world]| |4 |ãè¡ ãã(kya) ãã (kyu) ãã(kyo) |[âä¸å½]| |5 |ä¸å½ï¼ä½ 好ï¼a|[âä¸ï¼å½ï¼, 312] | |6 |ä¸å
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 > According to the policy, we don't have to block the current release because of i @cloud-fan, BTW, would you mind if I ask to share what you read? I want to be aware of the policy as well. Maybe do you mean correctness issue and data lose mentioned at https://spark.apache.org/contributing.html? They can be considered as blockers? Blocker means: pointless to release without this change as the release would be unusable to a large minority of users. Correctness and data loss issues should be considered Blockers. It's difficult to say but at least looks not so few of users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Fix OOM of KryoBenchmark due to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97945/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Fix OOM of KryoBenchmark due to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22663 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Fix OOM of KryoBenchmark due to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22663 **[Test build #97945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97945/testReport)** for PR 22663 at commit [`e2ca55e`](https://github.com/apache/spark/commit/e2ca55e81e4e395bb16711db63eb23f07ab9ec9f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227626505 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MutableProjectionSuite.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.CalendarInterval + +class MutableProjectionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def createMutableProjection(dataTypes: Array[DataType]): MutableProjection = { +MutableProjection.create(dataTypes.zipWithIndex.map(x => BoundReference(x._2, x._1, true))) + } + + testBothCodegenAndInterpreted("fixed-length types") { +val fixedLengthTypes = Array[DataType]( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, + DateType, TimestampType) +val proj = createMutableProjection(fixedLengthTypes) +val inputRow = InternalRow.fromSeq( + Seq(false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 5.0, 100, 200L)) +assert(proj(inputRow) === inputRow) + +// Use UnsafeRow as buffer +val numBytes = UnsafeRow.calculateBitSetWidthInBytes(fixedLengthTypes.length) +val unsafeBuffer = UnsafeRow.createFromByteArray(numBytes, fixedLengthTypes.length) +val projUnsafeRow = proj.target(unsafeBuffer)(inputRow) +assert(FromUnsafeProjection(fixedLengthTypes)(projUnsafeRow) === inputRow) + } + + testBothCodegenAndInterpreted("variable-length types") { +val variableLengthTypes = Array( + StringType, DecimalType.defaultConcreteType, CalendarIntervalType, BinaryType, + ArrayType(StringType), MapType(IntegerType, StringType), + StructType.fromDDL("a INT, b STRING"), ObjectType(classOf[java.lang.Integer])) +val proj = createMutableProjection(variableLengthTypes) --- End diff -- sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227626456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -143,4 +144,25 @@ object InternalRow { case u: UserDefinedType[_] => getAccessor(u.sqlType) case _ => (input, ordinal) => input.get(ordinal, dataType) } + + /** + * Returns a writer for an `InternalRow` with given data type. + */ + def getWriter(ordinal: Int, dt: DataType): (InternalRow, Any) => Unit = dt match { +case BooleanType => (input, v) => input.setBoolean(ordinal, v.asInstanceOf[Boolean]) +case ByteType => (input, v) => input.setByte(ordinal, v.asInstanceOf[Byte]) +case ShortType => (input, v) => input.setShort(ordinal, v.asInstanceOf[Short]) +case IntegerType | DateType => (input, v) => input.setInt(ordinal, v.asInstanceOf[Int]) +case LongType | TimestampType => (input, v) => input.setLong(ordinal, v.asInstanceOf[Long]) +case FloatType => (input, v) => input.setFloat(ordinal, v.asInstanceOf[Float]) +case DoubleType => (input, v) => input.setDouble(ordinal, v.asInstanceOf[Double]) +case DecimalType.Fixed(precision, _) => + (input, v) => input.setDecimal(ordinal, v.asInstanceOf[Decimal], precision) +case CalendarIntervalType | BinaryType | _: ArrayType | StringType | _: StructType | + _: MapType | _: ObjectType => + (input, v) => input.update(ordinal, v) +case udt: UserDefinedType[_] => getWriter(ordinal, udt.sqlType) +case NullType => (input, _) => input.setNullAt(ordinal) +case _ => throw new SparkException(s"Unsupported data type $dt") --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97952/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #97952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97952/testReport)** for PR 22512 at commit [`a014145`](https://github.com/apache/spark/commit/a014145f94c3c771b42e7a2e73bd67d596ea802e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #97956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97956/testReport)** for PR 21860 at commit [`ccfb5a6`](https://github.com/apache/spark/commit/ccfb5a6db420aa0de3823db0e99696021f160767). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21860 LGTM, pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21860 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22455 **[Test build #4388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4388/testReport)** for PR 22455 at commit [`5b39d73`](https://github.com/apache/spark/commit/5b39d737aea4b4a680e63e25f5df81993ced1c70). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaPrefixSpanExample ` * `trait ScalaReflection extends Logging ` * `// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()`` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16478 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97946/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16478 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21860 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16478 **[Test build #97946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97946/testReport)** for PR 16478 at commit [`8b83ec7`](https://github.com/apache/spark/commit/8b83ec7242fe44847485c0591c90bc41dbdfea4a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22755: [SPARK-25755][SQL][Test] Supplementation of non-CodeGen ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22755 It looks good to improve the test coverage. But, it seems to be hard to wrap all the test case with `withSQLConf`. So, how about adding a helper function for turning off/on codegen (`WHOLESTAGE_CODEGEN_ENABLED` and `CODEGEN_FACTORY_MODE`) like `testBothCodegenAndInterpreted`? @cloud-fan @gatorsmile https://github.com/apache/spark/blob/584e767d372d41071c3436f9ad4bf77a820f12b4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala#L38 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org