[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22497 I see. I will wait in other PRs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22331: [SPARK-25331][SS] Make FileStreamSink ignore part...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/22331#discussion_r219399313 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StagingFileCommitProtocol.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import org.apache.hadoop.fs.{FileAlreadyExistsException, FileContext, Path} +import org.apache.hadoop.mapreduce.{JobContext, TaskAttemptContext} + +import org.apache.spark.internal.Logging +import org.apache.spark.internal.io.FileCommitProtocol +import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage + +class StagingFileCommitProtocol(jobId: String, path: String) + extends FileCommitProtocol with Serializable with Logging + with ManifestCommitProtocol { + private var stagingDir: Option[Path] = None --- End diff -- Looks like you're using Option but always call `.get` without any checking. In `setupTask` it is fine since assignment is placed in there, but in `newTaskTempFile` we may be better to guard with `require` which achieves fail-fast and let `.get` always succeed later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22316 **[Test build #96404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96404/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22497 Congratulation, @kiszk I am working on https://github.com/apache/spark/pull/22513 . If it is possible, could you wait until we reach an agreement in that PR before merging other benchmark? In this PR, there is also such problem: ``` import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase} ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96403/testReport)** for PR 22513 at commit [`1c3c0f6`](https://github.com/apache/spark/commit/1c3c0f692d38b361f35017df3e999f7838e28e48). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3331/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219397646 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -27,7 +27,7 @@ import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType, TimestampType} -import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils} +import org.apache.spark.util.Utils /** * Benchmark to measure read performance with Filter pushdown. --- End diff -- Thanks, I have updated the doc ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219397495 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- The your first is correct since this patch addresses only codegen-on case. We can add another code to address codegen-off case. Regarding the your second point, have we ever distingished a wrong output from a bad written UT when we defect the difference between `expression` and `expected`. I think that the distinguishment is nice to have, but not mandatory to have. I have one question about your approach: ``` assert(containsNull(expected) && isNullable(expression.dataType)) ``` Since the above two conditions evaluates `expected` and `expression` independently, how this works for the following case? I think that the assertion would be passed ``` expression: dataType = StructType(ArrayType(IntegerType, false), ArrayType(IntegerType, true)) Struct(Array(0, null), Array(1, 0)) expected: Struct(Array(0, 0), Array(1, null)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22515 **[Test build #96402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96402/testReport)** for PR 22515 at commit [`0f32b01`](https://github.com/apache/spark/commit/0f32b0170fe6295bfef604b5a679f9391b5ec78f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3330/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22508 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsing...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22515 [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix One more legacy config to go ... You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark allowCreatingManagedTableUsingNonemptyLocation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22515.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22515 commit f7c372e6f803c86e189e984fa6c1dd81f84454e9 Author: Reynold Xin Date: 2018-09-21T02:10:10Z [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22508 LGTM, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3329/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #96401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96401/testReport)** for PR 22514 at commit [`5debc60`](https://github.com/apache/spark/commit/5debc6096ae6e505d3386fd7eb569d154f158d55). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22513 > KryoBenchmark is in core, and UnsafeProjectionBenchmark, HashByteArrayBenchmark and HashBenchmark are in catalyst. If we move the benchmark base class to sql, benchmarks mentioned above would not be able to inherit from the benchmark base class. What do you think? @wangyum The cases you mentioned are currently using the `org.apache.spark.sql.execution.benchmark.BenchmarkBase` (the old one), so it seems fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96389/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #96389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96389/testReport)** for PR 22163 at commit [`2dc94a2`](https://github.com/apache/spark/commit/2dc94a24ab06141768413dc2bf6f9c5e29ce7249). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22497 @kiszk @wangyum Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18544 Can you explain how do we fix the problem? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22514 [SPARK-25271][SQL] Hive ctas commands should use data source if it is convertible ## What changes were proposed in this pull request? We have a [regression](https://github.com/apache/spark/pull/20521/files#r217254430) since 2.3.1 that Hive ctas command only uses Hive Serde to write data. Hive ctas command previously will use Parquet/Orc data source to write data if it is convertible. Because of it, the related regression reported by this JIRA is when writing a empty map in to Hive using ctas, it hits Hive's known issue and is thrown exception. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-25271-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22514 commit 5debc6096ae6e505d3386fd7eb569d154f158d55 Author: Liang-Chi Hsieh Date: 2018-09-12T10:33:53Z Hive ctas commands should use data source format if it is convertible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22509 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22497 Congratulation, @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22509 lgtm, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219392779 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- Do we need this line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96392/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22497 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96392/testReport)** for PR 22494 at commit [`1ee9f02`](https://github.com/apache/spark/commit/1ee9f0208a3cb6de373e05366c19bf69967eecd8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayB...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22497 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22509 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22508 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22508 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96390/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22509 **[Test build #96391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96391/testReport)** for PR 22509 at commit [`8ad50d5`](https://github.com/apache/spark/commit/8ad50d5433ac5a0f888fb5909893317002d5aa51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22508 **[Test build #96390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96390/testReport)** for PR 22508 at commit [`f29dd89`](https://github.com/apache/spark/commit/f29dd8905f0b14c937a47d7abe291828c7de48b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22513 `KryoBenchmark` is in core, and `UnsafeProjectionBenchmark`, `HashByteArrayBenchmark` and `HashBenchmark` are in `catalyst`. If we move the benchmark base class to sql, benchmarks mentioned above would not be able to inherit from the benchmark base class. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3328/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22471: [SPARK-25469][SQL][Performance] Eval methods of Concat, ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22471 @maropu Do you want to merge this as your first work as a committer? I think this can be merged into master/2.4 because this is a performance regression fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22497 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22467 **[Test build #96400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96400/testReport)** for PR 22467 at commit [`813d19c`](https://github.com/apache/spark/commit/813d19c63477b82a76bdd0d1da73cf3cb1d38564). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22458: [SPARK-25459] Add viewOriginalText back to CatalogTable
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22458 **[Test build #96399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96399/testReport)** for PR 22458 at commit [`f3d3100`](https://github.com/apache/spark/commit/f3d3100399be442da9fd5e417aeefb9662903c49). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96398/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19045 **[Test build #96398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96398/testReport)** for PR 19045 at commit [`42a29ab`](https://github.com/apache/spark/commit/42a29abf4d4479f5195eee6324efd181f118535b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19045 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22407 @ueshin Wenchen thought it may be risky to backport the fix to tighestCommonType. Given this, can this be looked at now ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19045 **[Test build #96398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96398/testReport)** for PR 19045 at commit [`42a29ab`](https://github.com/apache/spark/commit/42a29abf4d4479f5195eee6324efd181f118535b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3327/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19045 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219388335 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- @adrian555 Thanks for the explanation. > However, my second point is that I don't think these two configs matter much or that important/necessary. Since the eager execution is just to show a snippet data of the SparkDataFrame, our default numRows = 20 and truncate = TRUE are good enough iMO. If users want to see more or less number of rows, they should call showDF(). So i just wanted to make sure if its possible to have parity with how it works for python. It seems to me that in python, we just get the two configs and call the showstring method. > And if we think that showDF() can ignore the eager execution setting and still want the show() to observe eager execution config, we can certainly just grab the maxNumRows and truncate setting and pass to showDF() call. What will happen if we grab these config in show() when eager execution is enabled and then call showDF() by passing these parameters ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96397/testReport)** for PR 22513 at commit [`9288933`](https://github.com/apache/spark/commit/9288933b4a71e646e67f551dcfd80f9ff9a470da). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219388085 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -27,7 +27,7 @@ import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType, TimestampType} -import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils} +import org.apache.spark.util.Utils /** * Benchmark to measure read performance with Filter pushdown. --- End diff -- How about change scala doc to below to fix **fails to generate documentation**? ```scala * To run this benchmark: * {{{ * 1. without sbt: bin/spark-submit --class * 2. build/sbt "sql/test:runMain " * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " * Results will be written to "benchmarks/FilterPushdownBenchmark-results.txt". * }}} ``` fails to generate documentation error message: ```java /home/jenkins/workspace/SparkPullRequestBuilder@2/target/javaunidoc/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.html... [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/mllib/target/java/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.java:5: error: unknown tag: this [error] * 1. without sbt: bin/spark-submit --class [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/mllib/target/java/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.java:5: error: unknown tag: spark [error] * 1. without sbt: bin/spark-submit --class [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/mllib/target/java/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.java:6: error: unknown tag: this [error] * 2. build/sbt "mllib/test:runMain " [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/mllib/target/java/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.java:7: error: unknown tag: this [error] * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "mllib/test:runMain " [error] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3326/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96395/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/22456 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #96395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96395/testReport)** for PR 22512 at commit [`39c5e92`](https://github.com/apache/spark/commit/39c5e92713b86f342e756591235f9cbe25126f90). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` s\"but $` * `case class Literal(value: Any, dataType: DataType) extends LeafExpression ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219386919 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -442,6 +442,8 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-06") .option("upperBound", "2018-07-20") .option("numPartitions", 3) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- ok. I will add notes to http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases, and will also add comments to the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22471: [SPARK-25469][SQL][Performance] Eval methods of Concat, ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22471 LGTM Btw, we don't need `[Performance]` in the title, probably. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96396/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96396/testReport)** for PR 22513 at commit [`89bd830`](https://github.com/apache/spark/commit/89bd8300405a6c7f2ed4d756db66b2d1cc3f7389). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22512 I thought we currently had less tests for interpreted projections, so I was checking if we had no bug caused by these projections. Then, I noticed these two issues when the interpreted mode enabled in `SQLQueryTestSuite`. I'm still digging if we have other bugs about interpreted projections, so I set `WIP`. Btw, we'd be better to split this pr into multiple ones, probably. But, I'd like to make all the related bugs clear first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22506: [SPARK-25494][SQL] Upgrade Spark's use of Janino ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22506 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96396/testReport)** for PR 22513 at commit [`89bd830`](https://github.com/apache/spark/commit/89bd8300405a6c7f2ed4d756db66b2d1cc3f7389). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3325/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22513 @wangyum @yucai @dongjoon-hyun @cloud-fan @gatorsmile Let's focus on this before we merge other benchmark PRs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/22513 [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark ## What changes were proposed in this pull request? Currently there are two classes with the same naming BenchmarkBase: 1. `org.apache.spark.util.BenchmarkBase` 2. `org.apache.spark.sql.execution.benchmark.BenchmarkBase` This is very confusing. And the benchmark object `org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark` is using the one in `org.apache.spark.util.BenchmarkBase`, while there is another class `BenchmarkBase` in the same package of it. Here I propose: 1. the package org.apache.spark.util.BenchmarkBase should be in test package, move to org.apache.spark.sql.execution.benchmark . 2. Rename the org.apache.spark.sql.execution.benchmark.BenchmarkBase as BenchmarkWithCodegen 3. Move org.apache.spark.util.Benchmark to test package of org.apache.spark.sql.execution.benchmark ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark refactorBenchmarkBase Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22513.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22513 commit 89bd8300405a6c7f2ed4d756db66b2d1cc3f7389 Author: Gengliang Wang Date: 2018-09-21T05:07:01Z refactor BenchmarkBase --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22506: [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0....
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22506 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #96395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96395/testReport)** for PR 22512 at commit [`39c5e92`](https://github.com/apache/spark/commit/39c5e92713b86f342e756591235f9cbe25126f90). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3324/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite fai...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/22512 [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures when the interpreter mode enabled ## What changes were proposed in this pull request? This pr fixed test failures in `SQLQueryTestSuite` when the interpreter mode enabled. This pr addressed the two cases below; - The current `InterpretedMutableProjection` can't handle `UnsafeRow` in the internal buffer `mutableRow`. `AggregationIterator` uses `MutableProjection` in that manner and `GenerateMutableProjection` can handle `UnsafeRow` as buffer internally. - `Literal` returns different a typed value between codegen and interpreter modes in some cases, e.g., `Literal(1, LongType)` returns a long value for the codegen mode and returns an int value for the interpreter mode. So, `InterpretedUnsafeProjection` fails when running `SQLQueryTestSuite`. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark InterpreterTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22512.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22512 commit 39c5e92713b86f342e756591235f9cbe25126f90 Author: Takeshi Yamamuro Date: 2018-09-21T04:25:53Z Fix test failures with the interpreter mode enabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219384936 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Had thought about this. First, I consider it is not in the scope of this jira, because I think they are conflicting with the current `showDF()` behavior. Some details: the `showDF()` already takes `numRows` and `truncate` arguments. So if we are going to respect those two as well, we have to decide what behavior is best suitable for `showDF()`. For example, whether `showDF()` should just ignore the eager execution, or it picks the `maxNumRows` and `truncate` set through eager execution like following: ``` setMethod("showDF", signature(x = "SparkDataFrame"), function(x, numRows = 20, truncate = TRUE, vertical = FALSE) { eagerNumRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]]) numRows <- ifelse(eagerNumRows == 0, numRows, eagerNumRows) eagerTruncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]]) truncate <- ifelse(eagerTruncate == 0, truncate, eagerTruncate) if (is.logical(truncate) && truncate) { s <- callJMethod(x@sdf, "showString", numToInt(numRows), numToInt(20), vertical) } else { truncate2 <- as.numeric(truncate) s <- callJMethod(x@sdf, "showString", numToInt(numRows), numToInt(truncate2), vertical) } cat(s) }) ``` And if we think that `showDF()` can ignore the eager execution setting and still want the `show()` to observe eager execution config, we can certainly just grab the `maxNumRows` and `truncate` setting and pass to `showDF() call. However, my second point is that I don't think these two configs matter much or that important/necessary. Since the eager execution is just to show a snippet data of the SparkDataFrame, our default `numRows = 20` and `truncate = TRUE` are good enough iMO. If users want to see more or less number of rows, they should call `showDF()`. @felixcheung, your thought? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3323/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22511 **[Test build #96394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96394/testReport)** for PR 22511 at commit [`aee82ab`](https://github.com/apache/spark/commit/aee82abe4cd9fbefa14fb280644276fe491bcf9a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22511: [SPARK-25422][CORE] Don't memory map blocks strea...
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/22511 [SPARK-25422][CORE] Don't memory map blocks streamed to disk. After data has been streamed to disk, the buffers are inserted into the memory store in some cases (eg., with broadcast blocks). But broadcast code also disposes of those buffers when the data has been read, to ensure that we don't leave mapped buffers using up memory, which then leads to garbage data in the memory store. ## How was this patch tested? Ran the old failing test in a loop. Full tests on jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/squito/spark SPARK-25422 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22511 commit aee82abe4cd9fbefa14fb280644276fe491bcf9a Author: Imran Rashid Date: 2018-09-20T19:50:06Z [SPARK-25422][CORE] Don't memory map blocks streamed to disk. After data has been streamed to disk, the buffers are inserted into the memory store in some cases (eg., with broadcast blocks). But broadcast code also disposes of those buffers when the data has been read, to ensure that we don't leave mapped buffers using up memory, which then leads to garbage data in the memory store. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22510 **[Test build #96393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96393/testReport)** for PR 22510 at commit [`2b2fdaf`](https://github.com/apache/spark/commit/2b2fdaf3f7598fe31161fdd4401728d6b314bbfe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22510 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22325 Flaky again. retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22506: [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22506 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22506: [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22506 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96386/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22506: [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22506 **[Test build #96386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96386/testReport)** for PR 22506 at commit [`c3f8a6b`](https://github.com/apache/spark/commit/c3f8a6b41c9409339bf62fb17cd4cd905853d97f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22510 **[Test build #96393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96393/testReport)** for PR 22510 at commit [`2b2fdaf`](https://github.com/apache/spark/commit/2b2fdaf3f7598fe31161fdd4401728d6b314bbfe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3322/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22510 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22510: [SPARK-25321][ML] Fix local LDA model constructor
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22510 [SPARK-25321][ML] Fix local LDA model constructor ## What changes were proposed in this pull request? change back the constructor to: ``` class LocalLDAModel private[ml] ( uid: String, vocabSize: Int, private[clustering] val oldLocalModel : OldLocalLDAModel, sparkSession: SparkSession) ``` Although it is marked `private[ml]`, it is used in `mleap` and the master change breaks `mleap` building. ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark LDA_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22510.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22510 commit 2b2fdaf3f7598fe31161fdd4401728d6b314bbfe Author: WeichenXu Date: 2018-09-21T03:03:30Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19868 Basically we need to introduce this new `spark.sql.files.ignoreMissingFiles` config in detail. And them explain how can we use it to replace `spark.sql.hive.verifyPartitionPath`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22458: [SPARK-25459] Add viewOriginalText back to CatalogTable
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22458 LGTM. If there are more properties like `originalViewText` which are useless to Spark and only need to be displayed, I'd suggest we create a map for them, instead of adding more and more fields into `CatalogTable`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22451: [SPARK-24777][SQL] Add write benchmark for AVRO
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22451 Oh I was creating a PR for refactoring BenchmarkBase, I planned to merge this one after that one. Since this is merged, I will create one to refactor both. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org