[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r184851052 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.json + +import java.io.File + +import org.apache.spark.SparkConf +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.types.{LongType, StringType, StructType} +import org.apache.spark.util.{Benchmark, Utils} + +/** + * Benchmark to measure JSON read/write performance. + * To run this: + * spark-submit --class --jars + */ +object JSONBenchmarks { + val conf = new SparkConf() + + val spark = SparkSession.builder +.master("local[1]") +.appName("benchmark-json-datasource") +.config(conf) +.getOrCreate() + import spark.implicits._ + + def withTempPath(f: File => Unit): Unit = { +val path = Utils.createTempDir() +path.delete() +try f(path) finally Utils.deleteRecursively(path) + } + + + def schemaInferring(rowsNum: Int): Unit = { +val benchmark = new Benchmark("JSON schema inferring", rowsNum) + +withTempPath { path => + // scalastyle:off --- End diff -- ``` // scalastyle:off println ... // scalastyle:on println ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21177 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21177 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2740/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21173 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2739/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21177 **[Test build #89957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89957/testReport)** for PR 21177 at commit [`99ecd12`](https://github.com/apache/spark/commit/99ecd123a8c5971f80fecb39f44d039be513a27b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) qu...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21177#discussion_r184851009 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala --- @@ -78,7 +81,7 @@ object TPCDSQueryBenchmark extends Logging { } val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum val benchmark = new Benchmark(s"TPCDS Snappy", numRows, 5) - benchmark.addCase(name) { i => + benchmark.addCase(s"$name$nameSuffix") { _ => --- End diff -- yes and no; I feel both is ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) qu...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21177#discussion_r184850972 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala --- @@ -87,10 +90,20 @@ object TPCDSQueryBenchmark extends Logging { } } + def filterQueries( + origQueries: Seq[String], + args: TPCDSQueryBenchmarkArguments): Seq[String] = { +if (args.queryFilter.nonEmpty) { + origQueries.filter { case queryName => args.queryFilter.contains(queryName) } --- End diff -- Thanks, fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21173 **[Test build #89956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89956/testReport)** for PR 21173 at commit [`3d3f84e`](https://github.com/apache/spark/commit/3d3f84e64dc7cd3c84d1fe0d93d39ca277fcb681). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89953/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21182 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21182 **[Test build #89953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89953/testReport)** for PR 21182 at commit [`1fa871d`](https://github.com/apache/spark/commit/1fa871dd4cad9a04cf8617031ab47844a82bb56e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21189: [SPARK-24117][SQL] Unified the getSizePerRow
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21189 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2738/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21189: [SPARK-24117][SQL] Unified the getSizePerRow
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21189 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21189: [SPARK-24117][SQL] Unified the getSizePerRow
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21189 **[Test build #89955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89955/testReport)** for PR 21189 at commit [`cd41538`](https://github.com/apache/spark/commit/cd415381386f0ac5c29cd6dab57ceafc86e96adf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21189: [SPARK-24117][SQL] Unified the getSizePerRow
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21189 [SPARK-24117][SQL] Unified the getSizePerRow ## What changes were proposed in this pull request? This pr unified the `getSizePerRow` because `getSizePerRow` is used in many places. For example: 1. [LocalRelation.scala#L80](https://github.com/wangyum/spark/blob/f70f46d1e5bc503e9071707d837df618b7696d32/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala#L80) 2. [SizeInBytesOnlyStatsPlanVisitor.scala#L36](https://github.com/apache/spark/blob/76b8b840ddc951ee6203f9cccd2c2b9671c1b5e8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L36) ## How was this patch tested? Exist tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-24117 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21189 commit cd415381386f0ac5c29cd6dab57ceafc86e96adf Author: Yuming Wang Date: 2018-04-28T11:10:33Z Unified the getSizePerRow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89952/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21188 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampU...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21188 **[Test build #89952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89952/testReport)** for PR 21188 at commit [`bf62aed`](https://github.com/apache/spark/commit/bf62aed080c9a2b6b46e8ee656c70b5ae76c0d45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21170: [SPARK-22732][SS][FOLLOW-UP] Fix MemorySinkV2 toS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21170 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89954/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21178 **[Test build #89954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89954/testReport)** for PR 21178 at commit [`dfdb4a8`](https://github.com/apache/spark/commit/dfdb4a82834ff218e162906bb069d39d05b13761). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2737/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21178 **[Test build #89954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89954/testReport)** for PR 21178 at commit [`dfdb4a8`](https://github.com/apache/spark/commit/dfdb4a82834ff218e162906bb069d39d05b13761). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21182 **[Test build #89953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89953/testReport)** for PR 21182 at commit [`1fa871d`](https://github.com/apache/spark/commit/1fa871dd4cad9a04cf8617031ab47844a82bb56e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21178#discussion_r184844613 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java --- @@ -362,4 +371,34 @@ public static void verifyProxyAccess(String realUser, String proxyUser, String i } } + public static boolean needUgiLogin(UserGroupInformation ugi, String principal, String keytab) { +return null == ugi || !ugi.hasKerberosCredentials() || !ugi.getUserName().equals(principal) || + !keytab.equals(getKeytabFromUgi()); + } + + private static String getKeytabFromUgi() { +Class clz = UserGroupInformation.class; +try { + synchronized (clz) { +Field field = clz.getDeclaredField("keytabFile"); +field.setAccessible(true); +return (String) field.get(null); + } +} catch (NoSuchFieldException e) { + try { +synchronized (clz) { + // In Hadoop 3 we don't have "keytabFile" field, instead we should use private method + // getKeytab(). + Method method = clz.getDeclaredMethod("getKeytab"); + method.setAccessible(true); + return (String) method.invoke(UserGroupInformation.getCurrentUser()); --- End diff -- Sure, I will change it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r184844407 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -19,14 +19,41 @@ package org.apache.spark.sql.catalyst.expressions import java.util.Comparator import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData, TypeUtils} import org.apache.spark.sql.types._ import org.apache.spark.unsafe.Platform import org.apache.spark.unsafe.array.ByteArrayMethods import org.apache.spark.unsafe.types.{ByteArray, UTF8String} +/** + * Base trait for [[BinaryExpression]]s with two arrays of the same element type and implicit + * casting. + */ +trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression --- End diff -- @kiszk you are not wrong, but `Concat` is a very specific case, since it supports also `String`s and `Binary`s, so it would anyway require a specific implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2736/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21188 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampU...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21188 **[Test build #89952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89952/testReport)** for PR 21188 at commit [`bf62aed`](https://github.com/apache/spark/commit/bf62aed080c9a2b6b46e8ee656c70b5ae76c0d45). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21188: [SPARK-24046][SS] Fix rate source rowsPerSecond <...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/21188 [SPARK-24046][SS] Fix rate source rowsPerSecond <= rampUpTime corner case ## What changes were proposed in this pull request? Current Rate source has some issues when calculating `valueAtSecond` if `rowsPerSecond` <= `rampUpTime`, value will not be gradually increased, details can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-24046). So here propose to fix this issue. ## How was this patch tested? Add UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-24046 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21188 commit bf62aed080c9a2b6b46e8ee656c70b5ae76c0d45 Author: jerryshao Date: 2018-04-28T07:29:17Z Fix rate source rowsPerSecond <= rampUpTime corner case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org