[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99883/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23142 **[Test build #99883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99883/testReport)** for PR 23142 at commit [`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r240026727 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared } test("Sort metrics") { -// Assume the execution plan is -// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1)) -val ds = spark.range(10).sort('id) -testSparkPlanMetrics(ds.toDF(), 2, Map.empty) +// Assume the execution plan with node id is +// Sort(nodeId = 0) +// Exchange(nodeId = 1) +// LocalTableScan(nodeId = 2) +val df = Seq(1, 3, 2).toDF("id").sort('id) +testSparkPlanMetrics(df, 2, Map.empty) --- End diff -- can we check the metrics of `SortExec` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r240026485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -118,10 +115,12 @@ case class HashClusteredDistribution( /** * Represents data where tuples have been ordered according to the `ordering` - * [[Expression Expressions]]. This is a strictly stronger guarantee than - * [[ClusteredDistribution]] as an ordering will ensure that tuples that share the - * same value for the ordering expressions are contiguous and will never be split across - * partitions. + * [[Expression Expressions]]. Its requirement is defined as the following: + * - Given any 2 adjacent partitions, all the rows of the second partition must be larger than or + * equal to any row in the first partition, according to the `ordering` expressions. --- End diff -- Note that, only sort requires `OrderedDistribution`, and global sort doesn't care if there are equal-rows across partitions. Here is a definition of the requirement. When designing protocols, it's important to make the requirement as weak as possible, and make guarantees as strong as possible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23255 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partit...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23255#discussion_r240026441 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -752,6 +752,17 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter } } + test("CTAS: INSERT a partitioned table using Hive serde") { --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23262#discussion_r240026394 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -53,7 +53,7 @@ object RDDConversions { data.mapPartitions { iterator => val numColumns = outputTypes.length val mutableRow = new GenericInternalRow(numColumns) - val converters = outputTypes.map(CatalystTypeConverters.createToCatalystConverter) + val converters = outputTypes.map(CatalystTypeConverters.createToCatalystConverter).toArray --- End diff -- shall we use `RowEncoder` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23262#discussion_r240026388 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -33,7 +33,7 @@ object RDDConversions { data.mapPartitions { iterator => val numColumns = outputTypes.length val mutableRow = new GenericInternalRow(numColumns) - val converters = outputTypes.map(CatalystTypeConverters.createToCatalystConverter) + val converters = outputTypes.map(CatalystTypeConverters.createToCatalystConverter).toArray --- End diff -- shall we use `ExpressionEncoder` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r240026330 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -131,8 +131,20 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with PredicateHelper { expressions.flatMap(collectEvaluableUDFs) } - def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { -case plan: LogicalPlan => extract(plan) + def apply(plan: LogicalPlan): LogicalPlan = plan match { +// SPARK-26293: A subquery will be rewritten into join later, and will go through this rule +// eventually. Here we skip subquery, as Python UDF only needs to be extracted once. +case _: Subquery => plan --- End diff -- I think you have a point here. If subquery will be converted to join, why do we need to optimize subquery ahead? Anyway, that's something we need to discuss later. cc @dilipbiswal for the subquery question. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240026245 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -35,7 +35,9 @@ displayTitle: Spark SQL Upgrading Guide - Since Spark 3.0, CSV datasource uses java.time API for parsing and generating CSV content. New formatting implementation supports date/timestamp patterns conformed to ISO 8601. To switch back to the implementation used in Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`. - - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully. + - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, the returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully. + + - In Spark version 2.4 and earlier, JSON datasource and JSON functions like `from_json` convert a bad JSON record to a row with all `null`s in the PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, the returned row can contain non-`null` fields if some of JSON column values were parsed and converted to desired types successfully. --- End diff -- does `from_csv` support it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240026237 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala --- @@ -229,6 +229,11 @@ private[json] trait TestJsonData { """{"date": "27/10/2014 18:30"}""" :: """{"date": "28/01/2016 20:00"}""" :: Nil))(Encoders.STRING) + def badRecords: Dataset[String] = --- End diff -- if it's only used in one test, let's move it to that test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5900/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23265 **[Test build #99884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99884/testReport)** for PR 23265 at commit [`6a837c0`](https://github.com/apache/spark/commit/6a837c019eaf7bc9907715a54778bfbb339f3342). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23265 [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter backport https://github.com/apache/spark/pull/23239 to 2.4 - ## What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23265.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23265 commit 6a837c019eaf7bc9907715a54778bfbb339f3342 Author: Wenchen Fan Date: 2018-12-08T19:18:09Z [SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter A followup of https://github.com/apache/spark/pull/23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. existing tests Closes #23239 from cloud-fan/minor. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23265 cc @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r240024723 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared } test("Sort metrics") { -// Assume the execution plan is -// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1)) -val ds = spark.range(10).sort('id) -testSparkPlanMetrics(ds.toDF(), 2, Map.empty) +// Assume the execution plan with node id is +// Sort(nodeId = 0) +// Exchange(nodeId = 1) +// LocalTableScan(nodeId = 2) +val df = Seq(1, 3, 2).toDF("id").sort('id) +testSparkPlanMetrics(df, 2, Map.empty) --- End diff -- @mgaido91 This case tries to check `Sort` (nodeId=0) metrics, rather than `LocalTableScan`. The second parameter (`2`) of `testSparkPlanMetrics(df, 2, Map.empty)` means `expectedNumOfJobs` rather than `nodeId`. The third parameter `expectedMetrics` will pass `nodeId` together with corresponding expected metrics. Because metrics of Sort node (including `sortTime`, `peakMemory`, `spillSize`) may change during each execution, unlike metrics like `numOutputRows`, we have no way to check these values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5899/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan and @JkSelf . For the partial revert, we had better create a new Apache JIRA issue. That will be a more cleaner way to backport. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23142 **[Test build #99883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99883/testReport)** for PR 23142 at commit [`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23142 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23142 **[Test build #99882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99882/testReport)** for PR 23142 at commit [`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99882/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration doc for ...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/23238 Thank you! @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22683 Fortunately the syntax is "100m", which has always meant "100 * 1024 * 1024" or "100 MiB" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan ok, i will revert as your comments later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 The result of all queries in tpcds with 1TB data scale is in [tpcds result](https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r240022921 --- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala --- @@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with ResetSystemProperties { intercept[IllegalArgumentException] { mgr.getSecretKey() } + case FILE => +val secretFile = createTempSecretFile() +conf.set(AUTH_SECRET_FILE, secretFile.getAbsolutePath) +mgr.initializeAuth() +assert(encodeFileAsBase64(secretFile) === mgr.getSecretKey()) } } } ) } } + private def encodeFileAsBase64(secretFile: File) = { +Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath)) + } + + private def createTempSecretFile(contents: String = "test-secret"): File = { +val secretDir = Utils.createTempDir("temp-secrets") +val secretFile = new File(secretDir, "temp-secret.txt") +Files.write(secretFile.toPath, contents.getBytes(StandardCharsets.UTF_8)) +secretFile --- End diff -- ah it's fine --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metrics shoul...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/23224 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23256: [SPARK-24207][R] follow-up PR for SPARK-24207 to fix cod...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/23256 ideally, but really not for this PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r240022552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { DecimalType(bigDecimal.precision, bigDecimal.scale) } decimalTry.getOrElse(StringType) - case VALUE_STRING => StringType + case VALUE_STRING => +val stringValue = parser.getText --- End diff -- If we switch the order here, we don't need the length check [here](https://github.com/apache/spark/pull/23201/files#diff-e925de14239f40430d05f9ffd0360f10R130), right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22088: [SPARK-24931][CORE]CoarseGrainedExecutorBackend send wro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22088 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23207 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r240022467 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala --- @@ -222,4 +222,61 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { --- End diff -- Please fix this first for the first and second review comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23207 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 according to https://github.com/apache/spark/pull/23214#issuecomment-443999282 , the hash join metrics is wrongly implemented. I think it's fine to revert it and re-implement it later. @JkSelf can you address the comments and only revert the hash join part? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23142 **[Test build #99882 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99882/testReport)** for PR 23142 at commit [`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5898/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23142 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23142 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23204 Hi, @LuciferYang . If we are not going to revert this, could you close this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration d...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23238 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration doc for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23238 Thanks, @seancxmao . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration doc for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23238 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23253 cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99881/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23257 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23257 **[Test build #99881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99881/testReport)** for PR 23257 at commit [`9ee62ac`](https://github.com/apache/spark/commit/9ee62acc50bd5fb72a84a620e0fe50e27f7df515). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23259 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23259 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99880/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23259 **[Test build #99880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99880/testReport)** for PR 23259 at commit [`01bc383`](https://github.com/apache/spark/commit/01bc38347496a6194b46ace0feb7d2cd1adb614e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99879/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23253 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23253 **[Test build #99879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99879/testReport)** for PR 23253 at commit [`fc088df`](https://github.com/apache/spark/commit/fc088df2e98133e11d8fe60f899d3212e4cbee35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99877/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23242 **[Test build #99877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99877/testReport)** for PR 23242 at commit [`de531c8`](https://github.com/apache/spark/commit/de531c81f3274145a1de839c80d93e3c37264da0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99878/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99878/testReport)** for PR 23169 at commit [`855f540`](https://github.com/apache/spark/commit/855f5404c53eba51ed373fa9e7be4eaafd60bb30). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23257 **[Test build #99881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99881/testReport)** for PR 23257 at commit [`9ee62ac`](https://github.com/apache/spark/commit/9ee62acc50bd5fb72a84a620e0fe50e27f7df515). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23257 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23257 @cloud-fan What do you think of the PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5897/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23259 cc @hvanhovell , @mgaido91 , too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23259 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5896/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23259 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23259 **[Test build #99880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99880/testReport)** for PR 23259 at commit [`01bc383`](https://github.com/apache/spark/commit/01bc38347496a6194b46ace0feb7d2cd1adb614e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23259 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23253 **[Test build #99879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99879/testReport)** for PR 23253 at commit [`fc088df`](https://github.com/apache/spark/commit/fc088df2e98133e11d8fe60f899d3212e4cbee35). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23253 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5895/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240014440 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -37,6 +37,8 @@ displayTitle: Spark SQL Upgrading Guide - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully. + - In Spark version 2.4 and earlier, JSON datasource and JSON functions like `from_json` convert a bad JSON record to a row with all `null`s in the PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, returned row can contain non-`null` fields if some of JSON column values were parsed and converted to desired types successfully. --- End diff -- Fixed for CSV too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r240014393 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -190,4 +192,103 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { } } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq(true, false).foreach { convertMetastore => + withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") { +withTempDir { dir => + val dataDir = new File(s"${dir.getCanonicalPath}/l3/l2/l1/").toURI + val parentDir = s"${dir.getCanonicalPath}/l3/l2/" + val l3Dir = s"${dir.getCanonicalPath}/l3/" + val wildcardParentDir = new File(s"${dir}/l3/l2/*").toURI + val wildcardL3Dir = new File(s"${dir}/l3/*").toURI + + try { +hiveClient.runSqlHive("USE default") +hiveClient.runSqlHive( + """ +|CREATE EXTERNAL TABLE hive_orc( +| C1 INT, +| C2 INT, +| C3 STRING) +|STORED AS orc""".stripMargin) +// Hive throws an exception if I assign the location in the create table statement. +hiveClient.runSqlHive( + s"ALTER TABLE hive_orc SET LOCATION '$dataDir'") +hiveClient.runSqlHive( + """ +|INSERT INTO TABLE hive_orc +|VALUES (1, 1, 'orc1'), (2, 2, 'orc2')""".stripMargin) + +withTable("tbl1", "tbl2", "tbl3", "tbl4") { + val parentDirStatement = +s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS orc + |LOCATION '${parentDir}'""".stripMargin + sql(parentDirStatement) + val parentDirSqlStatement = s"select * from tbl1" + if (convertMetastore) { +checkAnswer(sql(parentDirSqlStatement), Nil) + } else { +checkAnswer(sql(parentDirSqlStatement), + (1 to 2).map(i => Row(i, i, s"orc$i"))) + } + + val l3DirStatement = +s""" + |CREATE EXTERNAL TABLE tbl2( + | c1 int, + | c2 int, + | c3 string) + |STORED AS orc + |LOCATION '${l3Dir}'""".stripMargin + sql(l3DirStatement) + val l3DirSqlStatement = s"select * from tbl2" + if (convertMetastore) { +checkAnswer(sql(l3DirSqlStatement), Nil) + } else { +checkAnswer(sql(l3DirSqlStatement), + (1 to 2).map(i => Row(i, i, s"orc$i"))) + } + + val wildcardStatement = +s""" + |CREATE EXTERNAL TABLE tbl3( + | c1 int, + | c2 int, + | c3 string) + |STORED AS orc + |LOCATION '$wildcardParentDir'""".stripMargin + sql(wildcardStatement) + val wildcardSqlStatement = s"select * from tbl3" + if (convertMetastore) { +checkAnswer(sql(wildcardSqlStatement), + (1 to 2).map(i => Row(i, i, s"orc$i"))) + } else { +checkAnswer(sql(wildcardSqlStatement), Nil) + } + + val wildcardL3Statement = +s""" + |CREATE EXTERNAL TABLE tbl4( + | c1 int, + | c2 int, + | c3 string) + |STORED AS orc + |LOCATION '$wildcardL3Dir'""".stripMargin + sql(wildcardL3Statement) + val wildcardL3SqlStatement = s"select * from tbl4" + checkAnswer(sql(wildcardL3SqlStatement), Nil) +} + } finally { +hiveClient.runSqlHive("DROP TABLE IF EXISTS hive_orc") --- End diff -- Got it. I missed that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23264: [SPARK-26266][BUILD] Update to Scala 2.12.8 (bran...
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/23264 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r240014126 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -235,11 +235,8 @@ public void writeEmptyIterator() throws Exception { final Option mapStatus = writer.stop(true); assertTrue(mapStatus.isDefined()); assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); assertArrayEquals(new long[NUM_PARTITITONS], partitionSizesInMergedFile); -assertEquals(0, taskMetrics.shuffleWriteMetrics().recordsWritten()); -assertEquals(0, taskMetrics.shuffleWriteMetrics().bytesWritten()); -assertEquals(0, taskMetrics.diskBytesSpilled()); -assertEquals(0, taskMetrics.memoryBytesSpilled()); --- End diff -- We need to keep these test coverage guaranteeing that the task metrics remained an untouched state. Please revert this removal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5894/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23239 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99878/testReport)** for PR 23169 at commit [`855f540`](https://github.com/apache/spark/commit/855f5404c53eba51ed373fa9e7be4eaafd60bb30). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23239 Hi, @cloud-fan . Please make another PR for `branch-2.4`. There is a conflict on `branch-2.4`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240013729 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -37,6 +37,8 @@ displayTitle: Spark SQL Upgrading Guide - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully. + - In Spark version 2.4 and earlier, JSON datasource and JSON functions like `from_json` convert a bad JSON record to a row with all `null`s in the PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, returned row can contain non-`null` fields if some of JSON column values were parsed and converted to desired types successfully. --- End diff -- `returned row` -> `the returned row` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23264: [SPARK-26266][BUILD] Update to Scala 2.12.8 (branch-2.4)
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23264 This is merged now. Please close the PR, @srowen . :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23264: [SPARK-26266][BUILD] Update to Scala 2.12.8 (branch-2.4)
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23264 Merged to branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5893/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23242 **[Test build #99877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99877/testReport)** for PR 23242 at commit [`de531c8`](https://github.com/apache/spark/commit/de531c81f3274145a1de839c80d93e3c37264da0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23242: [SPARK-26285][CORE] accumulator metrics sources f...
Github user abellina commented on a diff in the pull request: https://github.com/apache/spark/pull/23242#discussion_r240013209 --- Diff: examples/src/main/scala/org/apache/spark/examples/AccumulatorMetricsTest.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples + +import org.apache.spark.metrics.source.{DoubleAccumulatorSource, LongAccumulatorSource} +import org.apache.spark.sql.SparkSession + +/** + * Usage: AccumulatorMetricsTest [partitions] [numElem] [blockSize] + */ +object AccumulatorMetricsTest { --- End diff -- @redsanket yes at first I thought that, but there are other many other "examples" here with the suffix Test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23253 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99875/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23253 **[Test build #99875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99875/testReport)** for PR 23253 at commit [`b19b3e1`](https://github.com/apache/spark/commit/b19b3e1ce25836a419b056a63bc320f1b82dc1b1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99876/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99876/testReport)** for PR 23169 at commit [`f6d0efc`](https://github.com/apache/spark/commit/f6d0efc7c0e1d461e5854c6e04f3347f174bf13a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GBTClassifierParams(GBTParams, HasVarianceImpurity):` * `class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,` * `class HasDistanceMeasure(Params):` * `class HasValidationIndicatorCol(Params):` * `class HasVarianceImpurity(Params):` * `class TreeRegressorParams(HasVarianceImpurity):` * `class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):` * `class GBTRegressorParams(GBTParams, TreeRegressorParams):` * `class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,` * `class ArrowCollectSerializer(Serializer):` * `class CSVInferSchema(val options: CSVOptions) extends Serializable ` * `class InterpretedSafeProjection(expressions: Seq[Expression]) extends Projection ` * `sealed trait DateTimeFormatter ` * `class Iso8601DateTimeFormatter(` * `class LegacyDateTimeFormatter(` * `class LegacyFallbackDateTimeFormatter(` * `sealed trait DateFormatter ` * `class Iso8601DateFormatter(` * `class LegacyDateFormatter(` * `class LegacyFallbackDateFormatter(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5892/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org