[spark] branch master updated: [SPARK-27890][SQL] Improve SQL parser error message for character-only identifier with hyphens except those in expressions
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7b7f16f [SPARK-27890][SQL] Improve SQL parser error message for character-only identifier with hyphens except those in expressions 7b7f16f is described below commit 7b7f16f2a7a6a6685a8917a9b5ba403fff76 Author: Yesheng Ma AuthorDate: Tue Jun 18 21:51:15 2019 -0700 [SPARK-27890][SQL] Improve SQL parser error message for character-only identifier with hyphens except those in expressions ## What changes were proposed in this pull request? Current SQL parser's error message for hyphen-connected identifiers without surrounding backquotes(e.g. hyphen-table) is confusing for end users. A possible approach to tackle this is to explicitly capture these wrong usages in the SQL parser. In this way, the end users can fix these errors more quickly. For example, for a simple query such as `SELECT * FROM test-table`, the original error message is ``` Error in SQL statement: ParseException: mismatched input '-' expecting (line 1, pos 18) ``` which can be confusing in a large query. After the fix, the error message is: ``` Error in query: Possibly unquoted identifier test-table detected. Please consider quoting it with back-quotes as `test-table`(line 1, pos 14) == SQL == SELECT * FROM test-table --^^^ ``` which is easier for end users to identify the issue and fix. We safely augmented the current grammar rule to explicitly capture these error cases. The error handling logic is implemented in the SQL parsing listener `PostProcessor`. However, note that for cases such as `a - my-func(b)`, the parser can't actually tell whether this should be ``a -`my-func`(b) `` or `a - my - func(b)`. Therefore for these cases, we leave the parser as is. Also, in this patch we only provide better error messages for character-only identifiers. ## How was this patch tested? Adding new unit tests. Closes #24749 from yeshengm/hyphen-ident. Authored-by: Yesheng Ma Signed-off-by: gatorsmile --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 60 ++- .../spark/sql/catalyst/parser/AstBuilder.scala | 16 +-- .../spark/sql/catalyst/parser/ParseDriver.scala| 8 ++ .../sql/catalyst/parser/ErrorParserSuite.scala | 110 + .../spark/sql/execution/SparkSqlParser.scala | 10 +- 5 files changed, 169 insertions(+), 35 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index dcb7939..f57a659 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -82,13 +82,15 @@ singleTableSchema statement : query #statementDefault | ctes? dmlStatementNoWith #dmlStatement -| USE db=identifier#use -| CREATE database (IF NOT EXISTS)? identifier +| USE db=errorCapturingIdentifier #use +| CREATE database (IF NOT EXISTS)? db=errorCapturingIdentifier ((COMMENT comment=STRING) | locationSpec | (WITH DBPROPERTIES tablePropertyList))* #createDatabase -| ALTER database identifier SET DBPROPERTIES tablePropertyList #setDatabaseProperties -| DROP database (IF EXISTS)? identifier (RESTRICT | CASCADE)? #dropDatabase +| ALTER database db=errorCapturingIdentifier +SET DBPROPERTIES tablePropertyList #setDatabaseProperties +| DROP database (IF EXISTS)? db=errorCapturingIdentifier +(RESTRICT | CASCADE)? #dropDatabase | SHOW DATABASES (LIKE? pattern=STRING)? #showDatabases | createTableHeader ('(' colTypeList ')')? tableProvider ((OPTIONS options=tablePropertyList) | @@ -135,7 +137,8 @@ statement (ALTER | CHANGE) COLUMN? qualifiedName (TYPE dataType)? (COMMENT comment=STRING)? colPosition? #alterTableColumn | ALTER TABLE tableIdentifier partitionSpec? -CHANGE COLUMN? identifier colType colPosition? #changeColumn +CHANGE COLUMN? +colName=errorCapturingIdentifier colType colPosition? #changeColumn | ALTER TABLE tableIdentifier (partitionSpec)? SET SERDE STRING (WITH SERDEPROPERTIES tablePropertyList)? #setTableSerDe | ALTER TABLE tableIdentifier (pa
[spark] branch master updated (a5dcb82 -> 15de6d0)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a5dcb82 [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion add 15de6d0 [SPARK-28096][SQL] Convert defs to lazy vals to avoid expensive reference computation in QueryPlan and Expression No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Expression.scala | 9 - .../catalyst/expressions/aggregate/interfaces.scala | 3 ++- .../spark/sql/catalyst/expressions/grouping.scala | 8 ++-- .../sql/catalyst/expressions/namedExpressions.scala | 3 ++- .../apache/spark/sql/catalyst/plans/QueryPlan.scala | 7 +-- .../sql/catalyst/plans/logical/LogicalPlan.scala | 2 +- .../catalyst/plans/logical/QueryPlanConstraints.scala | 2 +- .../catalyst/plans/logical/ScriptTransformation.scala | 3 ++- .../plans/logical/basicLogicalOperators.scala | 19 ++- .../spark/sql/catalyst/plans/logical/object.scala | 9 ++--- .../plans/logical/pythonLogicalOperators.scala| 3 ++- .../org/apache/spark/sql/execution/ExpandExec.scala | 3 ++- .../org/apache/spark/sql/execution/objects.scala | 3 ++- .../spark/sql/execution/python/EvalPythonExec.scala | 3 ++- 14 files changed, 51 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a5dcb82 [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion a5dcb82 is described below commit a5dcb82b5a6b08ebfe168e735f6edb40b80420fd Author: Ivan Vergiliev AuthorDate: Wed Jun 19 10:44:58 2019 +0800 [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion ## What changes were proposed in this pull request? `OrcFilters.createBuilder` has exponential complexity in the height of the filter tree due to the way the check-and-build pattern is implemented. We've hit this in production by passing a `Column` filter to Spark directly, with a job taking multiple hours for a simple set of ~30 filters. This PR changes the checking logic so that the conversion has linear complexity in the size of the tree instead of exponential in its height. Right now, due to the way ORC `SearchArgument` works, the code is forced to do two separate phases when converting a given Spark filter to an ORC filter: 1. Check if the filter is convertible. 2. Only if the check in 1. succeeds, perform the actual conversion into the resulting ORC filter. However, there's one detail which is the culprit in the exponential complexity: phases 1. and 2. are both done using the exact same method. The resulting exponential complexity is easiest to see in the `NOT` case - consider the following code: ``` val f1 = col("id") === lit(5) val f2 = !f1 val f3 = !f2 val f4 = !f3 val f5 = !f4 ``` Now, when we run `createBuilder` on `f5`, we get the following behaviour: 1. call `createBuilder(f4)` to check if the child `f4` is convertible 2. call `createBuilder(f4)` to actually convert it This seems fine when looking at a single level, but what actually ends up happening is: - `createBuilder(f3)` will then recursively be called 4 times - 2 times in step 1., and two times in step 2. - `createBuilder(f2)` will be called 8 times - 4 times in each top-level step, 2 times in each sub-step. - `createBuilder(f1)` will be called 16 times. As a result, having a tree of height > 30 leads to billions of calls to `createBuilder`, heap allocations, and so on and can take multiple hours. The way this PR solves this problem is by separating the `check` and `convert` functionalities into separate functions. This way, the call to `createBuilder` on `f5` above would look like this: 1. call `isConvertible(f4)` to check if the child `f4` is convertible - amortized constant complexity 2. call `createBuilder(f4)` to actually convert it - linear complexity in the size of the subtree. This way, we get an overall complexity that's linear in the size of the filter tree, allowing us to convert tree with 10s of thousands of nodes in milliseconds. The reason this split (`check` and `build`) is possible is that the checking never actually depends on the actual building of the filter. The `check` part of `createBuilder` depends mainly on: - `isSearchableType` for leaf nodes, and - `check`-ing the child filters for composite nodes like NOT, AND and OR. Situations like the `SearchArgumentBuilder` throwing an exception while building the resulting ORC filter are not handled right now - they just get thrown out of the class, and this change preserves this behaviour. This PR extracts this part of the code to a separate class which allows the conversion to make very efficient checks to confirm that a given child is convertible before actually converting it. Results: Before: - converting a skewed tree with a height of ~35 took about 6-7 hours. - converting a skewed tree with hundreds or thousands of nodes would be completely impossible. Now: - filtering against a skewed tree with a height of 1500 in the benchmark suite finishes in less than 10 seconds. ## Steps to reproduce ```scala val schema = StructType.fromDDL("col INT") (20 to 30).foreach { width => val whereFilter = (1 to width).map(i => EqualTo("col", i)).reduceLeft(Or) val start = System.currentTimeMillis() OrcFilters.createFilter(schema, Seq(whereFilter)) println(s"With $width filters, conversion takes ${System.currentTimeMillis() - start} ms") } ``` ### Before this PR ``` With 20 filters, conversion takes 363 ms With 21 filters, conversion takes 496 ms With 22 filters, conversion takes 939 ms With 23 filters, conversion takes 1871 ms With 24 filters, conversion takes 3756 ms With 25 filters, conversion takes 7452 ms With 26 filters, conversion takes 14978 ms With 27 filters, conversion tak
[spark] branch branch-2.3 updated: [SPARK-28081][ML] Handle large vocab counts in word2vec
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new 220f29a [SPARK-28081][ML] Handle large vocab counts in word2vec 220f29a is described below commit 220f29a6f5b681a67a7e9a9351f25389c303b956 Author: Sean Owen AuthorDate: Tue Jun 18 20:27:43 2019 -0500 [SPARK-28081][ML] Handle large vocab counts in word2vec ## What changes were proposed in this pull request? The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count. This takes over https://github.com/apache/spark/pull/24814 ## How was this patch tested? Existing tests. Closes #24893 from srowen/SPARK-28081. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5) Signed-off-by: Sean Owen --- .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala index b8c306d..d5b91df 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala @@ -45,7 +45,7 @@ import org.apache.spark.util.random.XORShiftRandom */ private case class VocabWord( var word: String, - var cn: Int, + var cn: Long, var point: Array[Int], var code: Array[Int], var codeLen: Int @@ -194,7 +194,7 @@ class Word2Vec extends Serializable with Logging { new Array[Int](MAX_CODE_LENGTH), 0)) .collect() - .sortWith((a, b) => a.cn > b.cn) + .sortBy(_.cn)(Ordering[Long].reverse) vocabSize = vocab.length require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " + @@ -232,7 +232,7 @@ class Word2Vec extends Serializable with Logging { a += 1 } while (a < 2 * vocabSize) { - count(a) = 1e9.toInt + count(a) = Long.MaxValue a += 1 } var pos1 = vocabSize - 1 @@ -267,6 +267,8 @@ class Word2Vec extends Serializable with Logging { min2i = pos2 pos2 += 1 } + assert(count(min1i) < Long.MaxValue) + assert(count(min2i) < Long.MaxValue) count(vocabSize + a) = count(min1i) + count(min2i) parentNode(min1i) = vocabSize + a parentNode(min2i) = vocabSize + a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-28081][ML] Handle large vocab counts in word2vec
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e4f5d84 [SPARK-28081][ML] Handle large vocab counts in word2vec e4f5d84 is described below commit e4f5d84874bb0ad30fdf19aeaf2a7ac756830dbf Author: Sean Owen AuthorDate: Tue Jun 18 20:27:43 2019 -0500 [SPARK-28081][ML] Handle large vocab counts in word2vec ## What changes were proposed in this pull request? The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count. This takes over https://github.com/apache/spark/pull/24814 ## How was this patch tested? Existing tests. Closes #24893 from srowen/SPARK-28081. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5) Signed-off-by: Sean Owen --- .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala index b8c306d..d5b91df 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala @@ -45,7 +45,7 @@ import org.apache.spark.util.random.XORShiftRandom */ private case class VocabWord( var word: String, - var cn: Int, + var cn: Long, var point: Array[Int], var code: Array[Int], var codeLen: Int @@ -194,7 +194,7 @@ class Word2Vec extends Serializable with Logging { new Array[Int](MAX_CODE_LENGTH), 0)) .collect() - .sortWith((a, b) => a.cn > b.cn) + .sortBy(_.cn)(Ordering[Long].reverse) vocabSize = vocab.length require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " + @@ -232,7 +232,7 @@ class Word2Vec extends Serializable with Logging { a += 1 } while (a < 2 * vocabSize) { - count(a) = 1e9.toInt + count(a) = Long.MaxValue a += 1 } var pos1 = vocabSize - 1 @@ -267,6 +267,8 @@ class Word2Vec extends Serializable with Logging { min2i = pos2 pos2 += 1 } + assert(count(min1i) < Long.MaxValue) + assert(count(min2i) < Long.MaxValue) count(vocabSize + a) = count(min1i) + count(min2i) parentNode(min1i) = vocabSize + a parentNode(min2i) = vocabSize + a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28081][ML] Handle large vocab counts in word2vec
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e96dd82 [SPARK-28081][ML] Handle large vocab counts in word2vec e96dd82 is described below commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5 Author: Sean Owen AuthorDate: Tue Jun 18 20:27:43 2019 -0500 [SPARK-28081][ML] Handle large vocab counts in word2vec ## What changes were proposed in this pull request? The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count. This takes over https://github.com/apache/spark/pull/24814 ## How was this patch tested? Existing tests. Closes #24893 from srowen/SPARK-28081. Authored-by: Sean Owen Signed-off-by: Sean Owen --- .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala index 9e19ff2..7888a80 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala @@ -46,7 +46,7 @@ import org.apache.spark.util.random.XORShiftRandom */ private case class VocabWord( var word: String, - var cn: Int, + var cn: Long, var point: Array[Int], var code: Array[Int], var codeLen: Int @@ -195,7 +195,7 @@ class Word2Vec extends Serializable with Logging { new Array[Int](MAX_CODE_LENGTH), 0)) .collect() - .sortWith((a, b) => a.cn > b.cn) + .sortBy(_.cn)(Ordering[Long].reverse) vocabSize = vocab.length require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " + @@ -233,7 +233,7 @@ class Word2Vec extends Serializable with Logging { a += 1 } while (a < 2 * vocabSize) { - count(a) = 1e9.toInt + count(a) = Long.MaxValue a += 1 } var pos1 = vocabSize - 1 @@ -268,6 +268,8 @@ class Word2Vec extends Serializable with Logging { min2i = pos2 pos2 += 1 } + assert(count(min1i) < Long.MaxValue) + assert(count(min2i) < Long.MaxValue) count(vocabSize + a) = count(min1i) + count(min2i) parentNode(min1i) = vocabSize + a parentNode(min2i) = vocabSize + a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27823][CORE] Refactor resource handling code
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7056e00 [SPARK-27823][CORE] Refactor resource handling code 7056e00 is described below commit 7056e004ee566fabbb9b22ddee2de55ef03260db Author: Xiangrui Meng AuthorDate: Tue Jun 18 17:18:17 2019 -0700 [SPARK-27823][CORE] Refactor resource handling code ## What changes were proposed in this pull request? Continue the work from https://github.com/apache/spark/pull/24821. Refactor resource handling code to make the code more readable. Major changes: * Moved resource-related classes to `spark.resource` from `spark`. * Added ResourceUtils and helper classes so we don't need to directly deal with Spark conf. * ResourceID: resource identifier and it provides conf keys * ResourceRequest/Allocation: abstraction for requested and allocated resources * Added `TestResourceIDs` to reference commonly used resource IDs in tests like `spark.executor.resource.gpu`. cc: tgravescs jiangxb1987 Ngone51 ## How was this patch tested? Unit tests for added utils and existing unit tests. Closes #24856 from mengxr/SPARK-27823. Lead-authored-by: Xiangrui Meng Co-authored-by: Thomas Graves Signed-off-by: Xingbo Jiang --- .../org/apache/spark/BarrierTaskContext.scala | 1 + .../org/apache/spark/ResourceDiscoverer.scala | 151 .../org/apache/spark/ResourceInformation.scala | 37 --- .../main/scala/org/apache/spark/SparkConf.scala| 45 .../main/scala/org/apache/spark/SparkContext.scala | 94 +++- .../main/scala/org/apache/spark/TaskContext.scala | 4 +- .../scala/org/apache/spark/TaskContextImpl.scala | 1 + .../main/scala/org/apache/spark/TestUtils.scala| 30 ++- .../executor/CoarseGrainedExecutorBackend.scala| 52 ++--- .../org/apache/spark/internal/config/package.scala | 10 +- .../spark/resource/ResourceInformation.scala | 87 +++ .../org/apache/spark/resource/ResourceUtils.scala | 191 +++ .../scala/org/apache/spark/scheduler/Task.scala| 1 + .../apache/spark/scheduler/TaskDescription.scala | 2 +- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 10 +- .../apache/spark/scheduler/TaskSetManager.scala| 11 +- .../cluster/CoarseGrainedClusterMessage.scala | 2 +- .../org/apache/spark/ResourceDiscovererSuite.scala | 236 --- .../scala/org/apache/spark/SparkConfSuite.scala| 53 + .../scala/org/apache/spark/SparkContextSuite.scala | 93 +++- .../CoarseGrainedExecutorBackendSuite.scala| 159 +++-- .../org/apache/spark/executor/ExecutorSuite.scala | 1 + .../spark/resource/ResourceInformationSuite.scala | 64 + .../apache/spark/resource/ResourceUtilsSuite.scala | 259 + .../TestResourceIDs.scala} | 17 +- .../CoarseGrainedSchedulerBackendSuite.scala | 8 +- .../scheduler/ExecutorResourceInfoSuite.scala | 2 +- .../spark/scheduler/TaskDescriptionSuite.scala | 4 +- .../spark/scheduler/TaskSchedulerImplSuite.scala | 10 +- .../spark/scheduler/TaskSetManagerSuite.scala | 5 +- .../apache/spark/deploy/k8s/KubernetesUtils.scala | 20 +- .../k8s/features/BasicDriverFeatureStep.scala | 2 +- .../k8s/features/BasicExecutorFeatureStep.scala| 2 +- .../k8s/features/BasicDriverFeatureStepSuite.scala | 14 +- .../features/BasicExecutorFeatureStepSuite.scala | 37 ++- .../k8s/features/KubernetesFeaturesTestUtils.scala | 3 +- .../MesosFineGrainedSchedulerBackendSuite.scala| 3 +- .../org/apache/spark/deploy/yarn/Client.scala | 2 +- .../spark/deploy/yarn/ResourceRequestHelper.scala | 9 +- .../apache/spark/deploy/yarn/YarnAllocator.scala | 2 +- .../spark/deploy/yarn/YarnSparkHadoopUtil.scala| 8 +- .../YarnCoarseGrainedExecutorBackend.scala | 3 +- .../org/apache/spark/deploy/yarn/ClientSuite.scala | 8 +- .../spark/deploy/yarn/YarnAllocatorSuite.scala | 6 +- 44 files changed, 908 insertions(+), 851 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala index cf957ff..c393df8 100644 --- a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala +++ b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala @@ -26,6 +26,7 @@ import org.apache.spark.executor.TaskMetrics import org.apache.spark.internal.Logging import org.apache.spark.memory.TaskMemoryManager import org.apache.spark.metrics.source.Source +import org.apache.spark.resource.ResourceInformation import org.apache.spark.rpc.{RpcEndpointRef, RpcTimeout} import org.apache.spark.shuffle.FetchFailedException import org
[spark] branch master updated: [SPARK-28039][SQL][TEST] Port float4.sql
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2e3ae97 [SPARK-28039][SQL][TEST] Port float4.sql 2e3ae97 is described below commit 2e3ae97668f9170c820ec5564edc50dff8347915 Author: Yuming Wang AuthorDate: Tue Jun 18 16:22:30 2019 -0700 [SPARK-28039][SQL][TEST] Port float4.sql ## What changes were proposed in this pull request? This PR is to port float4.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out When porting the test cases, found three PostgreSQL specific features that do not exist in Spark SQL: [SPARK-28060](https://issues.apache.org/jira/browse/SPARK-28060): Float type can not accept some special inputs [SPARK-28027](https://issues.apache.org/jira/browse/SPARK-28027): Spark SQL does not support prefix operator `` [SPARK-28061](https://issues.apache.org/jira/browse/SPARK-28061): Support for converting float to binary format Also, found a bug: [SPARK-28024](https://issues.apache.org/jira/browse/SPARK-28024): Incorrect value when out of range Also, found three inconsistent behavior: [SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL insert there bad inputs to NULL [SPARK-28028](https://issues.apache.org/jira/browse/SPARK-28028): Cast numeric to integral type need round [SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL returns NULL when dividing by zero ## How was this patch tested? N/A Closes #24887 from wangyum/SPARK-28039. Authored-by: Yuming Wang Signed-off-by: gatorsmile --- .../resources/sql-tests/inputs/pgSQL/float4.sql| 363 .../sql-tests/results/pgSQL/float4.sql.out | 379 + 2 files changed, 742 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql new file mode 100644 index 000..9e684d1 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql @@ -0,0 +1,363 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- FLOAT4 +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql + +CREATE TABLE FLOAT4_TBL (f1 float) USING parquet; + +INSERT INTO FLOAT4_TBL VALUES ('0.0'); +INSERT INTO FLOAT4_TBL VALUES ('1004.30 '); +INSERT INTO FLOAT4_TBL VALUES (' -34.84'); +INSERT INTO FLOAT4_TBL VALUES ('1.2345678901234e+20'); +INSERT INTO FLOAT4_TBL VALUES ('1.2345678901234e-20'); + +-- [SPARK-28024] Incorrect numeric values when out of range +-- test for over and under flow +-- INSERT INTO FLOAT4_TBL VALUES ('10e70'); +-- INSERT INTO FLOAT4_TBL VALUES ('-10e70'); +-- INSERT INTO FLOAT4_TBL VALUES ('10e-70'); +-- INSERT INTO FLOAT4_TBL VALUES ('-10e-70'); + +-- INSERT INTO FLOAT4_TBL VALUES ('10e400'); +-- INSERT INTO FLOAT4_TBL VALUES ('-10e400'); +-- INSERT INTO FLOAT4_TBL VALUES ('10e-400'); +-- INSERT INTO FLOAT4_TBL VALUES ('-10e-400'); + +-- [SPARK-27923] Spark SQL insert there bad inputs to NULL +-- bad input +-- INSERT INTO FLOAT4_TBL VALUES (''); +-- INSERT INTO FLOAT4_TBL VALUES (' '); +-- INSERT INTO FLOAT4_TBL VALUES ('xyz'); +-- INSERT INTO FLOAT4_TBL VALUES ('5.0.0'); +-- INSERT INTO FLOAT4_TBL VALUES ('5 . 0'); +-- INSERT INTO FLOAT4_TBL VALUES ('5. 0'); +-- INSERT INTO FLOAT4_TBL VALUES (' - 3.0'); +-- INSERT INTO FLOAT4_TBL VALUES ('1235'); + +-- special inputs +SELECT float('NaN'); +-- [SPARK-28060] Float type can not accept some special inputs +SELECT float('nan'); +SELECT float(' NAN '); +SELECT float('infinity'); +SELECT float(' -INFINiTY '); +-- [SPARK-27923] Spark SQL insert there bad special inputs to NULL +-- bad special inputs +SELECT float('N A N'); +SELECT float('NaN x'); +SELECT float(' INFINITYx'); + +-- [SPARK-28060] Float type can not accept some special inputs +SELECT float('Infinity') + 100.0; +SELECT float('Infinity') / float('Infinity'); +SELECT float('nan') / float('nan'); +SELECT float(decimal('nan')); + +SELECT '' AS five, * FROM FLOAT4_TBL; + +SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE f.f1 <> '1004.3'; + +SELECT '' AS one, f.* FROM FLOAT4_TBL f WHERE f.f1 = '1004.3'; + +SELECT '' AS three, f.* FROM FLOAT4_TBL f WHERE '1004.3' > f.f1; + +SELECT '' AS three, f.* FROM FLOAT4_TBL f WHERE f.f1 < '1004.3'; + +SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE '1004.3' >= f.f1; + +SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE f.f1 <= '1004.3'; + +SELECT '' AS three, f.
[spark] branch master updated: [SPARK-28088][SQL] Enhance LPAD/RPAD function
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c7f0301 [SPARK-28088][SQL] Enhance LPAD/RPAD function c7f0301 is described below commit c7f0301477da19b41380cef218da447dc8f85a0e Author: Yuming Wang AuthorDate: Tue Jun 18 14:08:18 2019 -0700 [SPARK-28088][SQL] Enhance LPAD/RPAD function ## What changes were proposed in this pull request? This pr enhances `LPAD`/`RPAD` function to make `pad` parameter optional. PostgreSQL, Vertica, Teradata, Oracle and DB2 support make `pad` parameter optional. MySQL, Hive and Presto does not support make `pad` parameter optional. SQL Server does not have `lapd`/`rpad` function. **PostgreSQL**: ``` postgres=# select substr(version(), 0, 16), lpad('hi', 5), rpad('hi', 5); substr | lpad | rpad -+---+--- PostgreSQL 11.3 |hi | hi (1 row) ``` **Vertica**: ``` dbadmin=> select version(), lpad('hi', 5), rpad('hi', 5); version | lpad | rpad +---+--- Vertica Analytic Database v9.1.1-0 |hi | hi (1 row) ``` **Teradata**: ![image](https://user-images.githubusercontent.com/5399861/59656550-89a49300-91d0-11e9-9f26-ed554f49ea34.png) **Oracle**: ![image](https://user-images.githubusercontent.com/5399861/59656591-a9d45200-91d0-11e9-8b0e-3e1f75983099.png) **DB2**: ![image](https://user-images.githubusercontent.com/5399861/59656468-3e8a8000-91d0-11e9-8826-0d854ed7f397.png) More details: https://www.postgresql.org/docs/11/functions-string.html https://docs.teradata.com/reader/kmuOwjp1zEYg98JsB8fu_A/e5w8LujIQDlVmRSww2E27A ## How was this patch tested? unit tests Closes #24899 from wangyum/SPARK-28088. Authored-by: Yuming Wang Signed-off-by: Dongjoon Hyun --- .../catalyst/expressions/stringExpressions.scala | 22 ++ .../expressions/StringExpressionsSuite.scala | 4 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 576eaec..a49b9bf 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -1088,8 +1088,9 @@ case class StringLocate(substr: Expression, str: Expression, start: Expression) */ @ExpressionDescription( usage = """ -_FUNC_(str, len, pad) - Returns `str`, left-padded with `pad` to a length of `len`. +_FUNC_(str, len[, pad]) - Returns `str`, left-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters. + If `pad` is not specified, `str` will be padded to the left with space characters. """, examples = """ Examples: @@ -1097,11 +1098,17 @@ case class StringLocate(substr: Expression, str: Expression, start: Expression) ???hi > SELECT _FUNC_('hi', 1, '??'); h + > SELECT _FUNC_('hi', 5); + hi """, since = "1.5.0") -case class StringLPad(str: Expression, len: Expression, pad: Expression) +case class StringLPad(str: Expression, len: Expression, pad: Expression = Literal(" ")) extends TernaryExpression with ImplicitCastInputTypes { + def this(str: Expression, len: Expression) = { +this(str, len, Literal(" ")) + } + override def children: Seq[Expression] = str :: len :: pad :: Nil override def dataType: DataType = StringType override def inputTypes: Seq[DataType] = Seq(StringType, IntegerType, StringType) @@ -1122,8 +1129,9 @@ case class StringLPad(str: Expression, len: Expression, pad: Expression) */ @ExpressionDescription( usage = """ -_FUNC_(str, len, pad) - Returns `str`, right-padded with `pad` to a length of `len`. +_FUNC_(str, len[, pad]) - Returns `str`, right-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters. + If `pad` is not specified, `str` will be padded to the right with space characters. """, examples = """ Examples: @@ -1131,11 +1139,17 @@ case class StringLPad(str: Expression, len: Expression, pad: Expression) hi??? > SELECT _FUNC_('hi', 1, '??'); h + > SELECT _FUNC_('hi', 5); + hi """, since = "1.5.0") -case class StringRPad(str: Expression, len: Expression, pad: Expression) +case class StringRPad(str: Expression, len: Expression, pad: Expression = Literal(" "))
[spark] branch master updated: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bef5d9d [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue bef5d9d is described below commit bef5d9d6c348e390f99b2cd781a2471d635e55f8 Author: Yuming Wang AuthorDate: Tue Jun 18 13:28:29 2019 -0700 [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue ## What changes were proposed in this pull request? This pr fix `TRIM`/`LTRIM`/`RTRIM` function parameter order issue, otherwise: ```sql spark-sql> SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x'); z spark-sql> SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy'); xyz spark-sql> SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy'); xy spark-sql> ``` ```sql postgres=# SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x'); btrim | btrim ---+--- Tom | bar (1 row) postgres=# SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy'); ltrim |ltrim ---+-- test | XxyLAST WORD (1 row) postgres=# SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy'); rtrim | rtrim ---+--- test | TURNERyxX (1 row) ``` ## How was this patch tested? unit tests Closes #24902 from wangyum/SPARK-28093. Authored-by: Yuming Wang Signed-off-by: Dongjoon Hyun --- .../catalyst/expressions/stringExpressions.scala | 6 +- .../expressions/StringExpressionsSuite.scala | 11 .../sql-tests/inputs/string-functions.sql | 10 .../sql-tests/results/string-functions.sql.out | 66 +- 4 files changed, 89 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 2752dd7..576eaec 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -653,7 +653,7 @@ case class StringTrim( trimStr: Option[Expression] = None) extends String2TrimExpression { - def this(trimStr: Expression, srcStr: Expression) = this(srcStr, Option(trimStr)) + def this(srcStr: Expression, trimStr: Expression) = this(srcStr, Option(trimStr)) def this(srcStr: Expression) = this(srcStr, None) @@ -753,7 +753,7 @@ case class StringTrimLeft( trimStr: Option[Expression] = None) extends String2TrimExpression { - def this(trimStr: Expression, srcStr: Expression) = this(srcStr, Option(trimStr)) + def this(srcStr: Expression, trimStr: Expression) = this(srcStr, Option(trimStr)) def this(srcStr: Expression) = this(srcStr, None) @@ -856,7 +856,7 @@ case class StringTrimRight( trimStr: Option[Expression] = None) extends String2TrimExpression { - def this(trimStr: Expression, srcStr: Expression) = this(srcStr, Option(trimStr)) + def this(srcStr: Expression, trimStr: Expression) = this(srcStr, Option(trimStr)) def this(srcStr: Expression) = this(srcStr, None) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala index 1e7737b..08f42fc 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala @@ -465,6 +465,9 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { // scalastyle:on checkEvaluation(StringTrim(Literal("a"), Literal.create(null, StringType)), null) checkEvaluation(StringTrim(Literal.create(null, StringType), Literal("a")), null) + +checkEvaluation(StringTrim(Literal("yxTomxx"), Literal("xyz")), "Tom") +checkEvaluation(StringTrim(Literal("xxxbarxxx"), Literal("x")), "bar") } test("LTRIM") { @@ -489,6 +492,10 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { // scalastyle:on checkEvaluation(StringTrimLeft(Literal.create(null, StringType), Literal("a")), null) checkEvaluation(StringTrimLeft(Literal("a"), Literal.create(null, StringType)), null) + +checkEvaluation(StringTrimLeft(Literal("zzzytest"), Literal("xyz")), "test") +checkEvaluation(StringTrimLeft(Literal("zzzytestxyz"), Literal("xyz")), "testxyz") +checkEvaluation(StringTrimLeft(Literal("xyxXxyLAST WORD"), Literal("xy")), "XxyLAST WORD") } te
[spark] branch master updated (ed280c2 -> 1ada36b)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed280c2 [SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` codegen on JDK9+ add 1ada36b [SPARK-27783][SQL] Add customizable hint error handler No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../sql/catalyst/analysis/HintErrorLogger.scala| 55 ++ .../spark/sql/catalyst/analysis/ResolveHints.scala | 22 - .../catalyst/optimizer/EliminateResolvedHint.scala | 16 +++ .../spark/sql/catalyst/plans/logical/hints.scala | 43 +++-- .../org/apache/spark/sql/internal/SQLConf.scala| 8 +++- .../scala/org/apache/spark/sql/JoinHintSuite.scala | 4 +- 7 files changed, 120 insertions(+), 30 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HintErrorLogger.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on issue #207: Fix source download link
dongjoon-hyun commented on issue #207: Fix source download link URL: https://github.com/apache/spark-website/pull/207#issuecomment-503199475 +1, late LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix source download link
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 92a Fix source download link 92a is described below commit 92a312dc70845c3525c9a8921ecd6f3567d6 Author: Sean Owen AuthorDate: Tue Jun 18 10:56:10 2019 -0500 Fix source download link The download javascript wasn't correctly generating the source file name. Also, minor, it wasn't correctly respecting whether it should be obtained from the mirror network or archive server. Author: Sean Owen Closes #207 from srowen/DownloadJS. --- js/downloads.js | 12 ++-- site/js/downloads.js | 12 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/js/downloads.js b/js/downloads.js index bf56d75..f8de69e 100644 --- a/js/downloads.js +++ b/js/downloads.js @@ -12,10 +12,10 @@ function addRelease(version, releaseDate, packages, mirrored) { } var sources = {pretty: "Source Code", tag: "sources"}; -var hadoopFree = {pretty: "Pre-build with user-provided Apache Hadoop", tag: "without-hadoop"}; +var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"}; var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7 and later", tag: "hadoop2.7"}; -var scala2p12_hadoopFree = {pretty: "Pre-build with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; +var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; // 2.2.0+ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources]; @@ -86,10 +86,10 @@ function onVersionSelect() { } // Populate releases - updateDownloadLink(releases[version].mirrored); + updateDownloadLink(); } -function updateDownloadLink(isMirrored) { +function updateDownloadLink() { var versionSelect = document.getElementById("sparkVersionSelect"); var packageSelect = document.getElementById("sparkPackageSelect"); var downloadLink = document.getElementById("spanDownloadLink"); @@ -102,10 +102,10 @@ function updateDownloadLink(isMirrored) { var pkg = getSelectedValue(packageSelect); var artifactName = "spark-" + version + "-bin-" + pkg + ".tgz" -.replace(/-bin-sources/, ""); // special case for source packages + artifactName = artifactName.replace(/-bin-sources/, ""); // special case for source packages var downloadHref = ""; - if (isMirrored) { + if (releases[version].mirrored) { downloadHref = "https://www.apache.org/dyn/closer.lua/spark/spark-"; + version + "/" + artifactName; } else { downloadHref = "https://archive.apache.org/dist/spark/spark-"; + version + "/" + artifactName; diff --git a/site/js/downloads.js b/site/js/downloads.js index bf56d75..f8de69e 100644 --- a/site/js/downloads.js +++ b/site/js/downloads.js @@ -12,10 +12,10 @@ function addRelease(version, releaseDate, packages, mirrored) { } var sources = {pretty: "Source Code", tag: "sources"}; -var hadoopFree = {pretty: "Pre-build with user-provided Apache Hadoop", tag: "without-hadoop"}; +var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"}; var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7 and later", tag: "hadoop2.7"}; -var scala2p12_hadoopFree = {pretty: "Pre-build with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; +var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; // 2.2.0+ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources]; @@ -86,10 +86,10 @@ function onVersionSelect() { } // Populate releases - updateDownloadLink(releases[version].mirrored); + updateDownloadLink(); } -function updateDownloadLink(isMirrored) { +function updateDownloadLink() { var versionSelect = document.getElementById("sparkVersionSelect"); var packageSelect = document.getElementById("sparkPackageSelect"); var downloadLink = document.getElementById("spanDownloadLink"); @@ -102,10 +102,10 @@ function updateDownloadLink(isMirrored) { var pkg = getSelectedValue(packageSelect); var artifactName = "spark-" + version + "-bin-" + pkg + ".tgz" -.replace(/-bin-sources/, ""); // special case for source packages + artifactName = artifactName.replace(/-bin-sources/, ""); // special case for source packages var downloadHref = ""; - if (isMirrored) { + if (releases[version].mirrored) { downloadHref = "https://www.apache.org/dyn/closer.lua/spark/spark-"; + version + "/" + artifactName; } else { downloadHref = "https://archive.apache.org/dist/spa
[GitHub] [spark-website] srowen closed pull request #207: Fix source download link
srowen closed pull request #207: Fix source download link URL: https://github.com/apache/spark-website/pull/207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen opened a new pull request #207: Fix source download link
srowen opened a new pull request #207: Fix source download link URL: https://github.com/apache/spark-website/pull/207 The download javascript wasn't correctly generating the source file name. Also, minor, it wasn't correctly respecting whether it should be obtained from the mirror network or archive server. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` codegen on JDK9+
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ed280c2 [SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` codegen on JDK9+ ed280c2 is described below commit ed280c23ca396087fc62d5a6412179f6a0103245 Author: Dongjoon Hyun AuthorDate: Tue Jun 18 00:08:37 2019 -0700 [SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` codegen on JDK9+ ## What changes were proposed in this pull request? With JDK9+, the generate **bytecode** of `FromUnixTime` raise `java.lang.IncompatibleClassChangeError` due to [JDK-8145148](https://bugs.openjdk.java.net/browse/JDK-8145148) . This is a blocker in [Apache Spark JDK11 Jenkins job](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-jdk-11-ubuntu-testing/). Locally, this is reproducible by the following unit test suite with JDK9+. ``` $ build/sbt "catalyst/testOnly *.DateExpressionsSuite" ... [info] org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite *** ABORTED *** (23 seconds, 75 milliseconds) [info] java.lang.IncompatibleClassChangeError: Method org.apache.spark.sql.catalyst.util.TimestampFormatter.apply(Ljava/lang/String;Ljava/time/ZoneId;Ljava/util/Locale;)Lorg/apache/spark/sql/catalyst/util/TimestampFormatter; must be InterfaceMeth ``` This bytecode issue is generated by `Janino` , so we replace `.apply` to `.MODULE$$.apply` and adds test coverage for similar codes. ## How was this patch tested? Manually with the existing UTs by doing the following with JDK9+. ``` build/sbt "catalyst/testOnly *.DateExpressionsSuite" ``` Actually, this is the last JDK11 error in `catalyst` module. So, we can verify with the following, too. ``` $ build/sbt "project catalyst" test ... [info] Total number of tests run: 3552 [info] Suites: completed 210, aborted 0 [info] Tests: succeeded 3552, failed 0, canceled 0, ignored 2, pending 0 [info] All tests passed. [info] Passed: Total 3583, Failed 0, Errors 0, Passed 3583, Ignored 2 [success] Total time: 294 s, completed Jun 16, 2019, 10:15:08 PM ``` Closes #24889 from dongjoon-hyun/SPARK-28072. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../catalyst/expressions/datetimeExpressions.scala | 2 +- .../expressions/DateExpressionsSuite.scala | 27 ++ 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 1e6a3aa..ccf6b36 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -863,7 +863,7 @@ case class FromUnixTime(sec: Expression, format: Expression, timeZoneId: Option[ nullSafeCodeGen(ctx, ev, (seconds, f) => { s""" try { - ${ev.value} = UTF8String.fromString($tf.apply($f.toString(), $zid, $locale). + ${ev.value} = UTF8String.fromString($tf$$.MODULE$$.apply($f.toString(), $zid, $locale). format($seconds * 100L)); } catch (java.lang.IllegalArgumentException e) { ${ev.isNull} = true; diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala index 88607d1..04bb61a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala @@ -268,6 +268,15 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(DateFormatClass(Cast(Literal(d), TimestampType, jstId), Literal("H"), jstId), "0") checkEvaluation(DateFormatClass(Literal(ts), Literal("H"), jstId), "22") + +// SPARK-28072 The codegen path should work +checkEvaluation( + expression = DateFormatClass( +BoundReference(ordinal = 0, dataType = TimestampType, nullable = true), +BoundReference(ordinal = 1, dataType = StringType, nullable = true), +jstId), + expected = "22", + inputRow = InternalRow(DateTimeUtils.fromJavaTimestamp(ts), UTF8String.fromString("H"))) } test("Hour") { @@ -683,14 +692,14 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation( FromUnixTime(Lite