date:20190618

[spark] branch master updated: [SPARK-27890][SQL] Improve SQL parser error message for character-only identifier with hyphens except those in expressions

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b7f16f  [SPARK-27890][SQL] Improve SQL parser error message for 
character-only identifier with hyphens except those in expressions
7b7f16f is described below

commit 7b7f16f2a7a6a6685a8917a9b5ba403fff76
Author: Yesheng Ma 
AuthorDate: Tue Jun 18 21:51:15 2019 -0700

[SPARK-27890][SQL] Improve SQL parser error message for character-only 
identifier with hyphens except those in expressions

## What changes were proposed in this pull request?

Current SQL parser's error message for hyphen-connected identifiers without 
surrounding backquotes(e.g. hyphen-table) is confusing for end users. A 
possible approach to tackle this is to explicitly capture these wrong usages in 
the SQL parser. In this way, the end users can fix these errors more quickly.

For example, for a simple query such as `SELECT * FROM test-table`, the 
original error message is
```
Error in SQL statement: ParseException:
mismatched input '-' expecting (line 1, pos 18)
```
which can be confusing in a large query.

After the fix, the error message is:
```
Error in query:
Possibly unquoted identifier test-table detected. Please consider quoting 
it with back-quotes as `test-table`(line 1, pos 14)

== SQL ==
SELECT * FROM test-table
--^^^
```
which is easier for end users to identify the issue and fix.

We safely augmented the current grammar rule to explicitly capture these 
error cases. The error handling logic is implemented in the SQL parsing 
listener `PostProcessor`.

However, note that for cases such as `a - my-func(b)`, the parser can't 
actually tell whether this should be ``a -`my-func`(b) `` or `a - my - 
func(b)`. Therefore for these cases, we leave the parser as is. Also, in this 
patch we only provide better error messages for character-only identifiers.

## How was this patch tested?
Adding new unit tests.

Closes #24749 from yeshengm/hyphen-ident.

Authored-by: Yesheng Ma 
Signed-off-by: gatorsmile 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  60 ++-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  16 +--
 .../spark/sql/catalyst/parser/ParseDriver.scala|   8 ++
 .../sql/catalyst/parser/ErrorParserSuite.scala | 110 +
 .../spark/sql/execution/SparkSqlParser.scala   |  10 +-
 5 files changed, 169 insertions(+), 35 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index dcb7939..f57a659 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -82,13 +82,15 @@ singleTableSchema
 statement
 : query
#statementDefault
 | ctes? dmlStatementNoWith 
#dmlStatement
-| USE db=identifier#use
-| CREATE database (IF NOT EXISTS)? identifier
+| USE db=errorCapturingIdentifier  #use
+| CREATE database (IF NOT EXISTS)? db=errorCapturingIdentifier
 ((COMMENT comment=STRING) |
  locationSpec |
  (WITH DBPROPERTIES tablePropertyList))*   
#createDatabase
-| ALTER database identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
-| DROP database (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
+| ALTER database db=errorCapturingIdentifier
+SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
+| DROP database (IF EXISTS)? db=errorCapturingIdentifier
+(RESTRICT | CASCADE)?  
#dropDatabase
 | SHOW DATABASES (LIKE? pattern=STRING)?   
#showDatabases
 | createTableHeader ('(' colTypeList ')')? tableProvider
 ((OPTIONS options=tablePropertyList) |
@@ -135,7 +137,8 @@ statement
 (ALTER | CHANGE) COLUMN? qualifiedName
 (TYPE dataType)? (COMMENT comment=STRING)? colPosition?
#alterTableColumn
 | ALTER TABLE tableIdentifier partitionSpec?
-CHANGE COLUMN? identifier colType colPosition? 
#changeColumn
+CHANGE COLUMN?
+colName=errorCapturingIdentifier colType colPosition?  
#changeColumn
 | ALTER TABLE tableIdentifier (partitionSpec)?
 SET SERDE STRING (WITH SERDEPROPERTIES tablePropertyList)? 
#setTableSerDe
 | ALTER TABLE tableIdentifier (pa

[spark] branch master updated (a5dcb82 -> 15de6d0)

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a5dcb82  [SPARK-27105][SQL] Optimize away exponential complexity in 
ORC predicate conversion
 add 15de6d0  [SPARK-28096][SQL] Convert defs to lazy vals to avoid 
expensive reference computation in QueryPlan and Expression

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Expression.scala   |  9 -
 .../catalyst/expressions/aggregate/interfaces.scala   |  3 ++-
 .../spark/sql/catalyst/expressions/grouping.scala |  8 ++--
 .../sql/catalyst/expressions/namedExpressions.scala   |  3 ++-
 .../apache/spark/sql/catalyst/plans/QueryPlan.scala   |  7 +--
 .../sql/catalyst/plans/logical/LogicalPlan.scala  |  2 +-
 .../catalyst/plans/logical/QueryPlanConstraints.scala |  2 +-
 .../catalyst/plans/logical/ScriptTransformation.scala |  3 ++-
 .../plans/logical/basicLogicalOperators.scala | 19 ++-
 .../spark/sql/catalyst/plans/logical/object.scala |  9 ++---
 .../plans/logical/pythonLogicalOperators.scala|  3 ++-
 .../org/apache/spark/sql/execution/ExpandExec.scala   |  3 ++-
 .../org/apache/spark/sql/execution/objects.scala  |  3 ++-
 .../spark/sql/execution/python/EvalPythonExec.scala   |  3 ++-
 14 files changed, 51 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion

2019-06-18 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a5dcb82  [SPARK-27105][SQL] Optimize away exponential complexity in 
ORC predicate conversion
a5dcb82 is described below

commit a5dcb82b5a6b08ebfe168e735f6edb40b80420fd
Author: Ivan Vergiliev 
AuthorDate: Wed Jun 19 10:44:58 2019 +0800

[SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate 
conversion

## What changes were proposed in this pull request?

`OrcFilters.createBuilder` has exponential complexity in the height of the 
filter tree due to the way the check-and-build pattern is implemented. We've 
hit this in production by passing a `Column` filter to Spark directly, with a 
job taking multiple hours for a simple set of ~30 filters. This PR changes the 
checking logic so that the conversion has linear complexity in the size of the 
tree instead of exponential in its height.

Right now, due to the way ORC `SearchArgument` works, the code is forced to 
do two separate phases when converting a given Spark filter to an ORC filter:
1. Check if the filter is convertible.
2. Only if the check in 1. succeeds, perform the actual conversion into the 
resulting ORC filter.

However, there's one detail which is the culprit in the exponential 
complexity: phases 1. and 2. are both done using the exact same method. The 
resulting exponential complexity is easiest to see in the `NOT` case - consider 
the following code:

```
val f1 = col("id") === lit(5)
val f2 = !f1
val f3 = !f2
val f4 = !f3
val f5 = !f4
```

Now, when we run `createBuilder` on `f5`, we get the following behaviour:
1. call `createBuilder(f4)` to check if the child `f4` is convertible
2. call `createBuilder(f4)` to actually convert it

This seems fine when looking at a single level, but what actually ends up 
happening is:
- `createBuilder(f3)` will then recursively be called 4 times - 2 times in 
step 1., and two times in step 2.
- `createBuilder(f2)` will be called 8 times - 4 times in each top-level 
step, 2 times in each sub-step.
- `createBuilder(f1)` will be called 16 times.

As a result, having a tree of height > 30 leads to billions of calls to 
`createBuilder`, heap allocations, and so on and can take multiple hours.

The way this PR solves this problem is by separating the `check` and 
`convert` functionalities into separate functions. This way, the call to 
`createBuilder` on `f5` above would look like this:
1. call `isConvertible(f4)` to check if the child `f4` is convertible - 
amortized constant complexity
2. call `createBuilder(f4)` to actually convert it - linear complexity in 
the size of the subtree.

This way, we get an overall complexity that's linear in the size of the 
filter tree, allowing us to convert tree with 10s of thousands of nodes in 
milliseconds.

The reason this split (`check` and `build`) is possible is that the 
checking never actually depends on the actual building of the filter. The 
`check` part of `createBuilder` depends mainly on:
- `isSearchableType` for leaf nodes, and
- `check`-ing the child filters for composite nodes like NOT, AND and OR.
Situations like the `SearchArgumentBuilder` throwing an exception while 
building the resulting ORC filter are not handled right now - they just get 
thrown out of the class, and this change preserves this behaviour.

This PR extracts this part of the code to a separate class which allows the 
conversion to make very efficient checks to confirm that a given child is 
convertible before actually converting it.

Results:
Before:
- converting a skewed tree with a height of ~35 took about 6-7 hours.
- converting a skewed tree with hundreds or thousands of nodes would be 
completely impossible.

Now:
- filtering against a skewed tree with a height of 1500 in the benchmark 
suite finishes in less than 10 seconds.

## Steps to reproduce
```scala
val schema = StructType.fromDDL("col INT")
(20 to 30).foreach { width =>
  val whereFilter = (1 to width).map(i => EqualTo("col", i)).reduceLeft(Or)
  val start = System.currentTimeMillis()
  OrcFilters.createFilter(schema, Seq(whereFilter))
  println(s"With $width filters, conversion takes 
${System.currentTimeMillis() - start} ms")
}
```

### Before this PR
```
With 20 filters, conversion takes 363 ms
With 21 filters, conversion takes 496 ms
With 22 filters, conversion takes 939 ms
With 23 filters, conversion takes 1871 ms
With 24 filters, conversion takes 3756 ms
With 25 filters, conversion takes 7452 ms
With 26 filters, conversion takes 14978 ms
With 27 filters, conversion tak

[spark] branch branch-2.3 updated: [SPARK-28081][ML] Handle large vocab counts in word2vec

2019-06-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 220f29a  [SPARK-28081][ML] Handle large vocab counts in word2vec
220f29a is described below

commit 220f29a6f5b681a67a7e9a9351f25389c303b956
Author: Sean Owen 
AuthorDate: Tue Jun 18 20:27:43 2019 -0500

[SPARK-28081][ML] Handle large vocab counts in word2vec

## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We 
should be able to handle very large counts generally better here by using longs 
to count.

This takes over https://github.com/apache/spark/pull/24814

## How was this patch tested?

Existing tests.

Closes #24893 from srowen/SPARK-28081.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5)
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala  | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index b8c306d..d5b91df 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -45,7 +45,7 @@ import org.apache.spark.util.random.XORShiftRandom
  */
 private case class VocabWord(
   var word: String,
-  var cn: Int,
+  var cn: Long,
   var point: Array[Int],
   var code: Array[Int],
   var codeLen: Int
@@ -194,7 +194,7 @@ class Word2Vec extends Serializable with Logging {
 new Array[Int](MAX_CODE_LENGTH),
 0))
   .collect()
-  .sortWith((a, b) => a.cn > b.cn)
+  .sortBy(_.cn)(Ordering[Long].reverse)
 
 vocabSize = vocab.length
 require(vocabSize > 0, "The vocabulary size should be > 0. You may need to 
check " +
@@ -232,7 +232,7 @@ class Word2Vec extends Serializable with Logging {
   a += 1
 }
 while (a < 2 * vocabSize) {
-  count(a) = 1e9.toInt
+  count(a) = Long.MaxValue
   a += 1
 }
 var pos1 = vocabSize - 1
@@ -267,6 +267,8 @@ class Word2Vec extends Serializable with Logging {
 min2i = pos2
 pos2 += 1
   }
+  assert(count(min1i) < Long.MaxValue)
+  assert(count(min2i) < Long.MaxValue)
   count(vocabSize + a) = count(min1i) + count(min2i)
   parentNode(min1i) = vocabSize + a
   parentNode(min2i) = vocabSize + a


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-28081][ML] Handle large vocab counts in word2vec

2019-06-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new e4f5d84  [SPARK-28081][ML] Handle large vocab counts in word2vec
e4f5d84 is described below

commit e4f5d84874bb0ad30fdf19aeaf2a7ac756830dbf
Author: Sean Owen 
AuthorDate: Tue Jun 18 20:27:43 2019 -0500

[SPARK-28081][ML] Handle large vocab counts in word2vec

## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We 
should be able to handle very large counts generally better here by using longs 
to count.

This takes over https://github.com/apache/spark/pull/24814

## How was this patch tested?

Existing tests.

Closes #24893 from srowen/SPARK-28081.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5)
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala  | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index b8c306d..d5b91df 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -45,7 +45,7 @@ import org.apache.spark.util.random.XORShiftRandom
  */
 private case class VocabWord(
   var word: String,
-  var cn: Int,
+  var cn: Long,
   var point: Array[Int],
   var code: Array[Int],
   var codeLen: Int
@@ -194,7 +194,7 @@ class Word2Vec extends Serializable with Logging {
 new Array[Int](MAX_CODE_LENGTH),
 0))
   .collect()
-  .sortWith((a, b) => a.cn > b.cn)
+  .sortBy(_.cn)(Ordering[Long].reverse)
 
 vocabSize = vocab.length
 require(vocabSize > 0, "The vocabulary size should be > 0. You may need to 
check " +
@@ -232,7 +232,7 @@ class Word2Vec extends Serializable with Logging {
   a += 1
 }
 while (a < 2 * vocabSize) {
-  count(a) = 1e9.toInt
+  count(a) = Long.MaxValue
   a += 1
 }
 var pos1 = vocabSize - 1
@@ -267,6 +267,8 @@ class Word2Vec extends Serializable with Logging {
 min2i = pos2
 pos2 += 1
   }
+  assert(count(min1i) < Long.MaxValue)
+  assert(count(min2i) < Long.MaxValue)
   count(vocabSize + a) = count(min1i) + count(min2i)
   parentNode(min1i) = vocabSize + a
   parentNode(min2i) = vocabSize + a


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28081][ML] Handle large vocab counts in word2vec

2019-06-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e96dd82  [SPARK-28081][ML] Handle large vocab counts in word2vec
e96dd82 is described below

commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5
Author: Sean Owen 
AuthorDate: Tue Jun 18 20:27:43 2019 -0500

[SPARK-28081][ML] Handle large vocab counts in word2vec

## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We 
should be able to handle very large counts generally better here by using longs 
to count.

This takes over https://github.com/apache/spark/pull/24814

## How was this patch tested?

Existing tests.

Closes #24893 from srowen/SPARK-28081.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala  | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index 9e19ff2..7888a80 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -46,7 +46,7 @@ import org.apache.spark.util.random.XORShiftRandom
  */
 private case class VocabWord(
   var word: String,
-  var cn: Int,
+  var cn: Long,
   var point: Array[Int],
   var code: Array[Int],
   var codeLen: Int
@@ -195,7 +195,7 @@ class Word2Vec extends Serializable with Logging {
 new Array[Int](MAX_CODE_LENGTH),
 0))
   .collect()
-  .sortWith((a, b) => a.cn > b.cn)
+  .sortBy(_.cn)(Ordering[Long].reverse)
 
 vocabSize = vocab.length
 require(vocabSize > 0, "The vocabulary size should be > 0. You may need to 
check " +
@@ -233,7 +233,7 @@ class Word2Vec extends Serializable with Logging {
   a += 1
 }
 while (a < 2 * vocabSize) {
-  count(a) = 1e9.toInt
+  count(a) = Long.MaxValue
   a += 1
 }
 var pos1 = vocabSize - 1
@@ -268,6 +268,8 @@ class Word2Vec extends Serializable with Logging {
 min2i = pos2
 pos2 += 1
   }
+  assert(count(min1i) < Long.MaxValue)
+  assert(count(min2i) < Long.MaxValue)
   count(vocabSize + a) = count(min1i) + count(min2i)
   parentNode(min1i) = vocabSize + a
   parentNode(min2i) = vocabSize + a


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27823][CORE] Refactor resource handling code

2019-06-18 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7056e00  [SPARK-27823][CORE] Refactor resource handling code
7056e00 is described below

commit 7056e004ee566fabbb9b22ddee2de55ef03260db
Author: Xiangrui Meng 
AuthorDate: Tue Jun 18 17:18:17 2019 -0700

[SPARK-27823][CORE] Refactor resource handling code

## What changes were proposed in this pull request?

Continue the work from https://github.com/apache/spark/pull/24821. Refactor 
resource handling code to make the code more readable. Major changes:

* Moved resource-related classes to `spark.resource` from `spark`.
* Added ResourceUtils and helper classes so we don't need to directly deal 
with Spark conf.
 * ResourceID: resource identifier and it provides conf keys
 * ResourceRequest/Allocation: abstraction for requested and allocated 
resources
* Added `TestResourceIDs` to reference commonly used resource IDs in tests 
like `spark.executor.resource.gpu`.

cc: tgravescs jiangxb1987 Ngone51

## How was this patch tested?

Unit tests for added utils and existing unit tests.

Closes #24856 from mengxr/SPARK-27823.

Lead-authored-by: Xiangrui Meng 
Co-authored-by: Thomas Graves 
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/BarrierTaskContext.scala  |   1 +
 .../org/apache/spark/ResourceDiscoverer.scala  | 151 
 .../org/apache/spark/ResourceInformation.scala |  37 ---
 .../main/scala/org/apache/spark/SparkConf.scala|  45 
 .../main/scala/org/apache/spark/SparkContext.scala |  94 +++-
 .../main/scala/org/apache/spark/TaskContext.scala  |   4 +-
 .../scala/org/apache/spark/TaskContextImpl.scala   |   1 +
 .../main/scala/org/apache/spark/TestUtils.scala|  30 ++-
 .../executor/CoarseGrainedExecutorBackend.scala|  52 ++---
 .../org/apache/spark/internal/config/package.scala |  10 +-
 .../spark/resource/ResourceInformation.scala   |  87 +++
 .../org/apache/spark/resource/ResourceUtils.scala  | 191 +++
 .../scala/org/apache/spark/scheduler/Task.scala|   1 +
 .../apache/spark/scheduler/TaskDescription.scala   |   2 +-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |  10 +-
 .../apache/spark/scheduler/TaskSetManager.scala|  11 +-
 .../cluster/CoarseGrainedClusterMessage.scala  |   2 +-
 .../org/apache/spark/ResourceDiscovererSuite.scala | 236 ---
 .../scala/org/apache/spark/SparkConfSuite.scala|  53 +
 .../scala/org/apache/spark/SparkContextSuite.scala |  93 +++-
 .../CoarseGrainedExecutorBackendSuite.scala| 159 +++--
 .../org/apache/spark/executor/ExecutorSuite.scala  |   1 +
 .../spark/resource/ResourceInformationSuite.scala  |  64 +
 .../apache/spark/resource/ResourceUtilsSuite.scala | 259 +
 .../TestResourceIDs.scala} |  17 +-
 .../CoarseGrainedSchedulerBackendSuite.scala   |   8 +-
 .../scheduler/ExecutorResourceInfoSuite.scala  |   2 +-
 .../spark/scheduler/TaskDescriptionSuite.scala |   4 +-
 .../spark/scheduler/TaskSchedulerImplSuite.scala   |  10 +-
 .../spark/scheduler/TaskSetManagerSuite.scala  |   5 +-
 .../apache/spark/deploy/k8s/KubernetesUtils.scala  |  20 +-
 .../k8s/features/BasicDriverFeatureStep.scala  |   2 +-
 .../k8s/features/BasicExecutorFeatureStep.scala|   2 +-
 .../k8s/features/BasicDriverFeatureStepSuite.scala |  14 +-
 .../features/BasicExecutorFeatureStepSuite.scala   |  37 ++-
 .../k8s/features/KubernetesFeaturesTestUtils.scala |   3 +-
 .../MesosFineGrainedSchedulerBackendSuite.scala|   3 +-
 .../org/apache/spark/deploy/yarn/Client.scala  |   2 +-
 .../spark/deploy/yarn/ResourceRequestHelper.scala  |   9 +-
 .../apache/spark/deploy/yarn/YarnAllocator.scala   |   2 +-
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala|   8 +-
 .../YarnCoarseGrainedExecutorBackend.scala |   3 +-
 .../org/apache/spark/deploy/yarn/ClientSuite.scala |   8 +-
 .../spark/deploy/yarn/YarnAllocatorSuite.scala |   6 +-
 44 files changed, 908 insertions(+), 851 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala 
b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
index cf957ff..c393df8 100644
--- a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
@@ -26,6 +26,7 @@ import org.apache.spark.executor.TaskMetrics
 import org.apache.spark.internal.Logging
 import org.apache.spark.memory.TaskMemoryManager
 import org.apache.spark.metrics.source.Source
+import org.apache.spark.resource.ResourceInformation
 import org.apache.spark.rpc.{RpcEndpointRef, RpcTimeout}
 import org.apache.spark.shuffle.FetchFailedException
 import org

[spark] branch master updated: [SPARK-28039][SQL][TEST] Port float4.sql

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e3ae97  [SPARK-28039][SQL][TEST] Port float4.sql
2e3ae97 is described below

commit 2e3ae97668f9170c820ec5564edc50dff8347915
Author: Yuming Wang 
AuthorDate: Tue Jun 18 16:22:30 2019 -0700

[SPARK-28039][SQL][TEST] Port float4.sql

## What changes were proposed in this pull request?

This PR is to port float4.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out

When porting the test cases, found three PostgreSQL specific features that 
do not exist in Spark SQL:
[SPARK-28060](https://issues.apache.org/jira/browse/SPARK-28060): Float 
type can not accept some special inputs
[SPARK-28027](https://issues.apache.org/jira/browse/SPARK-28027): Spark SQL 
does not support prefix operator ``
[SPARK-28061](https://issues.apache.org/jira/browse/SPARK-28061): Support 
for converting float to binary format

Also, found a bug:
[SPARK-28024](https://issues.apache.org/jira/browse/SPARK-28024): Incorrect 
value when out of range

Also, found three inconsistent behavior:
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL 
insert there bad inputs to NULL
[SPARK-28028](https://issues.apache.org/jira/browse/SPARK-28028): Cast 
numeric to integral type need round
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL 
returns NULL when dividing by zero

## How was this patch tested?

N/A

Closes #24887 from wangyum/SPARK-28039.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../resources/sql-tests/inputs/pgSQL/float4.sql| 363 
 .../sql-tests/results/pgSQL/float4.sql.out | 379 +
 2 files changed, 742 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql
new file mode 100644
index 000..9e684d1
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql
@@ -0,0 +1,363 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- FLOAT4
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql
+
+CREATE TABLE FLOAT4_TBL (f1  float) USING parquet;
+
+INSERT INTO FLOAT4_TBL VALUES ('0.0');
+INSERT INTO FLOAT4_TBL VALUES ('1004.30   ');
+INSERT INTO FLOAT4_TBL VALUES (' -34.84');
+INSERT INTO FLOAT4_TBL VALUES ('1.2345678901234e+20');
+INSERT INTO FLOAT4_TBL VALUES ('1.2345678901234e-20');
+
+-- [SPARK-28024] Incorrect numeric values when out of range
+-- test for over and under flow
+-- INSERT INTO FLOAT4_TBL VALUES ('10e70');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e70');
+-- INSERT INTO FLOAT4_TBL VALUES ('10e-70');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e-70');
+
+-- INSERT INTO FLOAT4_TBL VALUES ('10e400');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e400');
+-- INSERT INTO FLOAT4_TBL VALUES ('10e-400');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e-400');
+
+-- [SPARK-27923] Spark SQL insert there bad inputs to NULL
+-- bad input
+-- INSERT INTO FLOAT4_TBL VALUES ('');
+-- INSERT INTO FLOAT4_TBL VALUES ('   ');
+-- INSERT INTO FLOAT4_TBL VALUES ('xyz');
+-- INSERT INTO FLOAT4_TBL VALUES ('5.0.0');
+-- INSERT INTO FLOAT4_TBL VALUES ('5 . 0');
+-- INSERT INTO FLOAT4_TBL VALUES ('5.   0');
+-- INSERT INTO FLOAT4_TBL VALUES (' - 3.0');
+-- INSERT INTO FLOAT4_TBL VALUES ('1235');
+
+-- special inputs
+SELECT float('NaN');
+-- [SPARK-28060] Float type can not accept some special inputs
+SELECT float('nan');
+SELECT float('   NAN  ');
+SELECT float('infinity');
+SELECT float('  -INFINiTY   ');
+-- [SPARK-27923] Spark SQL insert there bad special inputs to NULL
+-- bad special inputs
+SELECT float('N A N');
+SELECT float('NaN x');
+SELECT float(' INFINITYx');
+
+-- [SPARK-28060] Float type can not accept some special inputs
+SELECT float('Infinity') + 100.0;
+SELECT float('Infinity') / float('Infinity');
+SELECT float('nan') / float('nan');
+SELECT float(decimal('nan'));
+
+SELECT '' AS five, * FROM FLOAT4_TBL;
+
+SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE f.f1 <> '1004.3';
+
+SELECT '' AS one, f.* FROM FLOAT4_TBL f WHERE f.f1 = '1004.3';
+
+SELECT '' AS three, f.* FROM FLOAT4_TBL f WHERE '1004.3' > f.f1;
+
+SELECT '' AS three, f.* FROM FLOAT4_TBL f WHERE  f.f1 < '1004.3';
+
+SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE '1004.3' >= f.f1;
+
+SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE  f.f1 <= '1004.3';
+
+SELECT '' AS three, f.

[spark] branch master updated: [SPARK-28088][SQL] Enhance LPAD/RPAD function

2019-06-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c7f0301  [SPARK-28088][SQL] Enhance LPAD/RPAD function
c7f0301 is described below

commit c7f0301477da19b41380cef218da447dc8f85a0e
Author: Yuming Wang 
AuthorDate: Tue Jun 18 14:08:18 2019 -0700

[SPARK-28088][SQL] Enhance LPAD/RPAD function

## What changes were proposed in this pull request?

This pr enhances `LPAD`/`RPAD` function to make `pad` parameter optional.

PostgreSQL, Vertica, Teradata, Oracle and DB2 support make `pad` parameter 
optional. MySQL, Hive and Presto does not support make `pad` parameter 
optional. SQL Server does not have `lapd`/`rpad` function.
**PostgreSQL**:
```
postgres=# select substr(version(), 0, 16), lpad('hi', 5), rpad('hi', 5);
 substr  | lpad  | rpad
-+---+---
 PostgreSQL 11.3 |hi | hi
(1 row)
```
**Vertica**:
```
dbadmin=> select version(), lpad('hi', 5), rpad('hi', 5);
  version   | lpad  | rpad
+---+---
 Vertica Analytic Database v9.1.1-0 |hi | hi
(1 row)
```
**Teradata**:

![image](https://user-images.githubusercontent.com/5399861/59656550-89a49300-91d0-11e9-9f26-ed554f49ea34.png)
**Oracle**:

![image](https://user-images.githubusercontent.com/5399861/59656591-a9d45200-91d0-11e9-8b0e-3e1f75983099.png)
**DB2**:

![image](https://user-images.githubusercontent.com/5399861/59656468-3e8a8000-91d0-11e9-8826-0d854ed7f397.png)

More details:
https://www.postgresql.org/docs/11/functions-string.html

https://docs.teradata.com/reader/kmuOwjp1zEYg98JsB8fu_A/e5w8LujIQDlVmRSww2E27A

## How was this patch tested?

unit tests

Closes #24899 from wangyum/SPARK-28088.

Authored-by: Yuming Wang 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/expressions/stringExpressions.scala   | 22 ++
 .../expressions/StringExpressionsSuite.scala   |  4 
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 576eaec..a49b9bf 100755
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -1088,8 +1088,9 @@ case class StringLocate(substr: Expression, str: 
Expression, start: Expression)
  */
 @ExpressionDescription(
   usage = """
-_FUNC_(str, len, pad) - Returns `str`, left-padded with `pad` to a length 
of `len`.
+_FUNC_(str, len[, pad]) - Returns `str`, left-padded with `pad` to a 
length of `len`.
   If `str` is longer than `len`, the return value is shortened to `len` 
characters.
+  If `pad` is not specified, `str` will be padded to the left with space 
characters.
   """,
   examples = """
 Examples:
@@ -1097,11 +1098,17 @@ case class StringLocate(substr: Expression, str: 
Expression, start: Expression)
???hi
   > SELECT _FUNC_('hi', 1, '??');
h
+  > SELECT _FUNC_('hi', 5);
+  hi
   """,
   since = "1.5.0")
-case class StringLPad(str: Expression, len: Expression, pad: Expression)
+case class StringLPad(str: Expression, len: Expression, pad: Expression = 
Literal(" "))
   extends TernaryExpression with ImplicitCastInputTypes {
 
+  def this(str: Expression, len: Expression) = {
+this(str, len, Literal(" "))
+  }
+
   override def children: Seq[Expression] = str :: len :: pad :: Nil
   override def dataType: DataType = StringType
   override def inputTypes: Seq[DataType] = Seq(StringType, IntegerType, 
StringType)
@@ -1122,8 +1129,9 @@ case class StringLPad(str: Expression, len: Expression, 
pad: Expression)
  */
 @ExpressionDescription(
   usage = """
-_FUNC_(str, len, pad) - Returns `str`, right-padded with `pad` to a length 
of `len`.
+_FUNC_(str, len[, pad]) - Returns `str`, right-padded with `pad` to a 
length of `len`.
   If `str` is longer than `len`, the return value is shortened to `len` 
characters.
+  If `pad` is not specified, `str` will be padded to the right with space 
characters.
   """,
   examples = """
 Examples:
@@ -1131,11 +1139,17 @@ case class StringLPad(str: Expression, len: Expression, 
pad: Expression)
hi???
   > SELECT _FUNC_('hi', 1, '??');
h
+  > SELECT _FUNC_('hi', 5);
+   hi
   """,
   since = "1.5.0")
-case class StringRPad(str: Expression, len: Expression, pad: Expression)
+case class StringRPad(str: Expression, len: Expression, pad: Expression = 
Literal(" "))

[spark] branch master updated: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue

2019-06-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bef5d9d  [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter 
order issue
bef5d9d is described below

commit bef5d9d6c348e390f99b2cd781a2471d635e55f8
Author: Yuming Wang 
AuthorDate: Tue Jun 18 13:28:29 2019 -0700

[SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue

## What changes were proposed in this pull request?

This pr fix `TRIM`/`LTRIM`/`RTRIM` function parameter order issue, 
otherwise:

```sql
spark-sql> SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
z
spark-sql> SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
xyz
spark-sql> SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
xy
spark-sql>
```

```sql
postgres=# SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
 btrim | btrim
---+---
 Tom   | bar
(1 row)

postgres=# SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
 ltrim |ltrim
---+--
 test  | XxyLAST WORD
(1 row)

postgres=# SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
 rtrim |   rtrim
---+---
 test  | TURNERyxX
(1 row)
```

## How was this patch tested?

unit tests

Closes #24902 from wangyum/SPARK-28093.

Authored-by: Yuming Wang 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/expressions/stringExpressions.scala   |  6 +-
 .../expressions/StringExpressionsSuite.scala   | 11 
 .../sql-tests/inputs/string-functions.sql  | 10 
 .../sql-tests/results/string-functions.sql.out | 66 +-
 4 files changed, 89 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 2752dd7..576eaec 100755
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -653,7 +653,7 @@ case class StringTrim(
 trimStr: Option[Expression] = None)
   extends String2TrimExpression {
 
-  def this(trimStr: Expression, srcStr: Expression) = this(srcStr, 
Option(trimStr))
+  def this(srcStr: Expression, trimStr: Expression) = this(srcStr, 
Option(trimStr))
 
   def this(srcStr: Expression) = this(srcStr, None)
 
@@ -753,7 +753,7 @@ case class StringTrimLeft(
 trimStr: Option[Expression] = None)
   extends String2TrimExpression {
 
-  def this(trimStr: Expression, srcStr: Expression) = this(srcStr, 
Option(trimStr))
+  def this(srcStr: Expression, trimStr: Expression) = this(srcStr, 
Option(trimStr))
 
   def this(srcStr: Expression) = this(srcStr, None)
 
@@ -856,7 +856,7 @@ case class StringTrimRight(
 trimStr: Option[Expression] = None)
   extends String2TrimExpression {
 
-  def this(trimStr: Expression, srcStr: Expression) = this(srcStr, 
Option(trimStr))
+  def this(srcStr: Expression, trimStr: Expression) = this(srcStr, 
Option(trimStr))
 
   def this(srcStr: Expression) = this(srcStr, None)
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
index 1e7737b..08f42fc 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
@@ -465,6 +465,9 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on
 checkEvaluation(StringTrim(Literal("a"), Literal.create(null, 
StringType)), null)
 checkEvaluation(StringTrim(Literal.create(null, StringType), 
Literal("a")), null)
+
+checkEvaluation(StringTrim(Literal("yxTomxx"), Literal("xyz")), "Tom")
+checkEvaluation(StringTrim(Literal("xxxbarxxx"), Literal("x")), "bar")
   }
 
   test("LTRIM") {
@@ -489,6 +492,10 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on
 checkEvaluation(StringTrimLeft(Literal.create(null, StringType), 
Literal("a")), null)
 checkEvaluation(StringTrimLeft(Literal("a"), Literal.create(null, 
StringType)), null)
+
+checkEvaluation(StringTrimLeft(Literal("zzzytest"), Literal("xyz")), 
"test")
+checkEvaluation(StringTrimLeft(Literal("zzzytestxyz"), Literal("xyz")), 
"testxyz")
+checkEvaluation(StringTrimLeft(Literal("xyxXxyLAST WORD"), Literal("xy")), 
"XxyLAST WORD")
   }
 
   te

[spark] branch master updated (ed280c2 -> 1ada36b)

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed280c2  [SPARK-28072][SQL] Fix IncompatibleClassChangeError in 
`FromUnixTime` codegen on JDK9+
 add 1ada36b  [SPARK-27783][SQL] Add customizable hint error handler

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../sql/catalyst/analysis/HintErrorLogger.scala| 55 ++
 .../spark/sql/catalyst/analysis/ResolveHints.scala | 22 -
 .../catalyst/optimizer/EliminateResolvedHint.scala | 16 +++
 .../spark/sql/catalyst/plans/logical/hints.scala   | 43 +++--
 .../org/apache/spark/sql/internal/SQLConf.scala|  8 +++-
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |  4 +-
 7 files changed, 120 insertions(+), 30 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HintErrorLogger.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on issue #207: Fix source download link

2019-06-18 Thread GitBox

dongjoon-hyun commented on issue #207: Fix source download link
URL: https://github.com/apache/spark-website/pull/207#issuecomment-503199475
 
 
   +1, late LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Fix source download link

2019-06-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 92a  Fix source download link
92a is described below

commit 92a312dc70845c3525c9a8921ecd6f3567d6
Author: Sean Owen 
AuthorDate: Tue Jun 18 10:56:10 2019 -0500

Fix source download link

The download javascript wasn't correctly generating the source file name. 
Also, minor, it wasn't correctly respecting whether it should be obtained from 
the mirror network or archive server.

Author: Sean Owen 

Closes #207 from srowen/DownloadJS.
---
 js/downloads.js  | 12 ++--
 site/js/downloads.js | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/js/downloads.js b/js/downloads.js
index bf56d75..f8de69e 100644
--- a/js/downloads.js
+++ b/js/downloads.js
@@ -12,10 +12,10 @@ function addRelease(version, releaseDate, packages, 
mirrored) {
 }
 
 var sources = {pretty: "Source Code", tag: "sources"};
-var hadoopFree = {pretty: "Pre-build with user-provided Apache Hadoop", tag: 
"without-hadoop"};
+var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
 var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7 and later", tag: 
"hadoop2.7"};
-var scala2p12_hadoopFree = {pretty: "Pre-build with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
+var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
 // 2.2.0+
 var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources];
@@ -86,10 +86,10 @@ function onVersionSelect() {
   }
 
   // Populate releases
-  updateDownloadLink(releases[version].mirrored);
+  updateDownloadLink();
 }
 
-function updateDownloadLink(isMirrored) {
+function updateDownloadLink() {
   var versionSelect = document.getElementById("sparkVersionSelect");
   var packageSelect = document.getElementById("sparkPackageSelect");
   var downloadLink = document.getElementById("spanDownloadLink");
@@ -102,10 +102,10 @@ function updateDownloadLink(isMirrored) {
   var pkg = getSelectedValue(packageSelect);
 
   var artifactName = "spark-" + version + "-bin-" + pkg + ".tgz"
-.replace(/-bin-sources/, ""); // special case for source packages
+  artifactName = artifactName.replace(/-bin-sources/, ""); // special case for 
source packages
 
   var downloadHref = "";
-  if (isMirrored) {
+  if (releases[version].mirrored) {
 downloadHref = "https://www.apache.org/dyn/closer.lua/spark/spark-"; + 
version + "/" + artifactName;
   } else {
 downloadHref = "https://archive.apache.org/dist/spark/spark-"; + version + 
"/" + artifactName;
diff --git a/site/js/downloads.js b/site/js/downloads.js
index bf56d75..f8de69e 100644
--- a/site/js/downloads.js
+++ b/site/js/downloads.js
@@ -12,10 +12,10 @@ function addRelease(version, releaseDate, packages, 
mirrored) {
 }
 
 var sources = {pretty: "Source Code", tag: "sources"};
-var hadoopFree = {pretty: "Pre-build with user-provided Apache Hadoop", tag: 
"without-hadoop"};
+var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
 var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7 and later", tag: 
"hadoop2.7"};
-var scala2p12_hadoopFree = {pretty: "Pre-build with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
+var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
 // 2.2.0+
 var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources];
@@ -86,10 +86,10 @@ function onVersionSelect() {
   }
 
   // Populate releases
-  updateDownloadLink(releases[version].mirrored);
+  updateDownloadLink();
 }
 
-function updateDownloadLink(isMirrored) {
+function updateDownloadLink() {
   var versionSelect = document.getElementById("sparkVersionSelect");
   var packageSelect = document.getElementById("sparkPackageSelect");
   var downloadLink = document.getElementById("spanDownloadLink");
@@ -102,10 +102,10 @@ function updateDownloadLink(isMirrored) {
   var pkg = getSelectedValue(packageSelect);
 
   var artifactName = "spark-" + version + "-bin-" + pkg + ".tgz"
-.replace(/-bin-sources/, ""); // special case for source packages
+  artifactName = artifactName.replace(/-bin-sources/, ""); // special case for 
source packages
 
   var downloadHref = "";
-  if (isMirrored) {
+  if (releases[version].mirrored) {
 downloadHref = "https://www.apache.org/dyn/closer.lua/spark/spark-"; + 
version + "/" + artifactName;
   } else {
 downloadHref = "https://archive.apache.org/dist/spa

[GitHub] [spark-website] srowen closed pull request #207: Fix source download link

2019-06-18 Thread GitBox

srowen closed pull request #207: Fix source download link
URL: https://github.com/apache/spark-website/pull/207
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen opened a new pull request #207: Fix source download link

2019-06-18 Thread GitBox

srowen opened a new pull request #207: Fix source download link
URL: https://github.com/apache/spark-website/pull/207
 
 
   The download javascript wasn't correctly generating the source file name. 
Also, minor, it wasn't correctly respecting whether it should be obtained from 
the mirror network or archive server.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` codegen on JDK9+

2019-06-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ed280c2  [SPARK-28072][SQL] Fix IncompatibleClassChangeError in 
`FromUnixTime` codegen on JDK9+
ed280c2 is described below

commit ed280c23ca396087fc62d5a6412179f6a0103245
Author: Dongjoon Hyun 
AuthorDate: Tue Jun 18 00:08:37 2019 -0700

[SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` 
codegen on JDK9+

## What changes were proposed in this pull request?

With JDK9+, the generate **bytecode** of `FromUnixTime` raise 
`java.lang.IncompatibleClassChangeError` due to 
[JDK-8145148](https://bugs.openjdk.java.net/browse/JDK-8145148) . This is a 
blocker in [Apache Spark JDK11 Jenkins 
job](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-jdk-11-ubuntu-testing/).
 Locally, this is reproducible by the following unit test suite with JDK9+.
```
$ build/sbt "catalyst/testOnly *.DateExpressionsSuite"
...
[info] org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite *** 
ABORTED *** (23 seconds, 75 milliseconds)
[info]   java.lang.IncompatibleClassChangeError: Method 
org.apache.spark.sql.catalyst.util.TimestampFormatter.apply(Ljava/lang/String;Ljava/time/ZoneId;Ljava/util/Locale;)Lorg/apache/spark/sql/catalyst/util/TimestampFormatter;
 must be InterfaceMeth
```

This bytecode issue is generated by `Janino` , so we replace `.apply` to 
`.MODULE$$.apply` and adds test coverage for similar codes.

## How was this patch tested?

Manually with the existing UTs by doing the following with JDK9+.
```
build/sbt "catalyst/testOnly *.DateExpressionsSuite"
```

Actually, this is the last JDK11 error in `catalyst` module. So, we can 
verify with the following, too.
```
$ build/sbt "project catalyst" test
...
[info] Total number of tests run: 3552
[info] Suites: completed 210, aborted 0
[info] Tests: succeeded 3552, failed 0, canceled 0, ignored 2, pending 0
[info] All tests passed.
[info] Passed: Total 3583, Failed 0, Errors 0, Passed 3583, Ignored 2
[success] Total time: 294 s, completed Jun 16, 2019, 10:15:08 PM
```

Closes #24889 from dongjoon-hyun/SPARK-28072.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/expressions/datetimeExpressions.scala |  2 +-
 .../expressions/DateExpressionsSuite.scala | 27 ++
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 1e6a3aa..ccf6b36 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -863,7 +863,7 @@ case class FromUnixTime(sec: Expression, format: 
Expression, timeZoneId: Option[
   nullSafeCodeGen(ctx, ev, (seconds, f) => {
 s"""
 try {
-  ${ev.value} = UTF8String.fromString($tf.apply($f.toString(), $zid, 
$locale).
+  ${ev.value} = 
UTF8String.fromString($tf$$.MODULE$$.apply($f.toString(), $zid, $locale).
 format($seconds * 100L));
 } catch (java.lang.IllegalArgumentException e) {
   ${ev.isNull} = true;
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 88607d1..04bb61a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -268,6 +268,15 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(DateFormatClass(Cast(Literal(d), TimestampType, jstId),
   Literal("H"), jstId), "0")
 checkEvaluation(DateFormatClass(Literal(ts), Literal("H"), jstId), "22")
+
+// SPARK-28072 The codegen path should work
+checkEvaluation(
+  expression = DateFormatClass(
+BoundReference(ordinal = 0, dataType = TimestampType, nullable = true),
+BoundReference(ordinal = 1, dataType = StringType, nullable = true),
+jstId),
+  expected = "22",
+  inputRow = InternalRow(DateTimeUtils.fromJavaTimestamp(ts), 
UTF8String.fromString("H")))
   }
 
   test("Hour") {
@@ -683,14 +692,14 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   checkEvaluation(
 FromUnixTime(Lite

[spark] branch master updated: [SPARK-27890][SQL] Improve SQL parser error message for character-only identifier with hyphens except those in expressions

[spark] branch master updated (a5dcb82 -> 15de6d0)

[spark] branch master updated: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion

[spark] branch branch-2.3 updated: [SPARK-28081][ML] Handle large vocab counts in word2vec

[spark] branch branch-2.4 updated: [SPARK-28081][ML] Handle large vocab counts in word2vec

[spark] branch master updated: [SPARK-28081][ML] Handle large vocab counts in word2vec

[spark] branch master updated: [SPARK-27823][CORE] Refactor resource handling code

[spark] branch master updated: [SPARK-28039][SQL][TEST] Port float4.sql

[spark] branch master updated: [SPARK-28088][SQL] Enhance LPAD/RPAD function

[spark] branch master updated: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue

[spark] branch master updated (ed280c2 -> 1ada36b)

[GitHub] [spark-website] dongjoon-hyun commented on issue #207: Fix source download link

[spark-website] branch asf-site updated: Fix source download link

[GitHub] [spark-website] srowen closed pull request #207: Fix source download link

[GitHub] [spark-website] srowen opened a new pull request #207: Fix source download link

[spark] branch master updated: [SPARK-28072][SQL] Fix IncompatibleClassChangeError in `FromUnixTime` codegen on JDK9+

16 matches

Site Navigation

Mail list logo

Footer information