date:20200327

[spark] branch master updated (b9eafcb -> 5945d46)

2020-03-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9eafcb  [SPARK-31088][SQL] Add back HiveContext and 
createExternalTable
 add 5945d46  [SPARK-31225][SQL] Override sql method of OuterReference

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/namedExpressions.scala  | 1 +
 .../resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out   | 4 ++--
 .../results/subquery/negative-cases/invalid-correlation.sql.out   | 2 +-
 .../sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out | 4 ++--
 4 files changed, 6 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31225][SQL] Override sql method of OuterReference

2020-03-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4868b4d  [SPARK-31225][SQL] Override sql method of OuterReference
4868b4d is described below

commit 4868b4d119e5fd7099d9f33f23a4164007360712
Author: Kent Yao 
AuthorDate: Fri Mar 27 15:21:19 2020 +0800

[SPARK-31225][SQL] Override sql method of OuterReference

### What changes were proposed in this pull request?

OuterReference is one LeafExpression, so it's children is Nil, which makes 
its SQL representation always be outer(). This makes our explain-command and 
error msg unclear when OuterReference exists.
e.g.

```scala
org.apache.spark.sql.AnalysisException:
Aggregate/Window/Generate expressions are not valid in where clause of the 
query.
Expression in where clause: [(in.`value` = max(outer()))]
Invalid expressions: [max(outer())];;
```
This PR override its `sql` method with its `prettyName` and single argment 
`e`'s `sql` methond

### Why are the changes needed?

improve err message

### Does this PR introduce any user-facing change?

yes, the err msg caused by OuterReference has changed
### How was this patch tested?

modified ut results

Closes #27985 from yaooqinn/SPARK-31225.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 5945d46c11a86fd85f9e65f24c2e88f368eee01f)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/expressions/namedExpressions.scala  | 1 +
 .../resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out   | 4 ++--
 .../results/subquery/negative-cases/invalid-correlation.sql.out   | 2 +-
 .../sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out | 4 ++--
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index 02e90f8..77b4cec 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -382,6 +382,7 @@ case class OuterReference(e: NamedExpression)
   override def nullable: Boolean = e.nullable
   override def prettyName: String = "outer"
 
+  override def sql: String = s"$prettyName(${e.sql})"
   override def name: String = e.name
   override def qualifier: Seq[String] = e.qualifier
   override def exprId: ExprId = e.exprId
diff --git 
a/sql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out
index 5efb58c..f7bba96 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out
@@ -381,8 +381,8 @@ struct<>
 org.apache.spark.sql.AnalysisException
 
 Aggregate/Window/Generate expressions are not valid in where clause of the 
query.
-Expression in where clause: [(sum(DISTINCT CAST((outer() + b.`four`) AS 
BIGINT)) = CAST(b.`four` AS BIGINT))]
-Invalid expressions: [sum(DISTINCT CAST((outer() + b.`four`) AS BIGINT))];
+Expression in where clause: [(sum(DISTINCT CAST((outer(a.`four`) + b.`four`) 
AS BIGINT)) = CAST(b.`four` AS BIGINT))]
+Invalid expressions: [sum(DISTINCT CAST((outer(a.`four`) + b.`four`) AS 
BIGINT))];
 
 
 -- !query
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out
index ec7ecf2..d703d4e 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out
@@ -109,7 +109,7 @@ struct<>
 -- !query output
 org.apache.spark.sql.AnalysisException
 Expressions referencing the outer query are not supported outside of 
WHERE/HAVING clauses:
-Aggregate [min(outer(t2a#x)) AS min(outer())#x]
+Aggregate [min(outer(t2a#x)) AS min(outer(t2.`t2a`))#x]
 +- SubqueryAlias t3
+- Project [t3a#x, t3b#x, t3c#x]
   +- SubqueryAlias t3
diff --git 
a/sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
index adf434b..76637bf 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
@@ -37

[spark] branch branch-3.0 updated (4868b4d -> f94d13f)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4868b4d  [SPARK-31225][SQL] Override sql method of OuterReference
 add f94d13f  
[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][FOLLOWUP][3.0] Fix build 
error due to conf version

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 1 -
 1 file changed, 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5945d46 -> 9f0c010)

2020-03-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5945d46  [SPARK-31225][SQL] Override sql method of OuterReference
 add 9f0c010  [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from 
`TimeZone` to `ZoneId`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  4 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  9 ++--
 .../expressions/CollectionExpressionsSuite.scala   |  4 +-
 .../catalyst/expressions/CsvExpressionsSuite.scala | 10 ++--
 .../expressions/DateExpressionsSuite.scala | 63 +++---
 .../expressions/JsonExpressionsSuite.scala |  8 +--
 .../catalyst/parser/ExpressionParserSuite.scala|  6 +--
 .../sql/catalyst/util/DateTimeTestUtils.scala  | 15 +++---
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 54 +--
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  2 +-
 .../spark/sql/StatisticsCollectionSuite.scala  |  3 +-
 11 files changed, 91 insertions(+), 87 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5945d46 -> 9f0c010)

2020-03-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5945d46  [SPARK-31225][SQL] Override sql method of OuterReference
 add 9f0c010  [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from 
`TimeZone` to `ZoneId`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  4 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  9 ++--
 .../expressions/CollectionExpressionsSuite.scala   |  4 +-
 .../catalyst/expressions/CsvExpressionsSuite.scala | 10 ++--
 .../expressions/DateExpressionsSuite.scala | 63 +++---
 .../expressions/JsonExpressionsSuite.scala |  8 +--
 .../catalyst/parser/ExpressionParserSuite.scala|  6 +--
 .../sql/catalyst/util/DateTimeTestUtils.scala  | 15 +++---
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 54 +--
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  2 +-
 .../spark/sql/StatisticsCollectionSuite.scala  |  3 +-
 11 files changed, 91 insertions(+), 87 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`

2020-03-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c2f79a6  [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from 
`TimeZone` to `ZoneId`
c2f79a6 is described below

commit c2f79a61de2af52ff9ef8321094f2ea7ea9b
Author: Maxim Gekk 
AuthorDate: Fri Mar 27 21:14:25 2020 +0800

[SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to 
`ZoneId`

In the PR, I propose to change types of `DateTimeTestUtils` values and 
functions by replacing `java.util.TimeZone` to `java.time.ZoneId`. In 
particular:
1. Type of `ALL_TIMEZONES` is changed to `Seq[ZoneId]`.
2. Remove `val outstandingTimezones: Seq[TimeZone]`.
3. Change the type of the time zone parameter in `withDefaultTimeZone` to 
`ZoneId`.
4. Modify affected test suites.

Currently, Spark SQL's date-time expressions and functions have been 
already ported on Java 8 time API but tests still use old time APIs. In 
particular, `DateTimeTestUtils` exposes functions that accept only TimeZone 
instances. This is inconvenient, and CPU consuming because need to convert 
TimeZone instances to ZoneId instances via strings (zone ids).

No

By affected test suites executed by jenkins builds.

Closes #28033 from MaxGekk/with-default-time-zone.

Authored-by: Maxim Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 9f0c010a5c9b5c10401f8e16fa7a151714b6dbb0)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  4 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  9 ++--
 .../expressions/CollectionExpressionsSuite.scala   |  4 +-
 .../catalyst/expressions/CsvExpressionsSuite.scala | 10 ++--
 .../expressions/DateExpressionsSuite.scala | 63 +++---
 .../expressions/JsonExpressionsSuite.scala |  8 +--
 .../catalyst/parser/ExpressionParserSuite.scala|  6 +--
 .../sql/catalyst/util/DateTimeTestUtils.scala  | 15 +++---
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 54 +--
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  2 +-
 .../spark/sql/StatisticsCollectionSuite.scala  |  3 +-
 11 files changed, 91 insertions(+), 87 deletions(-)

diff --git 
a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala 
b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index 34a0e2b..a04037c 100644
--- a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -38,7 +38,7 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.TestingUDT.IntervalData
 import org.apache.spark.sql.catalyst.expressions.AttributeReference
 import org.apache.spark.sql.catalyst.plans.logical.Filter
-import org.apache.spark.sql.catalyst.util.{DateTimeTestUtils, DateTimeUtils}
+import 
org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{withDefaultTimeZone, UTC}
 import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.{DataSource, FilePartition}
 import org.apache.spark.sql.execution.datasources.v2.BatchScanExec
@@ -408,7 +408,7 @@ abstract class AvroSuite extends QueryTest with 
SharedSparkSession {
 StructField("float", FloatType, true),
 StructField("date", DateType, true)
   ))
-  DateTimeTestUtils.withDefaultTimeZone(DateTimeUtils.TimeZoneUTC) {
+  withDefaultTimeZone(UTC) {
 val rdd = spark.sparkContext.parallelize(Seq(
   Row(1f, null),
   Row(2f, new Date(145194840L)),
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index e25d805..5c57843 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -112,13 +112,14 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("cast string to timestamp") {
-new ParVector(ALL_TIMEZONES.toVector).foreach { tz =>
+new ParVector(ALL_TIMEZONES.toVector).foreach { zid =>
   def checkCastStringToTimestamp(str: String, expected: Timestamp): Unit = 
{
-checkEvaluation(cast(Literal(str), TimestampType, Option(tz.getID)), 
expected)
+checkEvaluation(cast(Literal(str), TimestampType, Option(zid.getId)), 
expected)
   }
 
   checkCastStringToTimestamp("123", null)
 
+  val tz = TimeZone.getTimeZone(zid)
   var c = Calendar.getInstance(tz)
   c.set(2015, 0, 1, 0, 0, 0)
   c.set(Calendar.MILLISECOND, 0)
@@ -263,10 +264,10 @@ abstract class CastSuiteBase extends SparkFun

[spark] branch master updated (9f0c010 -> fc2a974)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9f0c010  [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from 
`TimeZone` to `ZoneId`
 add fc2a974  [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC 
datasource

No new revisions were added by this update.

Summary of changes:
 .../test-data/before_1582_ts_v2_4.snappy.orc   | Bin 0 -> 251 bytes
 .../execution/datasources/orc/OrcSourceSuite.scala |  28 +
 2 files changed, 28 insertions(+)
 create mode 100644 
sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC datasource

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6e8f64  [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC 
datasource
b6e8f64 is described below

commit b6e8f64d49caf1f0a1f1b910d603e8e000270d01
Author: Maxim Gekk 
AuthorDate: Fri Mar 27 09:06:59 2020 -0700

[SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC datasource

### What changes were proposed in this pull request?
In the PR, I propose 2 tests to check that rebasing of timestamps from/to 
the hybrid calendar (Julian + Gregorian) to/from Proleptic Gregorian calendar 
works correctly.
1. The test `compatibility with Spark 2.4 in reading timestamps` load ORC 
file saved by Spark 2.4.5 via:
```shell
$ export TZ="America/Los_Angeles"
```
```scala
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")

scala> val df = Seq("1001-01-01 
01:02:03.123456").toDF("tsS").select($"tsS".cast("timestamp").as("ts"))
df: org.apache.spark.sql.DataFrame = [ts: timestamp]

scala> df.write.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc")

scala> 
spark.read.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc").show(false)
+--+
|ts|
+--+
|1001-01-01 01:02:03.123456|
+--+
```
2. The test `rebasing timestamps in write` is round trip test. Since the 
previous test confirms correct rebasing of timestamps in read. This test should 
pass only if rebasing works correctly in write.

### Why are the changes needed?
To guarantee that rebasing works correctly for timestamps in ORC datasource.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By running `OrcSourceSuite` for Hive 1.2 and 2.3 via the commands:
```
$ build/sbt -Phive-2.3 "test:testOnly *OrcSourceSuite"
```
and
```
$ build/sbt -Phive-1.2 "test:testOnly *OrcSourceSuite"
```

Closes #28047 from MaxGekk/rebase-ts-orc-test.

Authored-by: Maxim Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fc2a974e030c82bf500a81c3908f853c3eeb761d)
Signed-off-by: Dongjoon Hyun 
---
 .../test-data/before_1582_ts_v2_4.snappy.orc   | Bin 0 -> 251 bytes
 .../execution/datasources/orc/OrcSourceSuite.scala |  28 +
 2 files changed, 28 insertions(+)

diff --git 
a/sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc 
b/sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc
new file mode 100644
index 000..af9ef04
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
index b5e002f..0b7500c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
@@ -508,6 +508,34 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   }
 }
   }
+
+  test("SPARK-31284: compatibility with Spark 2.4 in reading timestamps") {
+Seq(false, true).foreach { vectorized =>
+  withSQLConf(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString) {
+checkAnswer(
+  readResourceOrcFile("test-data/before_1582_ts_v2_4.snappy.orc"),
+  Row(java.sql.Timestamp.valueOf("1001-01-01 01:02:03.123456")))
+  }
+}
+  }
+
+  test("SPARK-31284: rebasing timestamps in write") {
+withTempPath { dir =>
+  val path = dir.getAbsolutePath
+  Seq("1001-01-01 01:02:03.123456").toDF("tsS")
+.select($"tsS".cast("timestamp").as("ts"))
+.write
+.orc(path)
+
+  Seq(false, true).foreach { vectorized =>
+withSQLConf(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString) {
+  checkAnswer(
+spark.read.orc(path),
+Row(java.sql.Timestamp.valueOf("1001-01-01 01:02:03.123456")))
+}
+  }
+}
+  }
 }
 
 class OrcSourceSuite extends OrcSuite with SharedSparkSession {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fc2a974 -> f879573)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fc2a974  [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC 
datasource
 add f879573  [SPARK-31200][K8S] Enforce to use `https` in 
/etc/apt/sources.list

No new revisions were added by this update.

Summary of changes:
 .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile  | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31200][K8S] Enforce to use `https` in /etc/apt/sources.list

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1a03b2  [SPARK-31200][K8S] Enforce to use `https` in 
/etc/apt/sources.list
c1a03b2 is described below

commit c1a03b2233aee77f957dbc94180d2334c61ac088
Author: Prashant Sharma 
AuthorDate: Fri Mar 27 09:13:55 2020 -0700

[SPARK-31200][K8S] Enforce to use `https` in /etc/apt/sources.list

…n progress errors.

### What changes were proposed in this pull request?
Switching to `https` instead of `http` in the debian mirror urls.

### Why are the changes needed?
My ISP was trying to intercept (or trying to serve from cache) the `http` 
traffic and this was causing a very confusing errors while building the spark 
image. I thought by posting this, I can help someone save his time and energy, 
if he encounters the same issue.
```
bash-3.2$ bin/docker-image-tool.sh -r scrapcodes -t v3.1.0-f1cc86 build
Sending build context to Docker daemon  203.4MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex && apt-get update && ln -s /lib /lib64 &&   
  apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*
 ---> Running in 96bcbe927d35
+ apt-get update
Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Err:3 http://deb.debian.org/debian buster/main amd64 Packages
  File has unexpected size (13217 != 7906744). Mirror sync in progress? 
[IP: 151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7906744 [weak]
   - SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
   - MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
  Release file created at: Sat, 08 Feb 2020 10:57:10 +
Get:5 http://deb.debian.org/debian buster-updates/main amd64 Packages [7380 
B]
Err:5 http://deb.debian.org/debian buster-updates/main amd64 Packages
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 
151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7380 [weak]
   - SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  Release file created at: Fri, 20 Mar 2020 02:28:11 +
Get:4 http://security-cdn.debian.org/debian-security buster/updates 
InRelease [65.4 kB]
Get:6 http://security-cdn.debian.org/debian-security buster/updates/main 
amd64 Packages [183 kB]
Fetched 419 kB in 1s (327 kB/s)
Reading package lists...
E: Failed to fetch 
http://deb.debian.org/debian/dists/buster/main/binary-amd64/by-hash/SHA256/80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 
151.101.10.133 80]
   Hashes of expected file:
- Filesize:7906744 [weak]
- 
SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
- MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
   Release file created at: Sat, 08 Feb 2020 10:57:10 +
E: Failed to fetch 
http://deb.debian.org/debian/dists/buster-updates/main/binary-amd64/by-hash/SHA256/6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 
151.101.10.133 80]
   Hashes of expected file:
- Filesize:7380 [weak]
- 
SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
   Release file created at: Fri, 20 Mar 2020 02:28:11 +
E: Some index files failed to download. They have been ignored, or old ones 
used instead.
The command '/bin/sh -c set -ex && apt-get update && ln -s /lib 
/lib64 && apt install -y bash tini libc6 libpam-modules krb5-user libnss3 
procps && mkdir -p /opt/spark && mkdir -p /opt/spark/examples && 
mkdir -p /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh 
&& ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so 
use_uid" >> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw 
/etc/passwd && rm -rf /var/cache/apt [...]
Failed to build Spark JVM Docker image, please refer to Docker build output 
for details.
```
### Does thi

[spark] branch master updated (f879573 -> 8a5d496)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f879573  [SPARK-31200][K8S] Enforce to use `https` in 
/etc/apt/sources.list
 add 8a5d496  [MINOR][DOC] Refine comments of QueryPlan regarding subquery

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/plans/QueryPlan.scala   | 32 ++
 1 file changed, 21 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOC] Refine comments of QueryPlan regarding subquery

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7435f45  [MINOR][DOC] Refine comments of QueryPlan regarding subquery
7435f45 is described below

commit 7435f4543ea6f2b927da6055c1cfb75f4a62f19d
Author: Wenchen Fan 
AuthorDate: Fri Mar 27 09:35:35 2020 -0700

[MINOR][DOC] Refine comments of QueryPlan regarding subquery

### What changes were proposed in this pull request?

The query plan of Spark SQL is a mutually recursive structure: QueryPlan -> 
Expression (PlanExpression) -> QueryPlan, but the transformations do not take 
this into account.

This PR refines the comments of `QueryPlan` to highlight this fact.

### Why are the changes needed?

better document.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #28050 from cloud-fan/comment.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8a5d49610d875c473114781e92300c79e24a53cc)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/plans/QueryPlan.scala   | 32 ++
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index 1248266..9f86fb2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -23,6 +23,16 @@ import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, 
TreeNode, TreeNodeTag
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, StructType}
 
+/**
+ * An abstraction of the Spark SQL query plan tree, which can be logical or 
physical. This class
+ * defines some basic properties of a query plan node, as well as some new 
transform APIs to
+ * transform the expressions of the plan node.
+ *
+ * Note that, the query plan is a mutually recursive structure:
+ *   QueryPlan -> Expression (subquery) -> QueryPlan
+ * The tree traverse APIs like `transform`, `foreach`, `collect`, etc. that are
+ * inherited from `TreeNode`, do not traverse into query plans inside 
subqueries.
+ */
 abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends 
TreeNode[PlanType] {
   self: PlanType =>
 
@@ -133,7 +143,7 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
 
   /**
* Returns the result of running [[transformExpressions]] on this node
-   * and all its children.
+   * and all its children. Note that this method skips expressions inside 
subqueries.
*/
   def transformAllExpressions(rule: PartialFunction[Expression, Expression]): 
this.type = {
 transform {
@@ -204,7 +214,7 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
   }
 
   /**
-   * All the subqueries of current plan.
+   * All the top-level subqueries of the current plan node. Nested subqueries 
are not included.
*/
   def subqueries: Seq[PlanType] = {
 expressions.flatMap(_.collect {
@@ -213,21 +223,21 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
   }
 
   /**
-   * Returns a sequence containing the result of applying a partial function 
to all elements in this
-   * plan, also considering all the plans in its (nested) subqueries
-   */
-  def collectInPlanAndSubqueries[B](f: PartialFunction[PlanType, B]): Seq[B] =
-(this +: subqueriesAll).flatMap(_.collect(f))
-
-  /**
-   * Returns a sequence containing the subqueries in this plan, also including 
the (nested)
-   * subquries in its children
+   * All the subqueries of the current plan node and all its children. Nested 
subqueries are also
+   * included.
*/
   def subqueriesAll: Seq[PlanType] = {
 val subqueries = this.flatMap(_.subqueries)
 subqueries ++ subqueries.flatMap(_.subqueriesAll)
   }
 
+  /**
+   * Returns a sequence containing the result of applying a partial function 
to all elements in this
+   * plan, also considering all the plans in its (nested) subqueries
+   */
+  def collectInPlanAndSubqueries[B](f: PartialFunction[PlanType, B]): Seq[B] =
+(this +: subqueriesAll).flatMap(_.collect(f))
+
   override def innerChildren: Seq[QueryPlan[_]] = subqueries
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8a5d496 -> aa8776b)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8a5d496  [MINOR][DOC] Refine comments of QueryPlan regarding subquery
 add aa8776b  [SPARK-29721][SQL] Prune unnecessary nested fields from 
Generate without Project

No new revisions were added by this update.

Summary of changes:
 .../catalyst/optimizer/NestedColumnAliasing.scala  | 53 ++
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 20 +-
 .../optimizer/NestedColumnAliasingSuite.scala  | 80 ++
 .../execution/datasources/SchemaPruningSuite.scala | 37 ++
 4 files changed, 172 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8a5d496 -> aa8776b)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8a5d496  [MINOR][DOC] Refine comments of QueryPlan regarding subquery
 add aa8776b  [SPARK-29721][SQL] Prune unnecessary nested fields from 
Generate without Project

No new revisions were added by this update.

Summary of changes:
 .../catalyst/optimizer/NestedColumnAliasing.scala  | 53 ++
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 20 +-
 .../optimizer/NestedColumnAliasingSuite.scala  | 80 ++
 .../execution/datasources/SchemaPruningSuite.scala | 37 ++
 4 files changed, 172 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31271][UI] fix web ui for driver side SQL metrics

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c4e98c0  [SPARK-31271][UI] fix web ui for driver side SQL metrics
c4e98c0 is described below

commit c4e98c065c99d2cf840e6006ee5414fbaaba9937
Author: Wenchen Fan 
AuthorDate: Fri Mar 27 15:45:35 2020 -0700

[SPARK-31271][UI] fix web ui for driver side SQL metrics

### What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/23551, we changed the metrics type 
of driver-side SQL metrics to size/time etc. which comes with max/min/median 
info.

This doesn't make sense for driver side SQL metrics as they have only one 
value. It makes the web UI hard to read:

![image](https://user-images.githubusercontent.com/3182036/77653892-42db9900-6fab-11ea-8e7f-92f763fa32ff.png)

This PR updates the SQL metrics UI to only display max/min/median if there 
are more than one metrics values:

![image](https://user-images.githubusercontent.com/3182036/77653975-5f77d100-6fab-11ea-849e-64c935377c8e.png)

### Why are the changes needed?

Makes the UI easier to read

### Does this PR introduce any user-facing change?

no

### How was this patch tested?
manual test

Closes #28037 from cloud-fan/ui.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/metric/SQLMetrics.scala| 60 +++---
 .../spark/sql/execution/ui/SparkPlanGraph.scala|  7 ++-
 .../sql/execution/metric/SQLMetricsSuite.scala | 33 +++-
 .../sql/execution/metric/SQLMetricsTestUtils.scala | 12 ++---
 .../execution/ui/SQLAppStatusListenerSuite.scala   |  9 ++--
 5 files changed, 68 insertions(+), 53 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
index 1394e0f..92d2179 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
@@ -116,26 +116,23 @@ object SQLMetrics {
 // data size total (min, med, max):
 // 100GB (100MB, 1GB, 10GB)
 val acc = new SQLMetric(SIZE_METRIC, -1)
-acc.register(sc, name = Some(s"$name total (min, med, max (stageId: 
taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
   def createTimingMetric(sc: SparkContext, name: String): SQLMetric = {
 // The final result of this metric in physical operator UI may looks like:
-// duration(min, med, max):
+// duration total (min, med, max):
 // 5s (800ms, 1s, 2s)
 val acc = new SQLMetric(TIMING_METRIC, -1)
-acc.register(sc, name = Some(s"$name total (min, med, max (stageId: 
taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
   def createNanoTimingMetric(sc: SparkContext, name: String): SQLMetric = {
 // Same with createTimingMetric, just normalize the unit of time to 
millisecond.
 val acc = new SQLMetric(NS_TIMING_METRIC, -1)
-acc.register(sc, name = Some(s"$name total (min, med, max (stageId: 
taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
@@ -150,8 +147,7 @@ object SQLMetrics {
 // probe avg (min, med, max):
 // (1.2, 2.2, 6.3)
 val acc = new SQLMetric(AVERAGE_METRIC)
-acc.register(sc, name = Some(s"$name (min, med, max (stageId: taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
@@ -164,13 +160,15 @@ object SQLMetrics {
 metricsType != SUM_METRIC
   }
 
+  private val METRICS_NAME_SUFFIX = "(min, med, max (stageId: taskId))"
+
   /**
* A function that defines how we aggregate the final accumulator results 
among all tasks,
* and represent it in string for a SQL physical operator.
 */
   def stringValue(metricsType: String, values: Array[Long], maxMetrics: 
Array[Long]): String = {
-// stringMetric = "(driver)" OR (stage ${stageId}.${attemptId}: task 
$taskId)
-val stringMetric = if (maxMetrics.isEmpty) {
+// taskInfo = "(driver)" OR (stage ${stageId}.${attemptId}: task $taskId)
+val taskInfo = if (maxMetrics.isEmpty) {
   "(driver)"
 } else {
   s"(stage ${maxMetrics(1)}.${maxMetrics(2)}: task ${maxMetrics(3)})"
@@ -180,18 +178,20 @@ object SQLMetrics {
   numberFormat.format(values.sum)
 } else if (metricsType == AVERAGE_METRIC) {
   val validValues = values.filter(_ > 0)
-  val Seq(min, med, max) = {
-val metric = if (validValues.isEmpty) {
-

[spark] branch branch-3.0 updated: [SPARK-31271][UI] fix web ui for driver side SQL metrics

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7c90ec0  [SPARK-31271][UI] fix web ui for driver side SQL metrics
7c90ec0 is described below

commit 7c90ec065f81c3933eef1f0dd172f1a518b1232b
Author: Wenchen Fan 
AuthorDate: Fri Mar 27 15:45:35 2020 -0700

[SPARK-31271][UI] fix web ui for driver side SQL metrics

### What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/23551, we changed the metrics type 
of driver-side SQL metrics to size/time etc. which comes with max/min/median 
info.

This doesn't make sense for driver side SQL metrics as they have only one 
value. It makes the web UI hard to read:

![image](https://user-images.githubusercontent.com/3182036/77653892-42db9900-6fab-11ea-8e7f-92f763fa32ff.png)

This PR updates the SQL metrics UI to only display max/min/median if there 
are more than one metrics values:

![image](https://user-images.githubusercontent.com/3182036/77653975-5f77d100-6fab-11ea-849e-64c935377c8e.png)

### Why are the changes needed?

Makes the UI easier to read

### Does this PR introduce any user-facing change?

no

### How was this patch tested?
manual test

Closes #28037 from cloud-fan/ui.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c4e98c065c99d2cf840e6006ee5414fbaaba9937)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/metric/SQLMetrics.scala| 60 +++---
 .../spark/sql/execution/ui/SparkPlanGraph.scala|  7 ++-
 .../sql/execution/metric/SQLMetricsSuite.scala | 33 +++-
 .../sql/execution/metric/SQLMetricsTestUtils.scala | 12 ++---
 .../execution/ui/SQLAppStatusListenerSuite.scala   |  9 ++--
 5 files changed, 68 insertions(+), 53 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
index 1394e0f..92d2179 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
@@ -116,26 +116,23 @@ object SQLMetrics {
 // data size total (min, med, max):
 // 100GB (100MB, 1GB, 10GB)
 val acc = new SQLMetric(SIZE_METRIC, -1)
-acc.register(sc, name = Some(s"$name total (min, med, max (stageId: 
taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
   def createTimingMetric(sc: SparkContext, name: String): SQLMetric = {
 // The final result of this metric in physical operator UI may looks like:
-// duration(min, med, max):
+// duration total (min, med, max):
 // 5s (800ms, 1s, 2s)
 val acc = new SQLMetric(TIMING_METRIC, -1)
-acc.register(sc, name = Some(s"$name total (min, med, max (stageId: 
taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
   def createNanoTimingMetric(sc: SparkContext, name: String): SQLMetric = {
 // Same with createTimingMetric, just normalize the unit of time to 
millisecond.
 val acc = new SQLMetric(NS_TIMING_METRIC, -1)
-acc.register(sc, name = Some(s"$name total (min, med, max (stageId: 
taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
@@ -150,8 +147,7 @@ object SQLMetrics {
 // probe avg (min, med, max):
 // (1.2, 2.2, 6.3)
 val acc = new SQLMetric(AVERAGE_METRIC)
-acc.register(sc, name = Some(s"$name (min, med, max (stageId: taskId))"),
-  countFailedValues = false)
+acc.register(sc, name = Some(name), countFailedValues = false)
 acc
   }
 
@@ -164,13 +160,15 @@ object SQLMetrics {
 metricsType != SUM_METRIC
   }
 
+  private val METRICS_NAME_SUFFIX = "(min, med, max (stageId: taskId))"
+
   /**
* A function that defines how we aggregate the final accumulator results 
among all tasks,
* and represent it in string for a SQL physical operator.
 */
   def stringValue(metricsType: String, values: Array[Long], maxMetrics: 
Array[Long]): String = {
-// stringMetric = "(driver)" OR (stage ${stageId}.${attemptId}: task 
$taskId)
-val stringMetric = if (maxMetrics.isEmpty) {
+// taskInfo = "(driver)" OR (stage ${stageId}.${attemptId}: task $taskId)
+val taskInfo = if (maxMetrics.isEmpty) {
   "(driver)"
 } else {
   s"(stage ${maxMetrics(1)}.${maxMetrics(2)}: task ${maxMetrics(3)})"
@@ -180,18 +178,20 @@ object SQLMetrics {
   numberFormat.format(values.sum)
 } else if (metricsType == AVERAGE_METRIC) {
   val validVa

[spark] branch branch-3.0 updated: [SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix readResourceOrcFile to create a local file from resource

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4e13ba9  [SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix 
readResourceOrcFile to create a local file from resource
4e13ba9 is described below

commit 4e13ba90446745fc5a9f46ed1f80c6eefb738795
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 27 18:44:53 2020 -0700

[SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix readResourceOrcFile to 
create a local file from resource

### What changes were proposed in this pull request?

This PR aims to copy a test resource file to a local file in `OrcTest` 
suite before reading it.

### Why are the changes needed?

SPARK-31238 and SPARK-31284 added test cases to access the resouce file in 
`sql/core` module from `sql/hive` module. In **Maven** test environment, this 
causes a failure.
```
- SPARK-31238: compatibility with Spark 2.4 in reading dates *** FAILED ***
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
path in absolute URI:

jar:file:/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/sql/core/target/spark-sql_2.12-3.1.0-SNAPSHOT-tests.jar!/test-data/before_1582_date_v2_4.snappy.orc
```

```
- SPARK-31284: compatibility with Spark 2.4 in reading timestamps *** 
FAILED ***
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
path in absolute URI:

jar:file:/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2-hive-2.3/sql/core/target/spark-sql_2.12-3.1.0-SNAPSHOT-tests.jar!/test-data/before_1582_ts_v2_4.snappy.orc
```

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins with Maven.

Closes #28059 from dongjoon-hyun/SPARK-31238.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit d025ddbaa7e7b9746d8e47aeed61ed39d2f09f0e)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/execution/datasources/orc/OrcTest.scala   | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
index 16772fe..e929f90 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
@@ -22,6 +22,7 @@ import java.io.File
 import scala.reflect.ClassTag
 import scala.reflect.runtime.universe.TypeTag
 
+import org.apache.commons.io.FileUtils
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.spark.sql._
@@ -136,6 +137,10 @@ abstract class OrcTest extends QueryTest with 
FileBasedDataSourceTest with Befor
 
   protected def readResourceOrcFile(name: String): DataFrame = {
 val url = Thread.currentThread().getContextClassLoader.getResource(name)
-spark.read.orc(url.toString)
+// Copy to avoid URISyntaxException when `sql/hive` accesses the resources 
in `sql/core`
+val file = File.createTempFile("orc-test", ".orc")
+file.deleteOnExit();
+FileUtils.copyURLToFile(url, file)
+spark.read.orc(file.getAbsolutePath)
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c4e98c0 -> d025ddba)

2020-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4e98c0  [SPARK-31271][UI] fix web ui for driver side SQL metrics
 add d025ddba [SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix 
readResourceOrcFile to create a local file from resource

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcTest.scala   | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9eafcb -> 5945d46)

[spark] branch branch-3.0 updated: [SPARK-31225][SQL] Override sql method of OuterReference

[spark] branch branch-3.0 updated (4868b4d -> f94d13f)

[spark] branch master updated (5945d46 -> 9f0c010)

[spark] branch master updated (5945d46 -> 9f0c010)

[spark] branch branch-3.0 updated: [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`

[spark] branch master updated (9f0c010 -> fc2a974)

[spark] branch branch-3.0 updated: [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC datasource

[spark] branch master updated (fc2a974 -> f879573)

[spark] branch branch-3.0 updated: [SPARK-31200][K8S] Enforce to use `https` in /etc/apt/sources.list

[spark] branch master updated (f879573 -> 8a5d496)

[spark] branch branch-3.0 updated: [MINOR][DOC] Refine comments of QueryPlan regarding subquery

[spark] branch master updated (8a5d496 -> aa8776b)

[spark] branch master updated (8a5d496 -> aa8776b)

[spark] branch master updated: [SPARK-31271][UI] fix web ui for driver side SQL metrics

[spark] branch branch-3.0 updated: [SPARK-31271][UI] fix web ui for driver side SQL metrics

[spark] branch branch-3.0 updated: [SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix readResourceOrcFile to create a local file from resource

[spark] branch master updated (c4e98c0 -> d025ddba)

18 matches

Site Navigation

Mail list logo

Footer information