date:20190730

[spark] branch master updated: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0

2019-07-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a745381  [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 
3.0
a745381 is described below

commit a745381b9d3dd290057ef3089de7fdb9264f1f8b
Author: WeichenXu 
AuthorDate: Wed Jul 31 14:26:18 2019 +0900

[SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0

## What changes were proposed in this pull request?

I remove the deprecate `ImageSchema.readImages`.
Move some useful methods from class `ImageSchema` into class 
`ImageFileFormat`.

In pyspark, I rename `ImageSchema` class to be `ImageUtils`, and keep some 
useful python methods in it.

## How was this patch tested?

UT.

Please review https://spark.apache.org/contributing.html before opening a 
pull request.

Closes #25245 from WeichenXu123/remove_image_schema.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
---
 .../org/apache/spark/ml/image/ImageSchema.scala|  72 -
 .../apache/spark/ml/image/ImageSchemaSuite.scala   | 171 -
 .../ml/source/image/ImageFileFormatSuite.scala |  18 +++
 project/MimaExcludes.scala |   6 +-
 python/pyspark/ml/image.py |  38 -
 python/pyspark/ml/tests/test_image.py  |  29 ++--
 6 files changed, 42 insertions(+), 292 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala 
b/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
index a7ddf2f..0313626 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
@@ -191,76 +191,4 @@ object ImageSchema {
   Some(Row(Row(origin, height, width, nChannels, mode, decoded)))
 }
   }
-
-  /**
-   * Read the directory of images from the local or remote source
-   *
-   * @note If multiple jobs are run in parallel with different sampleRatio or 
recursive flag,
-   * there may be a race condition where one job overwrites the hadoop configs 
of another.
-   * @note If sample ratio is less than 1, sampling uses a PathFilter that is 
efficient but
-   * potentially non-deterministic.
-   *
-   * @param path Path to the image directory
-   * @return DataFrame with a single column "image" of images;
-   * see ImageSchema for the details
-   */
-  @deprecated("use `spark.read.format(\"image\").load(path)` and this 
`readImages` will be " +
-"removed in 3.0.0.", "2.4.0")
-  def readImages(path: String): DataFrame = readImages(path, null, false, -1, 
false, 1.0, 0)
-
-  /**
-   * Read the directory of images from the local or remote source
-   *
-   * @note If multiple jobs are run in parallel with different sampleRatio or 
recursive flag,
-   * there may be a race condition where one job overwrites the hadoop configs 
of another.
-   * @note If sample ratio is less than 1, sampling uses a PathFilter that is 
efficient but
-   * potentially non-deterministic.
-   *
-   * @param path Path to the image directory
-   * @param sparkSession Spark Session, if omitted gets or creates the session
-   * @param recursive Recursive path search flag
-   * @param numPartitions Number of the DataFrame partitions,
-   *  if omitted uses defaultParallelism instead
-   * @param dropImageFailures Drop the files that are not valid images from 
the result
-   * @param sampleRatio Fraction of the files loaded
-   * @return DataFrame with a single column "image" of images;
-   * see ImageSchema for the details
-   */
-  @deprecated("use `spark.read.format(\"image\").load(path)` and this 
`readImages` will be " +
-"removed in 3.0.0.", "2.4.0")
-  def readImages(
-  path: String,
-  sparkSession: SparkSession,
-  recursive: Boolean,
-  numPartitions: Int,
-  dropImageFailures: Boolean,
-  sampleRatio: Double,
-  seed: Long): DataFrame = {
-require(sampleRatio <= 1.0 && sampleRatio >= 0, "sampleRatio should be 
between 0 and 1")
-
-val session = if (sparkSession != null) sparkSession else 
SparkSession.builder().getOrCreate
-val partitions =
-  if (numPartitions > 0) {
-numPartitions
-  } else {
-session.sparkContext.defaultParallelism
-  }
-
-RecursiveFlag.withRecursiveFlag(recursive, session) {
-  SamplePathFilter.withPathFilter(sampleRatio, session, seed) {
-val binResult = session.sparkContext.binaryFiles(path, partitions)
-val streams = if (numPartitions == -1) binResult else 
binResult.repartition(partitions)
-val convert = (origin: String, bytes: PortableDataStream) =>
-  decode(origin, bytes.toArray())
-val images = if (dropImageFailures) {
-

[spark] branch branch-2.3 updated: [MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

2019-07-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 78d1bb1  [MINOR][CORE][DOCS] Fix inconsistent description of 
showConsoleProgress
78d1bb1 is described below

commit 78d1bb188efa55038f63ece70aaa0a5ebaa75f5f
Author: gengjiaan 
AuthorDate: Wed Jul 31 12:17:44 2019 +0900

[MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

## What changes were proposed in this pull request?

The latest docs http://spark.apache.org/docs/latest/configuration.html 
contains some description as below:

spark.ui.showConsoleProgress | true | Show the progress bar in the console. 
The progress bar shows the progress of stages that run for longer than 500ms. 
If multiple stages run at the same time, multiple progress bars will be 
displayed on the same line.
-- | -- | --

But the class `org.apache.spark.internal.config.UI` define the config 
`spark.ui.showConsoleProgress` as below:
```
val UI_SHOW_CONSOLE_PROGRESS = ConfigBuilder("spark.ui.showConsoleProgress")
.doc("When true, show the progress bar in the console.")
.booleanConf
.createWithDefault(false)
```
So I think there are exists some little mistake and lead to confuse reader.

## How was this patch tested?

No need UT.

Closes #25297 from beliefer/inconsistent-desc-showConsoleProgress.

Authored-by: gengjiaan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dba4375359a2dfed1f009edc3b1bcf6b3253fe02)
Signed-off-by: HyukjinKwon 
---
 docs/configuration.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 42e6d87..822f903 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -867,11 +867,13 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   spark.ui.showConsoleProgress
-  true
+  false
   
 Show the progress bar in the console. The progress bar shows the progress 
of stages
 that run for longer than 500ms. If multiple stages run at the same time, 
multiple
 progress bars will be displayed on the same line.
+
+Note: In shell environment, the default value of 
spark.ui.showConsoleProgress is true.
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

2019-07-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 992b1bb  [MINOR][CORE][DOCS] Fix inconsistent description of 
showConsoleProgress
992b1bb is described below

commit 992b1bb3697f6ec9fb5fc2853c33b591715dfd76
Author: gengjiaan 
AuthorDate: Wed Jul 31 12:17:44 2019 +0900

[MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

## What changes were proposed in this pull request?

The latest docs http://spark.apache.org/docs/latest/configuration.html 
contains some description as below:

spark.ui.showConsoleProgress | true | Show the progress bar in the console. 
The progress bar shows the progress of stages that run for longer than 500ms. 
If multiple stages run at the same time, multiple progress bars will be 
displayed on the same line.
-- | -- | --

But the class `org.apache.spark.internal.config.UI` define the config 
`spark.ui.showConsoleProgress` as below:
```
val UI_SHOW_CONSOLE_PROGRESS = ConfigBuilder("spark.ui.showConsoleProgress")
.doc("When true, show the progress bar in the console.")
.booleanConf
.createWithDefault(false)
```
So I think there are exists some little mistake and lead to confuse reader.

## How was this patch tested?

No need UT.

Closes #25297 from beliefer/inconsistent-desc-showConsoleProgress.

Authored-by: gengjiaan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dba4375359a2dfed1f009edc3b1bcf6b3253fe02)
Signed-off-by: HyukjinKwon 
---
 docs/configuration.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 73bae77..d481986 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -891,11 +891,13 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   spark.ui.showConsoleProgress
-  true
+  false
   
 Show the progress bar in the console. The progress bar shows the progress 
of stages
 that run for longer than 500ms. If multiple stages run at the same time, 
multiple
 progress bars will be displayed on the same line.
+
+Note: In shell environment, the default value of 
spark.ui.showConsoleProgress is true.
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

2019-07-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dba4375  [MINOR][CORE][DOCS] Fix inconsistent description of 
showConsoleProgress
dba4375 is described below

commit dba4375359a2dfed1f009edc3b1bcf6b3253fe02
Author: gengjiaan 
AuthorDate: Wed Jul 31 12:17:44 2019 +0900

[MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

## What changes were proposed in this pull request?

The latest docs http://spark.apache.org/docs/latest/configuration.html 
contains some description as below:

spark.ui.showConsoleProgress | true | Show the progress bar in the console. 
The progress bar shows the progress of stages that run for longer than 500ms. 
If multiple stages run at the same time, multiple progress bars will be 
displayed on the same line.
-- | -- | --

But the class `org.apache.spark.internal.config.UI` define the config 
`spark.ui.showConsoleProgress` as below:
```
val UI_SHOW_CONSOLE_PROGRESS = ConfigBuilder("spark.ui.showConsoleProgress")
.doc("When true, show the progress bar in the console.")
.booleanConf
.createWithDefault(false)
```
So I think there are exists some little mistake and lead to confuse reader.

## How was this patch tested?

No need UT.

Closes #25297 from beliefer/inconsistent-desc-showConsoleProgress.

Authored-by: gengjiaan 
Signed-off-by: HyukjinKwon 
---
 docs/configuration.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 10886241..93a4fcc 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1062,11 +1062,13 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   spark.ui.showConsoleProgress
-  true
+  false
   
 Show the progress bar in the console. The progress bar shows the progress 
of stages
 that run for longer than 500ms. If multiple stages run at the same time, 
multiple
 progress bars will be displayed on the same line.
+
+Note: In shell environment, the default value of 
spark.ui.showConsoleProgress is true.
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28038][SQL][TEST] Port text.sql

2019-07-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 261e113  [SPARK-28038][SQL][TEST] Port text.sql
261e113 is described below

commit 261e113449cf1ae84feb073caff01cc27cb5d10f
Author: Yuming Wang 
AuthorDate: Wed Jul 31 11:36:26 2019 +0900

[SPARK-28038][SQL][TEST] Port text.sql

## What changes were proposed in this pull request?

This PR is to port text.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/text.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/text.out

When porting the test cases, found a PostgreSQL specific features that do 
not exist in Spark SQL:
[SPARK-28037](https://issues.apache.org/jira/browse/SPARK-28037): Add 
built-in String Functions: quote_literal

Also, found three inconsistent behavior:
[SPARK-27930](https://issues.apache.org/jira/browse/SPARK-27930): Spark 
SQL's format_string can not fully support PostgreSQL's format
[SPARK-28036](https://issues.apache.org/jira/browse/SPARK-28036):  Built-in 
udf left/right has inconsistent behavior
[SPARK-28033](https://issues.apache.org/jira/browse/SPARK-28033): String 
concatenation should low priority than other operators

## How was this patch tested?

N/A

Closes #24862 from wangyum/SPARK-28038.

Authored-by: Yuming Wang 
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/pgSQL/text.sql | 137 
 .../resources/sql-tests/results/pgSQL/text.sql.out | 375 +
 2 files changed, 512 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/text.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/text.sql
new file mode 100644
index 000..04d3acc
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/text.sql
@@ -0,0 +1,137 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- TEXT
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/text.sql
+
+SELECT string('this is a text string') = string('this is a text string') AS 
true;
+
+SELECT string('this is a text string') = string('this is a text strin') AS 
`false`;
+
+CREATE TABLE TEXT_TBL (f1 string) USING parquet;
+
+INSERT INTO TEXT_TBL VALUES ('doh!');
+INSERT INTO TEXT_TBL VALUES ('hi de ho neighbor');
+
+SELECT '' AS two, * FROM TEXT_TBL;
+
+-- As of 8.3 we have removed most implicit casts to text, so that for example
+-- this no longer works:
+-- Spark SQL implicit cast integer to string
+select length(42);
+
+-- But as a special exception for usability's sake, we still allow implicit
+-- casting to text in concatenations, so long as the other input is text or
+-- an unknown literal.  So these work:
+-- [SPARK-28033] String concatenation low priority than other arithmeticBinary
+select string('four: ') || 2+2;
+select 'four: ' || 2+2;
+
+-- but not this:
+-- Spark SQL implicit cast both side to string
+select 3 || 4.0;
+
+/*
+ * various string functions
+ */
+select concat('one');
+-- Spark SQL does not support MMDD, we replace it to MMdd
+select concat(1,2,3,'hello',true, false, to_date('20100309','MMdd'));
+select concat_ws('#','one');
+select concat_ws('#',1,2,3,'hello',true, false, 
to_date('20100309','MMdd'));
+select concat_ws(',',10,20,null,30);
+select concat_ws('',10,20,null,30);
+select concat_ws(NULL,10,20,null,30) is null;
+select reverse('abcde');
+-- [SPARK-28036] Built-in udf left/right has inconsistent behavior
+-- [SPARK-28479] Parser error when enabling ANSI mode
+set spark.sql.parser.ansi.enabled=false;
+select i, left('ahoj', i), right('ahoj', i) from range(-5, 6) t(i) order by i;
+set spark.sql.parser.ansi.enabled=true;
+-- [SPARK-28037] Add built-in String Functions: quote_literal
+-- select quote_literal('');
+-- select quote_literal('abc''');
+-- select quote_literal(e'\\');
+
+-- Skip these tests because Spark does not support variadic labeled argument
+-- check variadic labeled argument
+-- select concat(variadic array[1,2,3]);
+-- select concat_ws(',', variadic array[1,2,3]);
+-- select concat_ws(',', variadic NULL::int[]);
+-- select concat(variadic NULL::int[]) is NULL;
+-- select concat(variadic '{}'::int[]) = '';
+--should fail
+-- select concat_ws(',', variadic 10);
+
+-- [SPARK-27930] Replace format to format_string
+/*
+ * format
+ */
+select format_string(NULL);
+select format_string('Hello');
+select format_string('Hello %s', 'World');
+select format_string('Hello %%');
+select format_string('Hello ');
+-- should fail
+select format_string('Hello %s %s', 'World');
+select format_string('Hello %s');
+select format_string('Hello %x', 20);
+--

[spark] branch master updated: [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon

2019-07-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3b14088  [SPARK-26175][PYTHON] Redirect the standard input of the 
forked child to devnull in daemon
3b14088 is described below

commit 3b140885410362fced9d98fca61d6a357de604af
Author: WeichenXu 
AuthorDate: Wed Jul 31 09:10:24 2019 +0900

[SPARK-26175][PYTHON] Redirect the standard input of the forked child to 
devnull in daemon

## What changes were proposed in this pull request?

PySpark worker daemon reads from stdin the worker PIDs to kill. 
https://github.com/apache/spark/blob/1bb60ab8392adf8b896cc04fb1d060620cf09d8a/python/pyspark/daemon.py#L127

However, the worker process is a forked process from the worker daemon 
process and we didn't close stdin on the child after fork. This means the child 
and user program can read stdin as well, which blocks daemon from receiving the 
PID to kill. This can cause issues because the task reaper might detect the 
task was not terminated and eventually kill the JVM.

This PR fix this by redirecting the standard input of the forked child to 
devnull.

## How was this patch tested?

Manually test.

In `pyspark`, run:
```
import subprocess
def task(_):
  subprocess.check_output(["cat"])

sc.parallelize(range(1), 1).mapPartitions(task).count()
```

Before:
The job will get stuck and press Ctrl+C to exit the job but the python 
worker process do not exit.
After:
The job finish correctly. The "cat" print nothing (because the dummay stdin 
is "/dev/null").
The python worker process exit normally.

Please review https://spark.apache.org/contributing.html before opening a 
pull request.

Closes #25138 from WeichenXu123/SPARK-26175.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/daemon.py | 15 +++
 python/pyspark/sql/tests/test_udf.py | 12 
 2 files changed, 27 insertions(+)

diff --git a/python/pyspark/daemon.py b/python/pyspark/daemon.py
index 6f42ad3..97b6b25 100644
--- a/python/pyspark/daemon.py
+++ b/python/pyspark/daemon.py
@@ -160,6 +160,21 @@ def manager():
 if pid == 0:
 # in child process
 listen_sock.close()
+
+# It should close the standard input in the child process 
so that
+# Python native function executions stay intact.
+#
+# Note that if we just close the standard input (file 
descriptor 0),
+# the lowest file descriptor (file descriptor 0) will be 
allocated,
+# later when other file descriptors should happen to open.
+#
+# Therefore, here we redirects it to '/dev/null' by 
duplicating
+# another file descriptor for '/dev/null' to the standard 
input (0).
+# See SPARK-26175.
+devnull = open(os.devnull, 'r')
+os.dup2(devnull.fileno(), 0)
+devnull.close()
+
 try:
 # Acknowledge that the fork was successful
 outfile = sock.makefile(mode="wb")
diff --git a/python/pyspark/sql/tests/test_udf.py 
b/python/pyspark/sql/tests/test_udf.py
index 803d471..1999311 100644
--- a/python/pyspark/sql/tests/test_udf.py
+++ b/python/pyspark/sql/tests/test_udf.py
@@ -616,6 +616,18 @@ class UDFTests(ReusedSQLTestCase):
 
 self.spark.range(1).select(f()).collect()
 
+def test_worker_original_stdin_closed(self):
+# Test if it closes the original standard input of worker inherited 
from the daemon,
+# and replaces it with '/dev/null'.  See SPARK-26175.
+def task(iterator):
+import sys
+res = sys.stdin.read()
+# Because the standard input is '/dev/null', it reaches to EOF.
+assert res == '', "Expect read EOF from stdin."
+return iterator
+
+self.sc.parallelize(range(1), 1).mapPartitions(task).count()
+
 
 class UDFInitializationTests(unittest.TestCase):
 def tearDown(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (5f4feeb -> 9d9c5a5)

2019-07-30 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5f4feeb  [SPARK-25474][SQL][2.4] Support 
`spark.sql.statistics.fallBackToHdfs` in data source tables
 add 9d9c5a5  [SPARK-26152][CORE][2.4] Synchronize Worker Cleanup with 
Worker Shutdown

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/worker/Worker.scala| 68 +-
 1 file changed, 39 insertions(+), 29 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-30 Thread vanzin

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new abef84a  [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
abef84a is described below

commit abef84a868e9e15f346eea315bbab0ec8ac8e389
Author: mcheah 
AuthorDate: Tue Jul 30 14:17:30 2019 -0700

[SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

## What changes were proposed in this pull request?

As part of the shuffle storage API proposed in SPARK-25299, this introduces 
an API for persisting shuffle data in arbitrary storage systems.

This patch introduces several concepts:
* `ShuffleDataIO`, which is the root of the entire plugin tree that will be 
proposed over the course of the shuffle API project.
* `ShuffleExecutorComponents` - the subset of plugins for managing 
shuffle-related components for each executor. This will in turn instantiate 
shuffle readers and writers.
* `ShuffleMapOutputWriter` interface - instantiated once per map task. This 
provides child `ShufflePartitionWriter` instances for persisting the bytes for 
each partition in the map task.

The default implementation of these plugins exactly mirror what was done by 
the existing shuffle writing code - namely, writing the data to local disk and 
writing an index file. We leverage the APIs in the 
`BypassMergeSortShuffleWriter` only. Follow-up PRs will use the APIs in 
`SortShuffleWriter` and `UnsafeShuffleWriter`, but are left as future work to 
minimize the review surface area.

## How was this patch tested?

New unit tests were added. Micro-benchmarks indicate there's no slowdown in 
the affected code paths.

Closes #25007 from mccheah/spark-shuffle-writer-refactor.

Lead-authored-by: mcheah 
Co-authored-by: mccheah 
Signed-off-by: Marcelo Vanzin 
---
 .../apache/spark/shuffle/api/ShuffleDataIO.java|  49 
 .../shuffle/api/ShuffleExecutorComponents.java |  55 +
 .../spark/shuffle/api/ShuffleMapOutputWriter.java  |  71 ++
 .../spark/shuffle/api/ShufflePartitionWriter.java  |  98 
 .../shuffle/api/WritableByteChannelWrapper.java|  42 
 .../shuffle/sort/BypassMergeSortShuffleWriter.java | 173 +-
 .../shuffle/sort/io/LocalDiskShuffleDataIO.java|  40 
 .../io/LocalDiskShuffleExecutorComponents.java |  71 ++
 .../sort/io/LocalDiskShuffleMapOutputWriter.java   | 261 +
 .../org/apache/spark/internal/config/package.scala |   7 +
 .../spark/shuffle/sort/SortShuffleManager.scala|  25 +-
 .../main/scala/org/apache/spark/util/Utils.scala   |  30 ++-
 .../test/scala/org/apache/spark/ShuffleSuite.scala |  16 +-
 .../sort/BypassMergeSortShuffleWriterSuite.scala   | 149 +++-
 .../io/LocalDiskShuffleMapOutputWriterSuite.scala  | 147 
 15 files changed, 1087 insertions(+), 147 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java 
b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
new file mode 100644
index 000..e9e50ec
--- /dev/null
+++ b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle.api;
+
+import org.apache.spark.annotation.Private;
+
+/**
+ * :: Private ::
+ * An interface for plugging in modules for storing and reading temporary 
shuffle data.
+ * 
+ * This is the root of a plugin system for storing shuffle bytes to arbitrary 
storage
+ * backends in the sort-based shuffle algorithm implemented by the
+ * {@link org.apache.spark.shuffle.sort.SortShuffleManager}. If another 
shuffle algorithm is
+ * needed instead of sort-based shuffle, one should implement
+ * {@link org.apache.spark.shuffle.ShuffleManager} instead.
+ * 
+ * A single instance of this module is loaded per process in the Spark 
application.
+ * The default implementation reads and writes shuffle data from the local 
disks of
+ * the executor, and is the implementation of shuffle file

[spark] branch master updated (44c28d7 -> 121f933)

2019-07-30 Thread vanzin

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 44c28d7  [SPARK-28399][ML][PYTHON] implement RobustScaler
 add 121f933  [SPARK-28525][DEPLOY] Allow Launcher to be applied Java 
options

No new revisions were added by this update.

Summary of changes:
 bin/spark-class| 18 +++---
 conf/spark-env.sh.template |  3 +++
 .../src/main/java/org/apache/spark/launcher/Main.java  |  3 +++
 3 files changed, 21 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (70910e6 -> 44c28d7)

2019-07-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 70910e6  [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to 
dequeue speculative tasks…
 add 44c28d7  [SPARK-28399][ML][PYTHON] implement RobustScaler

No new revisions were added by this update.

Summary of changes:
 docs/ml-features.md|  45 
 ...erExample.java => JavaRobustScalerExample.java} |  22 +-
 ..._scaler_example.py => robust_scaler_example.py} |  13 +-
 ...alerExample.scala => RobustScalerExample.scala} |  18 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala | 288 +
 .../apache/spark/ml/feature/StandardScaler.scala   | 131 +++---
 .../spark/mllib/feature/StandardScaler.scala   | 116 -
 .../spark/ml/feature/RobustScalerSuite.scala   | 209 +++
 .../spark/ml/feature/StandardScalerSuite.scala |   4 +-
 python/pyspark/ml/feature.py   | 162 
 10 files changed, 878 insertions(+), 130 deletions(-)
 copy 
examples/src/main/java/org/apache/spark/examples/ml/{JavaStandardScalerExample.java
 => JavaRobustScalerExample.java} (73%)
 copy examples/src/main/python/ml/{standard_scaler_example.py => 
robust_scaler_example.py} (75%)
 copy 
examples/src/main/scala/org/apache/spark/examples/ml/{StandardScalerExample.scala
 => RobustScalerExample.scala} (79%)
 create mode 100644 
mllib/src/main/scala/org/apache/spark/ml/feature/RobustScaler.scala
 create mode 100644 
mllib/src/test/scala/org/apache/spark/ml/feature/RobustScalerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-30 Thread irashid

This is an automated email from the ASF dual-hosted git repository.

irashid pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 70910e6  [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to 
dequeue speculative tasks…
70910e6 is described below

commit 70910e6ad00f0de4075217d5305d87a477ff1dc4
Author: pgandhi 
AuthorDate: Tue Jul 30 09:54:51 2019 -0500

[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…

… more efficiently

This PR improves the performance of scheduling speculative tasks to be O(1) 
instead of O(numSpeculativeTasks), using the same approach used for scheduling 
regular tasks. The performance of this method is particularly important because 
a lock is held on the TaskSchedulerImpl which is a bottleneck for all 
scheduling operations. We ran a Join query on a large dataset with speculation 
enabled and out of 10 tasks for the ShuffleMapStage, the maximum number of 
speculatable tasks that wa [...]

In particular, this works by storing a separate stack of tasks by executor, 
node, and rack locality preferences. Then when trying to schedule a speculative 
task, rather than scanning all speculative tasks to find ones which match the 
given executor (or node, or rack) preference, we can jump to a quick check of 
tasks matching the resource offer. This technique was already used for regular 
tasks -- this change refactors the code to allow sharing with regular and 
speculative task execution.

## What changes were proposed in this pull request?

Have split the main queue "speculatableTasks" into 5 separate queues based 
on locality preference similar to how normal tasks are enqueued. Thus, the 
"dequeueSpeculativeTask" method will avoid performing locality checks for each 
task at runtime and simply return the preferable task to be executed.

## How was this patch tested?
We ran a spark job that performed a join on a 10 TB dataset to test the 
code change.
Original Code:
https://user-images.githubusercontent.com/8190/51873321-572df280-2322-11e9-9149-0aae08d5edc6.png;>

Optimized Code:
https://user-images.githubusercontent.com/8190/51873343-6745d200-2322-11e9-947b-2cfd0f06bcab.png;>

As you can see, the run time of the ShuffleMapStage came down from 40 min 
to 6 min approximately, thus, reducing the overall running time of the spark 
job by a significant amount.

Another example for the same job:

Original Code:
https://user-images.githubusercontent.com/8190/51873355-70cf3a00-2322-11e9-9c3a-af035449a306.png;>

Optimized Code:
https://user-images.githubusercontent.com/8190/51873367-7dec2900-2322-11e9-8d07-1b1b49285f71.png;>

Closes #23677 from pgandhi999/SPARK-26755.

Lead-authored-by: pgandhi 
Co-authored-by: pgandhi 
Signed-off-by: Imran Rashid 
---
 .../apache/spark/scheduler/TaskSetManager.scala| 292 -
 .../scheduler/OutputCommitCoordinatorSuite.scala   |  12 +-
 .../spark/scheduler/TaskSetManagerSuite.scala  | 122 -
 3 files changed, 242 insertions(+), 184 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
index e7645fc..79a1afc 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
@@ -131,37 +131,17 @@ private[spark] class TaskSetManager(
   // same time for a barrier stage.
   private[scheduler] def isBarrier = taskSet.tasks.nonEmpty && 
taskSet.tasks(0).isBarrier
 
-  // Set of pending tasks for each executor. These collections are actually
-  // treated as stacks, in which new tasks are added to the end of the
-  // ArrayBuffer and removed from the end. This makes it faster to detect
-  // tasks that repeatedly fail because whenever a task failed, it is put
-  // back at the head of the stack. These collections may contain duplicates
-  // for two reasons:
-  // (1): Tasks are only removed lazily; when a task is launched, it remains
-  // in all the pending lists except the one that it was launched from.
-  // (2): Tasks may be re-added to these lists multiple times as a result
-  // of failures.
-  // Duplicates are handled in dequeueTaskFromList, which ensures that a
-  // task hasn't already started running before launching it.
-  private val pendingTasksForExecutor = new HashMap[String, ArrayBuffer[Int]]
-
-  // Set of pending tasks for each host. Similar to pendingTasksForExecutor,
-  // but at host level.
-  private val pendingTasksForHost = new HashMap[String, ArrayBuffer[Int]]
-
-  // Set of pending tasks for each rack -- similar to the above.
-  private val pendingTasksForRack = new HashMap[String, ArrayBuffer[Int]]
-
-

[spark] branch master updated: [SPARK-28071][SQL][TEST] Port strings.sql

2019-07-30 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2656c9d  [SPARK-28071][SQL][TEST] Port strings.sql
2656c9d is described below

commit 2656c9d304b59584c331b923e8536e4093d83f81
Author: Yuming Wang 
AuthorDate: Tue Jul 30 18:54:14 2019 +0900

[SPARK-28071][SQL][TEST] Port strings.sql

## What changes were proposed in this pull request?

This PR is to port strings.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/strings.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/strings.out

When porting the test cases, found nine PostgreSQL specific features that 
do not exist in Spark SQL:
[SPARK-28076](https://issues.apache.org/jira/browse/SPARK-28076): Support 
regular expression substring
[SPARK-28078](https://issues.apache.org/jira/browse/SPARK-28078):  Add 
support other 4 REGEXP functions
[SPARK-28412](https://issues.apache.org/jira/browse/SPARK-28412): OVERLAY 
function support byte array
[SPARK-28083](https://issues.apache.org/jira/browse/SPARK-28083):  ANSI 
SQL: LIKE predicate: ESCAPE clause
[SPARK-28087](https://issues.apache.org/jira/browse/SPARK-28087):  Add 
support split_part
[SPARK-28122](https://issues.apache.org/jira/browse/SPARK-28122): Missing 
`sha224`/`sha256 `/`sha384 `/`sha512 ` functions
[SPARK-28123](https://issues.apache.org/jira/browse/SPARK-28123): Add 
support string functions: btrim
[SPARK-28448](https://issues.apache.org/jira/browse/SPARK-28448): Implement 
ILIKE operator
[SPARK-28449](https://issues.apache.org/jira/browse/SPARK-28449): Missing 
escape_string_warning and standard_conforming_strings config

Also, found five inconsistent behavior:
[SPARK-27952](https://issues.apache.org/jira/browse/SPARK-27952): String 
Functions: regexp_replace is not compatible
[SPARK-28121](https://issues.apache.org/jira/browse/SPARK-28121): decode 
can not accept 'escape' as charset
[SPARK-27930](https://issues.apache.org/jira/browse/SPARK-27930): Replace 
`strpos` with `locate` or `position` in Spark SQL
[SPARK-27930](https://issues.apache.org/jira/browse/SPARK-27930): Replace 
`to_hex` with `hex ` or in Spark SQL
[SPARK-28451](https://issues.apache.org/jira/browse/SPARK-28451): `substr` 
returns different values

## How was this patch tested?

N/A

Closes #24923 from wangyum/SPARK-28071.

Authored-by: Yuming Wang 
Signed-off-by: Takeshi Yamamuro 
---
 .../resources/sql-tests/inputs/pgSQL/strings.sql   | 660 +++
 .../sql-tests/results/pgSQL/strings.sql.out| 718 +
 2 files changed, 1378 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/strings.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/strings.sql
new file mode 100644
index 000..a684428
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/strings.sql
@@ -0,0 +1,660 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+-- STRINGS
+-- -- 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/strings.sql
+-- Test various data entry syntaxes.
+--
+
+-- SQL string continuation syntax
+-- E021-03 character string literals
+SELECT 'first line'
+' - next line'
+   ' - third line'
+   AS `Three lines to one`;
+
+-- Spark SQL support this string continuation syntax
+-- illegal string continuation syntax
+SELECT 'first line'
+' - next line' /* this comment is not allowed here */
+' - third line'
+   AS `Illegal comment within continuation`;
+
+-- [SPARK-28447] ANSI SQL: Unicode escapes in literals
+-- Unicode escapes
+-- SET standard_conforming_strings TO on;
+
+-- SELECT U&'d\0061t\+61' AS U&"d\0061t\+61";
+-- SELECT U&'d!0061t\+61' UESCAPE '!' AS U&"d*0061t\+61" UESCAPE '*';
+
+-- SELECT U&' \' UESCAPE '!' AS "tricky";
+-- SELECT 'tricky' AS U&"\" UESCAPE '!';
+
+-- SELECT U&'wrong: \061';
+-- SELECT U&'wrong: \+0061';
+-- SELECT U&'wrong: +0061' UESCAPE '+';
+
+-- SET standard_conforming_strings TO off;
+
+-- SELECT U&'d\0061t\+61' AS U&"d\0061t\+61";
+-- SELECT U&'d!0061t\+61' UESCAPE '!' AS U&"d*0061t\+61" UESCAPE '*';
+
+-- SELECT U&' \' UESCAPE '!' AS "tricky";
+-- SELECT 'tricky' AS U&"\" UESCAPE '!';
+
+-- SELECT U&'wrong: \061';
+-- SELECT U&'wrong: \+0061';
+-- SELECT U&'wrong: +0061' UESCAPE '+';
+
+-- RESET standard_conforming_strings;
+
+-- Spark SQL only support escape mode
+-- bytea
+-- SET bytea_output TO hex;
+-- SELECT E'\\xDeAdBeEf'::bytea;
+-- SELECT E'\\x De Ad Be Ef '::bytea;
+-- SELECT E'\\xDeAdBeE'::bytea;
+-- SELECT E'\\xDeAdBeEx'::bytea;
+-- SELECT

[spark] branch master updated: [SPARK-28178][SQL] DataSourceV2: DataFrameWriter.insertInfo

2019-07-30 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 749b1d3  [SPARK-28178][SQL] DataSourceV2: DataFrameWriter.insertInfo
749b1d3 is described below

commit 749b1d3a45584554a2fd8c47e232f5095316e10c
Author: John Zhuge 
AuthorDate: Tue Jul 30 17:22:33 2019 +0800

[SPARK-28178][SQL] DataSourceV2: DataFrameWriter.insertInfo

## What changes were proposed in this pull request?

Support multiple catalogs in the following InsertInto use cases:

- DataFrameWriter.insertInto("catalog.db.tbl")

Support matrix:

SaveMode|Partitioned Table|Partition Overwrite Mode|Action
|-||--
Append|*|*|AppendData
Overwrite|no|*|OverwriteByExpression(true)
Overwrite|yes|STATIC|OverwriteByExpression(true)
Overwrite|yes|DYNAMIC|OverwritePartitionsDynamic

## How was this patch tested?

New tests.
All existing catalyst and sql/core tests.

Closes #24980 from jzhuge/SPARK-28178-pr.

Authored-by: John Zhuge 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/DataFrameWriter.scala |  48 -
 .../sources/v2/DataSourceV2DataFrameSuite.scala| 107 +
 2 files changed, 151 insertions(+), 4 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index b7b1390..549c54f 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -22,16 +22,19 @@ import java.util.{Locale, Properties, UUID}
 import scala.collection.JavaConverters._
 
 import org.apache.spark.annotation.Stable
+import org.apache.spark.sql.catalog.v2.{CatalogPlugin, Identifier}
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.{EliminateSubqueryAliases, 
UnresolvedRelation}
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.expressions.Literal
-import org.apache.spark.sql.catalyst.plans.logical.{AppendData, 
InsertIntoTable, LogicalPlan, OverwriteByExpression}
+import org.apache.spark.sql.catalyst.plans.logical.{AppendData, 
InsertIntoTable, LogicalPlan, OverwriteByExpression, OverwritePartitionsDynamic}
 import org.apache.spark.sql.execution.SQLExecution
 import org.apache.spark.sql.execution.command.DDLUtils
 import org.apache.spark.sql.execution.datasources.{CreateTable, DataSource, 
DataSourceUtils, LogicalRelation}
 import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.internal.SQLConf.PartitionOverwriteMode
 import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister}
+import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.sources.v2._
 import org.apache.spark.sql.sources.v2.TableCapability._
 import org.apache.spark.sql.types.StructType
@@ -356,10 +359,8 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 1.4.0
*/
   def insertInto(tableName: String): Unit = {
-
insertInto(df.sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName))
-  }
+import df.sparkSession.sessionState.analyzer.{AsTableIdentifier, 
CatalogObjectIdentifier}
 
-  private def insertInto(tableIdent: TableIdentifier): Unit = {
 assertNotBucketed("insertInto")
 
 if (partitioningColumns.isDefined) {
@@ -370,6 +371,45 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   )
 }
 
+df.sparkSession.sessionState.sqlParser.parseMultipartIdentifier(tableName) 
match {
+  case CatalogObjectIdentifier(Some(catalog), ident) =>
+insertInto(catalog, ident)
+  case AsTableIdentifier(tableIdentifier) =>
+insertInto(tableIdentifier)
+}
+  }
+
+  private def insertInto(catalog: CatalogPlugin, ident: Identifier): Unit = {
+import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
+
+val table = 
DataSourceV2Relation.create(catalog.asTableCatalog.loadTable(ident))
+
+val command = modeForDSV2 match {
+  case SaveMode.Append =>
+AppendData.byName(table, df.logicalPlan)
+
+  case SaveMode.Overwrite =>
+val conf = df.sparkSession.sessionState.conf
+val dynamicPartitionOverwrite = table.table.partitioning.size > 0 &&
+  conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC
+
+if (dynamicPartitionOverwrite) {
+  OverwritePartitionsDynamic.byName(table, df.logicalPlan)
+} else {
+  OverwriteByExpression.byName(table, df.logicalPlan, Literal(true))
+}
+
+  case other =>
+throw new AnalysisException(s"insertInto does not support $other mode, 
" +
+  s"please use Append or

[spark] branch branch-2.3 updated (416aba4 -> 998ac04)

2019-07-30 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 416aba4  [SPARK-25474][SQL][2.3] Support 
`spark.sql.statistics.fallBackToHdfs` in data source tables
 add 998ac04  [SPARK-28156][SQL][BACKPORT-2.3] Self-join should not miss 
cached view

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 -
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 43 
 .../apache/spark/sql/catalyst/analysis/view.scala  | 80 +++---
 .../plans/logical/basicLogicalOperators.scala  |  3 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 31 +
 5 files changed, 103 insertions(+), 56 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d530d86 -> df84bfe)

2019-07-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d530d86  [SPARK-28326][SQL][TEST] Port join.sql
 add df84bfe  [SPARK-28406][SQL][TEST] Port union.sql

No new revisions were added by this update.

Summary of changes:
 .../resources/sql-tests/inputs/pgSQL/union.sql | 472 +
 .../sql-tests/results/pgSQL/union.sql.out  | 771 +
 2 files changed, 1243 insertions(+)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/pgSQL/union.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/pgSQL/union.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (196a4d7 -> d530d86)

2019-07-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 196a4d7  [SPARK-28556][SQL] QueryExecutionListener should also notify 
Error
 add d530d86  [SPARK-28326][SQL][TEST] Port join.sql

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/pgSQL/join.sql | 2079 
 .../resources/sql-tests/results/pgSQL/join.sql.out | 3440 
 2 files changed, 5519 insertions(+)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/pgSQL/join.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/pgSQL/join.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0

[spark] branch branch-2.3 updated: [MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

[spark] branch branch-2.4 updated: [MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

[spark] branch master updated: [MINOR][CORE][DOCS] Fix inconsistent description of showConsoleProgress

[spark] branch master updated: [SPARK-28038][SQL][TEST] Port text.sql

[spark] branch master updated: [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon

[spark] branch branch-2.4 updated (5f4feeb -> 9d9c5a5)

[spark] branch master updated: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

[spark] branch master updated (44c28d7 -> 121f933)

[spark] branch master updated (70910e6 -> 44c28d7)

[spark] branch master updated: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

[spark] branch master updated: [SPARK-28071][SQL][TEST] Port strings.sql

[spark] branch master updated: [SPARK-28178][SQL] DataSourceV2: DataFrameWriter.insertInfo

[spark] branch branch-2.3 updated (416aba4 -> 998ac04)

[spark] branch master updated (d530d86 -> df84bfe)

[spark] branch master updated (196a4d7 -> d530d86)

16 matches

Site Navigation

Mail list logo

Footer information