date:20191205

[spark] branch master updated (a5ccbce -> b86d4bb)

2019-12-05 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a5ccbce  [SPARK-30067][CORE] Fix fragment offset comparison in 
getBlockHosts
 add b86d4bb  [SPARK-30001][SQL] ResolveRelations should handle both V1 and 
V2 tables

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 167 +++--
 .../sql/catalyst/catalog/SessionCatalog.scala  |  36 +++--
 .../sql/connector/catalog/LookupCatalog.scala  |  22 +++
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  12 ++
 .../execution/command/PlanResolutionSuite.scala|   3 +-
 5 files changed, 143 insertions(+), 97 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (da27f91 -> a5ccbce)

2019-12-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from da27f91  [SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit 
jdk8/jdk11
 add a5ccbce  [SPARK-30067][CORE] Fix fragment offset comparison in 
getBlockHosts

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (25431d7 -> da27f91)

2019-12-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 25431d7  [SPARK-29953][SS] Don't clean up source files for 
FileStreamSource if the files belong to the output of FileStreamSink
 add da27f91  [SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit 
jdk8/jdk11

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 38 ++
 1 file changed, 38 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-05 Thread zsxwing

This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 25431d7  [SPARK-29953][SS] Don't clean up source files for 
FileStreamSource if the files belong to the output of FileStreamSink
25431d7 is described below

commit 25431d79f7daf2a68298701154eb505c2a4add80
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Thu Dec 5 21:46:28 2019 -0800

[SPARK-29953][SS] Don't clean up source files for FileStreamSource if the 
files belong to the output of FileStreamSink

### What changes were proposed in this pull request?

This patch prevents the cleanup operation in FileStreamSource if the source 
files belong to the FileStreamSink. This is needed because the output of 
FileStreamSink can be read with multiple Spark queries and queries will read 
the files based on the metadata log, which won't reflect the cleanup.

To simplify the logic, the patch only takes care of the case of when the 
source path without glob pattern refers to the output directory of 
FileStreamSink, via checking FileStreamSource to see whether it leverages 
metadata directory or not to list the source files.

### Why are the changes needed?

Without this patch, if end users turn on cleanup option with the path which 
is the output of FileStreamSink, there may be out of sync between metadata and 
available files which may break other queries reading the path.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added UT.

Closes #26590 from HeartSaVioR/SPARK-29953.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Shixiong Zhu 
---
 docs/structured-streaming-programming-guide.md |  2 +-
 .../sql/execution/streaming/FileStreamSource.scala | 17 -
 .../sql/streaming/FileStreamSourceSuite.scala  | 83 +-
 3 files changed, 81 insertions(+), 21 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 01679e5..b91b930 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -551,7 +551,7 @@ Here are the details of all the sources in Spark.
 When "archive" is provided, additional option 
sourceArchiveDir must be provided as well. The value of 
"sourceArchiveDir" must have 2 subdirectories (so depth of directory is greater 
than 2). e.g. /archived/here. This will ensure archived files are 
never included as new source files.
 Spark will move source files respecting their own path. For example, 
if the path of source file is /a/b/dataset.txt and the path of 
archive directory is /archived/here, file will be moved to 
/archived/here/a/b/dataset.txt.
 NOTE: Both archiving (via moving) or deleting completed files will 
introduce overhead (slow down) in each micro-batch, so you need to understand 
the cost for each operation in your file system before enabling this option. On 
the other hand, enabling this option will reduce the cost to list source files 
which can be an expensive operation.
-NOTE 2: The source path should not be used from multiple sources or 
queries when enabling this option.
+NOTE 2: The source path should not be used from multiple sources or 
queries when enabling this option. Similarly, you must ensure the source path 
doesn't match to any files in output directory of file stream sink.
 NOTE 3: Both delete and move actions are best effort. Failing to 
delete or move files will not fail the streaming query.
 
 For file-format-specific options, see the related methods in 
DataStreamReader
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
index 35d486c..f31fb32 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
@@ -206,6 +206,17 @@ class FileStreamSource(
   CaseInsensitiveMap(options), None).allFiles()
   }
 
+  private def setSourceHasMetadata(newValue: Option[Boolean]): Unit = newValue 
match {
+case Some(true) =>
+  if (sourceCleaner.isDefined) {
+throw new UnsupportedOperationException("Clean up source files is not 
supported when" +
+  " reading from the output directory of FileStreamSink.")
+  }
+  sourceHasMetadata = Some(true)
+case _ =>
+  sourceHasMetadata = newValue
+  }
+
   /**
* Returns a list of files found, sorted by their timestamp.
*/
@@ -216,7 +227,7 @@ class FileStreamSource(
 sourceHasMetadata match {
   case None =>

[spark] branch branch-2.4 updated: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-05 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 6dff114  [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec 
when numIterations are large
6dff114 is described below

commit 6dff114ddb3de7877625b79bea818ba724ccd22d
Author: Liang-Chi Hsieh 
AuthorDate: Thu Dec 5 16:32:33 2019 -0800

[SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when 
numIterations are large

### What changes were proposed in this pull request?

This patch adds normalization to word vectors when fitting dataset in 
Word2Vec.

### Why are the changes needed?

Running Word2Vec on some datasets, when numIterations is large, can produce 
infinity word vectors.

### Does this PR introduce any user-facing change?

Yes. After this patch, Word2Vec won't produce infinity word vectors.

### How was this patch tested?

Manually. This issue is not always reproducible on any dataset. The dataset 
known to reproduce it is too large (925M) to upload.

```scala
case class Sentences(name: String, words: Array[String])
val dataset = spark.read
  .option("header", "true").option("sep", "\t")
  .option("quote", "").option("nullValue", "\\N")
  .csv("/tmp/title.akas.tsv")
  .filter("region = 'US' or language = 'en'")
  .select("title")
  .as[String]
  .map(s => Sentences(s, s.split(' ')))
  .persist()

println("Training model...")
val word2Vec = new Word2Vec()
  .setInputCol("words")
  .setOutputCol("vector")
  .setVectorSize(64)
  .setWindowSize(4)
  .setNumPartitions(50)
  .setMinCount(5)
  .setMaxIter(30)
val model = word2Vec.fit(dataset)
model.getVectors.show()
```

Before:
```
Training model...
+-++
| word|  vector|
+-++
| Unspoken|[-Infinity,-Infin...|
|   Talent|[-Infinity,Infini...|
|Hourglass|[2.02805806500023...|
|Nickelodeon's|[-4.2918617120906...|
|  Priests|[-1.3570403355926...|
|Religion:|[-6.7049072282803...|
|   Bu|[5.05591774315586...|
|  Totoro:|[-1.0539840178632...|
| Trouble,|[-3.5363592836003...|
|   Hatter|[4.90413981352826...|
|  '79|[7.50436471285412...|
| Vile|[-2.9147142985312...|
| 9/11|[-Infinity,Infini...|
|  Santino|[1.30005911270850...|
|  Motives|[-1.2538958306253...|
|  '13|[-4.5040152427657...|
|   Fierce|[Infinity,Infinit...|
|   Stover|[-2.6326895394029...|
|  'It|[1.66574533864436...|
|Butts|[Infinity,Infinit...|
+-++
only showing top 20 rows
```

After:
```
Training model...
+-++
| word|  vector|
+-++
| Unspoken|[-0.0454501919448...|
|   Talent|[-0.2657704949378...|
|Hourglass|[-0.1399687677621...|
|Nickelodeon's|[-0.1767119318246...|
|  Priests|[-0.0047509293071...|
|Religion:|[-0.0411605164408...|
|   Bu|[0.11837736517190...|
|  Totoro:|[0.05258282646536...|
| Trouble,|[0.09482011198997...|
|   Hatter|[0.06040831282734...|
|  '79|[0.04783720895648...|
| Vile|[-0.0017210749210...|
| 9/11|[-0.0713915303349...|
|  Santino|[-0.0412711687386...|
|  Motives|[-0.0492418706417...|
|  '13|[-0.0073119504377...|
|   Fierce|[-0.0565455369651...|
|   Stover|[0.06938160210847...|
|  'It|[0.01117012929171...|
|Butts|[0.05374567210674...|
+-++
only showing top 20 rows
```

Closes #26722 from viirya/SPARK-24666-2.

Lead-authored-by: Liang-Chi Hsieh 
Co-authored-by: Liang-Chi Hsieh 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 755d8894485396b0a21304568c8ec5a55030f2fd)
Signed-off-by: Liang-Chi Hsieh 
---
 .../scala/org/apache/spark/mllib/feature/Word2Vec.scala | 17 ++---
 .../org/apache/spark/ml/feature/Word2VecSuite.scala |  8 
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index d5b91df..bb5d02e 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -438,9 +438,20 @@ class Word2Vec extends Serializable with Logging {
   }
 }.flatten
   }
-

[spark] branch master updated (7782b61 -> 755d889)

2019-12-05 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7782b61  [SPARK-29392][CORE][SQL][FOLLOWUP] Avoid deprecated (in 2.13) 
Symbol syntax 'foo in favor of simpler expression, where it generated 
deprecation warnings
 add 755d889  [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec 
when numIterations are large

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/mllib/feature/Word2Vec.scala | 17 ++---
 .../org/apache/spark/ml/feature/Word2VecSuite.scala |  8 
 2 files changed, 14 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5892bbf -> 7782b61)

2019-12-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5892bbf  [SPARK-30124][MLLIB] unnecessary persist in 
PythonMLLibAPI.scala
 add 7782b61  [SPARK-29392][CORE][SQL][FOLLOWUP] Avoid deprecated (in 2.13) 
Symbol syntax 'foo in favor of simpler expression, where it generated 
deprecation warnings

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/DistributionSuite.scala |   4 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  56 +--
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  80 
 .../apache/spark/sql/DataFrameAggregateSuite.scala |   2 +-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  50 +-
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  |   8 +-
 .../org/apache/spark/sql/DataFramePivotSuite.scala |   4 +-
 .../apache/spark/sql/DataFrameSelfJoinSuite.scala  |   8 +-
 .../spark/sql/DataFrameSetOperationsSuite.scala|   8 +-
 .../org/apache/spark/sql/DataFrameStatSuite.scala  |   4 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 105 +++--
 .../spark/sql/sources/HadoopFsRelationTest.scala   |  26 ++---
 .../sql/sources/ParquetHadoopFsRelationSuite.scala |   4 +-
 13 files changed, 183 insertions(+), 176 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5892bbf -> 7782b61)

2019-12-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5892bbf  [SPARK-30124][MLLIB] unnecessary persist in 
PythonMLLibAPI.scala
 add 7782b61  [SPARK-29392][CORE][SQL][FOLLOWUP] Avoid deprecated (in 2.13) 
Symbol syntax 'foo in favor of simpler expression, where it generated 
deprecation warnings

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/DistributionSuite.scala |   4 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  56 +--
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  80 
 .../apache/spark/sql/DataFrameAggregateSuite.scala |   2 +-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  50 +-
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  |   8 +-
 .../org/apache/spark/sql/DataFramePivotSuite.scala |   4 +-
 .../apache/spark/sql/DataFrameSelfJoinSuite.scala  |   8 +-
 .../spark/sql/DataFrameSetOperationsSuite.scala|   8 +-
 .../org/apache/spark/sql/DataFrameStatSuite.scala  |   4 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 105 +++--
 .../spark/sql/sources/HadoopFsRelationTest.scala   |  26 ++---
 .../sql/sources/ParquetHadoopFsRelationSuite.scala |   4 +-
 13 files changed, 183 insertions(+), 176 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (35bab33 -> 5892bbf)

2019-12-05 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 35bab33  [SPARK-30121][BUILD] Fix memory usage in sbt build script
 add 5892bbf  [SPARK-30124][MLLIB] unnecessary persist in 
PythonMLLibAPI.scala

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala| 6 +-
 .../scala/org/apache/spark/mllib/clustering/GaussianMixture.scala   | 1 +
 2 files changed, 2 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9cae37 -> 35bab33)

2019-12-05 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9cae37  [SPARK-29774][SQL] Date and Timestamp type +/- null should be 
null as Postgres
 add 35bab33  [SPARK-30121][BUILD] Fix memory usage in sbt build script

No new revisions were added by this update.

Summary of changes:
 build/sbt |  2 +-
 build/sbt-launch-lib.bash | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth

2019-12-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 3b64e2f  [SPARK-30129][CORE][2.4] Set client's id in TransportClient 
after successful auth
3b64e2f is described below

commit 3b64e2f35657ef0a4001a4f0926ead8cd9226a28
Author: Marcelo Vanzin 
AuthorDate: Thu Dec 5 09:02:10 2019 -0800

[SPARK-30129][CORE][2.4] Set client's id in TransportClient after 
successful auth

The new auth code was missing this bit, so it was not possible to know which
app a client belonged to when auth was on.

I also refactored the SASL test that checks for this so it also checks the
new protocol (test failed before the fix, passes now).

Closes #26764 from vanzin/SPARK-30129-2.4.

Authored-by: Marcelo Vanzin 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/network/crypto/AuthClientBootstrap.java  |   1 +
 .../spark/network/crypto/AuthRpcHandler.java   |   1 +
 .../spark/network/sasl/SaslIntegrationSuite.java   | 117 -
 .../spark/network/shuffle/AppIsolationSuite.java   | 184 +
 4 files changed, 186 insertions(+), 117 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
index 3c26378..737e187 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
@@ -77,6 +77,7 @@ public class AuthClientBootstrap implements 
TransportClientBootstrap {
 
 try {
   doSparkAuth(client, channel);
+  client.setClientId(appId);
 } catch (GeneralSecurityException | IOException e) {
   throw Throwables.propagate(e);
 } catch (RuntimeException e) {
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
index fb44dbb..821cc7a 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
@@ -125,6 +125,7 @@ class AuthRpcHandler extends RpcHandler {
   response.encode(responseData);
   callback.onSuccess(responseData.nioBuffer());
   engine.sessionCipher().addToChannel(channel);
+  client.setClientId(challenge.appId);
 } catch (Exception e) {
   // This is a fatal error: authentication has failed. Close the channel 
explicitly.
   LOG.debug("Authentication failed for client {}, closing channel.", 
channel.remoteAddress());
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
index 02e6eb3..0ef01ea 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
@@ -21,8 +21,6 @@ import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Arrays;
-import java.util.concurrent.CountDownLatch;
-import java.util.concurrent.atomic.AtomicReference;
 
 import org.junit.After;
 import org.junit.AfterClass;
@@ -34,8 +32,6 @@ import static org.mockito.Mockito.*;
 
 import org.apache.spark.network.TestUtils;
 import org.apache.spark.network.TransportContext;
-import org.apache.spark.network.buffer.ManagedBuffer;
-import org.apache.spark.network.client.ChunkReceivedCallback;
 import org.apache.spark.network.client.RpcResponseCallback;
 import org.apache.spark.network.client.TransportClient;
 import org.apache.spark.network.client.TransportClientFactory;
@@ -44,15 +40,6 @@ import org.apache.spark.network.server.RpcHandler;
 import org.apache.spark.network.server.StreamManager;
 import org.apache.spark.network.server.TransportServer;
 import org.apache.spark.network.server.TransportServerBootstrap;
-import org.apache.spark.network.shuffle.BlockFetchingListener;
-import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler;
-import org.apache.spark.network.shuffle.ExternalShuffleBlockResolver;
-import org.apache.spark.network.shuffle.OneForOneBlockFetcher;
-import org.apache.spark.network.shuffle.protocol.BlockTransferMessage;
-import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
-import org.apache.spark.network.shuffle.protocol.OpenBlocks;
-import org.apache.spark.network.shuffle.protocol.RegisterExecutor;
-import org.apache.spark.network.shuffle.protocol.StreamHandle;
 import

[spark] branch branch-2.4 updated: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth

2019-12-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 3b64e2f  [SPARK-30129][CORE][2.4] Set client's id in TransportClient 
after successful auth
3b64e2f is described below

commit 3b64e2f35657ef0a4001a4f0926ead8cd9226a28
Author: Marcelo Vanzin 
AuthorDate: Thu Dec 5 09:02:10 2019 -0800

[SPARK-30129][CORE][2.4] Set client's id in TransportClient after 
successful auth

The new auth code was missing this bit, so it was not possible to know which
app a client belonged to when auth was on.

I also refactored the SASL test that checks for this so it also checks the
new protocol (test failed before the fix, passes now).

Closes #26764 from vanzin/SPARK-30129-2.4.

Authored-by: Marcelo Vanzin 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/network/crypto/AuthClientBootstrap.java  |   1 +
 .../spark/network/crypto/AuthRpcHandler.java   |   1 +
 .../spark/network/sasl/SaslIntegrationSuite.java   | 117 -
 .../spark/network/shuffle/AppIsolationSuite.java   | 184 +
 4 files changed, 186 insertions(+), 117 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
index 3c26378..737e187 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthClientBootstrap.java
@@ -77,6 +77,7 @@ public class AuthClientBootstrap implements 
TransportClientBootstrap {
 
 try {
   doSparkAuth(client, channel);
+  client.setClientId(appId);
 } catch (GeneralSecurityException | IOException e) {
   throw Throwables.propagate(e);
 } catch (RuntimeException e) {
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
index fb44dbb..821cc7a 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java
@@ -125,6 +125,7 @@ class AuthRpcHandler extends RpcHandler {
   response.encode(responseData);
   callback.onSuccess(responseData.nioBuffer());
   engine.sessionCipher().addToChannel(channel);
+  client.setClientId(challenge.appId);
 } catch (Exception e) {
   // This is a fatal error: authentication has failed. Close the channel 
explicitly.
   LOG.debug("Authentication failed for client {}, closing channel.", 
channel.remoteAddress());
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
index 02e6eb3..0ef01ea 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java
@@ -21,8 +21,6 @@ import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Arrays;
-import java.util.concurrent.CountDownLatch;
-import java.util.concurrent.atomic.AtomicReference;
 
 import org.junit.After;
 import org.junit.AfterClass;
@@ -34,8 +32,6 @@ import static org.mockito.Mockito.*;
 
 import org.apache.spark.network.TestUtils;
 import org.apache.spark.network.TransportContext;
-import org.apache.spark.network.buffer.ManagedBuffer;
-import org.apache.spark.network.client.ChunkReceivedCallback;
 import org.apache.spark.network.client.RpcResponseCallback;
 import org.apache.spark.network.client.TransportClient;
 import org.apache.spark.network.client.TransportClientFactory;
@@ -44,15 +40,6 @@ import org.apache.spark.network.server.RpcHandler;
 import org.apache.spark.network.server.StreamManager;
 import org.apache.spark.network.server.TransportServer;
 import org.apache.spark.network.server.TransportServerBootstrap;
-import org.apache.spark.network.shuffle.BlockFetchingListener;
-import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler;
-import org.apache.spark.network.shuffle.ExternalShuffleBlockResolver;
-import org.apache.spark.network.shuffle.OneForOneBlockFetcher;
-import org.apache.spark.network.shuffle.protocol.BlockTransferMessage;
-import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
-import org.apache.spark.network.shuffle.protocol.OpenBlocks;
-import org.apache.spark.network.shuffle.protocol.RegisterExecutor;
-import org.apache.spark.network.shuffle.protocol.StreamHandle;
 import

[spark] branch master updated (332e252 -> b9cae37)

2019-12-05 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 332e252  [SPARK-29425][SQL] The ownership of a database should be 
respected
 add b9cae37  [SPARK-29774][SQL] Date and Timestamp type +/- null should be 
null as Postgres

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  60 +
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  61 ++---
 .../catalyst/expressions/datetimeExpressions.scala |  20 +-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  59 -
 .../expressions/DateExpressionsSuite.scala |  16 ++
 .../test/resources/sql-tests/inputs/datetime.sql   |  48 +++-
 .../resources/sql-tests/results/datetime.sql.out   | 280 +++--
 .../typeCoercion/native/dateTimeOperations.sql.out |  54 ++--
 .../typeCoercion/native/decimalPrecision.sql.out   |  48 ++--
 .../typeCoercion/native/promoteStrings.sql.out |  15 +-
 10 files changed, 457 insertions(+), 204 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0ab922c -> 332e252)

2019-12-05 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0ab922c  [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery
 add 332e252  [SPARK-29425][SQL] The ownership of a database should be 
respected

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/SupportsNamespaces.java  | 10 
 .../catalyst/catalog/ExternalCatalogSuite.scala|  6 ++-
 .../sql/catalyst/catalog/SessionCatalogSuite.scala |  8 +--
 .../apache/spark/sql/execution/command/ddl.scala   | 16 --
 .../spark/sql/execution/command/DDLSuite.scala | 19 +---
 .../spark/sql/hive/client/HiveClientImpl.scala | 43 ++--
 .../apache/spark/sql/hive/client/HiveShim.scala| 57 +-
 .../spark/sql/hive/client/VersionsSuite.scala  | 29 +++
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 40 ++-
 9 files changed, 191 insertions(+), 37 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0bd8b99 -> 0ab922c)

2019-12-05 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0bd8b99  [SPARK-30093][SQL] Improve error message for creating view
 add 0ab922c  [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/TypeCoercion.scala  |  3 +--
 .../negative-cases/subq-input-typecheck.sql |  2 +-
 .../negative-cases/subq-input-typecheck.sql.out |  6 +++---
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala  | 21 +
 4 files changed, 26 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a5ccbce -> b86d4bb)

[spark] branch master updated (da27f91 -> a5ccbce)

[spark] branch master updated (25431d7 -> da27f91)

[spark] branch master updated: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

[spark] branch branch-2.4 updated: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

[spark] branch master updated (7782b61 -> 755d889)

[spark] branch master updated (5892bbf -> 7782b61)

[spark] branch master updated (5892bbf -> 7782b61)

[spark] branch master updated (35bab33 -> 5892bbf)

[spark] branch master updated (b9cae37 -> 35bab33)

[spark] branch branch-2.4 updated: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth

[spark] branch branch-2.4 updated: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth

[spark] branch master updated (332e252 -> b9cae37)

[spark] branch master updated (0ab922c -> 332e252)

[spark] branch master updated (0bd8b99 -> 0ab922c)

15 matches

Site Navigation

Mail list logo

Footer information