svn commit: r31831 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_08_23_01-6277a9f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Jan 9 07:16:32 2019 New Revision: 31831 Log: Apache Spark 2.4.1-SNAPSHOT-2019_01_08_23_01-6277a9f docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31829 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_08_20_55-dbbba80-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Jan 9 05:08:50 2019 New Revision: 31829 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_08_20_55-dbbba80 docs [This commit notification would consist of 1775 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26549][PYSPARK] Fix for python worker reuse take no effect for parallelize lazy iterable range
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dbbba80 [SPARK-26549][PYSPARK] Fix for python worker reuse take no effect for parallelize lazy iterable range dbbba80 is described below commit dbbba80b3cb319b147dcf82a69963eee662e289f Author: Yuanjian Li AuthorDate: Wed Jan 9 11:55:12 2019 +0800 [SPARK-26549][PYSPARK] Fix for python worker reuse take no effect for parallelize lazy iterable range ## What changes were proposed in this pull request? During the follow-up work(#23435) for PySpark worker reuse scenario, we found that the worker reuse takes no effect for `sc.parallelize(xrange(...))`. It happened because of the specialize rdd.parallelize logic for xrange(introduced in #3264) generated data by lazy iterable range, which don't need to use the passed-in iterator. But this will break the end of stream checking in python worker and finally cause worker reuse takes no effect. See more details in [SPARK-26549](https://issue [...] We fix this by force using the passed-in iterator. ## How was this patch tested? New UT in test_worker.py. Closes #23470 from xuanyuanking/SPARK-26549. Authored-by: Yuanjian Li Signed-off-by: Hyukjin Kwon --- python/pyspark/context.py | 8 python/pyspark/tests/test_worker.py | 12 +++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/python/pyspark/context.py b/python/pyspark/context.py index 64178eb..316fbc8 100644 --- a/python/pyspark/context.py +++ b/python/pyspark/context.py @@ -498,6 +498,14 @@ class SparkContext(object): return start0 + int((split * size / numSlices)) * step def f(split, iterator): +# it's an empty iterator here but we need this line for triggering the +# logic of signal handling in FramedSerializer.load_stream, for instance, +# SpecialLengths.END_OF_DATA_SECTION in _read_with_length. Since +# FramedSerializer.load_stream produces a generator, the control should +# at least be in that function once. Here we do it by explicitly converting +# the empty iterator to a list, thus make sure worker reuse takes effect. +# See more details in SPARK-26549. +assert len(list(iterator)) == 0 return xrange(getStart(split), getStart(split + 1), step) return self.parallelize([], numSlices).mapPartitionsWithIndex(f) diff --git a/python/pyspark/tests/test_worker.py b/python/pyspark/tests/test_worker.py index a33b77d..a4f108f 100644 --- a/python/pyspark/tests/test_worker.py +++ b/python/pyspark/tests/test_worker.py @@ -22,7 +22,7 @@ import time from py4j.protocol import Py4JJavaError -from pyspark.testing.utils import ReusedPySparkTestCase, QuietTest +from pyspark.testing.utils import ReusedPySparkTestCase, PySparkTestCase, QuietTest if sys.version_info[0] >= 3: xrange = range @@ -145,6 +145,16 @@ class WorkerTests(ReusedPySparkTestCase): self.sc.pythonVer = version +class WorkerReuseTest(PySparkTestCase): + +def test_reuse_worker_of_parallelize_xrange(self): +rdd = self.sc.parallelize(xrange(20), 8) +previous_pids = rdd.map(lambda x: os.getpid()).collect() +current_pids = rdd.map(lambda x: os.getpid()).collect() +for pid in current_pids: +self.assertTrue(pid in previous_pids) + + if __name__ == "__main__": import unittest from pyspark.tests.test_worker import * - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 6277a9f [SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat 6277a9f is described below commit 6277a9f8f9e8f024110056c8d12eb7d205d6d1f4 Author: Gengliang Wang AuthorDate: Wed Jan 9 10:18:33 2019 +0800 [SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat ## What changes were proposed in this pull request? Currently Spark table maintains Hive catalog storage format, so that Hive client can read it. In `HiveSerDe.scala`, Spark uses a mapping from its data source to HiveSerde. The mapping is old, we need to update with latest canonical name of Parquet and Orc FileFormat. Otherwise the following queries will result in wrong Serde value in Hive table(default value `org.apache.hadoop.mapred.SequenceFileInputFormat`), and Hive client will fail to read the output table: ``` df.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").saveAsTable(..) ``` ``` df.write.format("org.apache.spark.sql.execution.datasources.orc.OrcFileFormat").saveAsTable(..) ``` This minor PR is to fix the mapping. ## How was this patch tested? Unit test. Closes #23491 from gengliangwang/fixHiveSerdeMap. Authored-by: Gengliang Wang Signed-off-by: Wenchen Fan (cherry picked from commit 311f32f37fbeaebe9dfa0b8dc2a111ee99b583b7) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/internal/HiveSerDe.scala | 2 ++ .../spark/sql/hive/HiveMetastoreCatalogSuite.scala | 18 ++ .../spark/sql/hive/orc/HiveOrcSourceSuite.scala| 29 -- 3 files changed, 20 insertions(+), 29 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala index eca612f..bd25a64 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala @@ -74,8 +74,10 @@ object HiveSerDe { def sourceToSerDe(source: String): Option[HiveSerDe] = { val key = source.toLowerCase(Locale.ROOT) match { case s if s.startsWith("org.apache.spark.sql.parquet") => "parquet" + case s if s.startsWith("org.apache.spark.sql.execution.datasources.parquet") => "parquet" case s if s.startsWith("org.apache.spark.sql.orc") => "orc" case s if s.startsWith("org.apache.spark.sql.hive.orc") => "orc" + case s if s.startsWith("org.apache.spark.sql.execution.datasources.orc") => "orc" case s if s.equals("orcfile") => "orc" case s if s.equals("parquetfile") => "parquet" case s if s.equals("avrofile") => "avro" diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala index 688b619..5c9261c 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala @@ -159,10 +159,28 @@ class DataSourceWithHiveMetastoreCatalogSuite "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe" )), +"org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat" -> (( + "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", + "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", + "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe" +)), + "orc" -> (( "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat", "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat", "org.apache.hadoop.hive.ql.io.orc.OrcSerde" +)), + +"org.apache.spark.sql.hive.orc" -> (( + "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcSerde" +)), + +"org.apache.spark.sql.execution.datasources.orc.OrcFileFormat" -> (( + "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcSerde" )) ).foreach { case (provider, (inputFormat, outputFormat, serde)) => test(s"Persist non-partitioned $provider relation into metastore as managed table") { diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala index c1ae2f6..c0bf181 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/or
[spark] branch master updated: [SPARK-26529] Add debug logs for confArchive when preparing local resource
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eb42bb49 [SPARK-26529] Add debug logs for confArchive when preparing local resource eb42bb49 is described below commit eb42bb493b1d7c79e9516660b71aec66bdde5d51 Author: Liupengcheng AuthorDate: Wed Jan 9 10:39:25 2019 +0800 [SPARK-26529] Add debug logs for confArchive when preparing local resource ## What changes were proposed in this pull request? Currently, `Client#createConfArchive` do not handle IOException, and some detail info is not provided in logs. Sometimes, this may delay the time of locating the root cause of io error. This PR will add debug logs for confArchive when preparing local resource. ## How was this patch tested? unittest Closes #23444 from liupc/Add-logs-for-IOException-when-preparing-local-resource. Authored-by: Liupengcheng Signed-off-by: Hyukjin Kwon --- .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala| 1 + 1 file changed, 1 insertion(+) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index 9f09dc0..8492180 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -713,6 +713,7 @@ private[spark] class Client( new File(Utils.getLocalDir(sparkConf))) val confStream = new ZipOutputStream(new FileOutputStream(confArchive)) +logDebug(s"Creating an archive with the config files for distribution at $confArchive.") try { confStream.setLevel(0) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 311f32f [SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat 311f32f is described below commit 311f32f37fbeaebe9dfa0b8dc2a111ee99b583b7 Author: Gengliang Wang AuthorDate: Wed Jan 9 10:18:33 2019 +0800 [SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat ## What changes were proposed in this pull request? Currently Spark table maintains Hive catalog storage format, so that Hive client can read it. In `HiveSerDe.scala`, Spark uses a mapping from its data source to HiveSerde. The mapping is old, we need to update with latest canonical name of Parquet and Orc FileFormat. Otherwise the following queries will result in wrong Serde value in Hive table(default value `org.apache.hadoop.mapred.SequenceFileInputFormat`), and Hive client will fail to read the output table: ``` df.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").saveAsTable(..) ``` ``` df.write.format("org.apache.spark.sql.execution.datasources.orc.OrcFileFormat").saveAsTable(..) ``` This minor PR is to fix the mapping. ## How was this patch tested? Unit test. Closes #23491 from gengliangwang/fixHiveSerdeMap. Authored-by: Gengliang Wang Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/internal/HiveSerDe.scala | 2 ++ .../spark/sql/hive/HiveMetastoreCatalogSuite.scala | 18 ++ .../spark/sql/hive/orc/HiveOrcSourceSuite.scala| 29 -- 3 files changed, 20 insertions(+), 29 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala index eca612f..bd25a64 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala @@ -74,8 +74,10 @@ object HiveSerDe { def sourceToSerDe(source: String): Option[HiveSerDe] = { val key = source.toLowerCase(Locale.ROOT) match { case s if s.startsWith("org.apache.spark.sql.parquet") => "parquet" + case s if s.startsWith("org.apache.spark.sql.execution.datasources.parquet") => "parquet" case s if s.startsWith("org.apache.spark.sql.orc") => "orc" case s if s.startsWith("org.apache.spark.sql.hive.orc") => "orc" + case s if s.startsWith("org.apache.spark.sql.execution.datasources.orc") => "orc" case s if s.equals("orcfile") => "orc" case s if s.equals("parquetfile") => "parquet" case s if s.equals("avrofile") => "avro" diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala index 688b619..5c9261c 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala @@ -159,10 +159,28 @@ class DataSourceWithHiveMetastoreCatalogSuite "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe" )), +"org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat" -> (( + "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", + "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", + "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe" +)), + "orc" -> (( "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat", "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat", "org.apache.hadoop.hive.ql.io.orc.OrcSerde" +)), + +"org.apache.spark.sql.hive.orc" -> (( + "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcSerde" +)), + +"org.apache.spark.sql.execution.datasources.orc.OrcFileFormat" -> (( + "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat", + "org.apache.hadoop.hive.ql.io.orc.OrcSerde" )) ).foreach { case (provider, (inputFormat, outputFormat, serde)) => test(s"Persist non-partitioned $provider relation into metastore as managed table") { diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala index 7fefaf5..c46512b 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala @@
svn commit: r31820 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_08_12_33-32515d2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 8 20:46:06 2019 New Revision: 31820 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_08_12_33-32515d2 docs [This commit notification would consist of 1775 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26349][PYSPARK] Forbid insecure py4j gateways
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32515d2 [SPARK-26349][PYSPARK] Forbid insecure py4j gateways 32515d2 is described below commit 32515d205a4de4d8838226fa5e5c4e4f66935193 Author: Imran Rashid AuthorDate: Tue Jan 8 11:26:36 2019 -0800 [SPARK-26349][PYSPARK] Forbid insecure py4j gateways Spark always creates secure py4j connections between java and python, but it also allows users to pass in their own connection. This ensures that even passed in connections are secure. Added test cases verifying the failure with a (mocked) insecure gateway. This is closely related to SPARK-26019, but this entirely forbids the insecure connection, rather than creating the "escape-hatch". Closes #23441 from squito/SPARK-26349. Authored-by: Imran Rashid Signed-off-by: Bryan Cutler --- python/pyspark/context.py| 5 + python/pyspark/tests/test_context.py | 10 ++ 2 files changed, 15 insertions(+) diff --git a/python/pyspark/context.py b/python/pyspark/context.py index 6137ed2..64178eb 100644 --- a/python/pyspark/context.py +++ b/python/pyspark/context.py @@ -115,6 +115,11 @@ class SparkContext(object): ValueError:... """ self._callsite = first_spark_call() or CallSite(None, None, None) +if gateway is not None and gateway.gateway_parameters.auth_token is None: +raise ValueError( +"You are trying to pass an insecure Py4j gateway to Spark. This" +" is not allowed as it is a security risk.") + SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) try: self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, diff --git a/python/pyspark/tests/test_context.py b/python/pyspark/tests/test_context.py index 201baf4..18d9cd4 100644 --- a/python/pyspark/tests/test_context.py +++ b/python/pyspark/tests/test_context.py @@ -20,6 +20,7 @@ import tempfile import threading import time import unittest +from collections import namedtuple from pyspark import SparkFiles, SparkContext from pyspark.testing.utils import ReusedPySparkTestCase, PySparkTestCase, QuietTest, SPARK_HOME @@ -246,6 +247,15 @@ class ContextTests(unittest.TestCase): with SparkContext() as sc: self.assertGreater(sc.startTime, 0) +def test_forbid_insecure_gateway(self): +# Fail immediately if you try to create a SparkContext +# with an insecure gateway +parameters = namedtuple('MockGatewayParameters', 'auth_token')(None) +mock_insecure_gateway = namedtuple('MockJavaGateway', 'gateway_parameters')(parameters) +with self.assertRaises(ValueError) as context: +SparkContext(gateway=mock_insecure_gateway) +self.assertIn("insecure Py4j gateway", str(context.exception)) + if __name__ == "__main__": from pyspark.tests.test_context import * - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-24920][CORE] Allow sharing Netty's memory pool allocators
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e103c4a [SPARK-24920][CORE] Allow sharing Netty's memory pool allocators e103c4a is described below commit e103c4a5e72bab8862ff49d6d4c1e62e642fc412 Author: “attilapiros” AuthorDate: Tue Jan 8 13:11:11 2019 -0600 [SPARK-24920][CORE] Allow sharing Netty's memory pool allocators ## What changes were proposed in this pull request? Introducing shared polled ByteBuf allocators. This feature can be enabled via the "spark.network.sharedByteBufAllocators.enabled" configuration. When it is on then only two pooled ByteBuf allocators are created: - one for transport servers where caching is allowed and - one for transport clients where caching is disabled This way the cache allowance remains as before. Both shareable pools are created with numCores parameter set to 0 (which defaults to the available processors) as conf.serverThreads() and conf.clientThreads() are module dependant and the lazy creation of this allocators would lead to unpredicted behaviour. When "spark.network.sharedByteBufAllocators.enabled" is false then a new allocator is created for every transport client and server separately as was before this PR. ## How was this patch tested? Existing unit tests. Closes #23278 from attilapiros/SPARK-24920. Authored-by: “attilapiros” Signed-off-by: Sean Owen --- .../network/client/TransportClientFactory.java | 11 +++-- .../spark/network/server/TransportServer.java | 17 +--- .../org/apache/spark/network/util/NettyUtils.java | 48 ++ .../apache/spark/network/util/TransportConf.java | 18 .../spark/network/netty/SparkTransportConf.scala | 25 +-- docs/configuration.md | 10 + 6 files changed, 97 insertions(+), 32 deletions(-) diff --git a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java index 16d242d..a8e2715 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java +++ b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java @@ -84,7 +84,7 @@ public class TransportClientFactory implements Closeable { private final Class socketChannelClass; private EventLoopGroup workerGroup; - private PooledByteBufAllocator pooledAllocator; + private final PooledByteBufAllocator pooledAllocator; private final NettyMemoryMetrics metrics; public TransportClientFactory( @@ -103,8 +103,13 @@ public class TransportClientFactory implements Closeable { ioMode, conf.clientThreads(), conf.getModuleName() + "-client"); -this.pooledAllocator = NettyUtils.createPooledByteBufAllocator( - conf.preferDirectBufs(), false /* allowCache */, conf.clientThreads()); +if (conf.sharedByteBufAllocators()) { + this.pooledAllocator = NettyUtils.getSharedPooledByteBufAllocator( + conf.preferDirectBufsForSharedByteBufAllocators(), false /* allowCache */); +} else { + this.pooledAllocator = NettyUtils.createPooledByteBufAllocator( + conf.preferDirectBufs(), false /* allowCache */, conf.clientThreads()); +} this.metrics = new NettyMemoryMetrics( this.pooledAllocator, conf.getModuleName() + "-client", conf); } diff --git a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java index eb5f10a..a0ecde2 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java +++ b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java @@ -54,6 +54,7 @@ public class TransportServer implements Closeable { private ServerBootstrap bootstrap; private ChannelFuture channelFuture; private int port = -1; + private final PooledByteBufAllocator pooledAllocator; private NettyMemoryMetrics metrics; /** @@ -69,6 +70,13 @@ public class TransportServer implements Closeable { this.context = context; this.conf = context.getConf(); this.appRpcHandler = appRpcHandler; +if (conf.sharedByteBufAllocators()) { + this.pooledAllocator = NettyUtils.getSharedPooledByteBufAllocator( + conf.preferDirectBufsForSharedByteBufAllocators(), true /* allowCache */); +} else { + this.pooledAllocator = NettyUtils.createPooledByteBufAllocator( + conf.preferDirectBufs(), true /* allowCache */, conf.serverThreads()); +} this.bootstraps = Li
[spark] branch master updated: [SPARK-24522][UI] Create filter to apply HTTP security checks consistently.
This is an automated email from the ASF dual-hosted git repository. irashid pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2783e4c [SPARK-24522][UI] Create filter to apply HTTP security checks consistently. 2783e4c is described below commit 2783e4c45f55f4fc87748d1c4a454bfdf3024156 Author: Marcelo Vanzin AuthorDate: Tue Jan 8 11:25:33 2019 -0600 [SPARK-24522][UI] Create filter to apply HTTP security checks consistently. Currently there is code scattered in a bunch of places to do different things related to HTTP security, such as access control, setting security-related headers, and filtering out bad content. This makes it really easy to miss these things when writing new UI code. This change creates a new filter that does all of those things, and makes sure that all servlet handlers that are attached to the UI get the new filter and any user-defined filters consistently. The extent of the actual features should be the same as before. The new filter is added at the end of the filter chain, because authentication is done by custom filters and thus needs to happen first. This means that custom filters see unfiltered HTTP requests - which is actually the current behavior anyway. As a side-effect of some of the code refactoring, handlers added after the initial set also get wrapped with a GzipHandler, which didn't happen before. Tested with added unit tests and in a history server with SPNEGO auth configured. Closes #23302 from vanzin/SPARK-24522. Authored-by: Marcelo Vanzin Signed-off-by: Imran Rashid --- .../apache/spark/deploy/history/HistoryPage.scala | 5 +- .../spark/deploy/history/HistoryServer.scala | 8 +- .../spark/deploy/master/ui/ApplicationPage.scala | 3 +- .../apache/spark/deploy/master/ui/MasterPage.scala | 6 +- .../apache/spark/deploy/worker/ui/LogPage.scala| 28 ++-- .../spark/deploy/worker/ui/WorkerWebUI.scala | 1 - .../apache/spark/metrics/sink/MetricsServlet.scala | 2 +- .../spark/status/api/v1/SecurityFilter.scala | 36 - .../org/apache/spark/ui/HttpSecurityFilter.scala | 116 +++ .../scala/org/apache/spark/ui/JettyUtils.scala | 154 +--- .../main/scala/org/apache/spark/ui/UIUtils.scala | 21 --- .../src/main/scala/org/apache/spark/ui/WebUI.scala | 15 +- .../spark/ui/exec/ExecutorThreadDumpPage.scala | 4 +- .../org/apache/spark/ui/jobs/AllJobsPage.scala | 16 +-- .../scala/org/apache/spark/ui/jobs/JobPage.scala | 3 +- .../scala/org/apache/spark/ui/jobs/JobsTab.scala | 4 +- .../scala/org/apache/spark/ui/jobs/PoolPage.scala | 3 +- .../scala/org/apache/spark/ui/jobs/StagePage.scala | 19 ++- .../org/apache/spark/ui/jobs/StageTable.scala | 15 +- .../scala/org/apache/spark/ui/jobs/StagesTab.scala | 4 +- .../org/apache/spark/ui/storage/RDDPage.scala | 11 +- .../apache/spark/ui/HttpSecurityFilterSuite.scala | 157 + .../test/scala/org/apache/spark/ui/UISuite.scala | 147 +-- .../scala/org/apache/spark/ui/UIUtilsSuite.scala | 39 - .../apache/spark/deploy/mesos/ui/DriverPage.scala | 3 +- .../scheduler/cluster/YarnSchedulerBackend.scala | 35 - .../cluster/YarnSchedulerBackendSuite.scala| 59 +++- .../spark/sql/execution/ui/AllExecutionsPage.scala | 19 +-- .../spark/sql/execution/ui/ExecutionPage.scala | 3 +- .../thriftserver/ui/ThriftServerSessionPage.scala | 3 +- .../org/apache/spark/streaming/ui/BatchPage.scala | 8 +- 31 files changed, 609 insertions(+), 338 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala index 00ca4ef..7a8ab7f 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala @@ -27,9 +27,8 @@ import org.apache.spark.ui.{UIUtils, WebUIPage} private[history] class HistoryPage(parent: HistoryServer) extends WebUIPage("") { def render(request: HttpServletRequest): Seq[Node] = { -// stripXSS is called first to remove suspicious characters used in XSS attacks -val requestedIncomplete = - Option(UIUtils.stripXSS(request.getParameter("showIncomplete"))).getOrElse("false").toBoolean +val requestedIncomplete = Option(request.getParameter("showIncomplete")) + .getOrElse("false").toBoolean val displayApplications = parent.getApplicationList() .exists(isApplicationCompleted(_) != requestedIncomplete) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala index b930338
[spark] branch master updated (b711382 -> c101182b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b711382 [MINOR][WEBUI] Modify the name of the column named "shuffle spill" in the StagePage add c101182b [SPARK-26002][SQL] Fix day of year calculation for Julian calendar days No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/DateTimeUtils.scala| 56 +- .../sql/catalyst/util/DateTimeUtilsSuite.scala | 30 .../test/resources/sql-tests/inputs/datetime.sql | 2 + .../resources/sql-tests/results/datetime.sql.out | 10 +++- 4 files changed, 86 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (72a572f -> b711382)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 72a572f [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any add b711382 [MINOR][WEBUI] Modify the name of the column named "shuffle spill" in the StagePage No new revisions were added by this update. Summary of changes: core/src/main/resources/org/apache/spark/ui/static/stagepage.js | 8 .../resources/org/apache/spark/ui/static/stagespage-template.html | 8 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 8 3 files changed, 12 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31815 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_08_08_12-72a572f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 8 16:25:31 2019 New Revision: 31815 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_08_08_12-72a572f docs [This commit notification would consist of 1774 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 72a572f [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any 72a572f is described below commit 72a572ffd6e156243b13f9243ed296f6d77b4241 Author: Wenchen Fan AuthorDate: Tue Jan 8 22:44:33 2019 +0800 [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any ## What changes were proposed in this pull request? For Scala UDF, when checking input nullability, we will skip inputs with type `Any`, and only check the inputs that provide nullability info. We should do the same for checking input types. ## How was this patch tested? new tests Closes #23275 from cloud-fan/udf. Authored-by: Wenchen Fan Signed-off-by: Hyukjin Kwon --- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 13 +- .../spark/sql/catalyst/expressions/ScalaUDF.scala | 4 +- .../apache/spark/sql/types/AbstractDataType.scala | 2 +- .../org/apache/spark/sql/UDFRegistration.scala | 216 + .../sql/expressions/UserDefinedFunction.scala | 57 +++--- .../scala/org/apache/spark/sql/functions.scala | 52 +++-- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 15 ++ 7 files changed, 175 insertions(+), 184 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala index b19aa50..13cc9b9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala @@ -882,7 +882,18 @@ object TypeCoercion { case udf: ScalaUDF if udf.inputTypes.nonEmpty => val children = udf.children.zip(udf.inputTypes).map { case (in, expected) => - implicitCast(in, udfInputToCastType(in.dataType, expected)).getOrElse(in) + // Currently Scala UDF will only expect `AnyDataType` at top level, so this trick works. + // In the future we should create types like `AbstractArrayType`, so that Scala UDF can + // accept inputs of array type of arbitrary element type. + if (expected == AnyDataType) { +in + } else { +implicitCast( + in, + udfInputToCastType(in.dataType, expected.asInstanceOf[DataType]) +).getOrElse(in) + } + } udf.withNewChildren(children) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index a23aaa3..fae1119 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -21,7 +21,7 @@ import org.apache.spark.SparkException import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, ScalaReflection} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ -import org.apache.spark.sql.types.DataType +import org.apache.spark.sql.types.{AbstractDataType, DataType} /** * User-defined function. @@ -48,7 +48,7 @@ case class ScalaUDF( dataType: DataType, children: Seq[Expression], inputsNullSafe: Seq[Boolean], -inputTypes: Seq[DataType] = Nil, +inputTypes: Seq[AbstractDataType] = Nil, udfName: Option[String] = None, nullable: Boolean = true, udfDeterministic: Boolean = true) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala index 5367ce2..d2ef088 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala @@ -96,7 +96,7 @@ private[sql] object TypeCollection { /** * An `AbstractDataType` that matches any concrete data types. */ -protected[sql] object AnyDataType extends AbstractDataType { +protected[sql] object AnyDataType extends AbstractDataType with Serializable { // Note that since AnyDataType matches any concrete types, defaultConcreteType should never // be invoked. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala index 5a3f556..fe5d1af 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala +++ b/sql/core/src/main/scala/org/apa