[spark] branch master updated: [SPARK-27470][PYSPARK] Update pyrolite to 4.23

2019-04-16 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8718367  [SPARK-27470][PYSPARK] Update pyrolite to 4.23
8718367 is described below

commit 8718367e2e739f1ed82997b9f4a1298b7a1c4e49
Author: Sean Owen 
AuthorDate: Tue Apr 16 19:41:40 2019 +0900

[SPARK-27470][PYSPARK] Update pyrolite to 4.23

## What changes were proposed in this pull request?

 Update pyrolite to 4.23 to pick up bug and security fixes.

## How was this patch tested?

Existing tests.

Closes #24381 from srowen/SPARK-27470.

Authored-by: Sean Owen 
Signed-off-by: HyukjinKwon 
---
 core/pom.xml   | 2 +-
 dev/deps/spark-deps-hadoop-2.7 | 2 +-
 dev/deps/spark-deps-hadoop-3.2 | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/core/pom.xml b/core/pom.xml
index 45bda44..9d57028 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -347,7 +347,7 @@
 
   net.razorvine
   pyrolite
-  4.13
+  4.23
   
 
   net.razorvine
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 00dc2ce..8ae59cc 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -170,7 +170,7 @@ parquet-hadoop-bundle-1.6.0.jar
 parquet-jackson-1.10.1.jar
 protobuf-java-2.5.0.jar
 py4j-0.10.8.1.jar
-pyrolite-4.13.jar
+pyrolite-4.23.jar
 scala-compiler-2.12.8.jar
 scala-library-2.12.8.jar
 scala-parser-combinators_2.12-1.1.0.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2 b/dev/deps/spark-deps-hadoop-3.2
index 97085d6..bbb0d73 100644
--- a/dev/deps/spark-deps-hadoop-3.2
+++ b/dev/deps/spark-deps-hadoop-3.2
@@ -191,7 +191,7 @@ parquet-hadoop-bundle-1.6.0.jar
 parquet-jackson-1.10.1.jar
 protobuf-java-2.5.0.jar
 py4j-0.10.8.1.jar
-pyrolite-4.13.jar
+pyrolite-4.23.jar
 re2j-1.1.jar
 scala-compiler-2.12.8.jar
 scala-library-2.12.8.jar


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-16 Thread GitBox
srowen commented on a change in pull request #194: Remove links to dead orgs / 
meetups; fix some broken links
URL: https://github.com/apache/spark-website/pull/194#discussion_r275795086
 
 

 ##
 File path: powered-by.md
 ##
 @@ -170,9 +160,8 @@ across all screens
   - PanTera is a tool for exploring large datasets. It uses Spark to create XY 
and geographic 
   scatterplots from millions to billions of datapoints.
   - Components we are using: Spark Core (Scala API), Spark SQL, and GraphX
-- http://www.peerialism.com";>Peerialism
 - http://www.planbmedia.com";>PlanBMedia
-- http://prediction.io/";>PredicitionIo
+- http://predictionio.apache.org/index.html/";>Apache PredicitionIo
 
 Review comment:
   There may not be much point, yeah, in SNAPSHOT artifacts. I don't know how 
to shut them off -- @shaneknapp is that just a Jenkins job? The argument for 
turning it off is that it has been somewhat controversial in the past to 
provide any binaries from the project that aren't PMC releases. It's allowed if 
heavily signposted. But, maybe little point in dumping 1GB of artifacts per 
night into the repo.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27397][CORE] Take care of OpenJ9 JVM in Spark

2019-04-16 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 257d01a  [SPARK-27397][CORE] Take care of OpenJ9 JVM in Spark
257d01a is described below

commit 257d01a6b8241c244b932104d2e7c6817d530fd9
Author: Kazuaki Ishizaki 
AuthorDate: Tue Apr 16 09:11:47 2019 -0500

[SPARK-27397][CORE] Take care of OpenJ9 JVM in Spark

## What changes were proposed in this pull request?

This PR supports `OpenJ9` in addition to `IBM JDK` and `OpenJDK` in Spark 
by handling `System.getProperty("java.vendor") = "Eclipse OpenJ9"`.

In `inferDefaultMemory()` and `getKrb5LoginModuleName()`, this PR uses non 
`IBM` way.

```
$ ~/jdk-11.0.2+9_openj9-0.12.1/bin/jshell
|  Welcome to JShell -- Version 11.0.2
|  For an introduction type: /help intro

jshell> System.out.println(System.getProperty("java.vendor"))
Eclipse OpenJ9

jshell> System.out.println(System.getProperty("java.vm.info"))
JRE 11 Linux amd64-64-Bit Compressed References 20190204_127 (JIT enabled, 
AOT enabled)
OpenJ9   - 90dd8cb40
OMR  - d2f4534b
JCL  - 289c70b6844 based on jdk-11.0.2+9

jshell> 
System.out.println(Class.forName("com.ibm.lang.management.OperatingSystemMXBean").getDeclaredMethod("getTotalPhysicalMemory"))
public abstract long 
com.ibm.lang.management.OperatingSystemMXBean.getTotalPhysicalMemory()

jshell> 
System.out.println(Class.forName("com.sun.management.OperatingSystemMXBean").getDeclaredMethod("getTotalPhysicalMemorySize"))
public abstract long 
com.sun.management.OperatingSystemMXBean.getTotalPhysicalMemorySize()

jshell> 
System.out.println(Class.forName("com.ibm.security.auth.module.Krb5LoginModule"))
|  Exception java.lang.ClassNotFoundException: 
com.ibm.security.auth.module.Krb5LoginModule
|at Class.forNameImpl (Native Method)
|at Class.forName (Class.java:339)
|at (#1:1)

jshell> 
System.out.println(Class.forName("com.sun.security.auth.module.Krb5LoginModule"))
class com.sun.security.auth.module.Krb5LoginModule
```

## How was this patch tested?

Existing test suites
Manual testing with OpenJ9.

Closes #24308 from kiszk/SPARK-27397.

Authored-by: Kazuaki Ishizaki 
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/util/SizeEstimator.scala  |  5 +++--
 .../apache/spark/launcher/CommandBuilderUtils.java   | 20 
 2 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala 
b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
index 4837b01..e09f1fc 100644
--- a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
+++ b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
@@ -131,8 +131,9 @@ object SizeEstimator extends Logging {
   return System.getProperty(TEST_USE_COMPRESSED_OOPS_KEY).toBoolean
 }
 
-// java.vm.info provides compressed ref info for IBM JDKs
-if (System.getProperty("java.vendor").contains("IBM")) {
+// java.vm.info provides compressed ref info for IBM and OpenJ9 JDKs
+val javaVendor = System.getProperty("java.vendor")
+if (javaVendor.contains("IBM") || javaVendor.contains("OpenJ9")) {
   return System.getProperty("java.vm.info").contains("Compressed Ref")
 }
 
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java 
b/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java
index 47d2f8e..172fb8c 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java
@@ -31,11 +31,6 @@ class CommandBuilderUtils {
   static final String DEFAULT_PROPERTIES_FILE = "spark-defaults.conf";
   static final String ENV_SPARK_HOME = "SPARK_HOME";
 
-  /** The set of known JVM vendors. */
-  enum JavaVendor {
-Oracle, IBM, OpenJDK, Unknown
-  }
-
   /** Returns whether the given string is null or empty. */
   static boolean isEmpty(String s) {
 return s == null || s.isEmpty();
@@ -112,21 +107,6 @@ class CommandBuilderUtils {
 return os.startsWith("Windows");
   }
 
-  /** Returns an enum value indicating whose JVM is being used. */
-  static JavaVendor getJavaVendor() {
-String vendorString = System.getProperty("java.vendor");
-if (vendorString.contains("Oracle")) {
-  return JavaVendor.Oracle;
-}
-if (vendorString.contains("IBM")) {
-  return JavaVendor.IBM;
-}
-if (vendorString.contains("OpenJDK")) {
-  return JavaVendor.OpenJDK;
-}
-return JavaVendor.Unknown;
-  }
-
   /**
* Updates the user environment, appending the given pathList to the 
existing value of the given
* e

[spark-website] branch asf-site updated: Remove links to dead orgs / meetups; fix some broken links

2019-04-16 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 814b3d0  Remove links to dead orgs / meetups; fix some broken links
814b3d0 is described below

commit 814b3d05f6f3379a749ae78bd450024d594fb385
Author: Sean Owen 
AuthorDate: Tue Apr 16 09:16:37 2019 -0500

Remove links to dead orgs / meetups; fix some broken links

Author: Sean Owen 

Closes #194 from srowen/BrokenLinks.
---
 community.md   | 21 -
 developer-tools.md | 17 -
 documentation.md   |  2 +-
 downloads.md   |  4 ++--
 examples.md|  2 +-
 index.md   |  2 +-
 powered-by.md  | 25 -
 release-process.md |  4 ++--
 site/community.html| 21 -
 site/developer-tools.html  | 19 ---
 site/documentation.html|  2 +-
 site/downloads.html|  4 ++--
 site/examples.html |  2 +-
 site/index.html|  2 +-
 site/powered-by.html   | 31 ---
 site/release-process.html  |  4 ++--
 site/third-party-projects.html |  5 ++---
 site/trademarks.html   |  2 +-
 third-party-projects.md|  5 ++---
 trademarks.md  |  2 +-
 20 files changed, 36 insertions(+), 140 deletions(-)

diff --git a/community.md b/community.md
index 39e1a73..58c1ee2 100644
--- a/community.md
+++ b/community.md
@@ -139,33 +139,18 @@ Spark Meetups are grass-roots events organized and hosted 
by individuals in the
 https://www.meetup.com/Spark_big_data_analytics/";>Bangalore Spark 
Meetup
   
   
-https://www.meetup.com/Berlin-Apache-Spark-Meetup/";>Berlin Spark 
Meetup
-  
-  
-https://www.meetup.com/spark-user-beijing-Meetup/";>Beijing Spark 
Meetup
-  
-  
 https://www.meetup.com/Boston-Apache-Spark-User-Group/";>Boston 
Spark Meetup
   
   
 https://www.meetup.com/Boulder-Denver-Spark-Meetup/";>Boulder/Denver Spark 
Meetup
   
   
-https://www.meetup.com/Chicago-Spark-Users/";>Chicago Spark 
Users
-  
-  
 https://www.meetup.com/Christchurch-Apache-Spark-Meetup/";>Christchurch 
Apache Spark Meetup
   
   
-https://www.meetup.com/Cincinnati-Apache-Spark-Meetup/";>Cincinanati 
Apache Spark Meetup
-  
-  
 https://www.meetup.com/Hangzhou-Apache-Spark-Meetup/";>Hangzhou 
Spark Meetup
   
   
-https://www.meetup.com/Spark-User-Group-Hyderabad/";>Hyderabad 
Spark Meetup
-  
-  
 https://www.meetup.com/israel-spark-users/";>Israel Spark Users
   
   
@@ -196,9 +181,6 @@ Spark Meetups are grass-roots events organized and hosted 
by individuals in the
 https://www.meetup.com/Shenzhen-Apache-Spark-Meetup/";>Shenzhen 
Spark Meetup
   
   
-https://www.meetup.com/Toronto-Apache-Spark";>Toronto Apache 
Spark
-  
-  
 https://www.meetup.com/Tokyo-Spark-Meetup/";>Tokyo Spark Meetup
   
   
@@ -207,9 +189,6 @@ Spark Meetups are grass-roots events organized and hosted 
by individuals in the
   
 https://www.meetup.com/Washington-DC-Area-Spark-Interactive/";>Washington 
DC Area Spark Meetup
   
-  
-https://www.meetup.com/Apache-Spark-Zagreb-Meetup/";>Zagreb Spark 
Meetup
-  
 
 
 If you'd like your meetup or conference added, please email mailto:u...@spark.apache.org";>u...@spark.apache.org.
diff --git a/developer-tools.md b/developer-tools.md
index 00d57cd..29a9f92 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -110,7 +110,7 @@ If you'd prefer, you can run all of these commands on the 
command line (but this
 $ build/sbt "core/testOnly *DAGSchedulerSuite -- -z SPARK-12345"
 ```
 
-For more about how to run individual tests with sbt, see the [sbt 
documentation](http://www.scala-sbt.org/0.13/docs/Testing.html).
+For more about how to run individual tests with sbt, see the [sbt 
documentation](https://www.scala-sbt.org/0.13/docs/Testing.html).
 
 Testing with Maven
 
@@ -463,16 +463,7 @@ in the Eclipse install directory. Increase the following 
setting as needed:
 
 Nightly Builds
 
-Packages are built regularly off of Spark's master branch and release 
branches. These provide 
-Spark developers access to the bleeding-edge of Spark master or the most 
recent fixes not yet 
-incorporated into a maintenance release. These should only be used by Spark 
developers, as they 
-may have bugs and have not undergone the same level of testing as releases. 
Spark nightly packages 
-are available at:
-
-- Latest master build: https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest";>https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest
-- All nightly builds: https://people.apache.org/~pwendell/spark-nightly/";>https://people.apache.org/~pwendell/spark

[GitHub] [spark-website] srowen closed pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-16 Thread GitBox
srowen closed pull request #194: Remove links to dead orgs / meetups; fix some 
broken links
URL: https://github.com/apache/spark-website/pull/194
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27464][CORE] Added Constant instead of referring string literal used from many places

2019-04-16 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 88d9de2  [SPARK-27464][CORE] Added Constant instead of referring 
string literal used from many places
88d9de2 is described below

commit 88d9de26dda1c91132fd909b0995492388dc5fac
Author: shivusondur 
AuthorDate: Tue Apr 16 09:30:46 2019 -0500

[SPARK-27464][CORE] Added Constant instead of referring string literal used 
from many places

## What changes were proposed in this pull request?

Added Constant instead of referring the same String literal 
"spark.buffer.pageSize" from many places
## How was this patch tested?
Run the corresponding Unit Test Cases manually.

Closes #24368 from shivusondur/Constant.

Authored-by: shivusondur 
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 6 ++
 core/src/main/scala/org/apache/spark/memory/MemoryManager.scala| 2 +-
 .../org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java| 2 +-
 .../util/collection/unsafe/sort/UnsafeExternalSorterSuite.java | 3 ++-
 .../org/apache/spark/sql/execution/joins/HashedRelation.scala  | 7 +++
 5 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 0bd46be..8e59ce7 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1303,4 +1303,10 @@ package object config {
 .doc("Staging directory used while submitting applications.")
 .stringConf
 .createOptional
+
+  private[spark] val BUFFER_PAGESIZE = ConfigBuilder("spark.buffer.pageSize")
+.doc("The amount of memory used per page in bytes")
+.bytesConf(ByteUnit.BYTE)
+.createOptional
+
 }
diff --git a/core/src/main/scala/org/apache/spark/memory/MemoryManager.scala 
b/core/src/main/scala/org/apache/spark/memory/MemoryManager.scala
index ff6d84b..c08b47f 100644
--- a/core/src/main/scala/org/apache/spark/memory/MemoryManager.scala
+++ b/core/src/main/scala/org/apache/spark/memory/MemoryManager.scala
@@ -255,7 +255,7 @@ private[spark] abstract class MemoryManager(
 }
 val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / 
safetyFactor)
 val default = math.min(maxPageSize, math.max(minPageSize, size))
-conf.getSizeAsBytes("spark.buffer.pageSize", default)
+conf.get(BUFFER_PAGESIZE).getOrElse(default)
   }
 
   /**
diff --git 
a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
 
b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
index 9bf707f..88125a6 100644
--- 
a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
+++ 
b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
@@ -101,7 +101,7 @@ public class UnsafeShuffleWriterSuite {
 partitionSizesInMergedFile = null;
 spillFilesCreated.clear();
 conf = new SparkConf()
-  .set("spark.buffer.pageSize", "1m")
+  .set(package$.MODULE$.BUFFER_PAGESIZE().key(), "1m")
   .set(package$.MODULE$.MEMORY_OFFHEAP_ENABLED(), false);
 taskMetrics = new TaskMetrics();
 memoryManager = new TestMemoryManager(conf);
diff --git 
a/core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
 
b/core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
index dd71d32..c6aa623 100644
--- 
a/core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
+++ 
b/core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
@@ -88,7 +88,8 @@ public class UnsafeExternalSorterSuite {
 
   protected boolean shouldUseRadixSort() { return false; }
 
-  private final long pageSizeBytes = 
conf.getSizeAsBytes("spark.buffer.pageSize", "4m");
+  private final long pageSizeBytes = conf.getSizeAsBytes(
+  package$.MODULE$.BUFFER_PAGESIZE().key(), "4m");
 
   private final int spillThreshold =
 (int) 
conf.get(package$.MODULE$.SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD());
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
index b03e8f5..9d8063d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
@@ -23,7 +23,7 @@ import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
 import com.esotericsoftware.kryo.io.{Input, Output}
 
 import org.apache.spark.{SparkCo

[spark] branch master updated: [SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9

2019-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a8f20c9  [SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9
a8f20c9 is described below

commit a8f20c95ab602fcda0ac68d21d7cc112bdfbdadf
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 16 08:54:16 2019 -0700

[SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9

## What changes were proposed in this pull request?

This PR aims to update `zstd-jni` from 1.3.2-2 to 1.3.8-9 to be aligned 
with the latest Zstd 1.3.8 in Apache Spark 3.0.0. Currently, Apache Spark is 
aligned with the old Zstd used in the first PR and there are many bugfix and 
improvement updates in `zstd-jni` until now.
- https://github.com/facebook/zstd/releases/tag/v1.3.8
- https://github.com/facebook/zstd/releases/tag/v1.3.7
- https://github.com/facebook/zstd/releases/tag/v1.3.6
- https://github.com/facebook/zstd/releases/tag/v1.3.4
- https://github.com/facebook/zstd/releases/tag/v1.3.3

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #24364 from dongjoon-hyun/SPARK-ZSTD.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2.7 | 2 +-
 dev/deps/spark-deps-hadoop-3.2 | 2 +-
 pom.xml| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 8ae59cc..1386349 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -198,4 +198,4 @@ xmlenc-0.52.jar
 xz-1.5.jar
 zjsonpatch-0.3.0.jar
 zookeeper-3.4.6.jar
-zstd-jni-1.3.2-2.jar
+zstd-jni-1.3.8-9.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2 b/dev/deps/spark-deps-hadoop-3.2
index bbb0d73..961f65d 100644
--- a/dev/deps/spark-deps-hadoop-3.2
+++ b/dev/deps/spark-deps-hadoop-3.2
@@ -220,4 +220,4 @@ xbean-asm7-shaded-4.12.jar
 xz-1.5.jar
 zjsonpatch-0.3.0.jar
 zookeeper-3.4.13.jar
-zstd-jni-1.3.2-2.jar
+zstd-jni-1.3.8-9.jar
diff --git a/pom.xml b/pom.xml
index 449b426..6a0651e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -584,7 +584,7 @@
   
 com.github.luben
 zstd-jni
-1.3.2-2
+1.3.8-9
   
   
 com.clearspring.analytics


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27467][BUILD][TEST-MAVEN] Upgrade Maven to 3.6.1

2019-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c4a643  [SPARK-27467][BUILD][TEST-MAVEN] Upgrade Maven to 3.6.1
7c4a643 is described below

commit 7c4a6439d6fc8ec0b47914603b68cff4ce6d0cfc
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 16 08:55:27 2019 -0700

[SPARK-27467][BUILD][TEST-MAVEN] Upgrade Maven to 3.6.1

## What changes were proposed in this pull request?

This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like 
[MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full 
release note, please see the following.
- https://maven.apache.org/docs/3.6.1/release-notes.html

## How was this patch tested?

Pass the Jenkins with `[test-maven]` tag.

Closes #24377 from dongjoon-hyun/SPARK-27467.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 6a0651e..55f9e56 100644
--- a/pom.xml
+++ b/pom.xml
@@ -115,7 +115,7 @@
 1.8
 ${java.version}
 ${java.version}
-3.6.0
+3.6.1
 spark
 1.7.16
 1.2.17


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #195: 
[SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275874116
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. If you saw error like
+``` 
+2019-04-14 16:05:24,796 [ 314609]   INFO -  #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
+Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0.
 
 Review comment:
   Hi, @William1104 . Could you remove this verbose error message dump? 
   In general, it's not a good idea to include the whole error message. This 
message is already outdated because Apache Spark already moves to Maven 3.6.1 
today. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #195: 
[SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275874492
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. If you saw error like
+``` 
+2019-04-14 16:05:24,796 [ 314609]   INFO -  #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
+Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0.
 
 Review comment:
   Opps. It's already updated. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on a change in pull request #195: 
[SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275874545
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. If you saw error like
+``` 
+2019-04-14 16:05:24,796 [ 314609]   INFO -  #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
+Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0.
 
 Review comment:
   Yes. Agree and i am going to remove it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on a change in pull request #195: [SPARK-27458][DOC] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275874698
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
 
 Review comment:
   Thanks for the review. Yes. I am going to change it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on a change in pull request #195: [SPARK-27458][DOC] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275875016
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. If you saw error like
+``` 
+2019-04-14 16:05:24,796 [ 314609]   INFO -  #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
+Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0.
+2019-04-14 16:05:24,813 [ 314626]   INFO -  #org.jetbrains.idea.maven - 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed.
+java.lang.RuntimeException: 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed.
+``` 
+in IntelliJ log file (`Help -> Show Log in Finder/Explorer`), you should reset 
the maven home directory 
 
 Review comment:
   Yes. I am going to remove the verbose logs. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on issue #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on issue #195: [SPARK-27458][DOC] remind developers to 
reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#issuecomment-483725696
 
 
   Hi @dongjoon-hyun  and @srowen, thanks for those suggestion. I just update 
the PR accordingly. I hope it looks better. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
srowen commented on a change in pull request #195: [SPARK-27458][DOC] remind 
developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275882487
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,12 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- The version of Maven bundled with IntelliJ may not be new enough for Spark. 
If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. 
+Please remember to reset the Maven home directory 
+(`Preference -> Build, Execution, Deployment -> Maven -> Maven home 
directory`) of your project to the 
+version is new enough.
 
 Review comment:
   "the version is new enough" -> "to point to a newer installation of Maven".
   
   I might also tell people to install the latest version of Maven locally, or 
use the copy that Spark downloads into `build/`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on a change in pull request #195: [SPARK-27458][DOC] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275901625
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,12 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- The version of Maven bundled with IntelliJ may not be new enough for Spark. 
If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. 
+Please remember to reset the Maven home directory 
+(`Preference -> Build, Execution, Deployment -> Maven -> Maven home 
directory`) of your project to the 
+version is new enough.
 
 Review comment:
   Thanks. Just updated. I prefer suggesting ppl using the copy Spark downloads 
at `build/`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on a change in pull request #195: [SPARK-27458][DOC] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275901625
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,12 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- The version of Maven bundled with IntelliJ may not be new enough for Spark. 
If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. 
+Please remember to reset the Maven home directory 
+(`Preference -> Build, Execution, Deployment -> Maven -> Maven home 
directory`) of your project to the 
+version is new enough.
 
 Review comment:
   Thanks. Just updated. I prefer suggesting ppl using the copy Spark downloads 
at `build/`. Worry that the latest version may not work all the time. For 
example, Maven 4 may not be backward compatible with Maven 3. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-16 Thread GitBox
shaneknapp commented on a change in pull request #194: Remove links to dead 
orgs / meetups; fix some broken links
URL: https://github.com/apache/spark-website/pull/194#discussion_r275910333
 
 

 ##
 File path: powered-by.md
 ##
 @@ -170,9 +160,8 @@ across all screens
   - PanTera is a tool for exploring large datasets. It uses Spark to create XY 
and geographic 
   scatterplots from millions to billions of datapoints.
   - Components we are using: Spark Core (Scala API), Spark SQL, and GraphX
-- http://www.peerialism.com";>Peerialism
 - http://www.planbmedia.com";>PlanBMedia
-- http://prediction.io/";>PredicitionIo
+- http://predictionio.apache.org/index.html/";>Apache PredicitionIo
 
 Review comment:
   we are currently not publishing any snapshot builds except for docs...  i 
can easily shut these builds off.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-16 Thread GitBox
srowen commented on a change in pull request #194: Remove links to dead orgs / 
meetups; fix some broken links
URL: https://github.com/apache/spark-website/pull/194#discussion_r275913516
 
 

 ##
 File path: powered-by.md
 ##
 @@ -170,9 +160,8 @@ across all screens
   - PanTera is a tool for exploring large datasets. It uses Spark to create XY 
and geographic 
   scatterplots from millions to billions of datapoints.
   - Components we are using: Spark Core (Scala API), Spark SQL, and GraphX
-- http://www.peerialism.com";>Peerialism
 - http://www.planbmedia.com";>PlanBMedia
-- http://prediction.io/";>PredicitionIo
+- http://predictionio.apache.org/index.html/";>Apache PredicitionIo
 
 Review comment:
   The docs build probably isn't useful and just filling up dist, though that's 
not what I was referencing here. It's: 
https://repository.apache.org/snapshots/org/apache/spark and while that seems 
to down now, I did see recent snapshots of the artifacts. However @gatorsmile 
and others note that they may be useful, so let's keep whatever it is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] shaneknapp commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-16 Thread GitBox
shaneknapp commented on a change in pull request #194: Remove links to dead 
orgs / meetups; fix some broken links
URL: https://github.com/apache/spark-website/pull/194#discussion_r275914668
 
 

 ##
 File path: powered-by.md
 ##
 @@ -170,9 +160,8 @@ across all screens
   - PanTera is a tool for exploring large datasets. It uses Spark to create XY 
and geographic 
   scatterplots from millions to billions of datapoints.
   - Components we are using: Spark Core (Scala API), Spark SQL, and GraphX
-- http://www.peerialism.com";>Peerialism
 - http://www.planbmedia.com";>PlanBMedia
-- http://prediction.io/";>PredicitionIo
+- http://predictionio.apache.org/index.html/";>Apache PredicitionIo
 
 Review comment:
   kk


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
srowen commented on a change in pull request #195: [SPARK-27458][DOC] remind 
developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275914710
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,13 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- The version of Maven bundled with IntelliJ may not be new enough for Spark. 
If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. 
+Please remember to reset the Maven home directory 
+(`Preference -> Build, Execution, Deployment -> Maven -> Maven home 
directory`) of your project to 
+point to a newer installation of Maven. You may also build Spark with script 
`build/mvn` first.
 
 Review comment:
   with script -> with the script
   install a proper Maven -> install a recent version of Maven
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27476][SQL] Refactoring SchemaPruning rule to remove duplicate code

2019-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b404e02  [SPARK-27476][SQL] Refactoring SchemaPruning rule to remove 
duplicate code
b404e02 is described below

commit b404e02574084c5ab550ce8716d4177464e7ce8c
Author: Liang-Chi Hsieh 
AuthorDate: Tue Apr 16 14:50:37 2019 -0700

[SPARK-27476][SQL] Refactoring SchemaPruning rule to remove duplicate code

## What changes were proposed in this pull request?

In SchemaPruning rule, there is duplicate code for data source v1 and v2. 
Their logic is the same and we can refactor the rule to remove duplicate code.

## How was this patch tested?

Existing tests.

Closes #24383 from viirya/SPARK-27476.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|   2 +-
 .../sql/execution/datasources/SchemaPruning.scala  | 100 ++---
 2 files changed, 47 insertions(+), 55 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index f33cc86..3f59fa1 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1551,7 +1551,7 @@ object SQLConf {
   .internal()
   .doc("Prune nested fields from a logical relation's output which are 
unnecessary in " +
 "satisfying a query. This optimization allows columnar file format 
readers to avoid " +
-"reading unnecessary nested column data. Currently Parquet and ORC v1 
are the " +
+"reading unnecessary nested column data. Currently Parquet and ORC are 
the " +
 "data sources that implement this optimization.")
   .booleanConf
   .createWithDefault(false)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala
index 15fdf65..463ee9a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala
@@ -50,73 +50,65 @@ object SchemaPruning extends Rule[LogicalPlan] {
   case op @ PhysicalOperation(projects, filters,
   l @ LogicalRelation(hadoopFsRelation: HadoopFsRelation, _, _, _))
 if canPruneRelation(hadoopFsRelation) =>
-val (normalizedProjects, normalizedFilters) =
-  normalizeAttributeRefNames(l.output, projects, filters)
-val requestedRootFields = identifyRootFields(normalizedProjects, 
normalizedFilters)
-
-// If requestedRootFields includes a nested field, continue. Otherwise,
-// return op
-if (requestedRootFields.exists { root: RootField => 
!root.derivedFromAtt }) {
-  val dataSchema = hadoopFsRelation.dataSchema
-  val prunedDataSchema = pruneDataSchema(dataSchema, 
requestedRootFields)
-
-  // If the data schema is different from the pruned data schema, 
continue. Otherwise,
-  // return op. We effect this comparison by counting the number of 
"leaf" fields in
-  // each schemata, assuming the fields in prunedDataSchema are a 
subset of the fields
-  // in dataSchema.
-  if (countLeaves(dataSchema) > countLeaves(prunedDataSchema)) {
+
+prunePhysicalColumns(l.output, projects, filters, 
hadoopFsRelation.dataSchema,
+  prunedDataSchema => {
 val prunedHadoopRelation =
   hadoopFsRelation.copy(dataSchema = 
prunedDataSchema)(hadoopFsRelation.sparkSession)
-
-val prunedRelation = buildPrunedRelation(l, prunedHadoopRelation)
-val projectionOverSchema = ProjectionOverSchema(prunedDataSchema)
-
-buildNewProjection(normalizedProjects, normalizedFilters, 
prunedRelation,
-  projectionOverSchema)
-  } else {
-op
-  }
-} else {
-  op
-}
+buildPrunedRelation(l, prunedHadoopRelation)
+  }).getOrElse(op)
 
   case op @ PhysicalOperation(projects, filters,
   d @ DataSourceV2Relation(table: FileTable, output, _)) if 
canPruneTable(table) =>
-val (normalizedProjects, normalizedFilters) =
-  normalizeAttributeRefNames(output, projects, filters)
-val requestedRootFields = identifyRootFields(normalizedProjects, 
normalizedFilters)
-
-// If requestedRootFields includes a nested field, continue. Otherwise,
-// return op
-if (requestedRootFields.exists { root: RootField => 
!root.derivedFromAtt }) {
-  val dataSchema = table.dataSchema
-

[spark] branch master updated: [SPARK-27453] Pass partitionBy as options in DataFrameWriter

2019-04-16 Thread tdas
This is an automated email from the ASF dual-hosted git repository.

tdas pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 26ed65f  [SPARK-27453] Pass partitionBy as options in DataFrameWriter
26ed65f is described below

commit 26ed65f4150db1fa37f8bfab24ac0873d2e42936
Author: liwensun 
AuthorDate: Tue Apr 16 15:03:16 2019 -0700

[SPARK-27453] Pass partitionBy as options in DataFrameWriter

## What changes were proposed in this pull request?

Pass partitionBy columns as options and feature-flag this behavior.

## How was this patch tested?

A new unit test.

Closes #24365 from liwensun/partitionby.

Authored-by: liwensun 
Signed-off-by: Tathagata Das 
---
 .../org/apache/spark/sql/internal/SQLConf.scala  |  9 +
 .../scala/org/apache/spark/sql/DataFrameWriter.scala | 11 ++-
 .../sql/execution/datasources/DataSourceUtils.scala  | 20 
 .../spark/sql/test/DataFrameReaderWriterSuite.scala  | 19 +++
 4 files changed, 58 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 3f59fa1..b223a48 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1687,6 +1687,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_PASS_PARTITION_BY_AS_OPTIONS =
+buildConf("spark.sql.legacy.sources.write.passPartitionByAsOptions")
+  .internal()
+  .doc("Whether to pass the partitionBy columns as options in 
DataFrameWriter. " +
+"Data source V1 now silently drops partitionBy columns for 
non-file-format sources; " +
+"turning the flag on provides a way for these sources to see these 
partitionBy columns.")
+  .booleanConf
+  .createWithDefault(false)
+
   val NAME_NON_STRUCT_GROUPING_KEY_AS_VALUE =
 buildConf("spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue")
   .internal()
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 9371936..3b84151 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -29,8 +29,9 @@ import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.catalyst.plans.logical.{AppendData, 
InsertIntoTable, LogicalPlan, OverwriteByExpression}
 import org.apache.spark.sql.execution.SQLExecution
 import org.apache.spark.sql.execution.command.DDLUtils
-import org.apache.spark.sql.execution.datasources.{CreateTable, DataSource, 
LogicalRelation}
+import org.apache.spark.sql.execution.datasources.{CreateTable, DataSource, 
DataSourceUtils, LogicalRelation}
 import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Relation, 
DataSourceV2Utils, FileDataSourceV2, WriteToDataSourceV2}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.sources.v2._
 import org.apache.spark.sql.sources.v2.TableCapability._
@@ -313,6 +314,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   private def saveToV1Source(): Unit = {
+if (SparkSession.active.sessionState.conf.getConf(
+  SQLConf.LEGACY_PASS_PARTITION_BY_AS_OPTIONS)) {
+  partitioningColumns.foreach { columns =>
+extraOptions += (DataSourceUtils.PARTITIONING_COLUMNS_KEY ->
+  DataSourceUtils.encodePartitioningColumns(columns))
+  }
+}
+
 // Code path for data source v1.
 runCommand(df.sparkSession, "save") {
   DataSource(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
index 74eae94..0ad914e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
@@ -18,6 +18,8 @@
 package org.apache.spark.sql.execution.datasources
 
 import org.apache.hadoop.fs.Path
+import org.json4s.NoTypeHints
+import org.json4s.jackson.Serialization
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.types._
@@ -25,6 +27,24 @@ import org.apache.spark.sql.types._
 
 object DataSourceUtils {
   /**
+   * The key to use for storing partitionBy columns as options.
+   */
+  val PARTITIONING_COLUMNS_KEY = "__partition_columns"
+
+  /**
+   * Utility methods for converting partitionBy columns to options and back.
+   */
+  private implicit val formats = Seri

[spark] branch branch-2.4 updated: [SPARK-27453] Pass partitionBy as options in DataFrameWriter

2019-04-16 Thread tdas
This is an automated email from the ASF dual-hosted git repository.

tdas pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new df9a506  [SPARK-27453] Pass partitionBy as options in DataFrameWriter
df9a506 is described below

commit df9a50637e2622a15e9af7d837986a0e868878b1
Author: liwensun 
AuthorDate: Tue Apr 16 15:03:16 2019 -0700

[SPARK-27453] Pass partitionBy as options in DataFrameWriter

Pass partitionBy columns as options and feature-flag this behavior.

A new unit test.

Closes #24365 from liwensun/partitionby.

Authored-by: liwensun 
Signed-off-by: Tathagata Das 
(cherry picked from commit 26ed65f4150db1fa37f8bfab24ac0873d2e42936)
Signed-off-by: Tathagata Das 
---
 .../org/apache/spark/sql/internal/SQLConf.scala  |  9 +
 .../scala/org/apache/spark/sql/DataFrameWriter.scala | 11 ++-
 .../sql/execution/datasources/DataSourceUtils.scala  | 20 
 .../spark/sql/test/DataFrameReaderWriterSuite.scala  | 19 +++
 4 files changed, 58 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 29bd356..c9ee60e 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1550,6 +1550,15 @@ object SQLConf {
 "WHERE, which does not follow SQL standard.")
   .booleanConf
   .createWithDefault(false)
+
+  val LEGACY_PASS_PARTITION_BY_AS_OPTIONS =
+buildConf("spark.sql.legacy.sources.write.passPartitionByAsOptions")
+  .internal()
+  .doc("Whether to pass the partitionBy columns as options in 
DataFrameWriter. " +
+"Data source V1 now silently drops partitionBy columns for 
non-file-format sources; " +
+"turning the flag on provides a way for these sources to see these 
partitionBy columns.")
+  .booleanConf
+  .createWithDefault(false)
 }
 
 /**
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index a2586cc..f90d353 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -28,8 +28,9 @@ import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.plans.logical.{AppendData, 
InsertIntoTable, LogicalPlan}
 import org.apache.spark.sql.execution.SQLExecution
 import org.apache.spark.sql.execution.command.DDLUtils
-import org.apache.spark.sql.execution.datasources.{CreateTable, DataSource, 
LogicalRelation}
+import org.apache.spark.sql.execution.datasources.{CreateTable, DataSource, 
DataSourceUtils, LogicalRelation}
 import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Relation, 
DataSourceV2Utils, WriteToDataSourceV2}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.sources.v2._
 import org.apache.spark.sql.types.StructType
@@ -272,6 +273,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   private def saveToV1Source(): Unit = {
+if (SparkSession.active.sessionState.conf.getConf(
+  SQLConf.LEGACY_PASS_PARTITION_BY_AS_OPTIONS)) {
+  partitioningColumns.foreach { columns =>
+extraOptions += (DataSourceUtils.PARTITIONING_COLUMNS_KEY ->
+  DataSourceUtils.encodePartitioningColumns(columns))
+  }
+}
+
 // Code path for data source v1.
 runCommand(df.sparkSession, "save") {
   DataSource(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
index 90cec5e..1cb69d7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
@@ -18,6 +18,8 @@
 package org.apache.spark.sql.execution.datasources
 
 import org.apache.hadoop.fs.Path
+import org.json4s.NoTypeHints
+import org.json4s.jackson.Serialization
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.types._
@@ -40,6 +42,24 @@ object DataSourceUtils {
   }
 
   /**
+   * The key to use for storing partitionBy columns as options.
+   */
+  val PARTITIONING_COLUMNS_KEY = "__partition_columns"
+
+  /**
+   * Utility methods for converting partitionBy columns to options and back.
+   */
+  private implicit val formats = Serialization.formats(NoTypeHints)
+
+  def encodePartitioningColumns(columns: Seq[String]): String = {
+Serialization.write(columns)
+  }
+
+  def decodePart

[spark] branch master updated: [SPARK-25348][SQL] Data source for binary files

2019-04-16 Thread meng
This is an automated email from the ASF dual-hosted git repository.

meng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1bb0c8e  [SPARK-25348][SQL] Data source for binary files
1bb0c8e is described below

commit 1bb0c8e407e0fcd1283f0eb2f742ba2567eda87e
Author: WeichenXu 
AuthorDate: Tue Apr 16 15:41:32 2019 -0700

[SPARK-25348][SQL] Data source for binary files

## What changes were proposed in this pull request?

Implement binary file data source in Spark.

Format name: "binaryFile" (case-insensitive)

Schema:
- content: BinaryType
- status: StructType
  - path: StringType
  - modificationTime: TimestampType
  - length: LongType

Options:
* pathGlobFilter (instead of pathFilterRegex) to reply on GlobFilter 
behavior
* maxBytesPerPartition is not implemented since it is controlled by two SQL 
confs: maxPartitionBytes and openCostInBytes.

## How was this patch tested?

Unit test added.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #24354 from WeichenXu123/binary_file_datasource.

Lead-authored-by: WeichenXu 
Co-authored-by: Xiangrui Meng 
Signed-off-by: Xiangrui Meng 
---
 ...org.apache.spark.sql.sources.DataSourceRegister |   1 +
 .../datasources/binaryfile/BinaryFileFormat.scala  | 177 +
 .../binaryfile/BinaryFileFormatSuite.scala | 143 +
 3 files changed, 321 insertions(+)

diff --git 
a/sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
 
b/sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
index be9cb81..d988287 100644
--- 
a/sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
+++ 
b/sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
@@ -8,3 +8,4 @@ 
org.apache.spark.sql.execution.datasources.v2.text.TextDataSourceV2
 org.apache.spark.sql.execution.streaming.ConsoleSinkProvider
 org.apache.spark.sql.execution.streaming.sources.RateStreamProvider
 org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider
+org.apache.spark.sql.execution.datasources.binaryfile.BinaryFileFormat
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
new file mode 100644
index 000..ad9292a
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.binaryfile
+
+import com.google.common.io.{ByteStreams, Closeables}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, GlobFilter, Path}
+import org.apache.hadoop.mapreduce.Job
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
+import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils}
+import org.apache.spark.sql.execution.datasources.{FileFormat, 
OutputWriterFactory, PartitionedFile}
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.SerializableConfiguration
+
+
+/**
+ * The binary file data source.
+ *
+ * It reads binary files and converts each file into a single record that 
contains the raw content
+ * and metadata of the file.
+ *
+ * Example:
+ * {{{
+ *   // Scala
+ *   val df = spark.read.format("binaryFile")
+ * .option("pathGlobFilter", "*.png")
+ * .load("/path/to/fileDir")
+ *
+ *   // Java
+ *   Dataset df = spark.read().format("binaryFile")
+ * .option("pat

[spark] branch master updated: [SPARK-27479][BUILD] Hide API docs for org.apache.spark.util.kvstore

2019-04-16 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61feb16  [SPARK-27479][BUILD] Hide API docs for 
org.apache.spark.util.kvstore
61feb16 is described below

commit 61feb1635217ef1d4ebceebc1e7c8829c5c11994
Author: gatorsmile 
AuthorDate: Tue Apr 16 19:53:01 2019 -0700

[SPARK-27479][BUILD] Hide API docs for org.apache.spark.util.kvstore

## What changes were proposed in this pull request?

The API docs should not include the "org.apache.spark.util.kvstore" package 
because they are internal private APIs. See the doc link: 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/kvstore/LevelDB.html

## How was this patch tested?
N/A

Closes #24386 from gatorsmile/rmDoc.

Authored-by: gatorsmile 
Signed-off-by: gatorsmile 
---
 project/SparkBuild.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 2036dc0..94f014a 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -740,6 +740,7 @@ object Unidoc {
   .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/unsafe")))
   .map(_.filterNot(_.getCanonicalPath.contains("python")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/util/collection")))
+  
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/util/kvstore")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/catalyst")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/execution")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/internal")))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-27479][BUILD] Hide API docs for org.apache.spark.util.kvstore

2019-04-16 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new fb47b9b  [SPARK-27479][BUILD] Hide API docs for 
org.apache.spark.util.kvstore
fb47b9b is described below

commit fb47b9b399f24da6464c29922a0e40bc8b553805
Author: gatorsmile 
AuthorDate: Tue Apr 16 19:53:01 2019 -0700

[SPARK-27479][BUILD] Hide API docs for org.apache.spark.util.kvstore

## What changes were proposed in this pull request?

The API docs should not include the "org.apache.spark.util.kvstore" package 
because they are internal private APIs. See the doc link: 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/kvstore/LevelDB.html

## How was this patch tested?
N/A

Closes #24386 from gatorsmile/rmDoc.

Authored-by: gatorsmile 
Signed-off-by: gatorsmile 
(cherry picked from commit 61feb1635217ef1d4ebceebc1e7c8829c5c11994)
Signed-off-by: gatorsmile 
---
 project/SparkBuild.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index a5ed908..341f046 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -676,6 +676,7 @@ object Unidoc {
   .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/unsafe")))
   .map(_.filterNot(_.getCanonicalPath.contains("python")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/util/collection")))
+  
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/util/kvstore")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/catalyst")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/execution")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/internal")))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] William1104 commented on a change in pull request #195: [SPARK-27458][DOC] remind developers to reset maven home in IntelliJ

2019-04-16 Thread GitBox
William1104 commented on a change in pull request #195: [SPARK-27458][DOC] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r276076078
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,13 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- The version of Maven bundled with IntelliJ may not be new enough for Spark. 
If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. 
+Please remember to reset the Maven home directory 
+(`Preference -> Build, Execution, Deployment -> Maven -> Maven home 
directory`) of your project to 
+point to a newer installation of Maven. You may also build Spark with script 
`build/mvn` first.
 
 Review comment:
   Thanks. I updated the developer-tools.md accordingly, but not yet run the 
build to update corresponding html yet. Will do that tonight when I can access 
a desktop. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27416][SQL] UnsafeMapData & UnsafeArrayData Kryo serialization …

2019-04-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 54b0d1e  [SPARK-27416][SQL] UnsafeMapData & UnsafeArrayData Kryo 
serialization …
54b0d1e is described below

commit 54b0d1e0efda33065e7e1053d26fab63653619ec
Author: pengbo 
AuthorDate: Wed Apr 17 13:03:00 2019 +0800

[SPARK-27416][SQL] UnsafeMapData & UnsafeArrayData Kryo serialization …

## What changes were proposed in this pull request?
Finish the rest work of https://github.com/apache/spark/pull/24317, 
https://github.com/apache/spark/pull/9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have 
different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes #24357 from pengbo/SPARK-27416_new.

Authored-by: pengbo 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/serializer/KryoSerializer.scala   |  4 ++
 .../sql/catalyst/expressions/UnsafeArrayData.java  | 40 +-
 .../sql/catalyst/expressions/UnsafeDataUtils.java  | 40 ++
 .../sql/catalyst/expressions/UnsafeMapData.java| 45 ++-
 .../spark/sql/catalyst/expressions/UnsafeRow.java  |  9 +--
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 27 ++---
 .../spark/sql/catalyst/util/UnsafeMapSuite.scala   | 64 ++
 7 files changed, 197 insertions(+), 32 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
index eef1997..c426095 100644
--- a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
@@ -213,6 +213,10 @@ class KryoSerializer(conf: SparkConf)
 // We can't load those class directly in order to avoid unnecessary jar 
dependencies.
 // We load them safely, ignore it if the class not found.
 Seq(
+  "org.apache.spark.sql.catalyst.expressions.UnsafeRow",
+  "org.apache.spark.sql.catalyst.expressions.UnsafeArrayData",
+  "org.apache.spark.sql.catalyst.expressions.UnsafeMapData",
+
   "org.apache.spark.ml.attribute.Attribute",
   "org.apache.spark.ml.attribute.AttributeGroup",
   "org.apache.spark.ml.attribute.BinaryAttribute",
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
index 4ff0838..db6401b 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
@@ -25,6 +25,11 @@ import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
 
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
 import org.apache.spark.sql.catalyst.util.ArrayData;
 import org.apache.spark.sql.types.*;
 import org.apache.spark.unsafe.Platform;
@@ -58,7 +63,7 @@ import static 
org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET;
  * Instances of `UnsafeArrayData` act as pointers to row data stored in this 
format.
  */
 
-public final class UnsafeArrayData extends ArrayData implements Externalizable 
{
+public final class UnsafeArrayData extends ArrayData implements 
Externalizable, KryoSerializable {
   public static int calculateHeaderPortionInBytes(int numFields) {
 return (int)calculateHeaderPortionInBytes((long)numFields);
   }
@@ -492,22 +497,9 @@ public final class UnsafeArrayData extends ArrayData 
implements Externalizable {
 return fromPrimitiveArray(arr, Platform.DOUBLE_ARRAY_OFFSET, arr.length, 
8);
   }
 
-
-  public byte[] getBytes() {
-if (baseObject instanceof byte[]
-&& baseOffset == Platform.BYTE_ARRAY_OFFSET
-&& (((byte[]) baseObject).length == sizeInBytes)) {
-  return (byte[]) baseObject;
-} else {
-  byte[] bytes = new byte[sizeInBytes];
-  Platform.copyMemory(baseObject, baseOffset, bytes, 
Platform.BYTE_ARRAY_OFFSET, sizeInBytes);
-  return bytes;
-}
-  }
-
   @Override
   public void writeExternal(ObjectOutput out) throws IOException {
-byte[] bytes = getBytes();
+byte[] bytes = UnsafeDataUtils.getBytes(baseObject, baseOffset, 
sizeInBytes);
 out.writeInt(bytes.length);
 out.writeInt(this.numElements);
 out.write(bytes);
@@ -522,4 +514,22 @@ public final class UnsafeArrayData extends ArrayData 
implements Externalizable {
 this.baseObject = new byte[sizeInBytes];
 in.