date:20170905

spark git commit: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-09-05 Thread zsxwing

Repository: spark
Updated Branches:
  refs/heads/master 6a2325448 -> 445f1790a


[SPARK-9104][CORE] Expose Netty memory metrics in Spark

## What changes were proposed in this pull request?

This PR exposes Netty memory usage for Spark's `TransportClientFactory` and 
`TransportServer`, including the details of each direct arena and heap arena 
metrics, as well as aggregated metrics. The purpose of adding the Netty metrics 
is to better know the memory usage of Netty in Spark shuffle, rpc and others 
network communications, and guide us to better configure the memory size of 
executors.

This PR doesn't expose these metrics to any sink, to leverage this feature, 
still requires to connect to either MetricsSystem or collect them back to 
Driver to display.

## How was this patch tested?

Add Unit test to verify it, also manually verified in real cluster.

Author: jerryshao 

Closes #18935 from jerryshao/SPARK-9104.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/445f1790
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/445f1790
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/445f1790

Branch: refs/heads/master
Commit: 445f1790ade1c53cf7eee1f282395648e4d0992c
Parents: 6a23254
Author: jerryshao 
Authored: Tue Sep 5 21:28:54 2017 -0700
Committer: Shixiong Zhu 
Committed: Tue Sep 5 21:28:54 2017 -0700

--
 common/network-common/pom.xml   |   5 +
 .../network/client/TransportClientFactory.java  |  13 +-
 .../spark/network/server/TransportServer.java   |  14 +-
 .../spark/network/util/NettyMemoryMetrics.java  | 145 
 .../spark/network/util/TransportConf.java   |  10 ++
 .../network/util/NettyMemoryMetricsSuite.java   | 171 +++
 dev/deps/spark-deps-hadoop-2.6  |   2 +-
 dev/deps/spark-deps-hadoop-2.7  |   2 +-
 pom.xml |   2 +-
 9 files changed, 353 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/445f1790/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index ccd8504..18cbdad 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -61,6 +61,11 @@
   jackson-annotations
 
 
+
+  io.dropwizard.metrics
+  metrics-core
+
+
 
 
   org.slf4j

http://git-wip-us.apache.org/repos/asf/spark/blob/445f1790/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
--
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 
b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
index 8add4e1..16d242d 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
@@ -26,6 +26,7 @@ import java.util.Random;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.atomic.AtomicReference;
 
+import com.codahale.metrics.MetricSet;
 import com.google.common.base.Preconditions;
 import com.google.common.base.Throwables;
 import com.google.common.collect.Lists;
@@ -42,10 +43,7 @@ import org.slf4j.LoggerFactory;
 
 import org.apache.spark.network.TransportContext;
 import org.apache.spark.network.server.TransportChannelHandler;
-import org.apache.spark.network.util.IOMode;
-import org.apache.spark.network.util.JavaUtils;
-import org.apache.spark.network.util.NettyUtils;
-import org.apache.spark.network.util.TransportConf;
+import org.apache.spark.network.util.*;
 
 /**
  * Factory for creating {@link TransportClient}s by using createClient.
@@ -87,6 +85,7 @@ public class TransportClientFactory implements Closeable {
   private final Class socketChannelClass;
   private EventLoopGroup workerGroup;
   private PooledByteBufAllocator pooledAllocator;
+  private final NettyMemoryMetrics metrics;
 
   public TransportClientFactory(
   TransportContext context,
@@ -106,6 +105,12 @@ public class TransportClientFactory implements Closeable {
 conf.getModuleName() + "-client");
 this.pooledAllocator = NettyUtils.createPooledByteBufAllocator(
   conf.preferDirectBufs(), false /* allowCache */, conf.clientThreads());
+this.metrics = new NettyMemoryMetrics(
+  this.pooledAllocator, conf.getModuleName() + "-client", conf);
+  }
+
+  public MetricSet getAllMetrics() {
+return metrics;
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/445f1790/common/network-common/src/main/jav

spark git commit: [SPARK-18061][THRIFTSERVER] Add spnego auth support for ThriftServer thrift/http protocol

2017-09-05 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 9e451bcf3 -> 6a2325448


[SPARK-18061][THRIFTSERVER] Add spnego auth support for ThriftServer 
thrift/http protocol

Spark ThriftServer doesn't support spnego auth for thrift/http protocol, this 
mainly used for knox+thriftserver scenario. Since in HiveServer2 CLIService 
there already has existing codes to support it. So here copy it to Spark 
ThriftServer to make it support.

Related Hive JIRA HIVE-6697.

Manual verification.

Author: jerryshao 

Closes #18628 from jerryshao/SPARK-21407.

Change-Id: I61ef0c09f6972bba982475084a6b0ae3a74e385e


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a232544
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a232544
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a232544

Branch: refs/heads/master
Commit: 6a2325448000ba431ba3b982d181c017559abfe3
Parents: 9e451bc
Author: jerryshao 
Authored: Wed Sep 6 09:39:39 2017 +0800
Committer: jerryshao 
Committed: Wed Sep 6 09:39:39 2017 +0800

--
 .../sql/hive/thriftserver/SparkSQLCLIService.scala  | 16 
 1 file changed, 16 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a232544/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
index 1b17a9a..ad1f5eb 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
@@ -25,6 +25,7 @@ import scala.collection.JavaConverters._
 
 import org.apache.commons.logging.Log
 import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars
 import org.apache.hadoop.hive.shims.Utils
 import org.apache.hadoop.security.UserGroupInformation
 import org.apache.hive.service.{AbstractService, Service, ServiceException}
@@ -47,6 +48,7 @@ private[hive] class SparkSQLCLIService(hiveServer: 
HiveServer2, sqlContext: SQLC
 setSuperField(this, "sessionManager", sparkSqlSessionManager)
 addService(sparkSqlSessionManager)
 var sparkServiceUGI: UserGroupInformation = null
+var httpUGI: UserGroupInformation = null
 
 if (UserGroupInformation.isSecurityEnabled) {
   try {
@@ -57,6 +59,20 @@ private[hive] class SparkSQLCLIService(hiveServer: 
HiveServer2, sqlContext: SQLC
 case e @ (_: IOException | _: LoginException) =>
   throw new ServiceException("Unable to login to kerberos with given 
principal/keytab", e)
   }
+
+  // Try creating spnego UGI if it is configured.
+  val principal = 
hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL).trim
+  val keyTabFile = 
hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB).trim
+  if (principal.nonEmpty && keyTabFile.nonEmpty) {
+try {
+  httpUGI = HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf)
+  setSuperField(this, "httpUGI", httpUGI)
+} catch {
+  case e: IOException =>
+throw new ServiceException("Unable to login to spnego with given 
principal " +
+  s"$principal and keytab $keyTabFile: $e", e)
+}
+  }
 }
 
 initCompositeService(hiveConf)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources

2017-09-05 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 1f7c4869b -> 7da8fbf08


[MINOR][DOC] Update `Partition Discovery` section to enumerate all available 
file sources

## What changes were proposed in this pull request?

All built-in data sources support `Partition Discovery`. We had better update 
the document to give the users more benefit clearly.

**AFTER**

https://user-images.githubusercontent.com/9700541/30083628-14278908-9244-11e7-98dc-9ad45fe233a9.png";>

## How was this patch tested?

```
SKIP_API=1 jekyll serve --watch
```

Author: Dongjoon Hyun 

Closes #19139 from dongjoon-hyun/partitiondiscovery.

(cherry picked from commit 9e451bcf36151bf401f72dcd66001b9ceb079738)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7da8fbf0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7da8fbf0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7da8fbf0

Branch: refs/heads/branch-2.2
Commit: 7da8fbf08b492ae899bef5ea5a08e2bcf4c6db93
Parents: 1f7c486
Author: Dongjoon Hyun 
Authored: Tue Sep 5 14:35:09 2017 -0700
Committer: gatorsmile 
Committed: Tue Sep 5 14:35:27 2017 -0700

--
 docs/sql-programming-guide.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7da8fbf0/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index b5eca76..9a54adc 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -733,8 +733,9 @@ SELECT * FROM parquetTable
 
 Table partitioning is a common optimization approach used in systems like 
Hive. In a partitioned
 table, data are usually stored in different directories, with partitioning 
column values encoded in
-the path of each partition directory. The Parquet data source is now able to 
discover and infer
-partitioning information automatically. For example, we can store all our 
previously used
+the path of each partition directory. All built-in file sources (including 
Text/CSV/JSON/ORC/Parquet)
+are able to discover and infer partitioning information automatically.
+For example, we can store all our previously used
 population data into a partitioned table using the following directory 
structure, with two extra
 columns, `gender` and `country` as partitioning columns:
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources

2017-09-05 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master fd60d4fa6 -> 9e451bcf3


[MINOR][DOC] Update `Partition Discovery` section to enumerate all available 
file sources

## What changes were proposed in this pull request?

All built-in data sources support `Partition Discovery`. We had better update 
the document to give the users more benefit clearly.

**AFTER**

https://user-images.githubusercontent.com/9700541/30083628-14278908-9244-11e7-98dc-9ad45fe233a9.png";>

## How was this patch tested?

```
SKIP_API=1 jekyll serve --watch
```

Author: Dongjoon Hyun 

Closes #19139 from dongjoon-hyun/partitiondiscovery.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e451bcf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e451bcf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9e451bcf

Branch: refs/heads/master
Commit: 9e451bcf36151bf401f72dcd66001b9ceb079738
Parents: fd60d4f
Author: Dongjoon Hyun 
Authored: Tue Sep 5 14:35:09 2017 -0700
Committer: gatorsmile 
Committed: Tue Sep 5 14:35:09 2017 -0700

--
 docs/sql-programming-guide.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9e451bcf/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index ee231a9..032073b 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -733,8 +733,9 @@ SELECT * FROM parquetTable
 
 Table partitioning is a common optimization approach used in systems like 
Hive. In a partitioned
 table, data are usually stored in different directories, with partitioning 
column values encoded in
-the path of each partition directory. The Parquet data source is now able to 
discover and infer
-partitioning information automatically. For example, we can store all our 
previously used
+the path of each partition directory. All built-in file sources (including 
Text/CSV/JSON/ORC/Parquet)
+are able to discover and infer partitioning information automatically.
+For example, we can store all our previously used
 population data into a partitioned table using the following directory 
structure, with two extra
 columns, `gender` and `country` as partitioning columns:
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConstraints and ConstantPropagation

2017-09-05 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 8c954d2cd -> fd60d4fa6


[SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConstraints and 
ConstantPropagation

## What changes were proposed in this pull request?

For the given example below, the predicate added by 
`InferFiltersFromConstraints` is folded by `ConstantPropagation` later, this 
leads to unconverged optimize iteration:
```
Seq((1, 1)).toDF("col1", "col2").createOrReplaceTempView("t1")
Seq(1, 2).toDF("col").createOrReplaceTempView("t2")
sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 = 
t2.col AND t1.col2 = t2.col")
```

We can fix this by adjusting the indent of the optimize rules.

## How was this patch tested?

Add test case that would have failed in `SQLQuerySuite`.

Author: Xingbo Jiang 

Closes #19099 from jiangxb1987/unconverge-optimization.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fd60d4fa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fd60d4fa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fd60d4fa

Branch: refs/heads/master
Commit: fd60d4fa6c516496a60d6979edd1b4630bf721bd
Parents: 8c954d2
Author: Xingbo Jiang 
Authored: Tue Sep 5 13:12:39 2017 -0700
Committer: gatorsmile 
Committed: Tue Sep 5 13:12:39 2017 -0700

--
 .../spark/sql/catalyst/optimizer/Optimizer.scala  |  3 ++-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 14 ++
 2 files changed, 16 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fd60d4fa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index b73f70a..d7e5906 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -79,11 +79,12 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
   PushProjectionThroughUnion,
   ReorderJoin,
   EliminateOuterJoin,
+  InferFiltersFromConstraints,
+  BooleanSimplification,
   PushPredicateThroughJoin,
   PushDownPredicate,
   LimitPushDown,
   ColumnPruning,
-  InferFiltersFromConstraints,
   // Operator combine
   CollapseRepartition,
   CollapseProject,

http://git-wip-us.apache.org/repos/asf/spark/blob/fd60d4fa/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 923c6d8..93a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -2663,4 +2663,18 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 // In unit test, Spark will fail the query if memory leak detected.
 spark.range(100).groupBy("id").count().limit(1).collect()
   }
+
+  test("SPARK-21652: rule confliction of InferFiltersFromConstraints and 
ConstantPropagation") {
+withTempView("t1", "t2") {
+  Seq((1, 1)).toDF("col1", "col2").createOrReplaceTempView("t1")
+  Seq(1, 2).toDF("col").createOrReplaceTempView("t2")
+  val df = sql(
+"""
+  |SELECT *
+  |FROM t1, t2
+  |WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 = t2.col AND t1.col2 
= t2.col
+""".stripMargin)
+  checkAnswer(df, Row(1, 1, 1))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

2017-09-05 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 fb1b5f08a -> 1f7c4869b


[SPARK-21925] Update trigger interval documentation in docs with behavior 
change in Spark 2.2

Forgot to update docs with behavior change.

Author: Burak Yavuz 

Closes #19138 from brkyvz/trigger-doc-fix.

(cherry picked from commit 8c954d2cd10a2cf729d2971fbeb19b2dd751a178)
Signed-off-by: Tathagata Das 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1f7c4869
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1f7c4869
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1f7c4869

Branch: refs/heads/branch-2.2
Commit: 1f7c4869b811f9a05cd1fb54e168e739cde7933f
Parents: fb1b5f0
Author: Burak Yavuz 
Authored: Tue Sep 5 13:10:32 2017 -0700
Committer: Tathagata Das 
Committed: Tue Sep 5 13:10:47 2017 -0700

--
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1f7c4869/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 8367f5a..13a6a82 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1168,7 +1168,7 @@ returned through `Dataset.writeStream()`. You will have 
to specify one or more o
 
 - *Query name:* Optionally, specify a unique name of the query for 
identification.
 
-- *Trigger interval:* Optionally, specify the trigger interval. If it is not 
specified, the system will check for availability of new data as soon as the 
previous processing has completed. If a trigger time is missed because the 
previous processing has not completed, then the system will attempt to trigger 
at the next trigger point, not immediately after the processing has completed.
+- *Trigger interval:* Optionally, specify the trigger interval. If it is not 
specified, the system will check for availability of new data as soon as the 
previous processing has completed. If a trigger time is missed because the 
previous processing has not completed, then the system will trigger processing 
immediately.
 
 - *Checkpoint location:* For some output sinks where the end-to-end 
fault-tolerance can be guaranteed, specify the location where the system will 
write all the checkpoint information. This should be a directory in an 
HDFS-compatible fault-tolerant file system. The semantics of checkpointing is 
discussed in more detail in the next section.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

2017-09-05 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master 2974406d1 -> 8c954d2cd


[SPARK-21925] Update trigger interval documentation in docs with behavior 
change in Spark 2.2

Forgot to update docs with behavior change.

Author: Burak Yavuz 

Closes #19138 from brkyvz/trigger-doc-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c954d2c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c954d2c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c954d2c

Branch: refs/heads/master
Commit: 8c954d2cd10a2cf729d2971fbeb19b2dd751a178
Parents: 2974406
Author: Burak Yavuz 
Authored: Tue Sep 5 13:10:32 2017 -0700
Committer: Tathagata Das 
Committed: Tue Sep 5 13:10:32 2017 -0700

--
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8c954d2c/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 8367f5a..13a6a82 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1168,7 +1168,7 @@ returned through `Dataset.writeStream()`. You will have 
to specify one or more o
 
 - *Query name:* Optionally, specify a unique name of the query for 
identification.
 
-- *Trigger interval:* Optionally, specify the trigger interval. If it is not 
specified, the system will check for availability of new data as soon as the 
previous processing has completed. If a trigger time is missed because the 
previous processing has not completed, then the system will attempt to trigger 
at the next trigger point, not immediately after the processing has completed.
+- *Trigger interval:* Optionally, specify the trigger interval. If it is not 
specified, the system will check for availability of new data as soon as the 
previous processing has completed. If a trigger time is missed because the 
previous processing has not completed, then the system will trigger processing 
immediately.
 
 - *Checkpoint location:* For some output sinks where the end-to-end 
fault-tolerance can be guaranteed, specify the location where the system will 
write all the checkpoint information. This should be a directory in an 
HDFS-compatible fault-tolerant file system. The semantics of checkpointing is 
discussed in more detail in the next section.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21845][SQL][TEST-MAVEN] Make codegen fallback of expressions configurable

2017-09-05 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 02a4386ae -> 2974406d1


[SPARK-21845][SQL][TEST-MAVEN] Make codegen fallback of expressions configurable

## What changes were proposed in this pull request?
We should make codegen fallback of expressions configurable. So far, it is 
always on. We might hide it when our codegen have compilation bugs. Thus, we 
should also disable the codegen fallback when running test cases.

## How was this patch tested?
Added test cases

Author: gatorsmile 

Closes #19119 from gatorsmile/fallbackCodegen.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2974406d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2974406d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2974406d

Branch: refs/heads/master
Commit: 2974406d17a3831c1897b8d99261419592f8042f
Parents: 02a4386
Author: gatorsmile 
Authored: Tue Sep 5 09:04:03 2017 -0700
Committer: gatorsmile 
Committed: Tue Sep 5 09:04:03 2017 -0700

--
 .../scala/org/apache/spark/sql/internal/SQLConf.scala   |  6 +++---
 .../org/apache/spark/sql/execution/SparkPlan.scala  | 11 ++-
 .../spark/sql/execution/WholeStageCodegenExec.scala |  2 +-
 .../org/apache/spark/sql/DataFrameFunctionsSuite.scala  |  2 +-
 .../scala/org/apache/spark/sql/DataFrameSuite.scala | 12 +++-
 .../org/apache/spark/sql/test/SharedSQLContext.scala|  2 ++
 .../scala/org/apache/spark/sql/hive/test/TestHive.scala |  1 +
 7 files changed, 25 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2974406d/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index c407874..db5d65c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -559,9 +559,9 @@ object SQLConf {
 .intConf
 .createWithDefault(100)
 
-  val WHOLESTAGE_FALLBACK = buildConf("spark.sql.codegen.fallback")
+  val CODEGEN_FALLBACK = buildConf("spark.sql.codegen.fallback")
 .internal()
-.doc("When true, whole stage codegen could be temporary disabled for the 
part of query that" +
+.doc("When true, (whole stage) codegen could be temporary disabled for the 
part of query that" +
   " fail to compile generated code")
 .booleanConf
 .createWithDefault(true)
@@ -1051,7 +1051,7 @@ class SQLConf extends Serializable with Logging {
 
   def wholeStageMaxNumFields: Int = getConf(WHOLESTAGE_MAX_NUM_FIELDS)
 
-  def wholeStageFallback: Boolean = getConf(WHOLESTAGE_FALLBACK)
+  def codegenFallback: Boolean = getConf(CODEGEN_FALLBACK)
 
   def maxCaseBranchesForCodegen: Int = getConf(MAX_CASES_BRANCHES)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2974406d/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
index c7277c2..b263f10 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
@@ -56,15 +56,17 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
 
   protected def sparkContext = sqlContext.sparkContext
 
-  // sqlContext will be null when we are being deserialized on the slaves.  In 
this instance
-  // the value of subexpressionEliminationEnabled will be set by the 
deserializer after the
-  // constructor has run.
+  // sqlContext will be null when SparkPlan nodes are created without the 
active sessions.
+  // So far, this only happens in the test cases.
   val subexpressionEliminationEnabled: Boolean = if (sqlContext != null) {
 sqlContext.conf.subexpressionEliminationEnabled
   } else {
 false
   }
 
+  // whether we should fallback when hitting compilation errors caused by 
codegen
+  private val codeGenFallBack = (sqlContext == null) || 
sqlContext.conf.codegenFallback
+
   /** Overridden make copy also propagates sqlContext to copied plan. */
   override def makeCopy(newArgs: Array[AnyRef]): SparkPlan = {
 SparkSession.setActiveSession(sqlContext.sparkSession)
@@ -370,8 +372,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
 try {
   GeneratePredicate.generate(expression, inputSchema)
 } catch {
-  case e @ (_: JaninoRuntimeException | _: CompileException)
-  if sqlCon

spark git commit: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-05 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 7f3c6ff4f -> 02a4386ae


[SPARK-20978][SQL] Bump up Univocity version to 2.5.4

## What changes were proposed in this pull request?

There was a bug in Univocity Parser that causes the issue in SPARK-20978. This 
was fixed as below:

```scala
val df = spark.read.schema("a string, b string, unparsed 
string").option("columnNameOfCorruptRecord", "unparsed").csv(Seq("a").toDS())
df.show()
```

**Before**

```
java.lang.NullPointerException
at 
scala.collection.immutable.StringLike$class.stripLineEnd(StringLike.scala:89)
at scala.collection.immutable.StringOps.stripLineEnd(StringOps.scala:29)
at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$getCurrentInput(UnivocityParser.scala:56)
at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:207)
at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:207)
...
```

**After**

```
+---+++
|  a|   b|unparsed|
+---+++
|  a|null|   a|
+---+++
```

It was fixed in 2.5.0 and 2.5.4 was released. I guess it'd be safe to upgrade 
this.

## How was this patch tested?

Unit test added in `CSVSuite.scala`.

Author: hyukjinkwon 

Closes #19113 from HyukjinKwon/bump-up-univocity.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02a4386a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02a4386a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02a4386a

Branch: refs/heads/master
Commit: 02a4386aec5f83f41ca1abc5f56e223b6fae015c
Parents: 7f3c6ff
Author: hyukjinkwon 
Authored: Tue Sep 5 23:21:43 2017 +0800
Committer: Wenchen Fan 
Committed: Tue Sep 5 23:21:43 2017 +0800

--
 dev/deps/spark-deps-hadoop-2.6   | 2 +-
 dev/deps/spark-deps-hadoop-2.7   | 2 +-
 sql/core/pom.xml | 2 +-
 .../spark/sql/execution/datasources/csv/CSVSuite.scala   | 8 
 4 files changed, 11 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/02a4386a/dev/deps/spark-deps-hadoop-2.6
--
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index 1535103..e3b9ce0 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -182,7 +182,7 @@ stax-api-1.0.1.jar
 stream-2.7.0.jar
 stringtemplate-3.2.1.jar
 super-csv-2.2.0.jar
-univocity-parsers-2.2.1.jar
+univocity-parsers-2.5.4.jar
 validation-api-1.1.0.Final.jar
 xbean-asm5-shaded-4.4.jar
 xercesImpl-2.9.1.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/02a4386a/dev/deps/spark-deps-hadoop-2.7
--
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index deaa288..a3f3f32 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -183,7 +183,7 @@ stax-api-1.0.1.jar
 stream-2.7.0.jar
 stringtemplate-3.2.1.jar
 super-csv-2.2.0.jar
-univocity-parsers-2.2.1.jar
+univocity-parsers-2.5.4.jar
 validation-api-1.1.0.Final.jar
 xbean-asm5-shaded-4.4.jar
 xercesImpl-2.9.1.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/02a4386a/sql/core/pom.xml
--
diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index 9a3cacb..7ee002e 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -38,7 +38,7 @@
 
   com.univocity
   univocity-parsers
-  2.2.1
+  2.5.4
   jar
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02a4386a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 243a55c..be89141 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -1195,4 +1195,12 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
   .csv(Seq("10u12").toDS())
 checkAnswer(results, Row(null))
   }
+
+  test("SPARK-20978: Fill the malformed column when the number of tokens is 
less than schem

spark git commit: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-05 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/master 4e7a29efd -> 7f3c6ff4f


[SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

## What changes were proposed in this pull request?

1.0.0 fixes an issue with import order, explicit type for public methods, line 
length limitation and comment validation:

```
[error] 
.../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala:50:16:
 Are you sure you want to println? If yes, wrap the code block with
[error]   // scalastyle:off println
[error]   println(...)
[error]   // scalastyle:on println
[error] 
.../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala:49:
 File line length exceeds 100 characters
[error] 
.../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala:22:21:
 Are you sure you want to println? If yes, wrap the code block with
[error]   // scalastyle:off println
[error]   println(...)
[error]   // scalastyle:on println
[error] 
.../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:35:6:
 Public method must have explicit type
[error] 
.../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:51:6:
 Public method must have explicit type
[error] 
.../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:93:15:
 Public method must have explicit type
[error] 
.../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:98:15:
 Public method must have explicit type
[error] 
.../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:47:2:
 Insert a space after the start of the comment
[error] 
.../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:26:43:
 JavaDStream should come before JavaDStreamLike.
```

This PR also fixes the workaround added in SPARK-16877 for 
`org.scalastyle.scalariform.OverrideJavaChecker` feature, added from 0.9.0.

## How was this patch tested?

Manually tested.

Author: hyukjinkwon 

Closes #19116 from HyukjinKwon/scalastyle-1.0.0.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7f3c6ff4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7f3c6ff4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7f3c6ff4

Branch: refs/heads/master
Commit: 7f3c6ff4ff0a501cc7f1fb53a90ea7b5787f68e1
Parents: 4e7a29e
Author: hyukjinkwon 
Authored: Tue Sep 5 19:40:05 2017 +0900
Committer: hyukjinkwon 
Committed: Tue Sep 5 19:40:05 2017 +0900

--
 project/SparkBuild.scala|  5 +++--
 project/plugins.sbt |  3 +--
 .../src/main/scala/org/apache/spark/repl/Main.scala |  2 ++
 .../main/scala/org/apache/spark/repl/SparkILoop.scala   |  5 -
 scalastyle-config.xml   |  5 +
 .../java/org/apache/spark/streaming/JavaTestUtils.scala | 12 ++--
 6 files changed, 17 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7f3c6ff4/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 9d903ed..20848f0 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -163,14 +163,15 @@ object SparkBuild extends PomBuild {
 val configUrlV = scalastyleConfigUrl.in(config).value
 val streamsV = streams.in(config).value
 val failOnErrorV = true
+val failOnWarningV = false
 val scalastyleTargetV = scalastyleTarget.in(config).value
 val configRefreshHoursV = scalastyleConfigRefreshHours.in(config).value
 val targetV = target.in(config).value
 val configCacheFileV = scalastyleConfigUrlCacheFile.in(config).value
 
 logger.info(s"Running scalastyle on ${name.value} in ${config.name}")
-Tasks.doScalastyle(args, configV, configUrlV, failOnErrorV, 
scalaSourceV, scalastyleTargetV,
-  streamsV, configRefreshHoursV, targetV, configCacheFileV)
+Tasks.doScalastyle(args, configV, configUrlV, failOnErrorV, 
failOnWarningV, scalaSourceV,
+  scalastyleTargetV, streamsV, configRefreshHoursV, targetV, 
configCacheFileV)
 
 Set.empty
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/7f3c6ff4/project/plugins.sbt
--
diff --git a/project/plugins.sbt b/project/plugins.sbt
index f67e0a1..3c5442b 100644
--- a/project/plugins.sbt
+++ b/project/plugins.sbt
@@ -7,8 +7,7 @@ addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % 
"5.1.0")
 // sbt 1.0.0 support: 
https://github.com/jrudolph/sbt-dependency-graph/issues/134
 addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.8.2")
 
-// need to

spark git commit: [SPARK-21913][SQL][TEST] withDatabase` should drop database with CASCADE

2017-09-05 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master ca59445ad -> 4e7a29efd


[SPARK-21913][SQL][TEST] withDatabase` should drop database with CASCADE

## What changes were proposed in this pull request?

Currently, `withDatabase` fails if the database is not empty. It would be great 
if we drop cleanly with CASCADE.

## How was this patch tested?

This is a change on test util. Pass the existing Jenkins.

Author: Dongjoon Hyun 

Closes #19125 from dongjoon-hyun/SPARK-21913.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e7a29ef
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e7a29ef
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e7a29ef

Branch: refs/heads/master
Commit: 4e7a29efdba6972a4713a62dfccb495504a25ab9
Parents: ca59445
Author: Dongjoon Hyun 
Authored: Tue Sep 5 00:20:16 2017 -0700
Committer: gatorsmile 
Committed: Tue Sep 5 00:20:16 2017 -0700

--
 .../src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4e7a29ef/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
index e68db3b..a14a144 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
@@ -247,7 +247,7 @@ private[sql] trait SQLTestUtils
   protected def withDatabase(dbNames: String*)(f: => Unit): Unit = {
 try f finally {
   dbNames.foreach { name =>
-spark.sql(s"DROP DATABASE IF EXISTS $name")
+spark.sql(s"DROP DATABASE IF EXISTS $name CASCADE")
   }
   spark.sql(s"USE $DEFAULT_DATABASE")
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

spark git commit: [SPARK-18061][THRIFTSERVER] Add spnego auth support for ThriftServer thrift/http protocol

spark git commit: [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources

spark git commit: [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources

spark git commit: [SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConstraints and ConstantPropagation

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

spark git commit: [SPARK-21845][SQL][TEST-MAVEN] Make codegen fallback of expressions configurable

spark git commit: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

spark git commit: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

spark git commit: [SPARK-21913][SQL][TEST] withDatabase` should drop database with CASCADE

11 matches

Site Navigation

Mail list logo

Footer information