[spark] branch master updated: [MINOR][SQL] Locality does not need to be implemented

2018-12-20 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 98ecda3  [MINOR][SQL] Locality does not need to be implemented
98ecda3 is described below

commit 98ecda3e8ef9db5e21b5b9605df09d1653094b9c
Author: liuxian 
AuthorDate: Fri Dec 21 13:01:14 2018 +0800

[MINOR][SQL] Locality does not need to be implemented

## What changes were proposed in this pull request?
`HadoopFileWholeTextReader` and  `HadoopFileLinesReader` will be eventually 
called in `FileSourceScanExec`.
In fact,  locality has been implemented in `FileScanRDD`,  even if we 
implement it in `HadoopFileWholeTextReader ` and  `HadoopFileLinesReader`,  it 
would be useless.
So I think these `TODO` can be removed.

## How was this patch tested?
N/A

Closes #23339 from 10110346/noneededtodo.

Authored-by: liuxian 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala  | 2 +-
 .../spark/sql/execution/datasources/HadoopFileWholeTextReader.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
index 00a78f7..57082b4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
@@ -51,7 +51,7 @@ class HadoopFileLinesReader(
   new Path(new URI(file.filePath)),
   file.start,
   file.length,
-  // TODO: Implement Locality
+  // The locality is decided by `getPreferredLocations` in `FileScanRDD`.
   Array.empty)
 val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 
0), 0)
 val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
index c61a89e..f5724f7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
@@ -40,7 +40,7 @@ class HadoopFileWholeTextReader(file: PartitionedFile, conf: 
Configuration)
   Array(new Path(new URI(file.filePath))),
   Array(file.start),
   Array(file.length),
-  // TODO: Implement Locality
+  // The locality is decided by `getPreferredLocations` in `FileScanRDD`.
   Array.empty[String])
 val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 
0), 0)
 val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26409][SQL][TESTS] SQLConf should be serializable in test sessions

2018-12-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new c29f7e2  [SPARK-26409][SQL][TESTS] SQLConf should be serializable in 
test sessions
c29f7e2 is described below

commit c29f7e20def505d3fdc2ee14a6d696a677828a4d
Author: Gengliang Wang 
AuthorDate: Thu Dec 20 10:05:56 2018 -0800

[SPARK-26409][SQL][TESTS] SQLConf should be serializable in test sessions

## What changes were proposed in this pull request?

`SQLConf` is supposed to be serializable. However, currently it is not  
serializable in `WithTestConf`. `WithTestConf` uses the method `overrideConfs` 
in closure, while the classes which implements it 
(`TestHiveSessionStateBuilder` and `TestSQLSessionStateBuilder`) are not 
serializable.

This PR is to use a local variable to fix it.

## How was this patch tested?

Add unit test.

Closes #23352 from gengliangwang/serializableSQLConf.

Authored-by: Gengliang Wang 
Signed-off-by: gatorsmile 
(cherry picked from commit 6692bacf3e74e7a17d8e676e8a06ab198f85d328)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/internal/BaseSessionStateBuilder.scala  | 3 ++-
 .../src/test/scala/org/apache/spark/sql/SerializationSuite.scala | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
index 3a0db7e..9c1a15c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
@@ -308,13 +308,14 @@ private[sql] trait WithTestConf { self: 
BaseSessionStateBuilder =>
   def overrideConfs: Map[String, String]
 
   override protected lazy val conf: SQLConf = {
+val overrideConfigurations = overrideConfs
 val conf = parentState.map(_.conf.clone()).getOrElse {
   new SQLConf {
 clear()
 override def clear(): Unit = {
   super.clear()
   // Make sure we start with the default test configs even after clear
-  overrideConfs.foreach { case (key, value) => setConfString(key, 
value) }
+  overrideConfigurations.foreach { case (key, value) => 
setConfString(key, value) }
 }
   }
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala
index cd6b264..1a1c956 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala
@@ -27,4 +27,9 @@ class SerializationSuite extends SparkFunSuite with 
SharedSQLContext {
 val spark = SparkSession.builder.getOrCreate()
 new JavaSerializer(new 
SparkConf()).newInstance().serialize(spark.sqlContext)
   }
+
+  test("[SPARK-26409] SQLConf should be serializable") {
+val spark = SparkSession.builder.getOrCreate()
+new JavaSerializer(new 
SparkConf()).newInstance().serialize(spark.sessionState.conf)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-25970][ML] Add Instrumentation to PrefixSpan

2018-12-20 Thread meng
This is an automated email from the ASF dual-hosted git repository.

meng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aa0d4ca  [SPARK-25970][ML] Add Instrumentation to PrefixSpan
aa0d4ca is described below

commit aa0d4ca8bab08a467645080a5b8a28bf6dd8a042
Author: zhengruifeng 
AuthorDate: Thu Dec 20 11:22:49 2018 -0800

[SPARK-25970][ML] Add Instrumentation to PrefixSpan

## What changes were proposed in this pull request?
Add Instrumentation to PrefixSpan

## How was this patch tested?
existing tests

Closes #22971 from zhengruifeng/log_PrefixSpan.

Authored-by: zhengruifeng 
Signed-off-by: Xiangrui Meng 
---
 mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala 
b/mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala
index 2a34135..b0006a8 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala
@@ -20,6 +20,7 @@ package org.apache.spark.ml.fpm
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.ml.param._
 import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.fpm.{PrefixSpan => mllibPrefixSpan}
 import org.apache.spark.sql.{DataFrame, Dataset, Row}
 import org.apache.spark.sql.functions.col
@@ -135,7 +136,10 @@ final class PrefixSpan(@Since("2.4.0") override val uid: 
String) extends Params
*  - `freq: Long`
*/
   @Since("2.4.0")
-  def findFrequentSequentialPatterns(dataset: Dataset[_]): DataFrame = {
+  def findFrequentSequentialPatterns(dataset: Dataset[_]): DataFrame = 
instrumented { instr =>
+instr.logDataset(dataset)
+instr.logParams(this, params: _*)
+
 val sequenceColParam = $(sequenceCol)
 val inputType = dataset.schema(sequenceColParam).dataType
 require(inputType.isInstanceOf[ArrayType] &&


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26392][YARN] Cancel pending allocate requests by taking locality preference into account

2018-12-20 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new daeb081  [SPARK-26392][YARN] Cancel pending allocate requests by 
taking locality preference into account
daeb081 is described below

commit daeb0811058c76e2d6cecb6de5ebe287c3be3a94
Author: Ngone51 
AuthorDate: Thu Dec 20 10:25:52 2018 -0800

[SPARK-26392][YARN] Cancel pending allocate requests by taking locality 
preference into account

## What changes were proposed in this pull request?

Right now, we cancel pending allocate requests by its sending order. I 
thing we can take

locality preference into account when do this to perfom least impact on 
task locality preference.

## How was this patch tested?

N.A.

Closes #23344 from 
Ngone51/dev-cancel-pending-allocate-requests-by-taking-locality-preference-into-account.

Authored-by: Ngone51 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit 3d6b44d9ea92dc1eabb8f211176861e51240bf93)
Signed-off-by: Marcelo Vanzin 
---
 .../apache/spark/deploy/yarn/YarnAllocator.scala   | 29 +-
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index 8a7551d..f4dc80a 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -287,20 +287,20 @@ private[yarn] class YarnAllocator(
   s"pending: $numPendingAllocate, running: ${runningExecutors.size}, " +
   s"executorsStarting: ${numExecutorsStarting.get}")
 
+// Split the pending container request into three groups: locality matched 
list, locality
+// unmatched list and non-locality list. Take the locality matched 
container request into
+// consideration of container placement, treat as allocated containers.
+// For locality unmatched and locality free container requests, cancel 
these container
+// requests, since required locality preference has been changed, 
recalculating using
+// container placement strategy.
+val (localRequests, staleRequests, anyHostRequests) = 
splitPendingAllocationsByLocality(
+  hostToLocalTaskCounts, pendingAllocate)
+
 if (missing > 0) {
   logInfo(s"Will request $missing executor container(s), each with " +
 s"${resource.getVirtualCores} core(s) and " +
 s"${resource.getMemory} MB memory (including $memoryOverhead MB of 
overhead)")
 
-  // Split the pending container request into three groups: locality 
matched list, locality
-  // unmatched list and non-locality list. Take the locality matched 
container request into
-  // consideration of container placement, treat as allocated containers.
-  // For locality unmatched and locality free container requests, cancel 
these container
-  // requests, since required locality preference has been changed, 
recalculating using
-  // container placement strategy.
-  val (localRequests, staleRequests, anyHostRequests) = 
splitPendingAllocationsByLocality(
-hostToLocalTaskCounts, pendingAllocate)
-
   // cancel "stale" requests for locations that are no longer needed
   staleRequests.foreach { stale =>
 amClient.removeContainerRequest(stale)
@@ -360,14 +360,9 @@ private[yarn] class YarnAllocator(
   val numToCancel = math.min(numPendingAllocate, -missing)
   logInfo(s"Canceling requests for $numToCancel executor container(s) to 
have a new desired " +
 s"total $targetNumExecutors executors.")
-
-  val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, 
ANY_HOST, resource)
-  if (!matchingRequests.isEmpty) {
-matchingRequests.iterator().next().asScala
-  .take(numToCancel).foreach(amClient.removeContainerRequest)
-  } else {
-logWarning("Expected to find pending requests, but found none.")
-  }
+  // cancel pending allocate requests by taking locality preference into 
account
+  val cancelRequests = (staleRequests ++ anyHostRequests ++ 
localRequests).take(numToCancel)
+  cancelRequests.foreach(amClient.removeContainerRequest)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26392][YARN] Cancel pending allocate requests by taking locality preference into account

2018-12-20 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d6b44d  [SPARK-26392][YARN] Cancel pending allocate requests by 
taking locality preference into account
3d6b44d is described below

commit 3d6b44d9ea92dc1eabb8f211176861e51240bf93
Author: Ngone51 
AuthorDate: Thu Dec 20 10:25:52 2018 -0800

[SPARK-26392][YARN] Cancel pending allocate requests by taking locality 
preference into account

## What changes were proposed in this pull request?

Right now, we cancel pending allocate requests by its sending order. I 
thing we can take

locality preference into account when do this to perfom least impact on 
task locality preference.

## How was this patch tested?

N.A.

Closes #23344 from 
Ngone51/dev-cancel-pending-allocate-requests-by-taking-locality-preference-into-account.

Authored-by: Ngone51 
Signed-off-by: Marcelo Vanzin 
---
 .../apache/spark/deploy/yarn/YarnAllocator.scala   | 29 +-
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index d37d0d6..54b1ec2 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -294,6 +294,15 @@ private[yarn] class YarnAllocator(
   s"pending: $numPendingAllocate, running: ${runningExecutors.size}, " +
   s"executorsStarting: ${numExecutorsStarting.get}")
 
+// Split the pending container request into three groups: locality matched 
list, locality
+// unmatched list and non-locality list. Take the locality matched 
container request into
+// consideration of container placement, treat as allocated containers.
+// For locality unmatched and locality free container requests, cancel 
these container
+// requests, since required locality preference has been changed, 
recalculating using
+// container placement strategy.
+val (localRequests, staleRequests, anyHostRequests) = 
splitPendingAllocationsByLocality(
+  hostToLocalTaskCounts, pendingAllocate)
+
 if (missing > 0) {
   if (log.isInfoEnabled()) {
 var requestContainerMessage = s"Will request $missing executor 
container(s), each with " +
@@ -306,15 +315,6 @@ private[yarn] class YarnAllocator(
 logInfo(requestContainerMessage)
   }
 
-  // Split the pending container request into three groups: locality 
matched list, locality
-  // unmatched list and non-locality list. Take the locality matched 
container request into
-  // consideration of container placement, treat as allocated containers.
-  // For locality unmatched and locality free container requests, cancel 
these container
-  // requests, since required locality preference has been changed, 
recalculating using
-  // container placement strategy.
-  val (localRequests, staleRequests, anyHostRequests) = 
splitPendingAllocationsByLocality(
-hostToLocalTaskCounts, pendingAllocate)
-
   // cancel "stale" requests for locations that are no longer needed
   staleRequests.foreach { stale =>
 amClient.removeContainerRequest(stale)
@@ -374,14 +374,9 @@ private[yarn] class YarnAllocator(
   val numToCancel = math.min(numPendingAllocate, -missing)
   logInfo(s"Canceling requests for $numToCancel executor container(s) to 
have a new desired " +
 s"total $targetNumExecutors executors.")
-
-  val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, 
ANY_HOST, resource)
-  if (!matchingRequests.isEmpty) {
-matchingRequests.iterator().next().asScala
-  .take(numToCancel).foreach(amClient.removeContainerRequest)
-  } else {
-logWarning("Expected to find pending requests, but found none.")
-  }
+  // cancel pending allocate requests by taking locality preference into 
account
+  val cancelRequests = (staleRequests ++ anyHostRequests ++ 
localRequests).take(numToCancel)
+  cancelRequests.foreach(amClient.removeContainerRequest)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26409][SQL][TESTS] SQLConf should be serializable in test sessions

2018-12-20 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6692bac  [SPARK-26409][SQL][TESTS] SQLConf should be serializable in 
test sessions
6692bac is described below

commit 6692bacf3e74e7a17d8e676e8a06ab198f85d328
Author: Gengliang Wang 
AuthorDate: Thu Dec 20 10:05:56 2018 -0800

[SPARK-26409][SQL][TESTS] SQLConf should be serializable in test sessions

## What changes were proposed in this pull request?

`SQLConf` is supposed to be serializable. However, currently it is not  
serializable in `WithTestConf`. `WithTestConf` uses the method `overrideConfs` 
in closure, while the classes which implements it 
(`TestHiveSessionStateBuilder` and `TestSQLSessionStateBuilder`) are not 
serializable.

This PR is to use a local variable to fix it.

## How was this patch tested?

Add unit test.

Closes #23352 from gengliangwang/serializableSQLConf.

Authored-by: Gengliang Wang 
Signed-off-by: gatorsmile 
---
 .../org/apache/spark/sql/internal/BaseSessionStateBuilder.scala  | 3 ++-
 .../src/test/scala/org/apache/spark/sql/SerializationSuite.scala | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
index ac07e1f..319c264 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
@@ -309,13 +309,14 @@ private[sql] trait WithTestConf { self: 
BaseSessionStateBuilder =>
   def overrideConfs: Map[String, String]
 
   override protected lazy val conf: SQLConf = {
+val overrideConfigurations = overrideConfs
 val conf = parentState.map(_.conf.clone()).getOrElse {
   new SQLConf {
 clear()
 override def clear(): Unit = {
   super.clear()
   // Make sure we start with the default test configs even after clear
-  overrideConfs.foreach { case (key, value) => setConfString(key, 
value) }
+  overrideConfigurations.foreach { case (key, value) => 
setConfString(key, value) }
 }
   }
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala
index cd6b264..1a1c956 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SerializationSuite.scala
@@ -27,4 +27,9 @@ class SerializationSuite extends SparkFunSuite with 
SharedSQLContext {
 val spark = SparkSession.builder.getOrCreate()
 new JavaSerializer(new 
SparkConf()).newInstance().serialize(spark.sqlContext)
   }
+
+  test("[SPARK-26409] SQLConf should be serializable") {
+val spark = SparkSession.builder.getOrCreate()
+new JavaSerializer(new 
SparkConf()).newInstance().serialize(spark.sessionState.conf)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7c8f475 -> a888d20)

2018-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7c8f475  [SPARK-24687][CORE] Avoid job hanging when generate task 
binary causes fatal error
 add a888d20  [SPARK-26324][DOCS] Add Spark docs for Running in Mesos with 
SSL

No new revisions were added by this update.

Summary of changes:
 docs/running-on-mesos.md | 13 +
 1 file changed, 13 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.3 updated: [SPARK-24687][CORE] Avoid job hanging when generate task binary causes fatal error

2018-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new a22a11b  [SPARK-24687][CORE] Avoid job hanging when generate task 
binary causes fatal error
a22a11b is described below

commit a22a11b3a1160b6564e5c39571a4b13e29b14936
Author: zhoukang 
AuthorDate: Thu Dec 20 08:26:25 2018 -0600

[SPARK-24687][CORE] Avoid job hanging when generate task binary causes 
fatal error

## What changes were proposed in this pull request?
When NoClassDefFoundError thrown,it will cause job hang.
`Exception in thread "dag-scheduler-event-loop" 
java.lang.NoClassDefFoundError: 
Lcom/xxx/data/recommend/aggregator/queue/QueueName;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
at java.lang.Class.getDeclaredField(Class.java:1946)
at 
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectOutputStream.writeClass(ObjectOutputStream.java:1212)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1119)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)`

It is caused by NoClassDefFoundError will not catch up during task 
seriazation.
`var taskBinary: Broadcast[Array[Byte]] = null
try {
  // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
  // For ResultTask, serialize and broadcast (rdd, func).
  val taskBinaryBytes: Array[Byte] = stage match {
case stage: ShuffleMapStage =>
  JavaUtils.bufferToArray(
closureSerializer.serialize((stage.rdd, stage.shuffleDep): 
AnyRef))
case stage: ResultStage =>
  JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, 
stage.func): AnyRef))
  }

  taskBinary = sc.broadcast(taskBinaryBytes)
} catch {
  // In the case of a failure during serialization, abort the stage.
  case e: NotSerializableException =>
abortStage(stage, "Task not serializable: " + e.toString, Some(e))
runningStages -= stage


[spark] branch branch-2.4 updated: [SPARK-24687][CORE] Avoid job hanging when generate task binary causes fatal error

2018-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 74c1cd1  [SPARK-24687][CORE] Avoid job hanging when generate task 
binary causes fatal error
74c1cd1 is described below

commit 74c1cd15ce49d16e4e6e3c605359bec5f39e9712
Author: zhoukang 
AuthorDate: Thu Dec 20 08:26:25 2018 -0600

[SPARK-24687][CORE] Avoid job hanging when generate task binary causes 
fatal error

## What changes were proposed in this pull request?
When NoClassDefFoundError thrown,it will cause job hang.
`Exception in thread "dag-scheduler-event-loop" 
java.lang.NoClassDefFoundError: 
Lcom/xxx/data/recommend/aggregator/queue/QueueName;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
at java.lang.Class.getDeclaredField(Class.java:1946)
at 
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectOutputStream.writeClass(ObjectOutputStream.java:1212)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1119)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)`

It is caused by NoClassDefFoundError will not catch up during task 
seriazation.
`var taskBinary: Broadcast[Array[Byte]] = null
try {
  // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
  // For ResultTask, serialize and broadcast (rdd, func).
  val taskBinaryBytes: Array[Byte] = stage match {
case stage: ShuffleMapStage =>
  JavaUtils.bufferToArray(
closureSerializer.serialize((stage.rdd, stage.shuffleDep): 
AnyRef))
case stage: ResultStage =>
  JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, 
stage.func): AnyRef))
  }

  taskBinary = sc.broadcast(taskBinaryBytes)
} catch {
  // In the case of a failure during serialization, abort the stage.
  case e: NotSerializableException =>
abortStage(stage, "Task not serializable: " + e.toString, Some(e))
runningStages -= stage


[spark] branch master updated: [SPARK-24687][CORE] Avoid job hanging when generate task binary causes fatal error

2018-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c8f475  [SPARK-24687][CORE] Avoid job hanging when generate task 
binary causes fatal error
7c8f475 is described below

commit 7c8f4756c34a0b00931c2987c827a18d989e6c08
Author: zhoukang 
AuthorDate: Thu Dec 20 08:26:25 2018 -0600

[SPARK-24687][CORE] Avoid job hanging when generate task binary causes 
fatal error

## What changes were proposed in this pull request?
When NoClassDefFoundError thrown,it will cause job hang.
`Exception in thread "dag-scheduler-event-loop" 
java.lang.NoClassDefFoundError: 
Lcom/xxx/data/recommend/aggregator/queue/QueueName;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
at java.lang.Class.getDeclaredField(Class.java:1946)
at 
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectOutputStream.writeClass(ObjectOutputStream.java:1212)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1119)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)`

It is caused by NoClassDefFoundError will not catch up during task 
seriazation.
`var taskBinary: Broadcast[Array[Byte]] = null
try {
  // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
  // For ResultTask, serialize and broadcast (rdd, func).
  val taskBinaryBytes: Array[Byte] = stage match {
case stage: ShuffleMapStage =>
  JavaUtils.bufferToArray(
closureSerializer.serialize((stage.rdd, stage.shuffleDep): 
AnyRef))
case stage: ResultStage =>
  JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, 
stage.func): AnyRef))
  }

  taskBinary = sc.broadcast(taskBinaryBytes)
} catch {
  // In the case of a failure during serialization, abort the stage.
  case e: NotSerializableException =>
abortStage(stage, "Task not serializable: " + e.toString, Some(e))
runningStages -= stage

// 

[spark-website] branch asf-site updated: Hotfix site links in sitemap.xml

2018-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 63d  Hotfix site links in sitemap.xml
63d is described below

commit 63db35119aba73cb28ababaa72ea609bb835
Author: Sean Owen 
AuthorDate: Thu Dec 20 03:46:01 2018 -0600

Hotfix site links in sitemap.xml
---
 site/sitemap.xml | 328 +++
 1 file changed, 164 insertions(+), 164 deletions(-)

diff --git a/site/sitemap.xml b/site/sitemap.xml
index c7f17db..80a4845 100644
--- a/site/sitemap.xml
+++ b/site/sitemap.xml
@@ -143,658 +143,658 @@
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-4-0.html
+  https://spark.apache.org/releases/spark-release-2-4-0.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-4-0-released.html
+  https://spark.apache.org/news/spark-2-4-0-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-3-2.html
+  https://spark.apache.org/releases/spark-release-2-3-2.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-3-2-released.html
+  https://spark.apache.org/news/spark-2-3-2-released.html
   weekly
 
 
-  
http://localhost:4000/news/spark-summit-oct-2018-agenda-posted.html
+  
https://spark.apache.org/news/spark-summit-oct-2018-agenda-posted.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-2-2.html
+  https://spark.apache.org/releases/spark-release-2-2-2.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-2-2-released.html
+  https://spark.apache.org/news/spark-2-2-2-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-1-3.html
+  https://spark.apache.org/releases/spark-release-2-1-3.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-1-3-released.html
+  https://spark.apache.org/news/spark-2-1-3-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-3-1.html
+  https://spark.apache.org/releases/spark-release-2-3-1.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-3-1-released.html
+  https://spark.apache.org/news/spark-2-3-1-released.html
   weekly
 
 
-  
http://localhost:4000/news/spark-summit-june-2018-agenda-posted.html
+  
https://spark.apache.org/news/spark-summit-june-2018-agenda-posted.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-3-0.html
+  https://spark.apache.org/releases/spark-release-2-3-0.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-3-0-released.html
+  https://spark.apache.org/news/spark-2-3-0-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-2-1.html
+  https://spark.apache.org/releases/spark-release-2-2-1.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-2-1-released.html
+  https://spark.apache.org/news/spark-2-2-1-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-1-2.html
+  https://spark.apache.org/releases/spark-release-2-1-2.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-1-2-released.html
+  https://spark.apache.org/news/spark-2-1-2-released.html
   weekly
 
 
-  http://localhost:4000/news/spark-summit-eu-2017-agenda-posted.html
+  
https://spark.apache.org/news/spark-summit-eu-2017-agenda-posted.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-2-0.html
+  https://spark.apache.org/releases/spark-release-2-2-0.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-2-0-released.html
+  https://spark.apache.org/news/spark-2-2-0-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-1-1.html
+  https://spark.apache.org/releases/spark-release-2-1-1.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-1-1-released.html
+  https://spark.apache.org/news/spark-2-1-1-released.html
   weekly
 
 
-  
http://localhost:4000/news/spark-summit-june-2017-agenda-posted.html
+  
https://spark.apache.org/news/spark-summit-june-2017-agenda-posted.html
   weekly
 
 
-  
http://localhost:4000/news/spark-summit-east-2017-agenda-posted.html
+  
https://spark.apache.org/news/spark-summit-east-2017-agenda-posted.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-1-0.html
+  https://spark.apache.org/releases/spark-release-2-1-0.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-1-0-released.html
+  https://spark.apache.org/news/spark-2-1-0-released.html
   weekly
 
 
-  
http://localhost:4000/news/spark-wins-cloudsort-100tb-benchmark.html
+  
https://spark.apache.org/news/spark-wins-cloudsort-100tb-benchmark.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-2-0-2.html
+  https://spark.apache.org/releases/spark-release-2-0-2.html
   weekly
 
 
-  http://localhost:4000/news/spark-2-0-2-released.html
+  https://spark.apache.org/news/spark-2-0-2-released.html
   weekly
 
 
-  http://localhost:4000/releases/spark-release-1-6-3.html
+