[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84832/testReport)**
 for PR 19953 at commit 
[`a3aca2e`](https://github.com/apache/spark/commit/a3aca2ef98bf2116f90565282bf24730f264b6b3).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] add init-container bootstrappi...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19954
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84836 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84836/testReport)**
 for PR 19862 at commit 
[`57550fb`](https://github.com/apache/spark/commit/57550fbd0c42c1616dee0197af6dedbd57a8da89).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] add init-container bootstrappi...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19954
  
**[Test build #84838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84838/testReport)**
 for PR 19954 at commit 
[`1a74521`](https://github.com/apache/spark/commit/1a74521c3f114a9774598738daef5489c6fa8bae).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84840/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] add init-container bootstrappi...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84838/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84840/testReport)**
 for PR 19862 at commit 
[`80231ab`](https://github.com/apache/spark/commit/80231ab670d5bf1640fad3a9741b6315dba9d1bb).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19962
  
**[Test build #84837 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84837/testReport)**
 for PR 19962 at commit 
[`3922ff4`](https://github.com/apache/spark/commit/3922ff4625aba951884c3f780782c8a4675aff06).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84833 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84833/testReport)**
 for PR 19953 at commit 
[`84a3ed3`](https://github.com/apache/spark/commit/84a3ed3e0f69485645bc92c471c35cfbfab7ffa2).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19953
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84833/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84836/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19962
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19953
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84832/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19962
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84837/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84841 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84841/testReport)**
 for PR 19811 at commit 
[`c9a790c`](https://github.com/apache/spark/commit/c9a790c4d0ed7c88b7217a2d6ee13741fad5a9a0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84842 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84842/testReport)**
 for PR 19862 at commit 
[`4571b08`](https://github.com/apache/spark/commit/4571b08678d180fefacfffbf0de2e5289066dd73).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19932


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19953
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19962
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84844/testReport)**
 for PR 19953 at commit 
[`84a3ed3`](https://github.com/apache/spark/commit/84a3ed3e0f69485645bc92c471c35cfbfab7ffa2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19962
  
**[Test build #84843 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84843/testReport)**
 for PR 19962 at commit 
[`3922ff4`](https://github.com/apache/spark/commit/3922ff4625aba951884c3f780782c8a4675aff06).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19793: [SPARK-22574] [Mesos] [Submit] Check submission request ...

2017-12-13 Thread Gschiavon
Github user Gschiavon commented on the issue:

https://github.com/apache/spark/pull/19793
  
Hi @vanzin ! I just fixed it and pushed it to my branch but it's not 
opening the PR, maybe I need to do a new one?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
@aokolnychyi After rethinking about it, we might need to revert this PR. 
Although it converts a CROSS Join to an Inner join, it does not improve the 
performance. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19350: [SPARK-22126][ML] Fix model-specific optimization...

2017-12-13 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/19350#discussion_r156599955
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -82,5 +86,49 @@ abstract class Estimator[M <: Model[M]] extends 
PipelineStage {
 paramMaps.map(fit(dataset, _))
   }
 
+  /**
+   * (Java-specific)
+   */
+  @Since("2.3.0")
+  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
+unpersistDatasetAfterFitting: Boolean, executionContext: 
ExecutionContext,
+modelCallback: VoidFunction2[Model[_], Int]): Unit = {
+// Fit models in a Future for training in parallel
+val modelFutures = paramMaps.map { paramMap =>
+  Future[Model[_]] {
+fit(dataset, paramMap).asInstanceOf[Model[_]]
--- End diff --

How will this work in a pipeline?

If the `Estimator` in CV is a `Pipeline`, then here it will call 
`fit(dataset, paramMap)` on the `Pipeline` which will in turn fit on each stage 
with that `paramMap`. This is what the current parallel CV is doing.

But if we have a stage with model-specific optimization (let's say for 
arguments sake a `LinearRegression` that can internally optimize `maxIter`) 
then its `fit` will be called with only a single `paramMap` arg.

So that pushing the parallel fit into `Estimator` nullifies any benefit 
from model-specific optimizations? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19350: [SPARK-22126][ML][WIP] Fix model-specific optimiz...

2017-12-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19350#discussion_r156603289
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -82,5 +86,49 @@ abstract class Estimator[M <: Model[M]] extends 
PipelineStage {
 paramMaps.map(fit(dataset, _))
   }
 
+  /**
+   * (Java-specific)
+   */
+  @Since("2.3.0")
+  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
+unpersistDatasetAfterFitting: Boolean, executionContext: 
ExecutionContext,
+modelCallback: VoidFunction2[Model[_], Int]): Unit = {
+// Fit models in a Future for training in parallel
+val modelFutures = paramMaps.map { paramMap =>
+  Future[Model[_]] {
+fit(dataset, paramMap).asInstanceOf[Model[_]]
--- End diff --

@MLnick Oh, the design is still under discussion on JIRA and will be 
changed I think. I should mark this WIP. thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r156609277
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -80,6 +80,14 @@ class CacheManager extends Logging {
 cachedData.isEmpty
   }
 
+  private def extractStatsOfPlanForCache(plan: LogicalPlan): 
Option[Statistics] = {
+if (plan.stats.rowCount.isDefined) {
--- End diff --

We could also collect the size as part of building the cache.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r156609849
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
@@ -60,7 +62,8 @@ case class InMemoryRelation(
 @transient child: SparkPlan,
 tableName: Option[String])(
 @transient var _cachedColumnBuffers: RDD[CachedBatch] = null,
-val batchStats: LongAccumulator = 
child.sqlContext.sparkContext.longAccumulator)
+val batchStats: LongAccumulator = 
child.sqlContext.sparkContext.longAccumulator,
+statsOfPlanToCache: Option[Statistics] = None)
--- End diff --

Yeah, the secondary argument list seems a better place. I don't think we 
should incorporate the stats in the hash/equals method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r156610279
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
@@ -71,9 +74,10 @@ case class InMemoryRelation(
 
   override def computeStats(): Statistics = {
 if (batchStats.value == 0L) {
-  // Underlying columnar RDD hasn't been materialized, no useful 
statistics information
-  // available, return the default statistics.
-  Statistics(sizeInBytes = child.sqlContext.conf.defaultSizeInBytes)
+  // Underlying columnar RDD hasn't been materialized, use the stats 
from the plan to cache when
+  // applicable
+  statsOfPlanToCache.getOrElse(Statistics(sizeInBytes =
+child.sqlContext.conf.defaultSizeInBytes))
--- End diff --

Mweh - this seems very arbitrary.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84841/testReport)**
 for PR 19811 at commit 
[`c9a790c`](https://github.com/apache/spark/commit/c9a790c4d0ed7c88b7217a2d6ee13741fad5a9a0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84841/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-13 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19599#discussion_r156605360
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: 
String) // No need for isV
   }
 }
 
+/**
+ * :: DeveloperApi ::
+ * Specialized version of `Param[String]` for Java.
+ */
+@DeveloperApi
+class StringParam(parent: Params, name: String, doc: String, isValid: 
String => Boolean)
+  extends Param[String](parent, name, doc, isValid) {
+
+  private var options: Option[Array[String]] = None
--- End diff --

What about this?
```
class StringParam(parent: Params, name: String, doc: String, isValid: 
String => Boolean, 
  options: Option[Array[String]] = None)
  extends Param[String](parent, name, doc, isValid) {
...
  def this(parent: Params, name: String, doc: String, options: 
Array[String]) = {
this(parent, name, doc + s" Supported options (case-insensitive): 
${options.mkString(", ")}.",
  s => options.exists(s.equalsIgnoreCase), Some(options))
  }```

This solves the options as val problem, but highlights another one: 
why do we need the possibility to give an explicit isValid? Why not always 
expect options only?

I agree with @attilapiros that these params are enum-like. If so the only 
reasonable validation is to check if one of the acceptable values are given 
(ignoring case). I don't remember ever seeing a custom validator doing anything 
else. Removing these custom validators would decrease complexity and code 
duplication.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-13 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19599#discussion_r156609625
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: 
String) // No need for isV
   }
 }
 
+/**
+ * :: DeveloperApi ::
+ * Specialized version of `Param[String]` for Java.
+ */
+@DeveloperApi
+class StringParam(parent: Params, name: String, doc: String, isValid: 
String => Boolean)
+  extends Param[String](parent, name, doc, isValid) {
+
+  private var options: Option[Array[String]] = None
+
+  def this(parent: Params, name: String, doc: String) =
+this(parent, name, doc, ParamValidators.alwaysTrue)
+
+  /** construct a StringParam with limited options (case-insensitive) */
+  def this(parent: Params, name: String, doc: String, options: 
Array[String]) = {
+this(parent, name, doc + s" Supported options (case-insensitive): 
${options.mkString(", ")}.",
+  s => options.exists(s.equalsIgnoreCase))
+this.options = Some(options)
+  }
+
+  private[spark] def getOptions: Option[Array[String]] = options
+
+  /** Creates a param pair with given value (for Java). */
+  override def w(value: String): ParamPair[String] = super.w(value)
+
+  override def validate(value: String): Unit = {
--- End diff --

should be private[param] 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-13 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19599#discussion_r156609525
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: 
String) // No need for isV
   }
 }
 
+/**
+ * :: DeveloperApi ::
+ * Specialized version of `Param[String]` for Java.
+ */
+@DeveloperApi
+class StringParam(parent: Params, name: String, doc: String, isValid: 
String => Boolean)
+  extends Param[String](parent, name, doc, isValid) {
+
+  private var options: Option[Array[String]] = None
+
+  def this(parent: Params, name: String, doc: String) =
+this(parent, name, doc, ParamValidators.alwaysTrue)
+
+  /** construct a StringParam with limited options (case-insensitive) */
+  def this(parent: Params, name: String, doc: String, options: 
Array[String]) = {
+this(parent, name, doc + s" Supported options (case-insensitive): 
${options.mkString(", ")}.",
--- End diff --

I missed one additional line when highlighting, sorry for that :)
I meant to test the case-insensitive validation in the next line.

Doc could be tested too, I think it's rather a nice to have than a must.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19962
  
LGTM, left only one minor comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InS...

2017-12-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19962#discussion_r156617815
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -981,80 +976,52 @@ case class ScalaUDF(
   }
 
   // scalastyle:on line.size.limit
-
-  private val converterClassName = classOf[Any => Any].getName
-  private val scalaUDFClassName = classOf[ScalaUDF].getName
-  private val typeConvertersClassName = 
CatalystTypeConverters.getClass.getName + ".MODULE$"
-
-  // Generate codes used to convert the arguments to Scala type for 
user-defined functions
-  private[this] def genCodeForConverter(ctx: CodegenContext, index: Int): 
(String, String) = {
-val converterTerm = ctx.freshName("converter")
-val expressionIdx = ctx.references.size - 1
-(converterTerm,
-  s"$converterClassName $converterTerm = 
($converterClassName)$typeConvertersClassName" +
-s".createToScalaConverter(((Expression)((($scalaUDFClassName)" +
-
s"references[$expressionIdx]).getChildren().apply($index))).dataType());")
-  }
-
   override def doGenCode(
   ctx: CodegenContext,
   ev: ExprCode): ExprCode = {
-val scalaUDF = ctx.freshName("scalaUDF")
-val scalaUDFRef = ctx.addReferenceObj("scalaUDFRef", this, 
scalaUDFClassName)
-
-// Object to convert the returned value of user-defined functions to 
Catalyst type
-val catalystConverterTerm = ctx.freshName("catalystConverter")
+val converterClassName = classOf[Any => Any].getName
 
+// The type converters for inputs and the result.
+val converters: Array[Any => Any] = children.map { c =>
+  CatalystTypeConverters.createToScalaConverter(c.dataType)
+}.toArray :+ CatalystTypeConverters.createToCatalystConverter(dataType)
+val convertersTerm = ctx.addReferenceObj("converters", converters, 
s"$converterClassName[]")
+val errorMsgTerm = ctx.addReferenceObj("errMsg", udfErrorMessage)
 val resultTerm = ctx.freshName("result")
 
-// This must be called before children expressions' codegen
-// because ctx.references is used in genCodeForConverter
-val converterTerms = children.indices.map(genCodeForConverter(ctx, _))
-
-// Initialize user-defined function
-val funcClassName = s"scala.Function${children.size}"
-
-val funcTerm = ctx.freshName("udf")
-
 // codegen for children expressions
 val evals = children.map(_.genCode(ctx))
 
 // Generate the codes for expressions and calling user-defined function
 // We need to get the boxedType of dataType's javaType here. Because 
for the dataType
 // such as IntegerType, its javaType is `int` and the returned type of 
user-defined
 // function is Object. Trying to convert an Object to `int` will cause 
casting exception.
-val evalCode = evals.map(_.code).mkString
-val (converters, funcArguments) = converterTerms.zipWithIndex.map {
-  case ((convName, convInit), i) =>
-val eval = evals(i)
-val argTerm = ctx.freshName("arg")
-val convert =
-  s"""
- |$convInit
- |Object $argTerm = ${eval.isNull} ? null : 
$convName.apply(${eval.value});
-   """.stripMargin
-(convert, argTerm)
-}.unzip
+val evalCode = evals.map(_.code).mkString("\n")
+val initFuncArgs = scala.collection.mutable.ListBuffer.empty[String]
+val funcArguments = evals.zipWithIndex.map { case (eval, i) =>
--- End diff --

very nit: can we do something like:
```
val (funcArguments, initFuncArgs)  = evals.zipWithIndex.map { case (eval, 
i) =>
  val argTerm = ctx.freshName("arg")
  val initFuncArgs =
s"Object $argTerm = ${eval.isNull} ? null : 
$convertersTerm[$i].apply(${eval.value});"
  (argTerm, initFuncArgs)
}.unzip
```
Just to avoid side effects and be more compliant to functional 
programming...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19940
  
I don't think so, it would be very risky, since `splitExpressions` might 
put the initialization outside the constructor.
What about 1? Any thought?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19156
  
**[Test build #84845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84845/testReport)**
 for PR 19156 at commit 
[`5647a49`](https://github.com/apache/spark/commit/5647a4950db7c95a913f5fa8da66bfbe87f65d64).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19940
  
**[Test build #84846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84846/testReport)**
 for PR 19940 at commit 
[`90f517c`](https://github.com/apache/spark/commit/90f517c38364eda0e905ea0f23b413806307ec10).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/19964

[SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in 
elt

## What changes were proposed in this pull request?

In SPARK-22550 which fixes 64KB JVM bytecode limit problem with elt, 
`buildCodeBlocks` is used to split codes. However, we should use 
`splitExpressionsWithCurrentInputs` because it considers both normal and 
wholestage codgen (it is not supported yet, so it simply doesn't split the 
codes).

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-22772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19964


commit c40488ef33bb81caa7a2d99fc4d1b4de6c462ee3
Author: Liang-Chi Hsieh 
Date:   2017-12-13T10:52:55Z

Use splitExpressionsWithCurrentInputs to split codes in elt.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit pr...

2017-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19778#discussion_r156624676
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -224,22 +224,52 @@ case class Elt(children: Seq[Expression])
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val index = indexExpr.genCode(ctx)
 val strings = stringExprs.map(_.genCode(ctx))
+val indexVal = ctx.freshName("index")
+val stringVal = ctx.freshName("stringVal")
 val assignStringValue = strings.zipWithIndex.map { case (eval, index) 
=>
   s"""
 case ${index + 1}:
-  ${ev.value} = ${eval.isNull} ? null : ${eval.value};
+  ${eval.code}
+  $stringVal = ${eval.isNull} ? null : ${eval.value};
   break;
   """
-}.mkString("\n")
-val indexVal = ctx.freshName("index")
-val stringArray = ctx.freshName("strings");
+}
 
-ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
-  final int $indexVal = ${index.value};
-  UTF8String ${ev.value} = null;
-  switch ($indexVal) {
-$assignStringValue
+val cases = ctx.buildCodeBlocks(assignStringValue)
+val codes = if (cases.length == 1) {
+  s"""
+UTF8String $stringVal = null;
+switch ($indexVal) {
+  ${cases.head}
+}
+   """
+} else {
+  var prevFunc = "null"
+  for (c <- cases.reverse) {
+val funcName = ctx.freshName("eltFunc")
+val funcBody = s"""
+ private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
--- End diff --

Proposes a fix in #19964.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156625612
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

We don't need to and can't merge split functions in inner classes.

We don't need to to do it because the split functions are not call in a 
sequence like this:
```
eltFunc_1(...)
eltFunc_2(...)
...
```

The calls are embedded in the default branch in each split function. So we 
won't call all split inner functions in outer class.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19964
  
**[Test build #84847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84847/testReport)**
 for PR 19964 at commit 
[`c40488e`](https://github.com/apache/spark/commit/c40488ef33bb81caa7a2d99fc4d1b4de6c462ee3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156626902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

We can't merge them because the `makeSplitFunction` will create invalid 
merged function if used with the given `foldFunctions`:

```java
private UTF8String eltFunc(InternalRow i, int index) { 

  UTF8String stringVal = null;
  switch (index) {
UTF8String stringVal = eltFunc_999(i, index);
default:
  return nestedClassInstance.eltFunc_999(i, index);
  }
  return stringVal;
}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs...

2017-12-13 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19964
  
cc @cloud-fan @kiszk @mgaido91



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156627486
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+// If spark.executor.heartbeatInterval bigger than 
spark.network.timeout,
+// it will almost always cause ExecutorLostFailure.See SPARK-22754.
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156627509
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19964
  
**[Test build #84848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84848/testReport)**
 for PR 19964 at commit 
[`c677aed`](https://github.com/apache/spark/commit/c677aede0fe20f19ef66be3d4be4de19ceb27ad6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19942: [SPARK-22754][DEPLOY] Check whether spark.executor.heart...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19942
  
**[Test build #84849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84849/testReport)**
 for PR 19942 at commit 
[`9127e68`](https://github.com/apache/spark/commit/9127e682a6a2bee561c63ef315501f434a73e8b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84842/testReport)**
 for PR 19862 at commit 
[`4571b08`](https://github.com/apache/spark/commit/4571b08678d180fefacfffbf0de2e5289066dd73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84842/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-12-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19156#discussion_r156629043
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -197,14 +240,14 @@ private[ml] object SummaryBuilderImpl extends Logging 
{
* metrics that need to de computed internally to get the final result.
*/
   private val allMetrics: Seq[(String, Metric, DataType, 
Seq[ComputeMetric])] = Seq(
-("mean", Mean, arrayDType, Seq(ComputeMean, ComputeWeightSum)),
-("variance", Variance, arrayDType, Seq(ComputeWeightSum, ComputeMean, 
ComputeM2n)),
+("mean", Mean, vectorUDT, Seq(ComputeMean, ComputeWeightSum)),
+("variance", Variance, vectorUDT, Seq(ComputeWeightSum, ComputeMean, 
ComputeM2n)),
 ("count", Count, LongType, Seq()),
-("numNonZeros", NumNonZeros, arrayLType, Seq(ComputeNNZ)),
-("max", Max, arrayDType, Seq(ComputeMax, ComputeNNZ)),
-("min", Min, arrayDType, Seq(ComputeMin, ComputeNNZ)),
-("normL2", NormL2, arrayDType, Seq(ComputeM2)),
-("normL1", NormL1, arrayDType, Seq(ComputeL1))
+("numNonZeros", NumNonZeros, vectorUDT, Seq(ComputeNNZ)),
--- End diff --

Internally still use `Array[Long]` to do the computation. Only when 
returning result, convert it to vector.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84844/testReport)**
 for PR 19953 at commit 
[`84a3ed3`](https://github.com/apache/spark/commit/84a3ed3e0f69485645bc92c471c35cfbfab7ffa2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84844/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19953
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156630612
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

yes but in this way we can hit the 64KB limit. Moreover I think that the 
current implementation is quite complex. What about making it similar to any 
other implementations using a `while` loop instead of a `switch`? WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...

2017-12-13 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19946
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19962
  
**[Test build #84843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84843/testReport)**
 for PR 19962 at commit 
[`3922ff4`](https://github.com/apache/spark/commit/3922ff4625aba951884c3f780782c8a4675aff06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19962
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19962
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84843/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19946
  
**[Test build #84850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84850/testReport)**
 for PR 19946 at commit 
[`5f24de1`](https://github.com/apache/spark/commit/5f24de19f038ac55015f9da54cfc681b18508d24).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19156
  
**[Test build #84845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84845/testReport)**
 for PR 19156 at commit 
[`5647a49`](https://github.com/apache/spark/commit/5647a4950db7c95a913f5fa8da66bfbe87f65d64).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19156
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19156
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84845/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-13 Thread gczsjdy
Github user gczsjdy commented on the issue:

https://github.com/apache/spark/pull/19862
  
This is actually a small change, but it can provide not small optimization 
for users who don't use `WholeStageCodegen`, for example there're still some 
users who use under 2.0 version of Spark.

Also, the behavior of codegen and non-codegen code paths are supposed to be 
the same, so in this way it is a 'bug'.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156643797
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156633323
  
--- Diff: docs/index.md ---
@@ -112,7 +113,7 @@ options for deployment:
   * [Mesos](running-on-mesos.html): deploy a private cluster using
   [Apache Mesos](http://mesos.apache.org)
   * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen 
(YARN)
-  * [Kubernetes 
(experimental)](https://github.com/apache-spark-on-k8s/spark): deploy Spark on 
top of Kubernetes
+  * [Kubernetes (experimental)](running-on-kubernetes.html): deploy Spark 
on top of Kubernetes
--- End diff --

We can remove `(experimental)` here as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156641748
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156639540
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156640797
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156641676
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156639188
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156641285
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. I

[GitHub] spark pull request #19965: [SSPARK-22769][CORE] When driver stopping, there ...

2017-12-13 Thread KaiXinXiaoLei
GitHub user KaiXinXiaoLei opened a pull request:

https://github.com/apache/spark/pull/19965

[SSPARK-22769][CORE] When driver stopping, there is error: RpcEnv already 
stopped

## What changes were proposed in this pull request?
When driver stopping, there is a error: 
org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped:

17/12/12 18:20:44 INFO MemoryStore: MemoryStore cleared
17/12/12 18:20:44 INFO BlockManager: BlockManager stopped
17/12/12 18:20:44 INFO BlockManagerMaster: BlockManagerMaster stopped
17/12/12 18:20:44 ERROR TransportRequestHandler: Error while invoking 
RpcHandler#receive() for one-way message.
org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
at 
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)

I think the log level should be warning, not error.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KaiXinXiaoLei/spark rpc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19965.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19965


commit 4d95a7a3bbcec7272c01938e8f99b8f6df3ed2ed
Author: hanghang <584620...@qq.com>
Date:   2017-12-13T11:53:18Z

change code




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19965: [SSPARK-22769][CORE] When driver stopping, there is erro...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19965
  
**[Test build #84851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84851/testReport)**
 for PR 19965 at commit 
[`4d95a7a`](https://github.com/apache/spark/commit/4d95a7a3bbcec7272c01938e8f99b8f6df3ed2ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18692
  
Yea I have the same feeling. If the left side has a `a = 1` constraint, and 
the right side has a `b = 1` constraint, adding a `a = b` join condition does 
not help as it always evaluate to true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19793: [SPARK-22574] [Mesos] [Submit] Check submission request ...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19793
  
yea please open a new PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156649930
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

Why we can hit the 64kb limit?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156650539
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

I have thought about it. Other implementation needs to introduce at least 
one global variable such as case when case. If we can tolerate it, it is ok for 
me. Let's see what other reviewers think about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19962
  
**[Test build #84852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84852/testReport)**
 for PR 19962 at commit 
[`2caef80`](https://github.com/apache/spark/commit/2caef802f15801a99e226034f1c470fabca050ca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19953
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156652342
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

let's not complicated the already-complex `splitExpressions`, I'm ok to use 
some global variables to simplify the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19962
  
LGTM, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156652932
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/ConfigSupport.java ---
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * A mix-in interface for {@link DataSourceV2}. Data sources can implement 
this interface to
+ * propagate session configs with chosen key-prefixes to the particular 
data source.
+ */
+@InterfaceStability.Evolving
+public interface ConfigSupport {
--- End diff --

nit: `SessionConfigSupport`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19966: Fix submission request

2017-12-13 Thread Gschiavon
GitHub user Gschiavon opened a pull request:

https://github.com/apache/spark/pull/19966

Fix submission request

## What changes were proposed in this pull request?

PR closed with all the comments -> 
https://github.com/apache/spark/pull/19793

It solves the problem when submitting a wrong CreateSubmissionRequest to 
Spark Dispatcher was causing a bad state of Dispatcher and making it inactive 
as a Mesos framework.

https://issues.apache.org/jira/browse/SPARK-22574

## How was this patch tested?

All spark test passed successfully.

It was tested sending a wrong request (without appArgs) before and after 
the change. The point is easy, check if the value is null before being accessed.

This was before the change, leaving the dispatcher inactive:

```
Exception in thread "Thread-22" java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.getDriverCommandValue(MesosClusterScheduler.scala:444)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.buildDriverCommand(MesosClusterScheduler.scala:451)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.org$apache$spark$scheduler$cluster$mesos$MesosClusterScheduler$$createTaskInfo(MesosClusterScheduler.scala:538)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:570)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:555)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:555)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:621)
```

And after:

```
  "message" : "Malformed request: 
org.apache.spark.deploy.rest.SubmitRestProtocolException: Validation of message 
CreateSubmissionRequest 
failed!\n\torg.apache.spark.deploy.rest.SubmitRestProtocolMessage.validate(SubmitRestProtocolMessage.scala:70)\n\torg.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:272)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:707)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\torg.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)\n\torg.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)\n\torg.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\torg.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\torg.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\torg.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\torg.s
 
park_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\torg.spark_project.jetty.server.Server.handle(Server.java:524)\n\torg.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)\n\torg.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)\n\torg.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\torg.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\torg.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.
 
java:671)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tjava.lang.Thread.run(Thread.java:745)"
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Gschiavon/spark fix-submission-request

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19966.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19966


commit 44fd5d3921299f93d1aab7fe971078a6bce835a2
Author: German Schiavon 
Date:   2017-11-21T15:32:04Z

Check submission request parameters

commit 14d64172500483f9e984ac28ba5f3b52db33ad9e
Author: German Schiavon 
Date:   2017-11-27T08:47:41Z

change test env var name

commit 57d52c4917b8cd08e8e73bce9729f8a59afa6ffd
Author: Germa

[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84853/testReport)**
 for PR 19953 at commit 
[`84a3ed3`](https://github.com/apache/spark/commit/84a3ed3e0f69485645bc92c471c35cfbfab7ffa2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156653418
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/ConfigSupport.java ---
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * A mix-in interface for {@link DataSourceV2}. Data sources can implement 
this interface to
+ * propagate session configs with chosen key-prefixes to the particular 
data source.
--- End diff --

`propagate session configs with the specified key-prefix to all the 
read/write operations under this session`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156653492
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/ConfigSupport.java ---
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * A mix-in interface for {@link DataSourceV2}. Data sources can implement 
this interface to
+ * propagate session configs with chosen key-prefixes to the particular 
data source.
+ */
+@InterfaceStability.Evolving
+public interface ConfigSupport {
+
+/**
+ * Name for the specified data source, will extract all session 
configs that starts with
+ * `spark.datasource.$name`, turn `spark.datasource.$name.xxx -> 
yyy` into
+ * `xxx -> yyy`, and propagate them to all data source operations 
in this session.
+ */
+String name();
--- End diff --

how about `keyPrefix`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19966: Fix submission request

2017-12-13 Thread Gschiavon
Github user Gschiavon commented on the issue:

https://github.com/apache/spark/pull/19966
  
@felixcheung @cloud-fan @vanzin @susanxhuynh @gatorsmile @ArtRand Here is 
the new PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19964: [SPARK-22772][SQL] Use splitExpressionsWithCurren...

2017-12-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19964#discussion_r156653613
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -299,33 +299,35 @@ case class Elt(children: Seq[Expression])
   """
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
+var prevFunc = "null"
+var codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = assignStringValue,
+  funcName = "eltFunc",
+  extraArguments = ("int", indexVal) :: Nil,
+  returnType = "UTF8String",
+  makeSplitFunction = body =>
+s"""
+   |UTF8String $stringVal = null;
+   |switch ($indexVal) {
+   |  $body
+   |  default:
+   |return $prevFunc;
+   |}
+   |return $stringVal;
+""".stripMargin,
+  foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+  makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, 
$indexVal)",
+  mergeSplit = false)
--- End diff --

@viirya I think we can hit it, with an outstanding number of parameters to 
the function. I am not saying that it is likely to happen, but IMHO it is 
feasible to make it happening


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156653981
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+import scala.collection.immutable
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.v2.ConfigSupport
+
+private[sql] object DataSourceV2ConfigSupport extends Logging {
--- End diff --

how about just naming it `Util`? We may add more functions in it for other 
purpose in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19966: Fix submission request

2017-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19966
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156654124
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+import scala.collection.immutable
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.v2.ConfigSupport
+
+private[sql] object DataSourceV2ConfigSupport extends Logging {
+
+  /**
+   * Helper method that turns session configs with config keys that start 
with
+   * `spark.datasource.$name` into k/v pairs, the k/v pairs will be used 
to create data source
+   * options.
+   * A session config `spark.datasource.$name.xxx -> yyy` will be 
transformed into
+   * `xxx -> yyy`.
+   *
+   * @param name the data source name
+   * @param conf the session conf
+   * @return an immutable map that contains all the extracted and 
transformed k/v pairs.
+   */
+  def withSessionConfig(
+  name: String,
+  conf: SQLConf): immutable.Map[String, String] = {
--- End diff --

in Scala `Map` by default refers to `immutable.Map`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156654259
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+import scala.collection.immutable
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.v2.ConfigSupport
+
+private[sql] object DataSourceV2ConfigSupport extends Logging {
+
+  /**
+   * Helper method that turns session configs with config keys that start 
with
+   * `spark.datasource.$name` into k/v pairs, the k/v pairs will be used 
to create data source
+   * options.
+   * A session config `spark.datasource.$name.xxx -> yyy` will be 
transformed into
+   * `xxx -> yyy`.
+   *
+   * @param name the data source name
+   * @param conf the session conf
+   * @return an immutable map that contains all the extracted and 
transformed k/v pairs.
+   */
+  def withSessionConfig(
+  name: String,
+  conf: SQLConf): immutable.Map[String, String] = {
+require(name != null, "The data source name can't be null.")
+
+val pattern = Pattern.compile(s"spark\\.datasource\\.$name\\.(.*)")
--- End diff --

this can be a member variable to avoid re-compile it everytime.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19861#discussion_r156654638
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+import scala.collection.immutable
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.v2.ConfigSupport
+
+private[sql] object DataSourceV2ConfigSupport extends Logging {
+
+  /**
+   * Helper method that turns session configs with config keys that start 
with
+   * `spark.datasource.$name` into k/v pairs, the k/v pairs will be used 
to create data source
+   * options.
+   * A session config `spark.datasource.$name.xxx -> yyy` will be 
transformed into
+   * `xxx -> yyy`.
+   *
+   * @param name the data source name
+   * @param conf the session conf
+   * @return an immutable map that contains all the extracted and 
transformed k/v pairs.
+   */
+  def withSessionConfig(
+  name: String,
+  conf: SQLConf): immutable.Map[String, String] = {
+require(name != null, "The data source name can't be null.")
+
+val pattern = Pattern.compile(s"spark\\.datasource\\.$name\\.(.*)")
--- End diff --

nit: s"^spark..." to make sure the matched string starts with `spark...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >