date:20180403

[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20928#discussion_r179038732
  
--- Diff: docs/mllib-feature-extraction.md ---
@@ -105,7 +105,7 @@ p(w_i | w_j ) = 
\frac{\exp(u_{w_i}^{\top}v_{w_j})}{\sum_{l=1}^{V} \exp(u_l^{\top
 \]`
 where $V$ is the vocabulary size. 
 
-The skip-gram model with softmax is expensive because the cost of 
computing $\log p(w_i | w_j)$ 
+The skip-gram model with softmax is expensive because of the cost of 
computing $\log p(w_i | w_j)$ 
--- End diff --

seems a mistake.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20961: [SPARK-23823][SQL] Keep origin in transformExpression

2018-04-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20961
  
LGTM, too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20928#discussion_r179040483
  
--- Diff: sql/README.md ---
@@ -6,7 +6,7 @@ This module provides support for executing relational 
queries expressed in eithe
 Spark SQL is broken up into four subprojects:
  - Catalyst (sql/catalyst) - An implementation-agnostic framework for 
manipulating trees of relational operators and expressions.
  - Execution (sql/core) - A query planner / execution engine for 
translating Catalyst's logical query plans into Spark RDDs.  This component 
also includes a new public interface, SQLContext, that allows users to execute 
SQL or LINQ statements against existing RDDs and Parquet files.
- - Hive Support (sql/hive) - Includes an extension of SQLContext called 
HiveContext that allows users to write queries using a subset of HiveQL and 
access data from a Hive Metastore using Hive SerDes.  There are also wrappers 
that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
+ - Hive Support (sql/hive) - Includes an extension of SQLContext called 
HiveContext that allows users to write queries using a subset of HiveQL and 
access data from a Hive Metastore using Hive SerDes. There are also wrappers 
that allow  users to run queries that include Hive UDFs, UDAFs, and UDTFs.
--- End diff --

There seems an extra place after `allow`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20928#discussion_r179038446
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -92,7 +92,7 @@ above) and "drop". Further strategies may be supported in 
future.
 
 
 
-In the following example, we load ratings data from the
+In the following example, we load rating data from the
--- End diff --

@dsakuma, ratings seems fine (also given the link 
http://grouplens.org/datasets/movielens/)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20928#discussion_r179038867
  
--- Diff: docs/mllib-naive-bayes.md ---
@@ -19,7 +19,7 @@ These models are typically used for [document 
classification](http://nlp.stanfor
 Within that context, each observation is a document and each
 feature represents a term whose value is the frequency of the term (in 
multinomial naive Bayes) or
 a zero or one indicating whether the term was found in the document (in 
Bernoulli naive Bayes).
-Feature values must be nonnegative. The model type is selected with an 
optional parameter
+Feature values must be non-negative. The model type is selected with an 
optional parameter
--- End diff --

seems the previous one also correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18576: [SPARK-21351][SQL] Update nullability based on ch...

2018-04-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18576


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2018-04-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18576
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20969: [SPARK-23826] [TEST] TestHiveSparkSession should ...

2018-04-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20969


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20969
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20913: [SPARK-23799] FilterEstimation.evaluateInSet prod...

2018-04-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20913#discussion_r179037665
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
 
 // return the filter selectivity.  Without advanced statistics such as 
histograms,
 // we have to assume uniform distribution.
-Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0))
+if (ndv.toDouble != 0) {
--- End diff --

What's the concrete case when `ndv.toDouble == 0`?
Also, is this only an place where we need this check? 
For example, we don't here:

https://github.com/apache/spark/blob/5cfd5fabcdbd77a806b98a6dd59b02772d2f6dee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala#L166


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20931: [SPARK-23815][Core]Spark writer dynamic partition...

2018-04-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20931#discussion_r179036894
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -186,7 +186,9 @@ class HadoopMapReduceCommitProtocol(
 logDebug(s"Clean up default partition directories for overwriting: 
$partitionPaths")
 for (part <- partitionPaths) {
   val finalPartPath = new Path(path, part)
-  fs.delete(finalPartPath, true)
+  if (!fs.delete(finalPartPath, true) && 
!fs.exists(finalPartPath.getParent)) {
+fs.mkdirs(finalPartPath.getParent)
--- End diff --

do you have some official HDFS document to support this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20969
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88872/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20969
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20969
  
**[Test build #88872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88872/testReport)**
 for PR 20969 at commit 
[`f7e0b03`](https://github.com/apache/spark/commit/f7e0b034026691872c905ab4d5d09c381c56b7b0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...

2018-04-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20931
  
ok to test



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20913: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20913
  
**[Test build #88877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88877/testReport)**
 for PR 20913 at commit 
[`67597fd`](https://github.com/apache/spark/commit/67597fdcb703c7fa3fa189a456944693727d5754).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20913: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20913
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2018-04-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18576
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20965
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20965
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1946/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20965
  
**[Test build #88876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88876/testReport)**
 for PR 20965 at commit 
[`696ba17`](https://github.com/apache/spark/commit/696ba171e2f42ceb6028eec56f8422715ca40a99).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20965
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1945/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20965
  
**[Test build #88875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88875/testReport)**
 for PR 20965 at commit 
[`9623765`](https://github.com/apache/spark/commit/962376552a9cfbd4a110ceb9294caeffb3032ecc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20965
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88873/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20973
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20973
  
**[Test build #88873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88873/testReport)**
 for PR 20973 at commit 
[`d563c8f`](https://github.com/apache/spark/commit/d563c8fab0cb718b511ac78bc38e712a65148d17).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88871/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20953
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20953
  
**[Test build #88871 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88871/testReport)**
 for PR 20953 at commit 
[`a06ad5e`](https://github.com/apache/spark/commit/a06ad5e0451c3ff8bf7104512f32161bf66ed696).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88874/testReport)**
 for PR 20886 at commit 
[`2b2973a`](https://github.com/apache/spark/commit/2b2973a9db7a8fa228bfc939604feca4cc2c6a59).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1944/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88870/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20971
  
**[Test build #88870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88870/testReport)**
 for PR 20971 at commit 
[`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-03 Thread sujith71955

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r179030611
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -385,7 +385,9 @@ case class LoadDataCommand(
 val hadoopConf = sparkSession.sessionState.newHadoopConf()
 val srcPath = new Path(hdfsUri)
 val fs = srcPath.getFileSystem(hadoopConf)
-if (!fs.exists(srcPath)) {
+// Check if the path exists or there are matched paths if it's a 
path with wildcard.
+// For HDFS path, we support wildcard in directory name and file 
name.
+if (null == fs.globStatus(srcPath) || 
fs.globStatus(srcPath).isEmpty) {
--- End diff --

I will update the PR  as such  we can use fs.globStatus() API  in both 
local and hdfs file path scenarios. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-03 Thread sujith71955

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r179030399
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -385,7 +385,9 @@ case class LoadDataCommand(
 val hadoopConf = sparkSession.sessionState.newHadoopConf()
 val srcPath = new Path(hdfsUri)
 val fs = srcPath.getFileSystem(hadoopConf)
-if (!fs.exists(srcPath)) {
+// Check if the path exists or there are matched paths if it's a 
path with wildcard.
+// For HDFS path, we support wildcard in directory name and file 
name.
+if (null == fs.globStatus(srcPath) || 
fs.globStatus(srcPath).isEmpty) {
--- End diff --

@wzhfy @HyukjinKwon @dongjoon-hyun  i verified the scenario by updating the 
code by using fs.globStatus() API for both local and hdfs path. for local path 
its working fine


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20974: [SPARK-23862][SQL] Spark ExpressionEncoder should suppor...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20974
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20974: [SPARK-23862][SQL] Spark ExpressionEncoder should suppor...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20974
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20974: [SPARK-23862][SQL] Spark ExpressionEncoder should...

2018-04-03 Thread fangshil

GitHub user fangshil opened a pull request:

https://github.com/apache/spark/pull/20974

[SPARK-23862][SQL] Spark ExpressionEncoder should support java enum type in 
scala


## What changes were proposed in this pull request?

In SPARK-21255, spark upstream adds support for creating encoders for java 
enum types, but the support is only added to Java API(for enum working within 
Java Beans). Since the java enum can come from third-party java library, we 
have use case that requires 
1. using java enum types as field of scala case class
2. using java enum as the type T in Dataset[T]

Spark ExpressionEncoder already supports ser/de many java types in 
ScalaReflection, so we propose to add support for java enum as well, as a 
follow up of SPARK-21255.


## How was this patch tested?

Tested the patch in our production cluster.  Added unit test.
Since:
1. it is not possible to define a java enum in scala directly, since the 
defined enum class in scala will miss method like valueOf which is added by 
java compiler
2. it is not possible to define a test enum java class and use in scala 
test because the compilation of single scala 
test(-DwildcardSuites=org.apache.spark.sql.DatasetSuite) won't compile the test 
java class first

As a result, I use the Spark SQL public java enum API(SaveMode.java) in the 
test. Please advise if there is a better way to test 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fangshil/spark SPARK-23862

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20974.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20974


commit 90effb21375a2ec0e93426efcaae092ad3f59e26
Author: Fangshi Li 
Date:   2018-04-04T04:52:36Z

SPARK-23862: Spark ExpressionEncoder should support java enum type in scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20973
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1943/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20973
  
**[Test build #88873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88873/testReport)**
 for PR 20973 at commit 
[`d563c8f`](https://github.com/apache/spark/commit/d563c8fab0cb718b511ac78bc38e712a65148d17).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20810: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-03 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at:

https://github.com/apache/spark/pull/20810


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20810
  
According to @jkbradley 's opinion. I create a new PR which only use a 
static method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-03 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/20973

[SPARK-20114][ML] spark.ml parity for sequential pattern mining - PrefixSpan

## What changes were proposed in this pull request?

PrefixSpan API for spark.ml. New implementation instead of #20810

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark prefixSpan2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20973.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20973


commit d563c8fab0cb718b511ac78bc38e712a65148d17
Author: WeichenXu 
Date:   2018-04-04T04:42:05Z

init pr




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88868/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20786
  
**[Test build #88868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88868/testReport)**
 for PR 20786 at commit 
[`48c17d4`](https://github.com/apache/spark/commit/48c17d4dff6a4e82b86d70f3845e6d524b4807e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `sealed trait ClassificationNode extends Node `
  * `sealed trait RegressionNode extends Node `
  * `sealed trait LeafNode extends Node `
  * `sealed trait InternalNode extends Node `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20969
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1942/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20969
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20969
  
**[Test build #88872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88872/testReport)**
 for PR 20969 at commit 
[`f7e0b03`](https://github.com/apache/spark/commit/f7e0b034026691872c905ab4d5d09c381c56b7b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20969: [SPARK-23826] [TEST] TestHiveSparkSession should ...

2018-04-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20969#discussion_r179020152
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -159,9 +159,10 @@ private[hive] class TestHiveSparkSession(
 private val loadTestTables: Boolean)
   extends SparkSession(sc) with Logging { self =>
 
-  // TODO(SPARK-23826): TestHiveSparkSession should set default session 
the same way as
-  // TestSparkSession, but doing this the same way breaks many tests in 
the package. We need
-  // to investigate and find a different strategy.
+  // The base spark session does this in getOrCreate(), here we emulate 
that behavior for tests.
+  if (SparkSession.getDefaultSession.isEmpty) {
+SparkSession.setDefaultSession(this)
+  }
--- End diff --

This is not needed after we merge https://github.com/apache/spark/pull/20927


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1941/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20971
  
**[Test build #88870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88870/testReport)**
 for PR 20971 at commit 
[`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20953
  
**[Test build #88871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88871/testReport)**
 for PR 20953 at commit 
[`a06ad5e`](https://github.com/apache/spark/commit/a06ad5e0451c3ff8bf7104512f32161bf66ed696).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20953
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1940/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20971
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20797
  
**[Test build #88869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88869/testReport)**
 for PR 20797 at commit 
[`c568944`](https://github.com/apache/spark/commit/c568944a98ce35c79809283a68ec95454029d0ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88867/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20971
  
**[Test build #88867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88867/testReport)**
 for PR 20971 at commit 
[`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88866/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20953
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20953
  
**[Test build #88866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88866/testReport)**
 for PR 20953 at commit 
[`a06ad5e`](https://github.com/apache/spark/commit/a06ad5e0451c3ff8bf7104512f32161bf66ed696).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20972: Fixes misspelling in configuration.md

2018-04-03 Thread bradurani

Github user bradurani closed the pull request at:

https://github.com/apache/spark/pull/20972


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19968: [SPARK-22769][CORE] When driver stopping, there i...

2018-04-03 Thread KaiXinXiaoLei

Github user KaiXinXiaoLei closed the pull request at:

https://github.com/apache/spark/pull/19968


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88860/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19968: [SPARK-22769][CORE] When driver stopping, there is error...

2018-04-03 Thread KaiXinXiaoLei

Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19968
  
Now this problemï¼ i don't work. Now i close it .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88860/testReport)**
 for PR 20886 at commit 
[`7a3311c`](https://github.com/apache/spark/commit/7a3311c2cbd3d9f7399abb38bd877bbd23ca836e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20928: [MINOR][DOC] Fix some typos and grammar issues

2018-04-03 Thread dsakuma

Github user dsakuma commented on the issue:

https://github.com/apache/spark/pull/20928
  
@HyukjinKwon I've fixed the title format :D


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-04-03 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r179013270
  
--- Diff: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala
 ---
@@ -108,6 +108,28 @@ class MesosCoarseGrainedSchedulerBackendSuite extends 
SparkFunSuite
 verifyTaskLaunched(driver, "o2")
   }
 
+  test("mesos declines offers from blacklisted slave") {
+setBackend()
+
+// launches a task on a valid offer on slave s1
+val minMem = backend.executorMemory(sc) + 1024
+val minCpu = 4
+val offer1 = Resources(minMem, minCpu)
+offerResources(List(offer1))
+verifyTaskLaunched(driver, "o1")
+
+// for any reason executor(aka mesos task) failed on s1
+val status = createTaskStatus("0", "s1", TaskState.TASK_FAILED)
+backend.statusUpdate(driver, status)
+when(taskScheduler.nodeBlacklist()).thenReturn(Set("hosts1"))
--- End diff --

just to re-iterate my point above -- in many cases, having an executor fail 
will *not* lead to `taskScheduler.nodeBlacklist()` changing as you're doing 
here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-04-03 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r179012299
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -648,14 +645,8 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   totalGpusAcquired -= gpus
   gpusByTaskId -= taskId
 }
-// If it was a failure, mark the slave as failed for blacklisting 
purposes
 if (TaskState.isFailed(state)) {
-  slave.taskFailures += 1
-
-  if (slave.taskFailures >= MAX_SLAVE_FAILURES) {
-logInfo(s"Blacklisting Mesos slave $slaveId due to too many 
failures; " +
-"is Spark installed on it?")
-  }
+  logError(s"Task $taskId failed on Mesos slave $slaveId.")
--- End diff --

minor: I think it would be nice to say "Mesos task $taskId...".  Maybe its 
obvious for those spending more time with mesos, but I find I'm easily confused 
by the difference between a mesos task and a spark task.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-04-03 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r179012891
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -571,7 +568,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
   cpus + totalCoresAcquired <= maxCores &&
   mem <= offerMem &&
   numExecutors < executorLimit &&
-  slaves.get(slaveId).map(_.taskFailures).getOrElse(0) < 
MAX_SLAVE_FAILURES &&
+  !scheduler.nodeBlacklist().contains(offerHostname) &&
--- End diff --

I just want to make really sure everybody understands the big change in 
behavior here -- `nodeBlacklist()` currently *only* gets updated based on 
failures in *spark* tasks.  If a mesos task fails to even start -- that is, if 
a spark executor fails to launch on a node -- `nodeBlacklist` does not get 
updated.  So you could have a node that is misconfigured somehow, and you might 
end up repeatedly trying to launch executors on it after this changed, with the 
executor even failing to start each time.  That is even if you have 
blacklisting on.

This is SPARK-16630 for the non-mesos case.  That is being actively worked 
on now -- however the work there will probably have to be yarn-specific, so 
there will still be followup work to get the same thing for mesos after that is 
in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88859/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20933
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20933
  
**[Test build #88859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88859/testReport)**
 for PR 20933 at commit 
[`ffbf2f8`](https://github.com/apache/spark/commit/ffbf2f88c224fcafce003121695ab91774db0776).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-04-03 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20249
  
It's belong to TODO work @tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...

2018-04-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20886#discussion_r179008019
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -298,15 +299,32 @@ class SessionCatalog(
 makeQualifiedPath(tableDefinition.storage.locationUri.get)
   tableDefinition.copy(
 storage = tableDefinition.storage.copy(locationUri = 
Some(qualifiedTableLocation)),
-identifier = TableIdentifier(table, Some(db)))
+identifier = tableIdentifier)
 } else {
-  tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
+  tableDefinition.copy(identifier = tableIdentifier)
 }
 
 requireDbExists(db)
+if (!ignoreIfExists) {
+  validateTableLocation(newTableDefinition)
+}
 externalCatalog.createTable(newTableDefinition, ignoreIfExists)
   }
 
+  def validateTableLocation(table: CatalogTable): Unit = {
+// SPARK-19724: the default location of a managed table should be 
non-existent or empty.
+if (table.tableType == CatalogTableType.MANAGED && 
!conf.allowNonemptyManagedTableLocation) {
+  val tableLocation =
+new 
Path(table.storage.locationUri.getOrElse(defaultTablePath(table.identifier)))
+  val fs = tableLocation.getFileSystem(hadoopConf)
+
+  if (fs.exists(tableLocation) && 
fs.listStatus(tableLocation).nonEmpty) {
+throw new AnalysisException(s"Can not create the managed 
table('${table.identifier}')" +
--- End diff --

`Can not` -> `Not allowed to` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20945: [SPARK-23790][Mesos] fix metastore connection iss...

2018-04-03 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20945#discussion_r179007924
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler(
   options ++= Seq("--class", desc.command.mainClass)
 }
 
+desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
+  options ++= Seq("--proxy-user", v)
--- End diff --

> Yes because the assumption was client mode was safe. There is no warning 
about this 

Could probably use something in the documentation - warnings printed to 
logs are easily ignored. Still, there are legitimate uses for client mode + 
proxy user, but I don't think this is one of them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...

2018-04-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20886#discussion_r179007898
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1152,6 +1152,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val ALLOW_NONEMPTY_MANAGED_TABLE_LOCATION =
+buildConf("spark.sql.allowNonemptyManagedTableLocation")
--- End diff --

`spark.sql.allowCreateManagedTableUsingNonemptyLocation`

Also this should be an internal conf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20786
  
**[Test build #88868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88868/testReport)**
 for PR 20786 at commit 
[`48c17d4`](https://github.com/apache/spark/commit/48c17d4dff6a4e82b86d70f3845e6d524b4807e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1939/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20786
  
@jkbradley Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20928: Fix small typo in configuration doc

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20928
  
@dsakuma, mind if I ask to fix the PR title to .. `[MINOR][DOC] ...` just 
to consistent with other PRs? It's not a small typo anymore :). Thanks for your 
effort.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20972: Fixes misspelling in configuration.md

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20972
  
We can close this just for clarification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20972: Fixes misspelling in configuration.md

2018-04-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20972
  
Please do a quick search before opening a PR.. there are two duplicated PRs 
- https://github.com/apache/spark/pull/20948 and 
https://github.com/apache/spark/pull/20928


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20972: Fixes misspelling in configuration.md

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20972
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20972: Fixes misspelling in configuration.md

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20972
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20923
  
Hi @steveloughran , I think you missed this comment. You need to create a 
deps file under dev/deps and change the related script.

> Also I think we need to create a related spark-deps-hadoop-3.x under 
dev/deps and make dependency check work for Hadoop 3.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20971
  
**[Test build #88867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88867/testReport)**
 for PR 20971 at commit 
[`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...

2018-04-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1938/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20972: Fixes misspelling in configuration.md

2018-04-03 Thread bradurani

GitHub user bradurani opened a pull request:

https://github.com/apache/spark/pull/20972

Fixes misspelling in configuration.md



## What changes were proposed in this pull request?

Fixes a misspelling in configuration.md. Changes `spark-defalut.conf` to 
`spark-default.conf`

## How was this patch tested?

Viewed the new markdown in Github


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bradurani/spark bu/fix_docs_misspelling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20972.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20972


commit e346e677cd2b783b4fa39e7bf6a59eee0a40eb1a
Author: Brad Urani 
Date:   2018-04-04T00:44:23Z

Fixes misspelling in configuration.md

spark-defalut.conf -> spark-default.conf




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 420 matches

Mail list logo