date:20180324

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20900
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88566/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20900
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20900
  
**[Test build #88566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88566/testReport)**
 for PR 20900 at commit 
[`bc49c3c`](https://github.com/apache/spark/commit/bc49c3cc5ae2e23da5cc7b6d7e1a779e9d012c8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20900
  
**[Test build #88566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88566/testReport)**
 for PR 20900 at commit 
[`bc49c3c`](https://github.com/apache/spark/commit/bc49c3cc5ae2e23da5cc7b6d7e1a779e9d012c8c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20798
  
that's fine :). let's close this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20900
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88565/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88565/testReport)**
 for PR 20886 at commit 
[`3b882fa`](https://github.com/apache/spark/commit/3b882fa61e94c7025ac035dcc2b483bec59e1ddf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88564/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88564/testReport)**
 for PR 20886 at commit 
[`ff57889`](https://github.com/apache/spark/commit/ff57889c4070e3f47e3f078e28d9d1c40fc8c338).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-24 Thread tengpeng

Github user tengpeng commented on the issue:

https://github.com/apache/spark/pull/20842
  
Looks good! Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20896
  
Also merged to 2.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20896


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20896
  
LGTM. Merging to master and 2.3. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/20896#discussion_r176925878
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 ---
@@ -550,22 +550,22 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter with Logging wi
 .start()
 }
 
-val input = MemoryStream[Int]
-val q1 = startQuery(input.toDS, "stream_serializable_test_1")
-val q2 = startQuery(input.toDS.map { i =>
+val input = MemoryStream[Int] :: MemoryStream[Int] :: 
MemoryStream[Int] :: Nil
--- End diff --

I think this is just to save several lines.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...

2018-03-24 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r176925831
  
--- Diff: R/pkg/R/functions.R ---
@@ -1957,8 +1958,12 @@ setMethod("levenshtein", signature(y = "Column"),
   })
 
 #' @details
-#' \code{months_between}: Returns number of months between dates \code{y} 
and \code{x}.
-#'
+#' \code{months_between}: Returns number of months between dates \code{y} 
and \code{x}. 
+#' If \code{y} is later than \code{x}, then the result is positive.
+#' If \code{y} and \code{x} are on the same day of month, or both are the 
last day of month,
+#' time of day will be ignored.
+#' Otherwise, the difference is calculated based on 31 days per month, and 
rounded to
+#' 8 digits.  
 #' @rdname column_datetime_diff_functions
--- End diff --

you should leave a line `#'` before this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...

2018-03-24 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r176925842
  
--- Diff: R/pkg/R/functions.R ---
@@ -1957,8 +1958,12 @@ setMethod("levenshtein", signature(y = "Column"),
   })
 
 #' @details
-#' \code{months_between}: Returns number of months between dates \code{y} 
and \code{x}.
-#'
+#' \code{months_between}: Returns number of months between dates \code{y} 
and \code{x}. 
+#' If \code{y} is later than \code{x}, then the result is positive.
+#' If \code{y} and \code{x} are on the same day of month, or both are the 
last day of month,
+#' time of day will be ignored.
+#' Otherwise, the difference is calculated based on 31 days per month, and 
rounded to
+#' 8 digits.  
 #' @rdname column_datetime_diff_functions
--- End diff --

also just as reference, the whitespace/newline will be stripped

```
time of day will be ignored.
Otherwise
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1739/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88565/testReport)**
 for PR 20886 at commit 
[`3b882fa`](https://github.com/apache/spark/commit/3b882fa61e94c7025ac035dcc2b483bec59e1ddf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88564/testReport)**
 for PR 20886 at commit 
[`ff57889`](https://github.com/apache/spark/commit/ff57889c4070e3f47e3f078e28d9d1c40fc8c338).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1738/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176922300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

If you think so, +1. :)

BTW, based on Apache Spark way, I assume that this will not land on 
`branch-2.3` with `spark.sql.parquet.filterPushdown.date=true`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20798
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88563/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20798
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20798
  
**[Test build #88563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88563/testReport)**
 for PR 20798 at commit 
[`b4f7056`](https://github.com/apache/spark/commit/b4f7056187474a2bad16c81e79798214980d7b80).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...

2018-03-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20866#discussion_r176919674
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
 ---
@@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider
 s"$principal at $metastoreUri")
 
   doAsRealUser {
-val hive = Hive.get(conf, classOf[HiveConf])
--- End diff --

Ya. What I asked is the following.
> Could you be more specific about the scope of this PR in the title and 
description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread mstewart141

Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20900
  
@HyukjinKwon the old pr: https://github.com/apache/spark/pull/20798

was a disaster from a git-cleanliness perspective so i've updated here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20900
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20900
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `p...

2018-03-24 Thread mstewart141

GitHub user mstewart141 opened a pull request:

https://github.com/apache/spark/pull/20900

 [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_udf` with keyword 
args

## What changes were proposed in this pull request?

Add documentation about the limitations of `pandas_udf` with keyword 
arguments and related concepts, like `functools.partial` fn objects.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mstewart141/spark udfkw2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20900


commit 048570f7e5f421288b7c297e4d2e3873626a6adc
Author: Michael (Stu) Stewart 
Date:   2018-03-11T20:38:29Z

[SPARK-23645][PYTHON] Allow python udfs to be called with keyword arguments

commit 9ea2595f0cecb0cd05e0e6b99baf538679332e8b
Author: Michael (Stu) Stewart 
Date:   2018-03-18T18:04:21Z

Incomplete / Show issue with partial fn in pandas_udf

commit acd1cbe53dc7d1bf83b1022a7e36652cd9530b58
Author: Michael (Stu) Stewart 
Date:   2018-03-18T18:13:53Z

Add note RE no keyword args in python UDFs

commit bc49c3cc5ae2e23da5cc7b6d7e1a779e9d012c8c
Author: Michael (Stu) Stewart 
Date:   2018-03-24T17:30:15Z

Address comments




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20798
  
**[Test build #88563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88563/testReport)**
 for PR 20798 at commit 
[`b4f7056`](https://github.com/apache/spark/commit/b4f7056187474a2bad16c81e79798214980d7b80).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88562/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20866
  
**[Test build #88562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88562/testReport)**
 for PR 20866 at commit 
[`f51bf63`](https://github.com/apache/spark/commit/f51bf634d89aaa0a2e1077903e6831a9284aea10).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19876: [ML][SPARK-23783][SPARK-11239] Add PMML export to...

2018-03-24 Thread goungoun

Github user goungoun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19876#discussion_r176911201
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -710,15 +711,58 @@ class LinearRegressionModel private[ml] (
   }
 
   /**
-   * Returns a [[org.apache.spark.ml.util.MLWriter]] instance for this ML 
instance.
+   * Returns a [[org.apache.spark.ml.util.GeneralMLWriter]] instance for 
this ML instance.
*
* For [[LinearRegressionModel]], this does NOT currently save the 
training [[summary]].
* An option to save [[summary]] may be added in the future.
*
* This also does not save the [[parent]] currently.
*/
   @Since("1.6.0")
-  override def write: MLWriter = new 
LinearRegressionModel.LinearRegressionModelWriter(this)
+  override def write: GeneralMLWriter = new GeneralMLWriter(this)
+}
+
+/** A writer for LinearRegression that handles the "internal" (or default) 
format */
+private class InternalLinearRegressionModelWriter
+  extends MLWriterFormat with MLFormatRegister {
+
+  override def format(): String = "internal"
+  override def stageName(): String = 
"org.apache.spark.ml.regression.LinearRegressionModel"
+
+  private case class Data(intercept: Double, coefficients: Vector, scale: 
Double)
+
+  override def write(path: String, sparkSession: SparkSession,
+optionMap: mutable.Map[String, String], stage: PipelineStage): Unit = {
+val instance = stage.asInstanceOf[LinearRegressionModel]
+val sc = sparkSession.sparkContext
+// Save metadata and Params
+DefaultParamsWriter.saveMetadata(instance, path, sc)
+// Save model data: intercept, coefficients, scale
+val data = Data(instance.intercept, instance.coefficients, 
instance.scale)
+val dataPath = new Path(path, "data").toString
+
sparkSession.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath)
+  }
+}
+
+/** A writer for LinearRegression that handles the "pmml" format */
+private class PMMLLinearRegressionModelWriter
+extends MLWriterFormat with MLFormatRegister {
--- End diff --

Should be two space indentation
`extends MLWriterFormat with MLFormatRegister {` 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88560/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20891
  
**[Test build #88560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88560/testReport)**
 for PR 20891 at commit 
[`126e6a8`](https://github.com/apache/spark/commit/126e6a8e7d333ecf99c26b374698d7cd0e1a9d19).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88561/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20891
  
**[Test build #88561 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88561/testReport)**
 for PR 20891 at commit 
[`cefd672`](https://github.com/apache/spark/commit/cefd672e79b508e995382ce146cd70a4d130af01).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20899: Bug fix in sendMessage() of pregel implementation in Pag...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20899
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20899: fix bug in sendMessage() of pregel implementation

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20899
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20899: fix bug in sendMessage() of pregel implementation

2018-03-24 Thread mangolzy

GitHub user mangolzy opened a pull request:

https://github.com/apache/spark/pull/20899

fix bug in sendMessage() of pregel implementation

## What changes were proposed in this pull request?
Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
->
Iterator((edge.dstId, edge.srcAttr._1 * edge.attr))

## How was this patch tested?

Since edge.srcAttr._2 is used to compare with tol, it should be the (newPR 
- oldPR), but in the sendMessage, the origin code send it as part of the 
message, instead it should be the newPR which is edge.srcAttr._1.


Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mangolzy/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20899.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20899


commit be567d83c4069a0592fb61464c795d123bd34116
Author: mangolzy 
Date:   2018-03-24T13:32:52Z

fix bug in sendMessage() of pregel implementation

Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
->
Iterator((edge.dstId, edge.srcAttr._1 * edge.attr))

Since edge.srcAttr._2 is used to compare with tol, it should be the (newPR 
- oldPR), but in the sendMessage, the origin code send it as part of the 
message, instead it should be the newPR which is edge.srcAttr._1.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...

2018-03-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20866#discussion_r176906868
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
 ---
@@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider
 s"$principal at $metastoreUri")
 
   doAsRealUser {
-val hive = Hive.get(conf, classOf[HiveConf])
--- End diff --

1. This 
[`Hive.get()`](https://github.com/apache/spark/blob/v2.3.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L239)
 is different from others, It loaded by  `IsolatedClientLoader`.
2. I can not start a `HiveThriftServer2` in a kerberized cluster, so I'm 
not sure `CLIService.java` should be updated, How about update it later?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...

2018-03-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20866#discussion_r176906522
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
 ---
@@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider
 s"$principal at $metastoreUri")
 
   doAsRealUser {
-val hive = Hive.get(conf, classOf[HiveConf])
-val tokenStr = hive.getDelegationToken(currentUser.getUserName(), 
principal)
+metaStoreClient = new 
HiveMetaStoreClient(conf.asInstanceOf[HiveConf])
+val tokenStr = 
metaStoreClient.getDelegationToken(currentUser.getUserName, principal)
--- End diff --

Yes, both HMS 1.x and 2.x


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20889: [MINOR][DOC] Fix ml-guide markdown typos

2018-03-24 Thread Lemonjing

Github user Lemonjing closed the pull request at:

https://github.com/apache/spark/pull/20889


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos

2018-03-24 Thread Lemonjing

Github user Lemonjing commented on the issue:

https://github.com/apache/spark/pull/20889
  
@felixcheung thanks a lot. and i closed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...

2018-03-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20866#discussion_r176906474
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
 ---
@@ -92,8 +94,9 @@ private[security] class HiveDelegationTokenProvider
 s"$principal at $metastoreUri")
 
   doAsRealUser {
-val hive = Hive.get(conf, classOf[HiveConf])
-val tokenStr = hive.getDelegationToken(currentUser.getUserName(), 
principal)
+metastoreClient = 
RetryingMetaStoreClient.getProxy(conf.asInstanceOf[HiveConf], null,
--- End diff --

HiveMetaStoreClient -> RetryingMetaStoreClient. In fact, `Hive.get` also 
uses `RetryingMetaStoreClient`:
```
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1737/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20866
  
**[Test build #88562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88562/testReport)**
 for PR 20866 at commit 
[`f51bf63`](https://github.com/apache/spark/commit/f51bf634d89aaa0a2e1077903e6831a9284aea10).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1736/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20891
  
**[Test build #88561 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88561/testReport)**
 for PR 20891 at commit 
[`cefd672`](https://github.com/apache/spark/commit/cefd672e79b508e995382ce146cd70a4d130af01).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-24 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20701
  
any more comments @holdenk ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20891
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1735/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20891
  
**[Test build #88560 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88560/testReport)**
 for PR 20891 at commit 
[`126e6a8`](https://github.com/apache/spark/commit/126e6a8e7d333ecf99c26b374698d7cd0e1a9d19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20898
  
Yes, it's proxy user:
```
export HADOOP_PROXY_USER=user
spark-sql --master yarn
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r176904365
  
--- Diff: python/pyspark/cloudpickle.py ---
@@ -802,9 +802,8 @@ def save_not_implemented(self, obj):
 self.save_reduce(_gen_not_implemented, ())
 
 if PY3:
-dispatch[io.TextIOWrapper] = save_file
-else:
-dispatch[file] = save_file
+file = io.TextIOWrapper
+dispatch[file] = save_file
--- End diff --

I think this one is actually related with cloudpickle's PR.

I was trying to (exactly) match this file to a specific version of 
cloudpickle (which is currently 0.4.3 - 
https://github.com/cloudpipe/cloudpickle/releases/tag/v0.4.3). So, I thought we 
could wait for more feedback there. 

At least, I was thinking that we should match it to 
https://github.com/cloudpipe/cloudpickle/tree/0.4.x If that one is merged, I 
could backport that change into 0.4.x branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-24 Thread rednaxelafx

Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20779
  
Sorry for the late comment. This PR itself is LGTM. I'd just like to make 
some side comments on why this bug is happening.

Janino uses a peculiar encoding of implementing bridge methods for extended 
accessibility from an inner class to members of its enclosing class. Here we're 
actually hitting a bug in Janino where it missed creating bridge methods on the 
enclosing class (`GeneratedClass...`) for `protected` members that it inherited 
from a base class (`BufferedRowIterator`).
I've seen this bug in Janino before, and I plan to fix it in Janino soon. 
Once it's fixed in Janino, we can safely use `protected` members such as 
`append` and `stopEarly` in nested classes within `GeneratedClass...` again. 
Would anybody be interested in switching these methods back to `protected` once 
it's fixed in Janino and Spark bumps the Janino dependency to that new version?

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread yaooqinn

Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/20898
  
proxy or not, i only find such an issue with proxy 
https://github.com/apache/spark/pull/20784


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20898
  
cc @yaooqinn @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88559/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20898
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20898
  
**[Test build #88559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88559/testReport)**
 for PR 20898 at commit 
[`a382aab`](https://github.com/apache/spark/commit/a382aab4f3a9cabda10ab2aedbbb8d663737348f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20898
  
**[Test build #88559 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88559/testReport)**
 for PR 20898 at commit 
[`a382aab`](https://github.com/apache/spark/commit/a382aab4f3a9cabda10ab2aedbbb8d663737348f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20898
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1734/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.u...

2018-03-24 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20898

[SPARK-23789][SQL] Shouldn't set hive.metastore.uris before invoking 
HiveDelegationTokenProvider

## What changes were proposed in this pull request?

`spark-sql` can't connect to metastore with a security Hadoop cluster after 
[SPARK-21428](https://issues.apache.org/jira/browse/SPARK-21428).

`hive.metastore.uris` was `HiveConf.ConfVars.METASTOREURIS.defaultStrVal` 
here before SPARK-21428. 

This pr revert `hive.metastore.uris` to 
`HiveConf.ConfVars.METASTOREURIS.defaultStrVal`.


## How was this patch tested?

manual tests with a security Hadoop cluster


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23789

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20898


commit a382aab4f3a9cabda10ab2aedbbb8d663737348f
Author: Yuming Wang 
Date:   2018-03-24T07:19:25Z

Shouldn't set hive.metastore.uris before invoking 
HiveDelegationTokenProvider




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20897
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20897
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88558/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20897
  
**[Test build #88558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88558/testReport)**
 for PR 20897 at commit 
[`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...

2018-03-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20866#discussion_r176902474
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
 ---
@@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider
 s"$principal at $metastoreUri")
 
   doAsRealUser {
-val hive = Hive.get(conf, classOf[HiveConf])
--- End diff --

Thanks @dongjoon-hyun, seems `RetryingMetaStoreClient` is a better choice 
and I will try.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901542
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+   [1,2,3,4,5,6]
+  """)
+case class ConcatArrays(children: Seq[Expression]) extends Expression with 
NullSafeEvaluation {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val arrayCheck = checkInputDataTypesAreArrays
+if(arrayCheck.isFailure) arrayCheck
--- End diff --

Style issue:
```scala
if (...) {
  ...
} else {
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
--- End diff --

Do we need to bring in `NullSafeEvaluation`? If only `ConcatArray` uses it, 
we may not need to add this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
+val values = children.map(_.eval(input))
+if (values.contains(null)) null
+else nullSafeEval(values)
+  }
+
+  /**
+   * Called by default [[eval]] implementation. If a class utilizing 
NullSaveEvaluation keep
+   * the default nullability, they can override this method to save 
null-check code.  If we need
+   * full control of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(inputs: Seq[Any]): Any =
+sys.error(s"The class utilizing NullSaveEvaluation must override 
either eval or nullSafeEval")
+
+  /**
+   * Short hand for generating of null save evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts a sequence of variable names and returns Java code 
to compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: Seq[String] => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, values => {
+  s"${ev.value} = ${f(values)};"
+})
+  }
+
+  /**
+   * Called by expressions to generate null safe evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f a function that accepts a sequence of non-null evaluation 
result names of children
+   *  and returns Java code to compute the output.
+   */
+  protected def nullSafeCodeGen(
+   ctx: CodegenContext,
+   ev: ExprCode,
+   f: Seq[String] => String): ExprCode = {
+val gens = children.map(_.genCode(ctx))
+val resultCode = f(gens.map(_.value))
+
+if (nullable) {
+  val nullSafeEval =
+(s"""
+  ${ev.isNull} = false; // resultCode could change nullability.
+  $resultCode
+""" /: children.zip(gens)) {
--- End diff --

Use `foldLeft` for readability.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176902162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+   [1,2,3,4,5,6]
+  """)
+case class ConcatArrays(children: Seq[Expression]) extends Expression with 
NullSafeEvaluation {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val arrayCheck = checkInputDataTypesAreArrays
+if(arrayCheck.isFailure) arrayCheck
+else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
s"function $prettyName")
+  }
+
+  private def checkInputDataTypesAreArrays(): TypeCheckResult =
+  {
+val mismatches = children.zipWithIndex.collect {
+  case (child, idx) if !ArrayType.acceptsType(child.dataType) =>
+s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " +
+  s"however, '${child.sql}' is of ${child.dataType.simpleString} 
type."
+}
+
+if (mismatches.isEmpty) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(mismatches.mkString(" "))
+}
+  }
+
+  override def dataType: ArrayType =
+children
+  .headOption.map(_.dataType.asInstanceOf[ArrayType])
+  .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType])
+
+
+  override protected def nullSafeEval(inputs: Seq[Any]): Any = {
+val elements = 
inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType))
+new GenericArrayData(elements)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, arrays => {
+  val elementType = dataType.elementType
+  if (CodeGenerator.isPrimitiveType(elementType)) {
+genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, 
ev.value)
+  } else {
+genCodeForConcatOfComplexElements(ctx, arrays, ev.value)
+  }
+})
+  }
+
+  private def genCodeForNumberOfElements(
+ctx: CodegenContext,
+elements: Seq[String]
+  ) : (String, String) = {
+val variableName = ctx.freshName("numElements")
+val code = elements
+  .map(el => s"$variableName += $el.numElements();")
+  .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s)
+(code, variableName)
+  }
+
+  private def genCodeForConcatOfPrimitiveElements(
+ctx: CodegenContext,
+elementType: DataType,
+elements: Seq[String],
+arrayDataName: String
+  ): String = {
+val arrayName = ctx.freshName("array")
+val arraySizeName = ctx.freshName("size")
+val counter = ctx.freshName("counter")
+val tempArrayDataName = ctx.freshName("tempArrayData")
+
+val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, 
elements)
+
+val unsafeArraySizeInBytes = s"""
+  |int $arraySizeName = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) +
+  
|${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord(
+  |${elementType.defaultSize} * $numElemName
+  |);
+  """.stripMargin
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+
+val primitiveValueTypeName = 
CodeGenerator.primitiveTypeName(elementType)
+val assignments = elements.map { el =>
+  s"""
+|for(int z = 0; z < $el.numElements(); z++) {
+| if($el.isNullAt(z)) {
--- End diff --

Style: `if ()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901957
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
+val values = children.map(_.eval(input))
+if (values.contains(null)) null
+else nullSafeEval(values)
+  }
+
+  /**
+   * Called by default [[eval]] implementation. If a class utilizing 
NullSaveEvaluation keep
+   * the default nullability, they can override this method to save 
null-check code.  If we need
+   * full control of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(inputs: Seq[Any]): Any =
+sys.error(s"The class utilizing NullSaveEvaluation must override 
either eval or nullSafeEval")
+
+  /**
+   * Short hand for generating of null save evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts a sequence of variable names and returns Java code 
to compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: Seq[String] => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, values => {
+  s"${ev.value} = ${f(values)};"
+})
+  }
+
+  /**
+   * Called by expressions to generate null safe evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f a function that accepts a sequence of non-null evaluation 
result names of children
+   *  and returns Java code to compute the output.
+   */
+  protected def nullSafeCodeGen(
+   ctx: CodegenContext,
+   ev: ExprCode,
+   f: Seq[String] => String): ExprCode = {
+val gens = children.map(_.genCode(ctx))
+val resultCode = f(gens.map(_.value))
+
+if (nullable) {
+  val nullSafeEval =
+(s"""
+  ${ev.isNull} = false; // resultCode could change nullability.
+  $resultCode
+""" /: children.zip(gens)) {
+  case (acc, (child, gen)) =>
+gen.code + ctx.nullSafeExec(child.nullable, gen.isNull)(acc)
--- End diff --

For example, for a binary expression, doesn't this generate code like:
```scala
rightGen.code + ctx.nullSafeExec(right.nullable, rightGen.isNull) {
  leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) {
${ev.isNull} = false; // resultCode could change nullability.
$resultCode
  }
}
```

Although for deterministic expressions, the evaluation order doesn't 
matter. But for non-deterministic, I'm little concerned that it may cause 
unexpected change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901989
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
--- End diff --

Defines that the element types of the arrays must be the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901348
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
--- End diff --

Spark usually use the style like:

```scala
override def eval(input: InternalRow): Any = {
  val values = children.map(_.eval(input))
  if (values.contains(null)) {
null
  } else {
nullSafeEval(values)
  }
}
```

You could follow the style of other codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901429
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
--- End diff --

There are other places where the braces `{}` style doesn't follow Spark 
codes. We should keep the same code style.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176902327
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+   [1,2,3,4,5,6]
+  """)
+case class ConcatArrays(children: Seq[Expression]) extends Expression with 
NullSafeEvaluation {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val arrayCheck = checkInputDataTypesAreArrays
+if(arrayCheck.isFailure) arrayCheck
+else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
s"function $prettyName")
+  }
+
+  private def checkInputDataTypesAreArrays(): TypeCheckResult =
+  {
+val mismatches = children.zipWithIndex.collect {
+  case (child, idx) if !ArrayType.acceptsType(child.dataType) =>
+s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " +
+  s"however, '${child.sql}' is of ${child.dataType.simpleString} 
type."
+}
+
+if (mismatches.isEmpty) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(mismatches.mkString(" "))
+}
+  }
+
+  override def dataType: ArrayType =
+children
+  .headOption.map(_.dataType.asInstanceOf[ArrayType])
+  .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType])
+
+
+  override protected def nullSafeEval(inputs: Seq[Any]): Any = {
+val elements = 
inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType))
+new GenericArrayData(elements)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, arrays => {
+  val elementType = dataType.elementType
+  if (CodeGenerator.isPrimitiveType(elementType)) {
+genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, 
ev.value)
+  } else {
+genCodeForConcatOfComplexElements(ctx, arrays, ev.value)
+  }
+})
+  }
+
+  private def genCodeForNumberOfElements(
+ctx: CodegenContext,
+elements: Seq[String]
+  ) : (String, String) = {
+val variableName = ctx.freshName("numElements")
+val code = elements
+  .map(el => s"$variableName += $el.numElements();")
+  .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s)
+(code, variableName)
+  }
+
+  private def genCodeForConcatOfPrimitiveElements(
+ctx: CodegenContext,
+elementType: DataType,
+elements: Seq[String],
+arrayDataName: String
+  ): String = {
+val arrayName = ctx.freshName("array")
+val arraySizeName = ctx.freshName("size")
+val counter = ctx.freshName("counter")
+val tempArrayDataName = ctx.freshName("tempArrayData")
+
+val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, 
elements)
+
+val unsafeArraySizeInBytes = s"""
+  |int $arraySizeName = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) +
+  
|${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord(
+  |${elementType.defaultSize} * $numElemName
+  |);
+  """.stripMargin
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+
+val primitiveValueTypeName = 
CodeGenerator.primitiveTypeName(elementType)
+val assignments = elements.map { el =>
+  s"""
+|for(int z = 0; z < $el.numElements(); z++) {
+| if($el.isNullAt(z)) {
+|   $tempArrayDataName.setNullAt($counter);
+| } else {
+|   $tempArrayDataName.set$primitiveValueTypeName(
+| $counter,
+| $el.get$primitiveValueTypeName(z)
+|   );
+| }
+| $counter++;
+|}
+""".stripMargin
+}.mkString("\n")
+
+s"""
+  |$numElemCode
+  |$unsafeArraySizeInBytes
+  |byte[] $arrayName = new byte[$arraySizeName];
+  |UnsafeArrayData $tempArrayDataName = new UnsafeArrayData();
+  |Platform.putLong($arrayName, $baseOffset, $numElemName);
+  |$tempArrayDataName.pointTo($arrayName, $baseOffset, $arraySizeName);
+  |int $counter = 0;
+  |$assignments
+  |$arrayDataName = $tempArrayDataName;
+""".stripMargin
+

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901467
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
+val values = children.map(_.eval(input))
--- End diff --

We probably don't need to evaluate all children. Once any child expression 
is null, we can just return null.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176901317
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
--- End diff --

typo: null safe.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176902161
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+   [1,2,3,4,5,6]
+  """)
+case class ConcatArrays(children: Seq[Expression]) extends Expression with 
NullSafeEvaluation {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val arrayCheck = checkInputDataTypesAreArrays
+if(arrayCheck.isFailure) arrayCheck
+else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
s"function $prettyName")
+  }
+
+  private def checkInputDataTypesAreArrays(): TypeCheckResult =
+  {
+val mismatches = children.zipWithIndex.collect {
+  case (child, idx) if !ArrayType.acceptsType(child.dataType) =>
+s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " +
+  s"however, '${child.sql}' is of ${child.dataType.simpleString} 
type."
+}
+
+if (mismatches.isEmpty) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(mismatches.mkString(" "))
+}
+  }
+
+  override def dataType: ArrayType =
+children
+  .headOption.map(_.dataType.asInstanceOf[ArrayType])
+  .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType])
+
+
+  override protected def nullSafeEval(inputs: Seq[Any]): Any = {
+val elements = 
inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType))
+new GenericArrayData(elements)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, arrays => {
+  val elementType = dataType.elementType
+  if (CodeGenerator.isPrimitiveType(elementType)) {
+genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, 
ev.value)
+  } else {
+genCodeForConcatOfComplexElements(ctx, arrays, ev.value)
+  }
+})
+  }
+
+  private def genCodeForNumberOfElements(
+ctx: CodegenContext,
+elements: Seq[String]
+  ) : (String, String) = {
+val variableName = ctx.freshName("numElements")
+val code = elements
+  .map(el => s"$variableName += $el.numElements();")
+  .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s)
+(code, variableName)
+  }
+
+  private def genCodeForConcatOfPrimitiveElements(
+ctx: CodegenContext,
+elementType: DataType,
+elements: Seq[String],
+arrayDataName: String
+  ): String = {
+val arrayName = ctx.freshName("array")
+val arraySizeName = ctx.freshName("size")
+val counter = ctx.freshName("counter")
+val tempArrayDataName = ctx.freshName("tempArrayData")
+
+val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, 
elements)
+
+val unsafeArraySizeInBytes = s"""
+  |int $arraySizeName = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) +
+  
|${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord(
+  |${elementType.defaultSize} * $numElemName
+  |);
+  """.stripMargin
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+
+val primitiveValueTypeName = 
CodeGenerator.primitiveTypeName(elementType)
+val assignments = elements.map { el =>
+  s"""
+|for(int z = 0; z < $el.numElements(); z++) {
--- End diff --

Stype: `for (`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20897
  
**[Test build #88558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88558/testReport)**
 for PR 20897 at commit 
[`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20897
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20896#discussion_r176902181
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 ---
@@ -550,22 +550,22 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter with Logging wi
 .start()
 }
 
-val input = MemoryStream[Int]
-val q1 = startQuery(input.toDS, "stream_serializable_test_1")
-val q2 = startQuery(input.toDS.map { i =>
+val input = MemoryStream[Int] :: MemoryStream[Int] :: 
MemoryStream[Int] :: Nil
--- End diff --

why build a list and not use 3 separate variables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20897
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20897
  
**[Test build #88557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88557/testReport)**
 for PR 20897 at commit 
[`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20897
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88557/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20003: [SPARK-22817][R] Use fixed testthat version for SparkR t...

2018-03-24 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20003
  
yea, I started doing some work but was staled, let me check..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20897
  
**[Test build #88557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88557/testReport)**
 for PR 20897 at commit 
[`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos

2018-03-24 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20889
  
@Lemonjing you need to close the PR from github.com - we don't have access 
to close it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 101 matches

Mail list logo