date:20180320

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88442/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88454 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88454/testReport)**
 for PR 20851 at commit 
[`7946bea`](https://github.com/apache/spark/commit/7946bea7c0eed08808696f34732c434e2c8ab4ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #88442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88442/testReport)**
 for PR 20345 at commit 
[`895b6a1`](https://github.com/apache/spark/commit/895b6a1595fef31f2028180a36869c6d344e5ac7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20745
  
https://user-images.githubusercontent.com/18561820/37696015-b1250bae-2c90-11e8-8ad1-515661487b94.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20745
  
https://user-images.githubusercontent.com/18561820/37695954-5aacaa2a-2c90-11e8-9f73-f57d0e1b27f6.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #88446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88446/testReport)**
 for PR 19381 at commit 
[`20b245a`](https://github.com/apache/spark/commit/20b245ad49124d8d8b42c6835859759cd6af7964).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88446/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20745
  
**[Test build #88453 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88453/testReport)**
 for PR 20745 at commit 
[`214cddc`](https://github.com/apache/spark/commit/214cddc242fbfa9a217d544ec695b062c148cd85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18982
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18982
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1667/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18982
  
**[Test build #88452 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88452/testReport)**
 for PR 18982 at commit 
[`9162944`](https://github.com/apache/spark/commit/9162944cf39a61db8060ee83829e7537dc979663).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20727
  
**[Test build #88451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88451/testReport)**
 for PR 20727 at commit 
[`f1c951f`](https://github.com/apache/spark/commit/f1c951f0c84e334e185a0bcc810c08d48ca726e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1666/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1665/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88450/testReport)**
 for PR 20579 at commit 
[`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175987083
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

Yup, I think this one is better than the current one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20870: [SPARK-23760][SQL] CodegenContext.withSubExprElim...

2018-03-20 Thread rednaxelafx

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/20870#discussion_r175986986
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -942,7 +940,7 @@ class CodegenContext {
   def subexpressionEliminationForWholeStageCodegen(expressions: 
Seq[Expression]): SubExprCodes = {
 // Create a clear EquivalentExpressions and SubExprEliminationState 
mapping
 val equivalentExpressions: EquivalentExpressions = new 
EquivalentExpressions
-val subExprEliminationExprs = mutable.HashMap.empty[Expression, 
SubExprEliminationState]
+val localSubExprEliminationExprs = mutable.HashMap.empty[Expression, 
SubExprEliminationState]
--- End diff --

This renaming isn't necessary for the fix per-se, but I'd like to piggyback 
it on this change so that it's clearer that we're not interfering with the 
current CSE state of this CodegenContext here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20579
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175986514
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

I think `"2017-08-19".d` is at least better than `1.d`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20870: [SPARK-23760][SQL] CodegenContext.withSubExprElimination...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20870
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1664/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20870: [SPARK-23760][SQL] CodegenContext.withSubExprElimination...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20870
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88440/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88440/testReport)**
 for PR 20579 at commit 
[`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20870
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1663/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20870
  
**[Test build #88449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88449/testReport)**
 for PR 20870 at commit 
[`df45286`](https://github.com/apache/spark/commit/df452861d16eb36c9982f6c438ea1dc2f8d9d1fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20870
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20870
  
**[Test build #88448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88448/testReport)**
 for PR 20870 at commit 
[`8635969`](https://github.com/apache/spark/commit/863596956ced94c49289c9eaeebf544b2de68f15).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEli...

2018-03-20 Thread rednaxelafx

GitHub user rednaxelafx opened a pull request:

https://github.com/apache/spark/pull/20870

[SPARK-23760][SQL]: CodegenContext.withSubExprEliminationExprs should 
save/restore CSE state correctly

## What changes were proposed in this pull request?

Fixed `CodegenContext.withSubExprEliminationExprs()` so that it 
saves/restores CSE state correctly.

## How was this patch tested?

Added new unit test to verify that the old CSE state is indeed saved and 
restored around the `withSubExprEliminationExprs()` call. Manually verified 
that this test fails without this patch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rednaxelafx/apache-spark codegen-subexpr-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20870






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: Documenting months_between direction

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r175985253
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1115,13 +1115,17 @@ case class AddMonths(startDate: Expression, 
numMonths: Expression)
 
   override def prettyName: String = "add_months"
 }
-
 /**
- * Returns number of months between dates date1 and date2.
- */
+   * Returns number of months between dates `timestamp1` and `timestamp2`.
+   * If `timestamp` is later than `timestamp2`, then the result is 
positive.
+   * If `timestamp1` and `timestamp2` are on the same day of month, or both
+   * are the last day of month, returns an integer (time of day will be 
ignored).
+   * Otherwise, the difference is calculated based on 31 days per month, 
and
+   * rounded to 8 digits.
+*/
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(timestamp1, timestamp2) - Returns number of months 
between `timestamp1` and `timestamp2`.",
+  usage = "_FUNC_(timestamp1, timestamp2) - Returns number of months 
between `timestamp1` and `timestamp2`. Positive if `timestamp1` is later than 
`timestamp2`",
--- End diff --

You could do either

```scala
@ExpressionDescription(
  usage = """
_FUNC_(timestamp1, timestamp2) - blablabla
  blabla
  blabla
  """,
...
```

Let's add the description here too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: Documenting months_between direction

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20787
  
Seems fine otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: Documenting months_between direction

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r175983804
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -881,10 +881,10 @@ object DateTimeUtils {
* Returns number of months between time1 and time2. time1 and time2 are 
expressed in
* microseconds since 1.1.1970.
*
-   * If time1 and time2 having the same day of month, or both are the last 
day of month,
-   * it returns an integer (time under a day will be ignored).
+   * If time1 and time2 are on the same day of month, or both are the last 
day of month,
+   * returns an integer (time under a day will be ignored).
--- End diff --

It seems a bit awkward because it actually returns a double. Shall we fix 
this like .. `returns an integer (time under a day will be ignored)` -> `time 
under a day will be ignored.`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: Documenting months_between direction

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r175983334
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1115,13 +1115,17 @@ case class AddMonths(startDate: Expression, 
numMonths: Expression)
 
   override def prettyName: String = "add_months"
 }
-
--- End diff --

Let's revert this change back. Seems unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: Documenting months_between direction

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r175982564
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1115,13 +1115,17 @@ case class AddMonths(startDate: Expression, 
numMonths: Expression)
 
   override def prettyName: String = "add_months"
 }
-
 /**
- * Returns number of months between dates date1 and date2.
- */
+   * Returns number of months between dates `timestamp1` and `timestamp2`.
--- End diff --

Hm, this should have been caught by Scala linter because we follow Java 
style comment. See "Code documentation style" in 
http://spark.apache.org/contributing.html 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-20 Thread liufengdb

Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/18666
  
I asked the following question in 
https://github.com/apache/spark/pull/20864: is it necessary to create these 
temp directories when the hive thrift server starts? It sounds some legacy from 
Hive and we can skip creating them in the first place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20864: [SPARK-23745][SQL]Remove the directories of the “hive....

2018-03-20 Thread liufengdb

Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/20864
  
@samartinucci @zuotingbing a high-level question: is it necessary to create 
these temp directories when the hive thrift server starts? It sounds some 
legacy from Hive and we can skip creating them in the first place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #88447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88447/testReport)**
 for PR 20795 at commit 
[`17f7e74`](https://github.com/apache/spark/commit/17f7e741632548f263a933b29a66f67b59af6725).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1662/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #88446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88446/testReport)**
 for PR 19381 at commit 
[`20b245a`](https://github.com/apache/spark/commit/20b245ad49124d8d8b42c6835859759cd6af7964).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19381
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r175982059
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -30,9 +30,19 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
 /**
  * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], 
which are all of the lines
  * in that file.
+ *
+ * @param file A part (i.e. "block") of a single file that should be read 
line by line.
+ * @param lineSeparator A line separator that should be used for each 
line. If the value is `None`,
+ *  it covers `\r`, `\r\n` and `\n`.
--- End diff --

Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88439/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20827
  
**[Test build #88439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88439/testReport)**
 for PR 20827 at commit 
[`043d6c1`](https://github.com/apache/spark/commit/043d6c1a888fbe3593dcb98a84c2c8aec4b35a28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2018-03-20 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/20028
  
Please advice if this is a good feature to add. If not I'll close it. 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2018-03-20 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/19599
  
Please advice if this is a good feature to add. If not I'll close it. 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...

2018-03-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20831#discussion_r175980451
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
@@ -68,6 +69,15 @@ case class InMemoryRelation(
 
   override protected def innerChildren: Seq[SparkPlan] = Seq(child)
 
+  override def doCanonicalize(): logical.LogicalPlan =
+copy(output = output.map(QueryPlan.normalizeExprId(_, child.output)),
+  storageLevel = StorageLevel.NONE,
--- End diff --

It is followed. I just ignored `useCompression`, `batchSize` as they are 
just primitives and don't need to be canonicalized here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...

2018-03-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20831#discussion_r175980243
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -169,7 +174,10 @@ case class InMemoryTableScanExec(
   override def outputOrdering: Seq[SortOrder] =
 
relation.child.outputOrdering.map(updateAttribute(_).asInstanceOf[SortOrder])
 
-  private def statsFor(a: Attribute) = 
relation.partitionStatistics.forAttribute(a)
+  // When we make canonicalized plan, we can't find a normalized attribute 
in this map.
+  // We return a `ColumnStatisticsSchema` for normalized attribute in this 
case.
--- End diff --

I've tried that at beginning. However, `partitionFilters` uses 
`buildFilter`. Making `partitionFilters` a lazy doesn't work because when do 
`copy`, the initialization of `InMemoryTableScanExec` will try to materialize 
`partitionFilters` for coping it value.

Making `partitionFilters`, `buildFilter` as methods is not enough too, we 
also need to remove `@transient` from `relation` and 
`InMemoryRelation.partitionStatistics`. So I think it isn't worth and leave it 
as is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2018-03-20 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17583
  
Please advice if this is a good feature to add. If not I'll close it. 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-03-20 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17280
  
Please advice if this is a good feature to add. If not I'll close it. 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2018-03-20 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/16158
  
Please advice if this is a good feature to add. If not I'll close it. 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1661/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20786
  
**[Test build #88445 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88445/testReport)**
 for PR 20786 at commit 
[`9707fe5`](https://github.com/apache/spark/commit/9707fe5db5b23f071282dc897adea337a2796c8d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20864: [SPARK-23745][SQL]Remove the directories of the “hive....

2018-03-20 Thread zuotingbing

Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20864
  
i take a look at [https://github.com/apache/spark/pull/18666]. i find it 
can not clean all the ***_resources directories. because when we start 
HiveThriftServer2, we created two resource directories:
`8/03/21 11:23:33 INFO **SessionState: Created local directory: 
/data1/zdh/spark/hive/tmp/616f66c9-fa4e-4a0c-a63a-10ff97e5019c_resources**
18/03/21 11:23:33 INFO SessionState: Created HDFS directory: 
/spark-tmp/scratchdir/root/616f66c9-fa4e-4a0c-a63a-10ff97e5019c
18/03/21 11:23:33 INFO SessionState: Created local directory: 
/data1/zdh/spark/hive/tmp/616f66c9-fa4e-4a0c-a63a-10ff97e5019c
18/03/21 11:23:33 INFO SessionState: Created HDFS directory: 
/spark-tmp/scratchdir/root/616f66c9-fa4e-4a0c-a63a-10ff97e5019c/_tmp_space.db
18/03/21 11:23:33 INFO HiveClientImpl: Warehouse location for Hive client 
(version 1.2.2) is file:/media/A/gitspace/spark/dist/sbin/spark-warehouse
18/03/21 11:23:33 INFO HiveMetaStore: 0: get_database: default
18/03/21 11:23:33 INFO audit: ugi=root  ip=unknown-ip-addr  
cmd=get_database: default   
18/03/21 11:23:33 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
18/03/21 11:23:33 INFO HiveUtils: Initializing execution hive, version 1.2.1
18/03/21 11:23:34 INFO HiveMetaStore: 0: Opening raw store with 
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/03/21 11:23:34 INFO ObjectStore: ObjectStore, initialize called
18/03/21 11:23:34 INFO Persistence: Property 
hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/03/21 11:23:34 INFO Persistence: Property datanucleus.cache.level2 
unknown - will be ignored
18/03/21 11:23:36 INFO ObjectStore: Setting MetaStore object pin classes 
with 
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/03/21 11:23:36 INFO Datastore: The class 
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
"embedded-only" so does not have its own datastore table.
18/03/21 11:23:36 INFO Datastore: The class 
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so 
does not have its own datastore table.
18/03/21 11:23:37 INFO Datastore: The class 
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
"embedded-only" so does not have its own datastore table.
18/03/21 11:23:37 INFO Datastore: The class 
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so 
does not have its own datastore table.
18/03/21 11:23:37 INFO MetaStoreDirectSql: Using direct SQL, underlying DB 
is DERBY
18/03/21 11:23:37 INFO ObjectStore: Initialized ObjectStore
18/03/21 11:23:37 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so recording the 
schema version 1.2.0
18/03/21 11:23:38 WARN ObjectStore: Failed to get database default, 
returning NoSuchObjectException
18/03/21 11:23:38 INFO HiveMetaStore: Added admin role in metastore
18/03/21 11:23:38 INFO HiveMetaStore: Added public role in metastore
18/03/21 11:23:38 INFO HiveMetaStore: No user is added in admin role, since 
config is empty
18/03/21 11:23:38 INFO HiveMetaStore: 0: get_all_databases
18/03/21 11:23:38 INFO audit: ugi=root  ip=unknown-ip-addr  
cmd=get_all_databases   
18/03/21 11:23:38 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/03/21 11:23:38 INFO audit: ugi=root  ip=unknown-ip-addr  
cmd=get_functions: db=default pat=* 
18/03/21 11:23:38 INFO Datastore: The class 
"org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as 
"embedded-only" so does not have its own datastore table.
18/03/21 11:23:38 INFO **SessionState: Created local directory: 
/data1/zdh/spark/hive/tmp/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e_resources**
18/03/21 11:23:38 INFO SessionState: Created HDFS directory: 
/spark-tmp/scratchdir/root/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e
18/03/21 11:23:38 INFO SessionState: Created local directory: 
/data1/zdh/spark/hive/tmp/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e
18/03/21 11:23:38 INFO SessionState: Created HDFS directory: 
/spark-tmp/scratchdir/root/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e/_tmp_space.db
18/03/21 11:23:38 INFO HiveClientImpl: Warehouse location for Hive client 
(version 1.2.2) is file:/media/A/gitspace/spark/dist/sbin/spark-warehouse`
 but when stop just remove only one resource directory which is current:
`public void close() throws IOException {
registry.clear();
if (txnMgr != null) txnMgr.closeTxnManager();
JavaUtils.closeClassLoadersTo(conf.getClassLoader(), parentLoader);
**File resourceDir =
new 
File(getConf().getVar(HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR))**;
LOG.debug("Removing resource dir "

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20745
  
LGTM, can you also attach a web UI SQL tab screenshot? thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20745#discussion_r175978136
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ---
@@ -405,4 +406,53 @@ class FileStreamSinkSuite extends StreamTest {
   }
 }
   }
+
+  test("SPARK-23288 writing and checking output metrics") {
+Seq("parquet", "orc", "text", "json").foreach { format =>
+  val inputData = MemoryStream[String]
+  val df = inputData.toDF()
+
+  val outputDir = Utils.createTempDir(namePrefix = 
"stream.output").getCanonicalPath
--- End diff --

we should use `withTempDir` to clean up the temp directory at the end


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88438/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #88438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88438/testReport)**
 for PR 20433 at commit 
[`5ee6f89`](https://github.com/apache/spark/commit/5ee6f897bc71eac24e086f39549ef3a396059b4d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20727
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1660/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20786
  
**[Test build #88444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88444/testReport)**
 for PR 20786 at commit 
[`2ee7e72`](https://github.com/apache/spark/commit/2ee7e7227bd18ccffbf415e83588a3dde2c8fd3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r175977344
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -30,9 +30,19 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
 /**
  * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], 
which are all of the lines
  * in that file.
+ *
+ * @param file A part (i.e. "block") of a single file that should be read 
line by line.
+ * @param lineSeparator A line separator that should be used for each 
line. If the value is `None`,
+ *  it covers `\r`, `\r\n` and `\n`.
--- End diff --

We should mention that this default rule is not defined by us, but by 
hadoop.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1659/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20786
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20786
  
**[Test build #88443 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88443/testReport)**
 for PR 20786 at commit 
[`3fac42e`](https://github.com/apache/spark/commit/3fac42e4d7713d156b691ffcacaa0519e3e85b77).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...

2018-03-20 Thread LantaoJin

Github user LantaoJin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20803#discussion_r175975380
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -166,20 +168,28 @@ private[sql] object Dataset {
 class Dataset[T] private[sql](
 @transient val sparkSession: SparkSession,
 @DeveloperApi @InterfaceStability.Unstable @transient val 
queryExecution: QueryExecution,
-encoder: Encoder[T])
+encoder: Encoder[T],
+val sqlText: String = "")
--- End diff --

Your speculation is almost right. First call val df = spark.sql(), then 
separates the sql text with pattern matching to there type: count, limit and 
other. if count, then invoke the df.showString(2,20). if limit, just invoke 
df.limit(1).foreach, the last type other will do noting. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread yucai

Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175975025
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 
4).map(i => Row.apply(i.d)))
+
+  checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d)
--- End diff --

Got it, thanks very much for explanation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #4142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4142/testReport)**
 for PR 19381 at commit 
[`20b245a`](https://github.com/apache/spark/commit/20b245ad49124d8d8b42c6835859759cd6af7964).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #88442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88442/testReport)**
 for PR 20345 at commit 
[`895b6a1`](https://github.com/apache/spark/commit/895b6a1595fef31f2028180a36869c6d344e5ac7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1658/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20868
  
**[Test build #88441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88441/testReport)**
 for PR 20868 at commit 
[`0d189ab`](https://github.com/apache/spark/commit/0d189ab49b2dcb748b51f875f1a04e6b2fb9f69b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1657/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20868
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20868
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20868
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20868
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88437/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20868
  
**[Test build #88437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88437/testReport)**
 for PR 20868 at commit 
[`0d189ab`](https://github.com/apache/spark/commit/0d189ab49b2dcb748b51f875f1a04e6b2fb9f69b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CatalogMetadata `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-20 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20767
  
@tdas @zsxwing @koeninger @tedyu do you think it makes sense to make 
similar step in the DStream area like this and then later follow with the 
mentioned Apache Common Pool?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20869
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20869
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20869: Improve implicitNotFound message for Encoder

2018-03-20 Thread ceedubs

GitHub user ceedubs opened a pull request:

https://github.com/apache/spark/pull/20869

Improve implicitNotFound message for Encoder

The `implicitNotFound` message for `Encoder` doesn't mention the name of
the type for which it can't find an encoder. Furthermore, it covers up
the fact that `Encoder` is the name of the relevant type class.
Hopefully this new message provides a little more specific type detail
while still giving the general message about which types are supported.

## What changes were proposed in this pull request?

Augment the existing message to mention that it's looking for an `Encoder` 
and what the type of the encoder is.

For example instead of:

```
Unable to find encoder for type stored in a Dataset.  Primitive types (Int, 
String, etc) and Product types (case classes) are supported by importing 
spark.implicits._  Support for serializing other types will be added in future 
releases.
```

return this message:

```
Unable to find encoder for type Exception. An implicit Encoder[Exception] 
is needed to store Exception instances in a Dataset. Primitive types (Int, 
String, etc) and Product types (ca
se classes) are supported by importing spark.implicits._  Support for 
serializing other types will be added in future releases.
```

## How was this patch tested?

It was tested manually in the Scala REPL, since triggering this in a test 
would cause a compilation error.

```
scala> implicitly[Encoder[Exception]]   

  
:51: error: Unable to find encoder for type Exception. An implicit 
Encoder[Exception] is needed to store Exception instances in a Dataset. 
Primitive types (Int, String, etc) and Product types (ca
se classes) are supported by importing spark.implicits._  Support for 
serializing other types will be added in future releases.  
   implicitly[Encoder[Exception]]   
   
 ^
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ceedubs/spark encoder-implicit-msg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20869.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20869


commit 588dffc51df53bcbb885305e8ecd5bf39aa2e465
Author: Cody Allen 
Date:   2018-03-21T01:05:02Z

Improve implicitNotFound message for Encoder

The `implicitNotFound` message for `Encoder` doesn't mention the name of
the type for which it can't find an encoder. Furthermore, it covers up
the fact that `Encoder` is the name of the relevant type class.
Hopefully this new message provides a little more specific type detail
while still giving the general message about which types are supported.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88436/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88436 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88436/testReport)**
 for PR 20579 at commit 
[`ecf0865`](https://github.com/apache/spark/commit/ecf08654d4c7b50eb498481011d3c6f856419207).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-03-20 Thread MrBago

Github user MrBago commented on a diff in the pull request:

https://github.com/apache/spark/pull/20695#discussion_r175971741
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -132,6 +134,172 @@ def corr(dataset, column, method="pearson"):
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class Summarizer(object):
+"""
+.. note:: Experimental
+
+Tools for vectorized statistics on MLlib Vectors.
+The methods in this package provide various statistics for Vectors 
contained inside DataFrames.
+This class lets users pick the statistics they would like to extract 
for a given column.
+
+>>> from pyspark.ml.stat import Summarizer
+>>> from pyspark.sql import Row
+>>> from pyspark.ml.linalg import Vectors
+>>> summarizer = Summarizer.metrics("mean", "count")
+>>> df = sc.parallelize([Row(weight=1.0, features=Vectors.dense(1.0, 
1.0, 1.0)),
+...  Row(weight=0.0, features=Vectors.dense(1.0, 
2.0, 3.0))]).toDF()
+>>> df.select(summarizer.summary(df.features, 
df.weight)).show(truncate=False)
++---+
+|aggregate_metrics(features, weight)|
++---+
+|[[1.0,1.0,1.0], 1] |
++---+
+
+>>> df.select(summarizer.summary(df.features)).show(truncate=False)
+++
+|aggregate_metrics(features, 1.0)|
+++
+|[[1.0,1.5,2.0], 2]  |
+++
+
+>>> df.select(Summarizer.mean(df.features, 
df.weight)).show(truncate=False)
++--+
+|mean(features)|
++--+
+|[1.0,1.0,1.0] |
++--+
+
+>>> df.select(Summarizer.mean(df.features)).show(truncate=False)
++--+
+|mean(features)|
++--+
+|[1.0,1.5,2.0] |
++--+
+
+
+.. versionadded:: 2.4.0
+
+"""
+def __init__(self, js):
+self._js = js
+
+@staticmethod
+@since("2.4.0")
+def mean(col, weightCol=None):
+"""
+return a column of mean summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "mean")
+
+@staticmethod
+@since("2.4.0")
+def variance(col, weightCol=None):
+"""
+return a column of variance summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "variance")
+
+@staticmethod
+@since("2.4.0")
+def count(col, weightCol=None):
+"""
+return a column of count summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "count")
+
+@staticmethod
+@since("2.4.0")
+def numNonZeros(col, weightCol=None):
+"""
+return a column of numNonZero summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "numNonZeros")
+
+@staticmethod
+@since("2.4.0")
+def max(col, weightCol=None):
+"""
+return a column of max summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "max")
+
+@staticmethod
+@since("2.4.0")
+def min(col, weightCol=None):
+"""
+return a column of min summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "min")
+
+@staticmethod
+@since("2.4.0")
+def normL1(col, weightCol=None):
+"""
+return a column of normL1 summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "normL1")
+
+@staticmethod
+@since("2.4.0")
+def normL2(col, weightCol=None):
+"""
+return a column of normL2 summary
+"""
+return Summarizer._get_single_metric(col, weightCol, "normL2")
+
+@staticmethod
+def _check_param(featureCol, weightCol):
+if weightCol is None:
+weightCol = lit(1.0)
+if not isinstance(featureCol, Column) or not isinstance(weightCol, 
Column):
+raise TypeError("featureCol and weightCol should be a Column")
+return featureCol, weightCol
+
+@staticmethod
+def _get_single_metric(col, weightCol, metric):
+col, weightCol = Summarizer._check_param(col, weightCol)
+return 
Column(JavaWrapper._new_java_obj("org.apache.spark.ml.stat.Summarizer." + 
metric,
+col._jc, weightCol._jc))
+
+

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20345#discussion_r175971417
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -84,19 +84,49 @@ object ReorderJoin extends Rule[LogicalPlan] with 
PredicateHelper {
 }
   }
 
+  // Extract a list of logical plans to be joined for join-order 
comparisons.
+  // Since `ExtractFiltersAndInnerJoins` handles left-deep trees only, 
this function have
+  // the same strategy to extract the plan list.
+  private def extractLeftDeepInnerJoins(plan: LogicalPlan): 
Seq[LogicalPlan] = plan match {
+case j @ Join(left, right, _: InnerLike, _) => right +: 
extractLeftDeepInnerJoins(left)
+case p @ Project(_, j @ Join(_, _, _: InnerLike, _)) => 
extractLeftDeepInnerJoins(j)
+case _ => Seq(plan)
+  }
+
+  private def checkSameJoinOrder(plan1: LogicalPlan, plan2: LogicalPlan): 
Boolean = {
+extractLeftDeepInnerJoins(plan1) == extractLeftDeepInnerJoins(plan2)
+  }
+
+  private def mayCreateOrderedJoin(
+  originalPlan: LogicalPlan,
+  input: Seq[(LogicalPlan, InnerLike)],
+  conditions: Seq[Expression]): LogicalPlan = {
+val orderedJoins = createOrderedJoin(input, conditions)
+if (!checkSameJoinOrder(orderedJoins, originalPlan)) {
--- End diff --

If we don't have this check, `operatorOptimizationRuleSet` reaches 
`fixedPoint` because `ReorderJoin` is re-applied in the same join trees every 
time the optimization rule batch invoked. This case does not happen in the 
master because reordered joins have `Project` in internal nodes (`Project` 
added by following optimization rules, e.g., `ColumnPruning`) and this plan 
structure guards this case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20345#discussion_r175971428
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
 ---
@@ -145,4 +159,15 @@ class JoinOptimizationSuite extends PlanTest {
 }
 assert(broadcastChildren.size == 1)
   }
+
+  test("SPARK-23172 skip projections when flattening joins") {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20345#discussion_r175971439
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -141,14 +141,16 @@ object ExtractEquiJoinKeys extends Logging with 
PredicateHelper {
 }
 
 /**
- * A pattern that collects the filter and inner joins.
+ * A pattern that collects the filter and inner joins (and skip 
projections in plan sub-trees).
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1656/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88440/testReport)**
 for PR 20579 at commit 
[`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 465 matches

Mail list logo