[GitHub] spark issue #18136: [SPARK-20910][SQL] Add build-in SQL function - UUID

2018-02-01 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18136
  
I just came across this expression and I have a few concerns:

1. A row will not get the same UUID assigned when a task fails. This might 
cause some really weird problems when the UUID column is used later-on.
2. UUID.randomString() uses a lot of synchronization in the background. 
This might make it pretty slow.

Shall I file a ticket?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-02-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20450
  
I found that we don't enable `getMap` API in `MutableColumnarRow` in this 
change, should we do it? If so, I can make a small follow-up PR for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-02-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20450
  
@viirya Thanks! but I'm working on it. I'll do it soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-02-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20450
  
@ueshin Ok. No problem. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20470: [SPARK-23296][YARN] Include stacktrace in YARN-ap...

2018-02-01 Thread gerashegalov
GitHub user gerashegalov opened a pull request:

https://github.com/apache/spark/pull/20470

[SPARK-23296][YARN] Include stacktrace in YARN-app diagnostic

## What changes were proposed in this pull request?

Include stacktrace in the diagnostics message upon abnormal unregister from 
RM

## How was this patch tested?
Tested with a failing job, and confirmed a stacktrace in the client output 
and YARN webUI.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gerashegalov/spark gera/stacktrace-diagnostics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20470.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20470


commit b96216730adb0f07e0e1b56c584af944f83e5c48
Author: Gera Shegalov 
Date:   2018-02-01T07:50:15Z

Include stacktrace in YARN-app diagnostic




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20470: [SPARK-23296][YARN] Include stacktrace in YARN-app diagn...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20470
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20400
  
**[Test build #86919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86919/testReport)**
 for PR 20400 at commit 
[`25fee39`](https://github.com/apache/spark/commit/25fee3901cfba3599330da394e437c91a9783368).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20470: [SPARK-23296][YARN] Include stacktrace in YARN-app diagn...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20470
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86919/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState when finish

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19219
  
**[Test build #86911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86911/testReport)**
 for PR 19219 at commit 
[`e421113`](https://github.com/apache/spark/commit/e4211137bdc72c3e94d7bce2944d108e5cb70b55).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState when finish

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19219
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState when finish

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86911/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20400#discussion_r165284238
  
--- Diff: python/pyspark/sql/window.py ---
@@ -120,20 +122,46 @@ def rangeBetween(start, end):
 and "5" means the five off after the current row.
 
 We recommend users use ``Window.unboundedPreceding``, 
``Window.unboundedFollowing``,
-and ``Window.currentRow`` to specify special boundary values, 
rather than using integral
-values directly.
+``Window.currentRow``, 
``pyspark.sql.functions.unboundedPreceding``,
+``pyspark.sql.functions.unboundedFollowing`` and 
``pyspark.sql.functions.currentRow``
+to specify special boundary values, rather than using integral 
values directly.
 
 :param start: boundary start, inclusive.
-  The frame is unbounded if this is 
``Window.unboundedPreceding``, or
+  The frame is unbounded if this is 
``Window.unboundedPreceding``,
+  a column returned by 
``pyspark.sql.functions.unboundedPreceding``, or
   any value less than or equal to max(-sys.maxsize, 
-9223372036854775808).
 :param end: boundary end, inclusive.
-The frame is unbounded if this is 
``Window.unboundedFollowing``, or
+The frame is unbounded if this is 
``Window.unboundedFollowing``,
+a column returned by 
``pyspark.sql.functions.unboundedFollowing``, or
 any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
+
+>>> from pyspark.sql import functions as F, SparkSession, Window
+>>> spark = SparkSession.builder.getOrCreate()
+>>> df = spark.createDataFrame(
+... [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, 
"b")], ["id", "category"])
+>>> window = 
Window.orderBy("id").partitionBy("category").rangeBetween(
+... F.currentRow(), F.lit(1))
+>>> df.withColumn("sum", F.sum("id").over(window)).show()
++---++---+
+| id|category|sum|
++---++---+
+|  1|   b|  3|
+|  2|   b|  5|
+|  3|   b|  3|
+|  1|   a|  4|
+|  1|   a|  4|
+|  2|   a|  2|
++---++---+
+
 """
-if start <= Window._PRECEDING_THRESHOLD:
-start = Window.unboundedPreceding
-if end >= Window._FOLLOWING_THRESHOLD:
-end = Window.unboundedFollowing
+if isinstance(start, (int, long)) and isinstance(end, (int, long)):
--- End diff --

Is it possibly that we mix int and Column in the parameters?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20400#discussion_r165284328
  
--- Diff: python/pyspark/sql/window.py ---
@@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
 and "5" means the five off after the current row.
 
 We recommend users use ``Window.unboundedPreceding``, 
``Window.unboundedFollowing``,
-and ``Window.currentRow`` to specify special boundary values, 
rather than using integral
-values directly.
+``Window.currentRow``, 
``pyspark.sql.functions.unboundedPreceding``,
+``pyspark.sql.functions.unboundedFollowing`` and 
``pyspark.sql.functions.currentRow``
+to specify special boundary values, rather than using integral 
values directly.
 
 :param start: boundary start, inclusive.
-  The frame is unbounded if this is 
``Window.unboundedPreceding``, or
+  The frame is unbounded if this is 
``Window.unboundedPreceding``,
+  a column returned by 
``pyspark.sql.functions.unboundedPreceding``, or
   any value less than or equal to max(-sys.maxsize, 
-9223372036854775808).
 :param end: boundary end, inclusive.
-The frame is unbounded if this is 
``Window.unboundedFollowing``, or
+The frame is unbounded if this is 
``Window.unboundedFollowing``,
+a column returned by 
``pyspark.sql.functions.unboundedFollowing``, or
 any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
 """
-if start <= Window._PRECEDING_THRESHOLD:
-start = Window.unboundedPreceding
-if end >= Window._FOLLOWING_THRESHOLD:
-end = Window.unboundedFollowing
+if isinstance(start, (int, long)) and isinstance(end, (int, long)):
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColum...

2018-02-01 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/20471

[SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow.getMap()`.

## What changes were proposed in this pull request?

This is a followup pr of #20450.
We should've enabled `MutableColumnarRow.getMap()` as well.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23280/fup2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20471.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20471


commit af757ef04626df632b47b39c49ec91bdec177051
Author: Takuya UESHIN 
Date:   2018-02-01T08:19:56Z

Enable `MutableColumnarRow.getMap()`.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20471
  
cc @cloud-fan @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20471
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20471
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/471/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20471
  
**[Test build #86922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86922/testReport)**
 for PR 20471 at commit 
[`af757ef`](https://github.com/apache/spark/commit/af757ef04626df632b47b39c49ec91bdec177051).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20455
  
We also need to add `if (isNullAt(rowId)) return null;` to 
`WritableColumnVector.getMap()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20400
  
Yup, we could also string as a column but I was thinking of matching the
signature with the Scala one for now, just for consistency ..

On 1 Feb 2018 5:24 pm, "Liang-Chi Hsieh"  wrote:

*@viirya* commented on this pull request.
--

In python/pyspark/sql/window.py
:

>  """
-if start <= Window._PRECEDING_THRESHOLD:
-start = Window.unboundedPreceding
-if end >= Window._FOLLOWING_THRESHOLD:
-end = Window.unboundedFollowing
+if isinstance(start, (int, long)) and isinstance(end, (int, long)):

Is it possibly that we mix int and Column in the parameters?
--

In python/pyspark/sql/window.py
:

>  any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
 """
-if start <= Window._PRECEDING_THRESHOLD:
-start = Window.unboundedPreceding
-if end >= Window._FOLLOWING_THRESHOLD:
-end = Window.unboundedFollowing
+if isinstance(start, (int, long)) and isinstance(end, (int, long)):

ditto.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
,
or mute
the thread


.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20455
  
@ueshin Yes, missing it. Thanks. I'll add it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/472/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20468
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86923/testReport)**
 for PR 20455 at commit 
[`35548e6`](https://github.com/apache/spark/commit/35548e6d30211cf155a366da2ad736d1281367bf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColum...

2018-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20471#discussion_r165288359
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java
 ---
@@ -162,8 +162,9 @@ public ColumnarArray getArray(int ordinal) {
   }
 
   @Override
-  public MapData getMap(int ordinal) {
-throw new UnsupportedOperationException();
+  public ColumnarMap getMap(int ordinal) {
+if (columns[ordinal].isNullAt(rowId)) return null;
--- End diff --

I will remove this null check in #20455 later after this gets merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-02-01 Thread ssaavedra
Github user ssaavedra commented on the issue:

https://github.com/apache/spark/pull/20383
  
Yes, sorry for the misunderstanding I was also probably too eager with 
this. However, if the changes I'm stating up there don't work, I am not sure 
what I'm missing now. I'll take a further look at it tomorrow. If any of you is 
at FOSDEM this weekend we could take a look at it in there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20454
  
**[Test build #86914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86914/testReport)**
 for PR 20454 at commit 
[`4ae9b5e`](https://github.com/apache/spark/commit/4ae9b5e4da575066fc36753793fa6437f18a1ddf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20454
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20454
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86914/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20455: [SPARK-23284][SQL] Document the behavior of sever...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20455#discussion_r165292391
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1261,140 @@ class ColumnarBatchSuite extends SparkFunSuite {
 batch.close()
 allocator.close()
   }
+
+  testVector("getUTF8String should return null for null slot", 4, 
StringType) {
--- End diff --

we already have test cases for each type, can we just change the existing 
test cases a little to add this null check?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/473/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20464
  
**[Test build #86924 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86924/testReport)**
 for PR 20464 at commit 
[`95c8a4e`](https://github.com/apache/spark/commit/95c8a4e48e8f760bb9ca0df844136d19452521d7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20468
  
**[Test build #86916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86916/testReport)**
 for PR 20468 at commit 
[`c44c477`](https://github.com/apache/spark/commit/c44c47701d337328493080a83d012abb35065ac2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86916/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20468
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20468
  
**[Test build #86925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86925/testReport)**
 for PR 20468 at commit 
[`c44c477`](https://github.com/apache/spark/commit/c44c47701d337328493080a83d012abb35065ac2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
**[Test build #86915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86915/testReport)**
 for PR 20460 at commit 
[`d9805c3`](https://github.com/apache/spark/commit/d9805c3e4d4795f866e72f3c30f8ca29db90761d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86915/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/474/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-02-01 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/20459
  
Merged to master / branch-2.3. Thanks @yanboliang !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20459


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20455: [SPARK-23284][SQL] Document the behavior of sever...

2018-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20455#discussion_r165298199
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1261,140 @@ class ColumnarBatchSuite extends SparkFunSuite {
 batch.close()
 allocator.close()
   }
+
+  testVector("getUTF8String should return null for null slot", 4, 
StringType) {
--- End diff --

That sounds ok to me. I'll commit the change by tonight.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20455: [SPARK-23284][SQL] Document the behavior of sever...

2018-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20455#discussion_r165300767
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1261,140 @@ class ColumnarBatchSuite extends SparkFunSuite {
 batch.close()
 allocator.close()
   }
+
+  testVector("getUTF8String should return null for null slot", 4, 
StringType) {
--- End diff --

Seems we don't have individual tests for binary type and decimal type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86926/testReport)**
 for PR 20455 at commit 
[`923d0fe`](https://github.com/apache/spark/commit/923d0fe042befe722905791fd8dfcb42003f5e15).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/475/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20464
  
**[Test build #86924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86924/testReport)**
 for PR 20464 at commit 
[`95c8a4e`](https://github.com/apache/spark/commit/95c8a4e48e8f760bb9ca0df844136d19452521d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86924/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165308205
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -653,6 +666,7 @@ class ALSSuite
   test("ALS cold start user/item prediction strategy") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

nit: no need for empty line here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165312057
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -662,28 +676,32 @@ class ALSSuite
 val knownItem = data.select(max("item")).as[Int].first()
 val unknownItem = knownItem + 20
 val test = Seq(
-  (unknownUser, unknownItem),
-  (knownUser, unknownItem),
-  (unknownUser, knownItem),
-  (knownUser, knownItem)
-).toDF("user", "item")
+  (unknownUser, unknownItem, true),
+  (knownUser, unknownItem, true),
+  (unknownUser, knownItem, true),
+  (knownUser, knownItem, false)
+).toDF("user", "item", "expectedIsNaN")
 
 val als = new ALS().setMaxIter(1).setRank(1)
 // default is 'nan'
 val defaultModel = als.fit(data)
-val defaultPredictions = 
defaultModel.transform(test).select("prediction").as[Float].collect()
-assert(defaultPredictions.length == 4)
-assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
-assert(!defaultPredictions.last.isNaN)
+var defaultPredictionNotNaN = Float.NaN
--- End diff --

I would get rid of this variable. 
In `testTransformer` it just adds overhead, 
`assert(!defaultPredictionNotNaN.isNaN)` asserts something that was already 
checked in testTransformer, so it's only use is in 
`testTransformerByGlobalCheckFunc`.
Producing it is a bit convoluted, it's not easy to understand why it's 
needed. 
I would make it clearer by doing a plain old transform using the `test` DF 
(or a smaller one containing only the knownUser, knownItem pair) and selecting 
the value.
An alternative solution could be to use real expected values in the `test` 
DF instead of "isNan" flags. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165312157
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +711,9 @@ class ALSSuite
 val data = ratings.toDF
 val model = new ALS().fit(data)
 Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
-  model.setColdStartStrategy(s).transform(data)
+  testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), 
"prediction") {
+case _ =>
--- End diff --

Just like above, no need for partial function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165308129
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -628,18 +635,24 @@ class ALSSuite
 }
 withClue("transform should fail when ids exceed integer range. ") {
   val model = als.fit(df)
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_big").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_small").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_big").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_small").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
+  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+assert(intercept[SparkException] {
+  model.transform(dataFrame).first
+}.getMessage.contains(msg))
+assert(intercept[StreamingQueryException] {
+  testTransformer[A](dataFrame, model, "prediction") {
+case _ =>
--- End diff --

No need for a partial function here, you can simplify it to `{ _ => }`. 
I would also add a small comment to make it explicit that we intentionally 
do not check anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165305989
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +599,15 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, ex, 
"prediction") {
+case exRows: Seq[Row] =>
--- End diff --

I think it's ok to keep ex.transform here. This way the code will be a bit 
simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165304423
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -566,6 +565,7 @@ class ALSSuite
   test("read/write") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

nit: new line is not needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yup, explicitly logging sounds fine for now so that we can easily check.

>  I do prefer to have these conditional skips removed because sometimes it 
is hard to tell if everything passed or was just skipped

To be clear, I think it's more because our own testing script doesn't show 
the skipped tests output from unittests in the console.

Also, I think it's more because we couldn't make sure Pandas and Arrow were 
installed properly in testing env, Jenkins but not because we skip tests 
related with extra dependencies when they are not installed. Making them as 
required dependencies is a big deal IMHO.

FYI, I tried to install PyArrow with PyPy last time and I failed. I wonder 
if we can easily install it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86918/testReport)**
 for PR 20455 at commit 
[`7a1fd57`](https://github.com/apache/spark/commit/7a1fd57925a080116c288ca1793af86258019494).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86918/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20469
  
**[Test build #86920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86920/testReport)**
 for PR 20469 at commit 
[`15d67ee`](https://github.com/apache/spark/commit/15d67eee9baa87a8fa08a265549000386fd476a6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Also, if we should go in this way, I think we should enable some tests with 
PyPy too if I understood correctly and there isn't another problem I maybe 
missed:


https://github.com/apache/spark/blob/9623a98248837da302ba4ec240335d1c4268ee21/dev/sparktestsupport/modules.py#L457

At least, PyPy in my local has `numpy`.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20469
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86920/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20468
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
I agree that pandas and pyarrow should not be a hard requirement for users, 
and this is what it is today: PySpark only throws exception when users try to 
use pandas related functions without pandas/pyarrow installed.

My proposal is, pandas and pyarrow should be a hard requirement for our 
jenkins, to make sure the features are well tested.

If there is a way to prove that py3 tests run well, and the environment 
issue is hard to fix, then we maybe we can deal with it later, after 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86921 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86921/testReport)**
 for PR 20455 at commit 
[`5246fcc`](https://github.com/apache/spark/commit/5246fcc5bb5936d64991fe7eb6acdd4cbdc25e05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86921/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Thank you for bearing with me @cloud-fan. I agree with it.

BTW, are you working on the logging thing BTW? I was thinking the simplest 
way to check is just print out once if PyArrow / Pandas are installed or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
> My proposal is, pandas and pyarrow should be a hard requirement for our 
jenkins, to make sure the features are well tested.

If this is a goal, I think another simple way is just to use an env set in 
Jenkins and throw an exception if both PyArrow or Pandas are not installed in 
the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20471
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20455: [SPARK-23284][SQL] Document the behavior of sever...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20455#discussion_r165324640
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1269,38 @@ class ColumnarBatchSuite extends SparkFunSuite {
 batch.close()
 allocator.close()
   }
+
+  testVector("getDecimal should return null for null slot", 4, 
DecimalType.IntDecimal) {
--- End diff --

shall we make it a normal test case for decimal type? we can follow the 
other tests, e.g. create a decimal array, and check the value of column vector 
at the same index.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20466
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/476/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20466
  
**[Test build #86927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86927/testReport)**
 for PR 20466 at commit 
[`6e55d10`](https://github.com/apache/spark/commit/6e55d1000c62a86c14ad993d3699b0ed99f53cbb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
I've not worked in the logging stuff yet, feel free to take it, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
@cloud-fan, will try it. Thank you sincerely.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread lucio-yz
GitHub user lucio-yz opened a pull request:

https://github.com/apache/spark/pull/20472

[SPARK-22751][ML]Improve ML RandomForest shuffle performance

## What changes were proposed in this pull request?

As I mentioned in 
[SPARK-22751](https://issues.apache.org/jira/browse/SPARK-22751?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20ML%20AND%20text%20~%20randomforest),
 there is a shuffle performance problem in ML Randomforest when train a RF in 
high dimensional data. 

The reason is that, in org.apache.spark.tree.impl.RandomForest, the 
function findSplitsBySorting will actually flatmap a sparse vector into a dense 
vector, then in groupByKey there will be a huge shuffle write size.

To avoid this, we can add a filter after flatmap, to filter out zero value. 
And in function findSplitsForContinuousFeature, we can infer the number of zero 
value by pass a parameter numInput to function findSplitsForContinuousFeature. 
numInput is the number of samples.

In addition, if a feature only contains zero value, continuousSplits will 
not has the key of feature id. So I add a check when using continuousSplits.

## How was this patch tested?
Ran model locally using spark-submit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lucio-yz/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20472


commit 50cb173dd34dc353c243b97f2686a8c545a03909
Author: lucio <576632108@...>
Date:   2018-02-01T09:47:52Z

fix mllib randomforest shuffle issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20472
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20472
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20461
  
**[Test build #86917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86917/testReport)**
 for PR 20461 at commit 
[`fed6dc2`](https://github.com/apache/spark/commit/fed6dc25c6293cad08e6759bc0a1cf414b91dfd0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86917/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20471
  
**[Test build #86922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86922/testReport)**
 for PR 20471 at commit 
[`af757ef`](https://github.com/apache/spark/commit/af757ef04626df632b47b39c49ec91bdec177051).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20471
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20471
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86922/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86923/testReport)**
 for PR 20455 at commit 
[`35548e6`](https://github.com/apache/spark/commit/35548e6d30211cf155a366da2ad736d1281367bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86923/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/20461
  
@cloud-fan thanks a lot for ping. LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20454
  
adding a default method to a java interface is binary compatible, I'm 
merging this to master only, to follow @rxin 's suggestion, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20468
  
**[Test build #86925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86925/testReport)**
 for PR 20468 at commit 
[`c44c477`](https://github.com/apache/spark/commit/c44c47701d337328493080a83d012abb35065ac2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >