[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87295/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20571
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20571
  
**[Test build #87295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87295/testReport)**
 for PR 20571 at commit 
[`81c1b24`](https://github.com/apache/spark/commit/81c1b2407ceb478d6795438de82ac6afe65024c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20575: [SPARK-23386][DEPLOY] enable direct application links in...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20575
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20575: [SPARK-23386][DEPLOY] enable direct application links in...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20575
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20575: [SPARK-23386][DEPLOY] enable direct application l...

2018-02-10 Thread gerashegalov
GitHub user gerashegalov opened a pull request:

https://github.com/apache/spark/pull/20575

[SPARK-23386][DEPLOY] enable direct application links in SHS before replay

## What changes were proposed in this pull request?
Enable direct job links already in the scan thread before full replay. 
Otherwise, direct job links might not be available for hours.

## How was this patch tested?
Test with a deploy on multiple 10k apps. This is currently a prototype for 
YARN, but should generalizable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gerashegalov/spark 
gera/logs-events-from-listing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20575


commit e27880263f36a7b8beee62c902389c293bb2a17e
Author: Gera Shegalov 
Date:   2018-02-09T15:05:12Z

List-driven bootstrap replay

(cherry picked from commit 0d4e2a2215bb9e102ce449c52bcf7c3d44fc6d44)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...

2018-02-10 Thread LantaoJin
Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/20574
  
cc @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20574
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20574
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...

2018-02-10 Thread LantaoJin
Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/20574
  
@jerryshao Could you have a time to help to review?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20574: [SPARK-23385][CORE] Allow SparkUITab to be custom...

2018-02-10 Thread LantaoJin
GitHub user LantaoJin opened a pull request:

https://github.com/apache/spark/pull/20574

[SPARK-23385][CORE] Allow SparkUITab to be customized adding in Spark…

…Conf and loaded when creating SparkUI

## What changes were proposed in this pull request?

It would be nice if there was a mechanism to allow to add customized 
SparkUITab (embedded like Jobs, Stages, Storage, Environment, Executors,...) to 
be registered through SparkConf settings. This would be more flexible when we 
need display some special information in UI rather than adding the embedded one 
by one and wait community to merge.

I propose to introduce a new configuration option, spark.extraUITabs, that 
allows customized WebUITab to be specified in SparkConf and registered when 
SparkUI is created. Here is the proposed documentation for the new option: 

> A comma-separated list of classes that implement SparkUITab; when 
initializing SparkUI, instances of these classes will be created and registered 
to the tabs array in SparkUI. If a class has a two-argument constructor that 
accepts a SparkUI and AppStatusStore, that constructor will be called; If a 
class has a single-argument constructor that accepts a SparkUI; otherwise, a 
zero-argument constructor will be called. If no valid constructor can be found, 
the SparkUI creation will fail with an exception.

## How was this patch tested?
1.  Offerred a unit test.
2. Check the WebUI to see a new tab called "Test" via
`bin/spark-shell` --master local --driver-class-path 
/path/spark-core_2.11-*-tests.jar --conf 
`spark.extraUITabs=org.apache.spark.ui.TestUITab

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LantaoJin/spark SPARK-23385

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20574


commit fb9a8a1be7fc515848b0906af8af31c4c8081807
Author: LantaoJin 
Date:   2018-02-11T06:56:01Z

[SPARK-23385][CORE] Allow SparkUITab to be customized adding in SparkConf 
and loaded when creating SparkUI




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20567#discussion_r167423077
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1941,12 +1941,24 @@ def toPandas(self):
 timezone = None
 
 if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", 
"false").lower() == "true":
+should_fall_back = False
 try:
-from pyspark.sql.types import 
_check_dataframe_convert_date, \
-_check_dataframe_localize_timestamps
+from pyspark.sql.types import to_arrow_schema
 from pyspark.sql.utils import 
require_minimum_pyarrow_version
-import pyarrow
 require_minimum_pyarrow_version()
+# Check if its schema is convertible in Arrow format.
+to_arrow_schema(self.schema)
+except Exception as e:
+# Fallback to convert to Pandas DataFrame without arrow if 
raise some exception
--- End diff --

Yup. It does fall back for unsupported schema, PyArrow version mismatch and 
PyAarrow missing. Will add a note in PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20566
  
**[Test build #87301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)**
 for PR 20566 at commit 
[`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20566
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20573: [SPARK-23384][WEB-UI]When it has no incomplete(completed...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20573
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20573: [SPARK-23384][WEB-UI]When it has no incomplete(completed...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20573
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20566
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/787/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20573: [SPARK-23384][WEB-UI]When it has no incomplete(co...

2018-02-10 Thread guoxiaolongzte
GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/20573

[SPARK-23384][WEB-UI]When it has no incomplete(completed) applications 
found, the last updated time is not formatted and client local time zone is not 
show in history server web ui.

## What changes were proposed in this pull request?

When it has no incomplete(completed) applications found, the last updated 
time is not formatted and client local time zone is not show in history server 
web ui. It is a bug.

fix before:

![1](https://user-images.githubusercontent.com/26266482/36070635-264d7cf0-0f3a-11e8-8426-14135ffedb16.png)

fix after:

![2](https://user-images.githubusercontent.com/26266482/36070651-8ec3800e-0f3a-11e8-991c-6122cc9539fe.png)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-23384

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20573


commit 0575d5eb402edcca0c67a5fa9001fd5e5183e34e
Author: guoxiaolong 
Date:   2018-02-11T06:43:20Z

[SPARK-23384][WEB-UI]When it has no incomplete(completed) applications 
found, the last updated time is not formatted and client local time zone is not 
show in history server web ui.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20566
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87299/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20566
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20566
  
**[Test build #87299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)**
 for PR 20566 at commit 
[`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20561
  
**[Test build #87300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87300/testReport)**
 for PR 20561 at commit 
[`2e7a5ad`](https://github.com/apache/spark/commit/2e7a5ad9063d51116c1180b1c8285631edb8ce65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20566
  
**[Test build #87299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)**
 for PR 20566 at commit 
[`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/786/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20566
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/785/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20566
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87294/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20561
  
**[Test build #87298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87298/testReport)**
 for PR 20561 at commit 
[`5e93313`](https://github.com/apache/spark/commit/5e93313548f87351f58d3217ccedceafcef7083b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3

2018-02-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20511
  
First of all, ORC 1.4.2 was very safe because it has only ORC-235 removing 
redundant dependencies.

For ORC 1.4.3, the following five patches are included. 

1. ORC-298 Move the benchmark code base to non-Apache repository
2. ORC-240 Fix warnings from Maven
3. ORC-217 Duplicate rat plugins in pom.xml

The above three are trivial.

4. ORC-285 Empty vector batches of floats or doubles get  
java.io.EOFException
5. ORC-296 Work around HADOOP-15171; also fix stream contract

(4) is only adding a workaround for `batchSize=0`.  (5) may cause 
performance difference.

In general, the patches look required, but I didn't run a full test against 
ORC 1.4.3.



Only ORC-296 might cause some performance difference.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20561
  
**[Test build #87294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87294/testReport)**
 for PR 20561 at commit 
[`151a92d`](https://github.com/apache/spark/commit/151a92dff074bff26ad179bedbdd4b49f345ec93).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/784/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3

2018-02-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20511
  
@dongjoon-hyun Could you go over the list of the resolved JIRAs in ORC 
1.4.2 and 1.4.3 that could cause the regressions?

We need to know the impact and the risk. If possible, also added a test 
case in Spark to ensure the issue has been resolved after the upgrade?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87297/testReport)**
 for PR 20511 at commit 
[`5e45129`](https://github.com/apache/spark/commit/5e451294a1465f64739dda5d892ca3bdd808e6cf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/783/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20572
  
**[Test build #87296 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87296/testReport)**
 for PR 20572 at commit 
[`2ed51f1`](https://github.com/apache/spark/commit/2ed51f1f73ee75ffd08355265a72e68e83ef592d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87296/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20511
  
Sure. No problem. BTW, is it applicable for Apache Spark 2.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20511
  
+1 for 1.4.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorte...

2018-02-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20561#discussion_r167421526
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
 ---
@@ -98,10 +99,22 @@ public UnsafeKVExternalSorter(
 numElementsForSpillThreshold,
 canUseRadixSort);
 } else {
-  // The array will be used to do in-place sort, which require half of 
the space to be empty.
-  // Note: each record in the map takes two entries in the array, one 
is record pointer,
-  // another is the key prefix.
-  assert(map.numKeys() * 2 <= map.getArray().size() / 2);
+  LongArray pointArray = map.getArray();
+  // `BytesToBytesMap`'s point array is only guaranteed to hold all 
the distinct keys, but
+  // `UnsafeInMemorySorter`'s point array need to hold all the 
entries. Since `BytesToBytesMap`
+  // can have duplicated keys, here we need a check to make sure the 
point array can hold
+  // all the entries in `BytesToBytesMap`.
+  // The point array will be used to do in-place sort, which requires 
half of the space to be
+  // empty. Note: each record in the map takes two entries in the 
point array, one is record
+  // pointer, another is key prefix. So the required size of point 
array is `numRecords * 4`.
+  // TODO: It's possible to change UnsafeInMemorySorter to have 
multiple entries with same key,
+  // so that we can always reuse the point array.
+  if (map.numValues() > pointArray.size() / 4) {
+// Here we ask the map to allocate memory, so that the memory 
manager won't ask the map
+// to spill, if the memory is not enough.
+pointArray = map.allocateArray(map.numValues() * 4L);
+  }
+
   // During spilling, the array in map will not be used, so we can 
borrow that and use it
   // as the underlying array for in-memory sorter (it's always large 
enough).
--- End diff --

Shall we update the comment here too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20572
  
**[Test build #87296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87296/testReport)**
 for PR 20572 at commit 
[`2ed51f1`](https://github.com/apache/spark/commit/2ed51f1f73ee75ffd08355265a72e68e83ef592d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/782/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-10 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/20572

[SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets

## What changes were proposed in this pull request?

Add a configuration spark.streaming.kafka.allowNonConsecutiveOffsets to 
allow streaming jobs to proceed on compacted topics (or other situations 
involving gaps between offsets in the log).

## How was this patch tested?

Added new unit test

@justinrmiller has been testing this branch in production for a few weeks

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-17147

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20572


commit 3082de7e43e8c381dc2227005d1e0fc5bd2c3d29
Author: cody koeninger 
Date:   2016-10-08T21:21:48Z

[SPARK-17147][STREAMING][KAFKA] failing test for compacted topics

commit e8ea89ea10527c6723df4af2685004ea67d872cd
Author: cody koeninger 
Date:   2016-10-09T04:59:39Z

[SPARK-17147][STREAMING][KAFKA] test passing for compacted topics

commit 182943e36f596d0cb5841a9c63471bea1dd9047b
Author: cody koeninger 
Date:   2018-02-11T04:09:38Z

spark.streaming.kafka.allowNonConsecutiveOffsets

commit 89f4bc5f4de78cdcc22b5c9b26a27ee9263048c8
Author: cody koeninger 
Date:   2018-02-11T04:13:49Z

[SPARK-17147][STREAMING][KAFKA] remove stray param doc

commit 12e65bedddbcd2407598e69fa3c6fcbcdfc67e5d
Author: cody koeninger 
Date:   2018-02-11T04:28:22Z

[SPARK-17147][STREAMING][KAFKA] prepare for merge of master

commit 2ed51f1f73ee75ffd08355265a72e68e83ef592d
Author: cody koeninger 
Date:   2018-02-11T05:19:31Z

Merge branch 'master' into SPARK-17147




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20511
  
@omalley Thanks for your quick reply!

@dongjoon-hyun Maybe we should directly bump to 1.4.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87293/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87293 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87293/testReport)**
 for PR 20511 at commit 
[`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20568#discussion_r167419965
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java
 ---
@@ -51,6 +51,22 @@ public void testKnownLongInputs() {
 Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
   }
 
+  @Test
+  public void testKnownBytesInputs() {
+byte[] test = "test".getBytes(StandardCharsets.UTF_8);
+Assert.assertEquals(-1167338989,
--- End diff --

Is it better to compare the result of murmur3 hash value by scala library?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20568#discussion_r167419960
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java
 ---
@@ -51,6 +51,22 @@ public void testKnownLongInputs() {
 Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
   }
 
+  @Test
--- End diff --

It would be good to add JIRA number with a short description as a comment 
(e.g. `SPARK-23381 ...`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/781/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20571
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...

2018-02-10 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/20557#discussion_r167419765
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -539,15 +539,15 @@ case class DescribeTableCommand(
 throw new AnalysisException(
   s"DESC PARTITION is not allowed on a temporary view: 
${table.identifier}")
   }
-  describeSchema(catalog.lookupRelation(table).schema, result, header 
= false)
+  describeSchema(catalog.lookupRelation(table).schema, result, header 
= true)
--- End diff --

The snapshot is correct fix code effect, the statistics rows does not 
contain the head

![2](https://user-images.githubusercontent.com/26266482/36069344-ba833c56-0f22-11e8-9ab6-26f0ae6285b7.png)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20571
  
**[Test build #87295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87295/testReport)**
 for PR 20571 at commit 
[`81c1b24`](https://github.com/apache/spark/commit/81c1b2407ceb478d6795438de82ac6afe65024c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20571: [SPARK-23383][Build][Minor]Make a distribution sh...

2018-02-10 Thread yaooqinn
GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/20571

[SPARK-23383][Build][Minor]Make a distribution should exit with usage while 
detecting wrong options

## What changes were proposed in this pull request?
```shell
./dev/make-distribution.sh --name ne-1.0.0-SNAPSHOT xyz --tgz  -Phadoop-2.7
+++ dirname ./dev/make-distribution.sh
++ cd ./dev/..
++ pwd
+ SPARK_HOME=/Users/Kent/Documents/spark
+ DISTDIR=/Users/Kent/Documents/spark/dist
+ MAKE_TGZ=false
+ MAKE_PIP=false
+ MAKE_R=false
+ NAME=none
+ MVN=/Users/Kent/Documents/spark/build/mvn
+ ((  5  ))
+ case $1 in
+ NAME=ne-1.0.0-SNAPSHOT
+ shift
+ shift
+ ((  3  ))
+ case $1 in
+ break
+ '[' -z /Users/Kent/.jenv/candidates/java/current ']'
+ '[' -z /Users/Kent/.jenv/candidates/java/current ']'
++ command -v git
+ '[' /usr/local/bin/git ']'
++ git rev-parse --short HEAD
+ GITREV=98ea6a7
+ '[' '!' -z 98ea6a7 ']'
+ GITREVSTRING=' (git revision 98ea6a7)'
+ unset GITREV
++ command -v /Users/Kent/Documents/spark/build/mvn
+ '[' '!' /Users/Kent/Documents/spark/build/mvn ']'
++ /Users/Kent/Documents/spark/build/mvn help:evaluate 
-Dexpression=project.version xyz --tgz -Phadoop-2.7
++ grep -v INFO
++ tail -n 1
+ VERSION=' -X,--debug Produce execution debug 
output'
```
It is better to declare the mistakes and exit with usage than `break`

## How was this patch tested?

manually 

cc @srowen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark SPARK-23383

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20571


commit 81c1b2407ceb478d6795438de82ac6afe65024c8
Author: Kent Yao 
Date:   2018-02-11T03:48:30Z

exit with usage




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20561
  
**[Test build #87294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87294/testReport)**
 for PR 20561 at commit 
[`151a92d`](https://github.com/apache/spark/commit/151a92dff074bff26ad179bedbdd4b49f345ec93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/780/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread omalley
Github user omalley commented on the issue:

https://github.com/apache/spark/pull/20511
  
Sorry, I forgot to transition the jira issues for the ORC 1.4.3, so they 
didn't show up in the search from the notes.

The list of jiras closed by the 1.4.3 release is: https://s.apache.org/Fll8

There was an issue with the reader if you had an empty column of 
floats/doubles (ORC-285) and a compression issue that only seemed to hit LLAP 
(ORC-296).

We are about to start the ORC 1.5 release, but the ORC 1.4 release has been 
very stable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20511
  
I am wondering what is the difference between ORC 1.4.3 and ORC 1.4.2? 
Their release notes are the SAME.  
https://orc.apache.org/news/2018/02/09/ORC-1.4.3/ and 
https://orc.apache.org/news/2018/02/09/ORC-1.4.2/ Could you help us figure out 
the exact change JIRA lists excluded in these two releases? 

Should we directly upgrade it to 1.4.3? What is the release schedule for 
Apache ORC? Our Spark 2.4 will not be released until the second half of 2018. 
Which version of ORC is stable for production? I am wondering if we should 
always upgrade to the latest version of ORC? Or wait for more user feedbacks 
from the ORC community to know whether the version is stable or not?

cc @dongjoon-hyun @omalley 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20518: [SPARK-22119][FOLLOWUP][ML] Use spherical KMeans ...

2018-02-10 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/20518#discussion_r167417459
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -745,4 +763,27 @@ private[spark] class CosineDistanceMeasure extends 
DistanceMeasure {
   override def distance(v1: VectorWithNorm, v2: VectorWithNorm): Double = {
 1 - dot(v1.vector, v2.vector) / v1.norm / v2.norm
   }
+
+  /**
+   * Updates the value of `sum` adding the `point` vector.
+   * @param point a `VectorWithNorm` to be added to `sum` of a cluster
+   * @param sum the `sum` for a cluster to be updated
+   */
+  override def updateClusterSum(point: VectorWithNorm, sum: Vector): Unit 
= {
+axpy(1.0 / point.norm, point.vector, sum)
--- End diff --

In scala, `1.0 / 0.0` generate `Infinity`, what about directly throw an 
exception instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20570: [spark-23382][WEB-UI]Spark Streaming ui about the conten...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20570
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20570: [spark-23382][WEB-UI]Spark Streaming ui about the conten...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20570
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20570: [spark-23382][WEB-UI]Spark Streaming ui about the...

2018-02-10 Thread guoxiaolongzte
GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/20570

[spark-23382][WEB-UI]Spark Streaming ui about the contents of the for need 
to have hidden and show features, when the table records very much.


## What changes were proposed in this pull request?
Spark Streaming ui about the contents of the for need to have hidden and 
show features, when the table records very much.
please refer to https://github.com/apache/spark/pull/20216

fix after:

![1](https://user-images.githubusercontent.com/26266482/36068644-df029328-0f14-11e8-8350-cfdde9733ffc.png)




## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-23382

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20570


commit c6ffe3025af5129a807885f9d757d2ddad641b62
Author: guoxiaolong 
Date:   2018-02-11T02:13:05Z

[spark-23382][WEB-UI]Spark Streaming ui about the contents of the form need 
to have hidden and show features, when the table records very much.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20566
  
Not only `threshold`, the default params of `NaiveBayes`, 
`LogisticRegression` (maybe more, I'm looking up now) are all set in the 
estimator, not in their model. The models are received the default values at 
the end of `fit`.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...

2018-02-10 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20516
  
@srowen, I think this is a function provided by spark for port use,
One is that the spark user only needs to specify start port and the offset 
of ports (spark.port.maxRetries settings), the port binding is automatically 
generated by the spark.
two is that when the spark user has must be bind the specified port (set 
spark.port.maxRetries = 0). but, Once the specified port has been bound to the 
system, then spark will be thrown bind exception.
thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...

2018-02-10 Thread LantaoJin
Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/20532
  
Thanks everyone. So just close it? Or easily leave an enabled switch like 
blockUpdated dose? I am all OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...

2018-02-10 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/20557#discussion_r167416457
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -539,15 +539,15 @@ case class DescribeTableCommand(
 throw new AnalysisException(
   s"DESC PARTITION is not allowed on a temporary view: 
${table.identifier}")
   }
-  describeSchema(catalog.lookupRelation(table).schema, result, header 
= false)
+  describeSchema(catalog.lookupRelation(table).schema, result, header 
= true)
--- End diff --

# Partition Information
# col_name  data_type   comment

Partition information also takes up two rows.
I try to keep the head of the case, let rows number is displayed correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...

2018-02-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20567#discussion_r167415761
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1941,12 +1941,24 @@ def toPandas(self):
 timezone = None
 
 if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", 
"false").lower() == "true":
+should_fall_back = False
 try:
-from pyspark.sql.types import 
_check_dataframe_convert_date, \
-_check_dataframe_localize_timestamps
+from pyspark.sql.types import to_arrow_schema
 from pyspark.sql.utils import 
require_minimum_pyarrow_version
-import pyarrow
 require_minimum_pyarrow_version()
+# Check if its schema is convertible in Arrow format.
+to_arrow_schema(self.schema)
+except Exception as e:
+# Fallback to convert to Pandas DataFrame without arrow if 
raise some exception
--- End diff --

Does this PR fall back to the original path if any exception occurs? E.g. 
`ImportError` happens while the current code throws an exception with the 
message?
Would it be good to note this change, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20511
  
Thank you, @kiszk .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87293/testReport)**
 for PR 20511 at commit 
[`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/779/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20511
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87292/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #87292 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87292/testReport)**
 for PR 20208 at commit 
[`29c281d`](https://github.com/apache/spark/commit/29c281dbe3c6f63614d9abc286c68e283786649b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread MrBago
Github user MrBago commented on the issue:

https://github.com/apache/spark/pull/20566
  
I believe this will break persistence for LogisticRegression. I believe the 
issue is that the `threshold` param on LogisticRegressionModel doesn't get a 
default directly, but only gets it during the call to `fit` on 
LogisticRegression. This is currently fine because the Model can only be 
created by fitting or by being read from disk and in both case some value gets 
set for threshold. With this change that's no longer the case. Here's a test to 
confirm, 
https://github.com/apache/spark/commit/5db2108224accdf848b41ef0d8d1c312b49f49c6.

I believe LinearRegression may have a similar issue.

Our current tests don't seem to cover this kind of thing so I think we 
should improve test coverage if we want to make this kind of change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87291/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87291 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87291/testReport)**
 for PR 20511 at commit 
[`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87291/testReport)**
 for PR 20511 at commit 
[`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #87292 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87292/testReport)**
 for PR 20208 at commit 
[`29c281d`](https://github.com/apache/spark/commit/29c281dbe3c6f63614d9abc286c68e283786649b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/778/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/777/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20511
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20565: SPAR[SPARK-23379][SQL] remove redundant metastore...

2018-02-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20565#discussion_r167411254
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -292,10 +292,12 @@ private[hive] class HiveClientImpl(
   }
 
   override def setCurrentDatabase(databaseName: String): Unit = 
withHiveState {
-if (databaseExists(databaseName)) {
-  state.setCurrentDatabase(databaseName)
-} else {
-  throw new NoSuchDatabaseException(databaseName)
+if (state.getCurrentDatabase != databaseName) {
+  if (databaseExists(databaseName)) {
--- End diff --

This PR uses an additional `getCurrentDatabase` to avoid `databaseExists`. 
Can we have a more specific in the title?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...

2018-02-10 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/20519
  
>yea but we can't simply flush and ignore the stdout specifically from 
sitecustomize unless we define a kind of an additional protocol like this 
because we can't simply distinguish if the output

We might be able to distinguish between sitecustomize.py output and 
daemon.py output. Assuming the code in the sitecustomize.py is not 
multi-threaded, we can assume all output from sitecustomize.py comes *before* 
any output from daemon.py. Therefore, if daemon.py first prints a "magic 
number" or some other string that is unlikely to show up in sitecustomize.py 
output, PythonWorkerFactory.startDaemon() will know when daemon.py output 
starts. daemon.py would print the port number only after printing this magic 
value. For example:


daemon port: ^@^@\325


Once the scala code sees "daemon port: " in the launched process's stdout, 
it knows the next 4 bytes are the port number.

However, if sitecustomize.py starts multi-threaded code (and if that's even 
possible, that's a corner-corner-corner case), its output could potentially be 
interleaved with the daemon's output. Also, I am not sure sitecustomize.py 
output is guaranteed to show up first in stdout, but it seems reasonable that 
it would.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20560
  
thank you @gatorsmile for taking a look at this. Let me know if there is 
something I can/should improve. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20560
  
@mgaido91 Yeah, we definitely should include this rule. We just need more 
careful review and comprehensive test cases. Thanks for your work!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87290/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)**
 for PR 20537 at commit 
[`23abfb0`](https://github.com/apache/spark/commit/23abfb0e01f98dc4bfbd3fb9f04e487ec9af052c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >