[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87295/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20571 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20571 **[Test build #87295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87295/testReport)** for PR 20571 at commit [`81c1b24`](https://github.com/apache/spark/commit/81c1b2407ceb478d6795438de82ac6afe65024c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20575: [SPARK-23386][DEPLOY] enable direct application links in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20575 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20575: [SPARK-23386][DEPLOY] enable direct application links in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20575 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20575: [SPARK-23386][DEPLOY] enable direct application l...
GitHub user gerashegalov opened a pull request: https://github.com/apache/spark/pull/20575 [SPARK-23386][DEPLOY] enable direct application links in SHS before replay ## What changes were proposed in this pull request? Enable direct job links already in the scan thread before full replay. Otherwise, direct job links might not be available for hours. ## How was this patch tested? Test with a deploy on multiple 10k apps. This is currently a prototype for YARN, but should generalizable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gerashegalov/spark gera/logs-events-from-listing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20575.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20575 commit e27880263f36a7b8beee62c902389c293bb2a17e Author: Gera ShegalovDate: 2018-02-09T15:05:12Z List-driven bootstrap replay (cherry picked from commit 0d4e2a2215bb9e102ce449c52bcf7c3d44fc6d44) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20574 cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20574 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20574 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20574 @jerryshao Could you have a time to help to review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20574: [SPARK-23385][CORE] Allow SparkUITab to be custom...
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/20574 [SPARK-23385][CORE] Allow SparkUITab to be customized adding in Spark⦠â¦Conf and loaded when creating SparkUI ## What changes were proposed in this pull request? It would be nice if there was a mechanism to allow to add customized SparkUITab (embedded like Jobs, Stages, Storage, Environment, Executors,...) to be registered through SparkConf settings. This would be more flexible when we need display some special information in UI rather than adding the embedded one by one and wait community to merge. I propose to introduce a new configuration option, spark.extraUITabs, that allows customized WebUITab to be specified in SparkConf and registered when SparkUI is created. Here is the proposed documentation for the new option: > A comma-separated list of classes that implement SparkUITab; when initializing SparkUI, instances of these classes will be created and registered to the tabs array in SparkUI. If a class has a two-argument constructor that accepts a SparkUI and AppStatusStore, that constructor will be called; If a class has a single-argument constructor that accepts a SparkUI; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkUI creation will fail with an exception. ## How was this patch tested? 1. Offerred a unit test. 2. Check the WebUI to see a new tab called "Test" via `bin/spark-shell` --master local --driver-class-path /path/spark-core_2.11-*-tests.jar --conf `spark.extraUITabs=org.apache.spark.ui.TestUITab You can merge this pull request into a Git repository by running: $ git pull https://github.com/LantaoJin/spark SPARK-23385 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20574.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20574 commit fb9a8a1be7fc515848b0906af8af31c4c8081807 Author: LantaoJinDate: 2018-02-11T06:56:01Z [SPARK-23385][CORE] Allow SparkUITab to be customized adding in SparkConf and loaded when creating SparkUI --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20567#discussion_r167423077 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1941,12 +1941,24 @@ def toPandas(self): timezone = None if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", "false").lower() == "true": +should_fall_back = False try: -from pyspark.sql.types import _check_dataframe_convert_date, \ -_check_dataframe_localize_timestamps +from pyspark.sql.types import to_arrow_schema from pyspark.sql.utils import require_minimum_pyarrow_version -import pyarrow require_minimum_pyarrow_version() +# Check if its schema is convertible in Arrow format. +to_arrow_schema(self.schema) +except Exception as e: +# Fallback to convert to Pandas DataFrame without arrow if raise some exception --- End diff -- Yup. It does fall back for unsupported schema, PyArrow version mismatch and PyAarrow missing. Will add a note in PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20573: [SPARK-23384][WEB-UI]When it has no incomplete(completed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20573 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20573: [SPARK-23384][WEB-UI]When it has no incomplete(completed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20573 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/787/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20573: [SPARK-23384][WEB-UI]When it has no incomplete(co...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/20573 [SPARK-23384][WEB-UI]When it has no incomplete(completed) applications found, the last updated time is not formatted and client local time zone is not show in history server web ui. ## What changes were proposed in this pull request? When it has no incomplete(completed) applications found, the last updated time is not formatted and client local time zone is not show in history server web ui. It is a bug. fix before: ![1](https://user-images.githubusercontent.com/26266482/36070635-264d7cf0-0f3a-11e8-8426-14135ffedb16.png) fix after: ![2](https://user-images.githubusercontent.com/26266482/36070651-8ec3800e-0f3a-11e8-991c-6122cc9539fe.png) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-23384 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20573.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20573 commit 0575d5eb402edcca0c67a5fa9001fd5e5183e34e Author: guoxiaolongDate: 2018-02-11T06:43:20Z [SPARK-23384][WEB-UI]When it has no incomplete(completed) applications found, the last updated time is not formatted and client local time zone is not show in history server web ui. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87299/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit [`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87300/testReport)** for PR 20561 at commit [`2e7a5ad`](https://github.com/apache/spark/commit/2e7a5ad9063d51116c1180b1c8285631edb8ce65). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit [`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/786/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/785/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87294/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87298/testReport)** for PR 20561 at commit [`5e93313`](https://github.com/apache/spark/commit/5e93313548f87351f58d3217ccedceafcef7083b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 First of all, ORC 1.4.2 was very safe because it has only ORC-235 removing redundant dependencies. For ORC 1.4.3, the following five patches are included. 1. ORC-298 Move the benchmark code base to non-Apache repository 2. ORC-240 Fix warnings from Maven 3. ORC-217 Duplicate rat plugins in pom.xml The above three are trivial. 4. ORC-285 Empty vector batches of floats or doubles get java.io.EOFException 5. ORC-296 Work around HADOOP-15171; also fix stream contract (4) is only adding a workaround for `batchSize=0`. (5) may cause performance difference. In general, the patches look required, but I didn't run a full test against ORC 1.4.3. Only ORC-296 might cause some performance difference. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87294/testReport)** for PR 20561 at commit [`151a92d`](https://github.com/apache/spark/commit/151a92dff074bff26ad179bedbdd4b49f345ec93). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/784/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20511 @dongjoon-hyun Could you go over the list of the resolved JIRAs in ORC 1.4.2 and 1.4.3 that could cause the regressions? We need to know the impact and the risk. If possible, also added a test case in Spark to ensure the issue has been resolved after the upgrade? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20511 **[Test build #87297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87297/testReport)** for PR 20511 at commit [`5e45129`](https://github.com/apache/spark/commit/5e451294a1465f64739dda5d892ca3bdd808e6cf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20572 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/783/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20572 **[Test build #87296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87296/testReport)** for PR 20572 at commit [`2ed51f1`](https://github.com/apache/spark/commit/2ed51f1f73ee75ffd08355265a72e68e83ef592d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87296/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 Sure. No problem. BTW, is it applicable for Apache Spark 2.3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20511 +1 for 1.4.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorte...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20561#discussion_r167421526 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -98,10 +99,22 @@ public UnsafeKVExternalSorter( numElementsForSpillThreshold, canUseRadixSort); } else { - // The array will be used to do in-place sort, which require half of the space to be empty. - // Note: each record in the map takes two entries in the array, one is record pointer, - // another is the key prefix. - assert(map.numKeys() * 2 <= map.getArray().size() / 2); + LongArray pointArray = map.getArray(); + // `BytesToBytesMap`'s point array is only guaranteed to hold all the distinct keys, but + // `UnsafeInMemorySorter`'s point array need to hold all the entries. Since `BytesToBytesMap` + // can have duplicated keys, here we need a check to make sure the point array can hold + // all the entries in `BytesToBytesMap`. + // The point array will be used to do in-place sort, which requires half of the space to be + // empty. Note: each record in the map takes two entries in the point array, one is record + // pointer, another is key prefix. So the required size of point array is `numRecords * 4`. + // TODO: It's possible to change UnsafeInMemorySorter to have multiple entries with same key, + // so that we can always reuse the point array. + if (map.numValues() > pointArray.size() / 4) { +// Here we ask the map to allocate memory, so that the memory manager won't ask the map +// to spill, if the memory is not enough. +pointArray = map.allocateArray(map.numValues() * 4L); + } + // During spilling, the array in map will not be used, so we can borrow that and use it // as the underlying array for in-memory sorter (it's always large enough). --- End diff -- Shall we update the comment here too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20572 **[Test build #87296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87296/testReport)** for PR 20572 at commit [`2ed51f1`](https://github.com/apache/spark/commit/2ed51f1f73ee75ffd08355265a72e68e83ef592d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20572 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/782/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/20572 [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets ## What changes were proposed in this pull request? Add a configuration spark.streaming.kafka.allowNonConsecutiveOffsets to allow streaming jobs to proceed on compacted topics (or other situations involving gaps between offsets in the log). ## How was this patch tested? Added new unit test @justinrmiller has been testing this branch in production for a few weeks You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-17147 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20572.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20572 commit 3082de7e43e8c381dc2227005d1e0fc5bd2c3d29 Author: cody koeningerDate: 2016-10-08T21:21:48Z [SPARK-17147][STREAMING][KAFKA] failing test for compacted topics commit e8ea89ea10527c6723df4af2685004ea67d872cd Author: cody koeninger Date: 2016-10-09T04:59:39Z [SPARK-17147][STREAMING][KAFKA] test passing for compacted topics commit 182943e36f596d0cb5841a9c63471bea1dd9047b Author: cody koeninger Date: 2018-02-11T04:09:38Z spark.streaming.kafka.allowNonConsecutiveOffsets commit 89f4bc5f4de78cdcc22b5c9b26a27ee9263048c8 Author: cody koeninger Date: 2018-02-11T04:13:49Z [SPARK-17147][STREAMING][KAFKA] remove stray param doc commit 12e65bedddbcd2407598e69fa3c6fcbcdfc67e5d Author: cody koeninger Date: 2018-02-11T04:28:22Z [SPARK-17147][STREAMING][KAFKA] prepare for merge of master commit 2ed51f1f73ee75ffd08355265a72e68e83ef592d Author: cody koeninger Date: 2018-02-11T05:19:31Z Merge branch 'master' into SPARK-17147 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20511 @omalley Thanks for your quick reply! @dongjoon-hyun Maybe we should directly bump to 1.4.3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87293/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20511 **[Test build #87293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87293/testReport)** for PR 20511 at commit [`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20568#discussion_r167419965 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java --- @@ -51,6 +51,22 @@ public void testKnownLongInputs() { Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE)); } + @Test + public void testKnownBytesInputs() { +byte[] test = "test".getBytes(StandardCharsets.UTF_8); +Assert.assertEquals(-1167338989, --- End diff -- Is it better to compare the result of murmur3 hash value by scala library? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20568#discussion_r167419960 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java --- @@ -51,6 +51,22 @@ public void testKnownLongInputs() { Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE)); } + @Test --- End diff -- It would be good to add JIRA number with a short description as a comment (e.g. `SPARK-23381 ...`) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/781/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20571 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...
Github user guoxiaolongzte commented on a diff in the pull request: https://github.com/apache/spark/pull/20557#discussion_r167419765 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -539,15 +539,15 @@ case class DescribeTableCommand( throw new AnalysisException( s"DESC PARTITION is not allowed on a temporary view: ${table.identifier}") } - describeSchema(catalog.lookupRelation(table).schema, result, header = false) + describeSchema(catalog.lookupRelation(table).schema, result, header = true) --- End diff -- The snapshot is correct fix code effect, the statistics rows does not contain the head ![2](https://user-images.githubusercontent.com/26266482/36069344-ba833c56-0f22-11e8-9ab6-26f0ae6285b7.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20571: [SPARK-23383][Build][Minor]Make a distribution should ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20571 **[Test build #87295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87295/testReport)** for PR 20571 at commit [`81c1b24`](https://github.com/apache/spark/commit/81c1b2407ceb478d6795438de82ac6afe65024c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20571: [SPARK-23383][Build][Minor]Make a distribution sh...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/20571 [SPARK-23383][Build][Minor]Make a distribution should exit with usage while detecting wrong options ## What changes were proposed in this pull request? ```shell ./dev/make-distribution.sh --name ne-1.0.0-SNAPSHOT xyz --tgz -Phadoop-2.7 +++ dirname ./dev/make-distribution.sh ++ cd ./dev/.. ++ pwd + SPARK_HOME=/Users/Kent/Documents/spark + DISTDIR=/Users/Kent/Documents/spark/dist + MAKE_TGZ=false + MAKE_PIP=false + MAKE_R=false + NAME=none + MVN=/Users/Kent/Documents/spark/build/mvn + (( 5 )) + case $1 in + NAME=ne-1.0.0-SNAPSHOT + shift + shift + (( 3 )) + case $1 in + break + '[' -z /Users/Kent/.jenv/candidates/java/current ']' + '[' -z /Users/Kent/.jenv/candidates/java/current ']' ++ command -v git + '[' /usr/local/bin/git ']' ++ git rev-parse --short HEAD + GITREV=98ea6a7 + '[' '!' -z 98ea6a7 ']' + GITREVSTRING=' (git revision 98ea6a7)' + unset GITREV ++ command -v /Users/Kent/Documents/spark/build/mvn + '[' '!' /Users/Kent/Documents/spark/build/mvn ']' ++ /Users/Kent/Documents/spark/build/mvn help:evaluate -Dexpression=project.version xyz --tgz -Phadoop-2.7 ++ grep -v INFO ++ tail -n 1 + VERSION=' -X,--debug Produce execution debug output' ``` It is better to declare the mistakes and exit with usage than `break` ## How was this patch tested? manually cc @srowen You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-23383 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20571.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20571 commit 81c1b2407ceb478d6795438de82ac6afe65024c8 Author: Kent YaoDate: 2018-02-11T03:48:30Z exit with usage --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87294/testReport)** for PR 20561 at commit [`151a92d`](https://github.com/apache/spark/commit/151a92dff074bff26ad179bedbdd4b49f345ec93). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/780/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user omalley commented on the issue: https://github.com/apache/spark/pull/20511 Sorry, I forgot to transition the jira issues for the ORC 1.4.3, so they didn't show up in the search from the notes. The list of jiras closed by the 1.4.3 release is: https://s.apache.org/Fll8 There was an issue with the reader if you had an empty column of floats/doubles (ORC-285) and a compression issue that only seemed to hit LLAP (ORC-296). We are about to start the ORC 1.5 release, but the ORC 1.4 release has been very stable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20511 I am wondering what is the difference between ORC 1.4.3 and ORC 1.4.2? Their release notes are the SAME. https://orc.apache.org/news/2018/02/09/ORC-1.4.3/ and https://orc.apache.org/news/2018/02/09/ORC-1.4.2/ Could you help us figure out the exact change JIRA lists excluded in these two releases? Should we directly upgrade it to 1.4.3? What is the release schedule for Apache ORC? Our Spark 2.4 will not be released until the second half of 2018. Which version of ORC is stable for production? I am wondering if we should always upgrade to the latest version of ORC? Or wait for more user feedbacks from the ORC community to know whether the version is stable or not? cc @dongjoon-hyun @omalley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20518: [SPARK-22119][FOLLOWUP][ML] Use spherical KMeans ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/20518#discussion_r167417459 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -745,4 +763,27 @@ private[spark] class CosineDistanceMeasure extends DistanceMeasure { override def distance(v1: VectorWithNorm, v2: VectorWithNorm): Double = { 1 - dot(v1.vector, v2.vector) / v1.norm / v2.norm } + + /** + * Updates the value of `sum` adding the `point` vector. + * @param point a `VectorWithNorm` to be added to `sum` of a cluster + * @param sum the `sum` for a cluster to be updated + */ + override def updateClusterSum(point: VectorWithNorm, sum: Vector): Unit = { +axpy(1.0 / point.norm, point.vector, sum) --- End diff -- In scala, `1.0 / 0.0` generate `Infinity`, what about directly throw an exception instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20570: [spark-23382][WEB-UI]Spark Streaming ui about the conten...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20570 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20570: [spark-23382][WEB-UI]Spark Streaming ui about the conten...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20570 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20570: [spark-23382][WEB-UI]Spark Streaming ui about the...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/20570 [spark-23382][WEB-UI]Spark Streaming ui about the contents of the for need to have hidden and show features, when the table records very much. ## What changes were proposed in this pull request? Spark Streaming ui about the contents of the for need to have hidden and show features, when the table records very much. please refer to https://github.com/apache/spark/pull/20216 fix after: ![1](https://user-images.githubusercontent.com/26266482/36068644-df029328-0f14-11e8-8350-cfdde9733ffc.png) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-23382 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20570.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20570 commit c6ffe3025af5129a807885f9d757d2ddad641b62 Author: guoxiaolongDate: 2018-02-11T02:13:05Z [spark-23382][WEB-UI]Spark Streaming ui about the contents of the form need to have hidden and show features, when the table records very much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 Not only `threshold`, the default params of `NaiveBayes`, `LogisticRegression` (maybe more, I'm looking up now) are all set in the estimator, not in their model. The models are received the default values at the end of `fit`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20516 @srowen, I think this is a function provided by spark for port use, One is that the spark user only needs to specify start port and the offset of ports (spark.port.maxRetries settings), the port binding is automatically generated by the spark. two is that when the spark user has must be bind the specified port (set spark.port.maxRetries = 0). but, Once the specified port has been bound to the system, then spark will be thrown bind exception. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20532 Thanks everyone. So just close it? Or easily leave an enabled switch like blockUpdated dose? I am all OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...
Github user guoxiaolongzte commented on a diff in the pull request: https://github.com/apache/spark/pull/20557#discussion_r167416457 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -539,15 +539,15 @@ case class DescribeTableCommand( throw new AnalysisException( s"DESC PARTITION is not allowed on a temporary view: ${table.identifier}") } - describeSchema(catalog.lookupRelation(table).schema, result, header = false) + describeSchema(catalog.lookupRelation(table).schema, result, header = true) --- End diff -- # Partition Information # col_name data_type comment Partition information also takes up two rows. I try to keep the head of the case, let rows number is displayed correctly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20567#discussion_r167415761 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1941,12 +1941,24 @@ def toPandas(self): timezone = None if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", "false").lower() == "true": +should_fall_back = False try: -from pyspark.sql.types import _check_dataframe_convert_date, \ -_check_dataframe_localize_timestamps +from pyspark.sql.types import to_arrow_schema from pyspark.sql.utils import require_minimum_pyarrow_version -import pyarrow require_minimum_pyarrow_version() +# Check if its schema is convertible in Arrow format. +to_arrow_schema(self.schema) +except Exception as e: +# Fallback to convert to Pandas DataFrame without arrow if raise some exception --- End diff -- Does this PR fall back to the original path if any exception occurs? E.g. `ImportError` happens while the current code throws an exception with the message? Would it be good to note this change, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 Thank you, @kiszk . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20511 **[Test build #87293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87293/testReport)** for PR 20511 at commit [`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/779/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20511 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87292/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #87292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87292/testReport)** for PR 20208 at commit [`29c281d`](https://github.com/apache/spark/commit/29c281dbe3c6f63614d9abc286c68e283786649b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20566 I believe this will break persistence for LogisticRegression. I believe the issue is that the `threshold` param on LogisticRegressionModel doesn't get a default directly, but only gets it during the call to `fit` on LogisticRegression. This is currently fine because the Model can only be created by fitting or by being read from disk and in both case some value gets set for threshold. With this change that's no longer the case. Here's a test to confirm, https://github.com/apache/spark/commit/5db2108224accdf848b41ef0d8d1c312b49f49c6. I believe LinearRegression may have a similar issue. Our current tests don't seem to cover this kind of thing so I think we should improve test coverage if we want to make this kind of change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87291/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20511 **[Test build #87291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87291/testReport)** for PR 20511 at commit [`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20511 **[Test build #87291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87291/testReport)** for PR 20511 at commit [`eeef040`](https://github.com/apache/spark/commit/eeef04073dcf6092e319547be8960651f6fcd9cb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #87292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87292/testReport)** for PR 20208 at commit [`29c281d`](https://github.com/apache/spark/commit/29c281dbe3c6f63614d9abc286c68e283786649b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/778/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/777/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20565: SPAR[SPARK-23379][SQL] remove redundant metastore...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20565#discussion_r167411254 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -292,10 +292,12 @@ private[hive] class HiveClientImpl( } override def setCurrentDatabase(databaseName: String): Unit = withHiveState { -if (databaseExists(databaseName)) { - state.setCurrentDatabase(databaseName) -} else { - throw new NoSuchDatabaseException(databaseName) +if (state.getCurrentDatabase != databaseName) { + if (databaseExists(databaseName)) { --- End diff -- This PR uses an additional `getCurrentDatabase` to avoid `databaseExists`. Can we have a more specific in the title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20519 >yea but we can't simply flush and ignore the stdout specifically from sitecustomize unless we define a kind of an additional protocol like this because we can't simply distinguish if the output We might be able to distinguish between sitecustomize.py output and daemon.py output. Assuming the code in the sitecustomize.py is not multi-threaded, we can assume all output from sitecustomize.py comes *before* any output from daemon.py. Therefore, if daemon.py first prints a "magic number" or some other string that is unlikely to show up in sitecustomize.py output, PythonWorkerFactory.startDaemon() will know when daemon.py output starts. daemon.py would print the port number only after printing this magic value. For example: daemon port: ^@^@\325 Once the scala code sees "daemon port: " in the launched process's stdout, it knows the next 4 bytes are the port number. However, if sitecustomize.py starts multi-threaded code (and if that's even possible, that's a corner-corner-corner case), its output could potentially be interleaved with the daemon's output. Also, I am not sure sitecustomize.py output is guaranteed to show up first in stdout, but it seems reasonable that it would. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20560 thank you @gatorsmile for taking a look at this. Let me know if there is something I can/should improve. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20560 @mgaido91 Yeah, we definitely should include this rule. We just need more careful review and comprehensive test cases. Thanks for your work! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87290/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20537 **[Test build #87290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)** for PR 20537 at commit [`23abfb0`](https://github.com/apache/spark/commit/23abfb0e01f98dc4bfbd3fb9f04e487ec9af052c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org