[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17300 Thanks both of you for review, I have addressed the comments and modified the test case. Please help calling jenkins for test, because I can't trigger that. Thanks again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17868: [CORE]Add new unit tests to ShuffleSuite
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17868 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/17300#discussion_r114935050 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -555,12 +555,15 @@ private[spark] class BlockManager( /** * Return a list of locations for the given block, prioritizing the local machine since - * multiple block managers can share the same host. + * multiple block managers can share the same host, followed by hosts on the same rack. */ private def getLocations(blockId: BlockId): Seq[BlockManagerId] = { val locs = Random.shuffle(master.getLocations(blockId)) val (preferredLocs, otherLocs) = locs.partition { loc => blockManagerId.host == loc.host } -preferredLocs ++ otherLocs +val (sameRackLocs, differentRackLocs) = otherLocs.partition { + loc => blockManagerId.topologyInfo == loc.topologyInfo --- End diff -- Modified, thanks a lot for the good advice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17868: [CORE]Add new unit tests to ShuffleSuite
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/17868 [CORE]Add new unit tests to ShuffleSuite ## What changes were proposed in this pull request? This PR update to two: 1.adds the new unit tests. testing would be performed when there is no shuffle stage, shuffle will not generate the data file and the index files. 2.Modify the '[SPARK-4085] rerun map stage if reduce stage cannot find its local shuffle file' unit test, parallelize is 1 but not is 2, Check the index file and delete. ## How was this patch tested? The new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/heary-cao/spark ShuffleSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17868.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17868 commit 874a3b1f1f1da21adf8fa682aa082efb5a0efb8f Author: caoxuewenDate: 2017-05-05T05:44:36Z Add new unit tests to ShuffleSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17861: Remove excess quotes in Windows executable
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17861 @jarrettmeyer I think we should create a JIRA for this as it does look non-trivial fix although the line diff is single. Please refer http://spark.apache.org/contributing.html. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17861: Remove excess quotes in Windows executable
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17861#discussion_r114934079 --- Diff: bin/spark-class2.cmd --- @@ -64,7 +64,7 @@ if not "x%JAVA_HOME%"=="x" ( rem The launcher library prints the command to be executed in a single line suitable for being rem executed by the batch interpreter. So read all the output of the launcher into a variable. set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt -"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT% --- End diff -- I found this problem when I tested some cases on Windows before but I just thought it was my wrong environmental setup. I think we should change `set RUNNER="%JAVA_HOME%\bin\java"` to `set RUNNER=%JAVA_HOME%\bin\java`. cc @felixcheung and @shivaram, I just realised I was cc'ed in the PR (https://github.com/apache/spark/pull/16596) but it looks I missed this as well ... I can reproduce this problem as below: ```cmd C:\...\spark>set JAVA_HOME JAVA_HOME=C:\Program Files\Java\jdk1.8.0_121 C:\...\spark>.\bin\spark-shell '""C:\Program' is not recognized as an internal or external command, operable program or batch file. ``` ```cmd echo "%RUNNER%" ``` prints ```cmd ""C:\Program Files\Java\jdk1.8.0_121\bin\java"" ``` It looks cmd does not recognise the space. To double check, I copied the jdk into `C:\Java` and then ran the commands as below: ```cmd C:\...\spark>set JAVA_HOME=C:\Java\jdk1.8.0_121 C:\...\spark>.\bin\spark-shell ... Spark context Web UI available at http://10.0.2.15:4040 Spark context available as 'sc' (master = local[*], app id = local-1493961061248). Spark session available as 'spark'. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_121) Type in expressions to have them evaluated. Type :help for more information. ... ``` ```cmd echo "%RUNNER%" ``` prints ```cmd ""C:\Java\jdk1.8.0_121\bin\java"" ``` **After fixing the line I suggested** ```diff if not "x%JAVA_HOME%"=="x" ( - set RUNNER="%JAVA_HOME%\bin\java" + set RUNNER=%JAVA_HOME%\bin\java ) else ( ``` ```cmd C:\...\spark>set JAVA_HOME JAVA_HOME=C:\Program Files\Java\jdk1.8.0_121 C:\...\spark>.\bin\spark-shell ... Spark context Web UI available at http://10.0.2.15:4040 Spark context available as 'sc' (master = local[*], app id = local-1493962115332). Spark session available as 'spark'. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_121) Type in expressions to have them evaluated. Type :help for more information. ... ``` ```cmd echo "%RUNNER%" ``` prints ```cmd "C:\Program Files\Java\jdk1.8.0_121\bin\java" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17844 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76471/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17844 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17844 **[Test build #76471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76471/testReport)** for PR 17844 at commit [`9248a5e`](https://github.com/apache/spark/commit/9248a5e005c000c42e5a233c9f3ca37b51b6c95d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17864 @sethah Thanks for summarizing the previous discussions. What are you suggesting for this PR? I think it makes sense to log a warning when imputing integer types with mean. In addition, perhaps we can set "median" as the default strategy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17678 **[Test build #76480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76480/testReport)** for PR 17678 at commit [`aef3481`](https://github.com/apache/spark/commit/aef3481b125b49343caa46bb2f78cd634369a8a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17678 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114932485 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. +#' +#' @name alias +#' @rdname alias +#' @param object x a Column or a SparkDataFrame +#' @param data new name to use --- End diff -- that's true actually. if you think it's useful we could always have them in separate rd. I'm pretty sure `@rdname` needs to match `@aliases` to fix multiple link bug https://issues.apache.org/jira/browse/SPARK-18825; which means we can't have multiple functions in the same rd - each has to have its own. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76470/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76470/testReport)** for PR 17770 at commit [`b29ded3`](https://github.com/apache/spark/commit/b29ded3f806616e43f260db4f133c7bbe3a8fb3b). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17658 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76469/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17658 **[Test build #76469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76469/testReport)** for PR 17658 at commit [`dad87a6`](https://github.com/apache/spark/commit/dad87a64c42de22e1a7a565d9b922811a759dff8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114931344 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. --- End diff -- I still believe that AS is applicable to both. Essentially what we do is: ``` SELECT column AS new_column FROM table ``` and ``` (SELECT * FROM table) AS new_table ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114931185 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. +#' +#' @name alias +#' @rdname alias +#' @param object x a Column or a SparkDataFrame +#' @param data new name to use --- End diff -- To be honest I find both equally confusing, so if you think that a single annotation is better, I am happy to oblige. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17865 (Thank you @gatorsmile for triggering the test) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76478/testReport)** for PR 17770 at commit [`4ff9610`](https://github.com/apache/spark/commit/4ff9610133fca947fab23af6ea67e6c7af50e8d2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76478/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929441 --- Diff: python/pyspark/sql/functions.py --- @@ -153,7 +173,7 @@ def _(): # math functions that take two arguments as input _binary_mathfunctions = { 'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' + - 'polar coordinates (r, theta).', + 'polar coordinates (r, theta). Units in radians.', --- End diff -- I am unclear we should note this for every instance and users really gets confused as I see these use Scala/Java's built-in library. I wonder if there is an example that supports this differently in other libraries? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929075 --- Diff: python/pyspark/sql/functions.py --- @@ -131,9 +152,8 @@ def _(): 'var_pop': 'Aggregate function: returns the population variance of the values in a group.', 'skewness': 'Aggregate function: returns the skewness of the values in a group.', 'kurtosis': 'Aggregate function: returns the kurtosis of the values in a group.', -'collect_list': 'Aggregate function: returns a list of objects with duplicates.', -'collect_set': 'Aggregate function: returns a set of objects with duplicate elements' + - ' eliminated.', +'collect_list': _collect_list_doc, --- End diff -- Let's wrap it (and the same instances) with `ignore_unicode_prefix` like we(you) did before. Please refer https://github.com/apache/spark/blob/8ddf0d2a60795a2306f94df8eac6e265b1fe5230/python/pyspark/rdd.py#L146-L156 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114930599 --- Diff: python/pyspark/sql/functions.py --- @@ -910,8 +941,8 @@ def weekofyear(col): """ Extract the week number of a given date as integer. ->>> df = spark.createDataFrame([('2015-04-08',)], ['a']) ->>> df.select(weekofyear(df.a).alias('week')).collect() +>>> df = spark.createDataFrame([('2015-04-08',)], ['time']) --- End diff -- Let's use `d` for `DateType` or `datetime.date` similarly with other existing names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929597 --- Diff: python/pyspark/sql/functions.py --- @@ -206,17 +226,20 @@ def _(): @since(1.3) def approxCountDistinct(col, rsd=None): """ -.. note:: Deprecated in 2.1, use approx_count_distinct instead. +.. note:: Deprecated in 2.1, use :func:`approx_count_distinct instead`. --- End diff -- Probably `` :func:`approx_count_distinct` ``? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929803 --- Diff: python/pyspark/sql/functions.py --- @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz): @since(1.5) def to_utc_timestamp(timestamp, tz): """ -Given a timestamp, which corresponds to a certain time of day in the given timezone, returns -another timestamp that corresponds to the same time of day in UTC. +Given a `timestamp`, which corresponds to a time of day in the timezone `tz`, --- End diff -- Should this be ``` ``timestamp`` ``` not `` `timestamp` ``? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929993 --- Diff: python/pyspark/sql/functions.py --- @@ -67,9 +67,16 @@ def _(): _.__doc__ = 'Window function: ' + doc return _ +_lit_doc = """ +Creates a :class:`Column` of literal value. Supports basic types like :class:`IntegerType`, +:class:`FloatType`, :class:`BooleanType`, and :class:`StringType` --- End diff -- I would like to keep this identical with the one in `functions.scala` to reduce overhead when someone sweeps the same documentation changes across APIs in other languages. If the additional informations are Python-specific, let's add it in `::note`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929689 --- Diff: python/pyspark/sql/functions.py --- @@ -456,7 +479,7 @@ def monotonically_increasing_id(): def nanvl(col1, col2): """Returns col1 if it is not NaN, or col2 if col1 is NaN. -Both inputs should be floating point columns (DoubleType or FloatType). +Both inputs should be floating point columns (:class:`DoubleType` or FloatType). --- End diff -- I think we should link both `DoubleType` and `FloatType ` all or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114929646 --- Diff: python/pyspark/sql/functions.py --- @@ -397,7 +420,7 @@ def input_file_name(): @since(1.6) def isnan(col): -"""An expression that returns true iff the column is NaN. +"""An expression that returns true if the column is NaN. --- End diff -- I think "iff" is the abbreviation for "if and only if". I don't think it is worth changing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r114930366 --- Diff: python/pyspark/sql/functions.py --- @@ -793,8 +824,8 @@ def date_format(date, format): .. note:: Use when ever possible specialized functions like `year`. These benefit from a specialized implementation. ->>> df = spark.createDataFrame([('2015-04-08',)], ['a']) ->>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect() +>>> df = spark.createDataFrame([('2015-04-08',)], ['time']) --- End diff -- Okay. I guess it is a documentation improvement to use a bit more meaningful name over an arbitrary name `a`. Let's match these to existing names such as `ts` or `t` (abbreviation for timestamp) or `dt` (abbreviation for `datetime.datetime`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17867: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17867 **[Test build #76479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76479/testReport)** for PR 17867 at commit [`4922c03`](https://github.com/apache/spark/commit/4922c03a0ba0ed7386198b3e9e068352cc4378f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76476/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76476/testReport)** for PR 17770 at commit [`2af9e2b`](https://github.com/apache/spark/commit/2af9e2bfc0fc85840dfe04e886b293f1ec962b0d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17300#discussion_r114929963 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -555,12 +555,15 @@ private[spark] class BlockManager( /** * Return a list of locations for the given block, prioritizing the local machine since - * multiple block managers can share the same host. + * multiple block managers can share the same host, followed by hosts on the same rack. */ private def getLocations(blockId: BlockId): Seq[BlockManagerId] = { val locs = Random.shuffle(master.getLocations(blockId)) val (preferredLocs, otherLocs) = locs.partition { loc => blockManagerId.host == loc.host } -preferredLocs ++ otherLocs +val (sameRackLocs, differentRackLocs) = otherLocs.partition { + loc => blockManagerId.topologyInfo == loc.topologyInfo --- End diff -- If `blockManagerId.topologyInfo` is `None`, we will prefer the locations with empty `topologyInfo`. It is slightly different with what the shuffling wants to do here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114929845 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. +#' +#' @name alias +#' @rdname alias +#' @param object x a Column or a SparkDataFrame +#' @param data new name to use --- End diff -- that we did, at one point. I think the feedback is we could have one line for parameter (`object`) and return value could be more but which line matches which input parameter type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17867: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated me...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/17867 [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods for ML ## What changes were proposed in this pull request? Remove ML methods we deprecated in 2.1. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-20606 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17867 commit e5b1337f3a09995568b69dde83dba90f9c01fcfe Author: Yanbo LiangDate: 2017-05-05T03:18:53Z Remove deprecated methods for ML. commit 4922c03a0ba0ed7386198b3e9e068352cc4378f5 Author: Yanbo Liang Date: 2017-05-05T04:12:02Z Add stuff to MimaExcludes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114929683 --- Diff: core/src/test/resources/HistoryServerExpectations/completed_app_list_json_expectation.json --- @@ -22,6 +23,7 @@ "duration" : 101795, "sparkUser" : "jose", "completed" : true, +"appSparkVersion" : "", --- End diff -- It's not really about the default value; these tests replay the log files, which contain the Spark version, so I would expect the data retrieved through the API to contain the version that was recorded in the event log. Another way of saying that probably there's a bug somewhere in your code that is preventing the data from the event log from being exposed correctly through the REST API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114929528 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. +#' +#' @name alias +#' @rdname alias +#' @param object x a Column or a SparkDataFrame +#' @param data new name to use --- End diff -- Wouldn't be better to annotate actual implementations? To get something like this: ![image](https://cloud.githubusercontent.com/assets/1554276/25733425/295f465e-3159-11e7-87b7-d959c9bf3352.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r114929436 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -36,7 +36,10 @@ import org.apache.spark.util.{Utils, VersionUtils} * Params for [[Word2Vec]] and [[Word2VecModel]]. */ private[feature] trait Word2VecBase extends Params - with HasInputCol with HasOutputCol with HasMaxIter with HasStepSize with HasSeed { + with HasInputCol with HasOutputCol with HasMaxIter with HasStepSize with HasSeed with HasSolver { + // We currently support SkipGram with Hierarchical Softmax and + // Continuous Bag of Words with Negative Sampling + private val supportedModels = Array("sg-hs", "cbow-ns") --- End diff -- how is this used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76478/testReport)** for PR 17770 at commit [`4ff9610`](https://github.com/apache/spark/commit/4ff9610133fca947fab23af6ea67e6c7af50e8d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114928953 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. +#' +#' @name alias +#' @rdname alias +#' @param object x a Column or a SparkDataFrame +#' @param data new name to use --- End diff -- shouldn't we have a `@return` here? perhaps to say ``` Returns a new SparkDataFrame or Column with an alias set. For Column, equivalent to SQL "AS" keyword. @return a new SparkDataFrame or Column ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17865 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17865 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76477/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17865 **[Test build #76477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76477/testReport)** for PR 17865 at commit [`91515c6`](https://github.com/apache/spark/commit/91515c620287e193c6d208038025fe194740e4d2). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17865 **[Test build #76477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76477/testReport)** for PR 17865 at commit [`91515c6`](https://github.com/apache/spark/commit/91515c620287e193c6d208038025fe194740e4d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114928655 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. --- End diff -- I guess we don't say `return a new Column` but more generally `return a Column` and in other cases we say `return a new SparkDataFrame` so I guess it's a difference in wording. I think what you propose is fine, though do you think it's confusing to say `Equivalent to SQL "AS" keyword.` because that makes sense only for Column and not the whole dataframe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17865 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17678 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76476/testReport)** for PR 17770 at commit [`2af9e2b`](https://github.com/apache/spark/commit/2af9e2bfc0fc85840dfe04e886b293f1ec962b0d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17866: [SPARK-20605][Core][Yarn][Mesos] Deprecate not used AM a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17866 **[Test build #76475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76475/testReport)** for PR 17866 at commit [`3c9120e`](https://github.com/apache/spark/commit/3c9120e51a510f858dbcae4da69b53777992fc9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17866: [SPARK-20605][Core][Yarn][Mesos] Deprecate not us...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/17866 [SPARK-20605][Core][Yarn][Mesos] Deprecate not used AM and executor port configuration ## What changes were proposed in this pull request? After SPARK-10997, client mode Netty RpcEnv doesn't require to start server, so port configurations are not used any more, here propose to remove these two configurations: "spark.executor.port" and "spark.am.port". ## How was this patch tested? Existing UTs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-20605 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17866 commit 3c9120e51a510f858dbcae4da69b53777992fc9e Author: jerryshaoDate: 2017-05-05T03:23:47Z deprecate not used AM and executor port configuration Change-Id: I1280b8d803e22bd2084bdb4f49580c7955a2f476 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76474/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76474/testReport)** for PR 17770 at commit [`a855182`](https://github.com/apache/spark/commit/a855182d8f5037daab718820775cbcf8add01546). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user zero323 closed the pull request at: https://github.com/apache/spark/pull/17825 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
GitHub user zero323 reopened a pull request: https://github.com/apache/spark/pull/17825 [SPARK-20550][SPARKR] R wrapper for Dataset.alias ## What changes were proposed in this pull request? - Add SparkR wrapper for `Dataset.alias`. - Adjust roxygen annotations for `functions.alias` (including example usage). ## How was this patch tested? Unit tests, `check_cran.sh`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zero323/spark SPARK-20550 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17825.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17825 commit 944a3ec791a8f103093e24511e895a4ce60970d8 Author: zero323Date: 2017-05-01T08:59:24Z Initial implementation commit 5e9f8da45c432e0752e5e78556add33e0a6d0557 Author: zero323 Date: 2017-05-01T22:27:11Z Adjust argument annotations - Remove param annotations from dataframe.alias - Use generic annotations for column.alias commit 73133f9442ad8317fb12b600221962bf47d8a95c Author: zero323 Date: 2017-05-01T22:31:26Z Add usage examples to column.alias commit 848eeefc1f18c6aabaf65e6efed259a2fa5c19c3 Author: zero323 Date: 2017-05-01T22:34:51Z Remove return type annotation commit 05c0781110b42a940e06cc31650449a8715e85c9 Author: zero323 Date: 2017-05-02T02:00:13Z Fix typo commit 22d7cf661bb54a8f7f9c660e1d914802f1eb4153 Author: zero323 Date: 2017-05-02T04:25:34Z Move dontruns to their own lines commit 22e1292557f1a5597cde6337267a099bbcdc07aa Author: zero323 Date: 2017-05-02T04:27:11Z Extend param description commit 6bb3d914960d1cf63e582a7d732ca80ed321e9c5 Author: zero323 Date: 2017-05-02T04:33:34Z Add type annotations to since notes commit b3c1a416a16a9d32649edda2b66fc9c3476358a5 Author: zero323 Date: 2017-05-02T04:38:51Z Attach alias test to select-with-column test case commit 40fedcb8c41bc84deead205aad81e84c095045b5 Author: zero323 Date: 2017-05-02T04:44:45Z Extend description commit 1e1ad443751fc3dc93487e5385cc934feb93f631 Author: zero323 Date: 2017-05-03T00:25:15Z Move alias documentation to generics commit 2d5ace288f2443327696823c343c095f0d8d64ca Author: zero323 Date: 2017-05-04T01:13:45Z Add family annotation commit 5fe5495580eb3852ea5092a34dc2334c0e45c9b7 Author: zero323 Date: 2017-05-04T06:32:54Z Check that stats::alias is not masked commit 09f9ccaf5e66a400d26b4ab6d600d951305d5fd3 Author: zero323 Date: 2017-05-04T07:04:52Z Fix style commit f1c74f338b8df865a5e8b9a6e281211aa27af7d3 Author: zero323 Date: 2017-05-04T10:17:42Z vim --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114925159 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. --- End diff -- How about? ``` #' Return a new Column or a SparkDataFrame with a name set. Equivalent to SQL "AS" keyword. ``` Is the `Column` new? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76474/testReport)** for PR 17770 at commit [`a855182`](https://github.com/apache/spark/commit/a855182d8f5037daab718820775cbcf8add01546). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76472/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76472/testReport)** for PR 17770 at commit [`8c8fe1e`](https://github.com/apache/spark/commit/8c8fe1e20609a373f164e8b2252a970e4e468eb3). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17300 **[Test build #76473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76473/testReport)** for PR 17300 at commit [`56f5231`](https://github.com/apache/spark/commit/56f5231626cceb114c45413b7b340ee719c3f2f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76472/testReport)** for PR 17770 at commit [`8c8fe1e`](https://github.com/apache/spark/commit/8c8fe1e20609a373f164e8b2252a970e4e468eb3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17300 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17678 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76467/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17678 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17844 **[Test build #76471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76471/testReport)** for PR 17844 at commit [`9248a5e`](https://github.com/apache/spark/commit/9248a5e005c000c42e5a233c9f3ca37b51b6c95d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17678 **[Test build #76467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76467/testReport)** for PR 17678 at commit [`aef3481`](https://github.com/apache/spark/commit/aef3481b125b49343caa46bb2f78cd634369a8a2). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114924076 --- Diff: R/pkg/R/generics.R --- @@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) #' @export setGeneric("agg", function (x, ...) { standardGeneric("agg") }) +#' alias +#' +#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" keyword. --- End diff -- right - I think again we should emphasize on returning a new SparkDataFrame --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 @srinathshankar also thinks it's weird to add a barrier node. I suggest @hvanhovell and @srinathshankar duke it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17825 could you close/reopen to trigger appveyor again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17840: [SPARK-20574][ML] Allow Bucketizer to handle non-...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17840 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76470/testReport)** for PR 17770 at commit [`b29ded3`](https://github.com/apache/spark/commit/b29ded3f806616e43f260db4f133c7bbe3a8fb3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...
Github user redsanket commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114924015 --- Diff: core/src/test/resources/HistoryServerExpectations/completed_app_list_json_expectation.json --- @@ -22,6 +23,7 @@ "duration" : 101795, "sparkUser" : "jose", "completed" : true, +"appSparkVersion" : "", --- End diff -- probably I could change the default value, looks like ok will do it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17840: [SPARK-20574][ML] Allow Bucketizer to handle non-Double ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17840 Merged into master and branch-2.0. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17658 **[Test build #76469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76469/testReport)** for PR 17658 at commit [`dad87a6`](https://github.com/apache/spark/commit/dad87a64c42de22e1a7a565d9b922811a759dff8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17865 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user map222 commented on the issue: https://github.com/apache/spark/pull/17865 @HyukjinKwon I ended up not making examples for the aggregate functions, as I didn't make a good dataframe to demonstrate them. I could add more examples for the string functions if you think that is a good idea. There are dozens of functions that could be documented, I'm not sure how far we want to go, or which ones need it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
GitHub user map222 opened a pull request: https://github.com/apache/spark/pull/17865 [SPARK-20456][Docs] Add examples for functions collection for pyspark ## What changes were proposed in this pull request? This adds documentation to many functions in pyspark.sql.functions.py: `upper`, `lower`, `reverse`, `unix_timestamp`, `from_unixtime`, `rand`, `randn`, `collect_list`, `collect_set`, `lit` Add units to the trigonometry functions. Renames columns in datetime examples to be more informative. Adds links between some functions. ## How was this patch tested? `./dev/lint-python` `python python/pyspark/sql/functions.py` `./python/run-tests.py --module pyspark-sql` You can merge this pull request into a Git repository by running: $ git pull https://github.com/map222/spark spark-20456 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17865 commit 91515c620287e193c6d208038025fe194740e4d2 Author: Michael PattersonDate: 2017-05-05T00:26:56Z First revision: trigonometry units, lit, collect_set, collect_list, unix_timestamp, from_unixtime --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r114922000 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -60,12 +61,19 @@ private[kinesis] class KinesisInputDStream[T: ClassTag]( val isBlockIdValid = blockInfos.map { _.isBlockIdValid() }.toArray logDebug(s"Creating KinesisBackedBlockRDD for $time with ${seqNumRanges.length} " + s"seq number ranges: ${seqNumRanges.mkString(", ")} ") + + /** + * Construct the Kinesis read configs from streaming context + * and pass to KinesisBackedBlockRDD + */ + val kinesisReadConfigs = KinesisReadConfigurations(ssc) + new KinesisBackedBlockRDD( context.sc, regionName, endpointUrl, blockIds, seqNumRanges, isBlockIdValid = isBlockIdValid, -retryTimeoutMs = ssc.graph.batchDuration.milliseconds.toInt, messageHandler = messageHandler, -kinesisCreds = kinesisCreds) +kinesisCreds = kinesisCreds, +kinesisReadConfigs = kinesisReadConfigs) --- End diff -- I think it would be sufficient to change this to ```scala kinesisReadConfigs = KinesisReadConfigurations(ssc)) ``` and omit lines 65-70. I don't think a comment is necessary here, the code is pretty straightforward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user budde commented on the issue: https://github.com/apache/spark/pull/17467 Fair enough. I took another look and I think I may have been thinking of the way things worked in an earlier revision of this code. I think the case class is reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_INSTANCES...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17859 Ok, I will open another pr to remove it. Thanks a lot both of you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_IN...
Github user ConeyLiu closed the pull request at: https://github.com/apache/spark/pull/17859 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...
Github user redsanket commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114921697 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -283,10 +283,15 @@ private[spark] object EventLoggingListener extends Logging { * * @param logStream Raw output stream to the event log file. */ - def initEventLog(logStream: OutputStream): Unit = { + def initEventLog(logStream: OutputStream, testing: Boolean, + loggedEvents: ArrayBuffer[JValue]): Unit = { val metadata = SparkListenerLogStart(SPARK_VERSION) -val metadataJson = compact(JsonProtocol.logStartToJson(metadata)) + "\n" +val eventJson = JsonProtocol.logStartToJson(metadata) +val metadataJson = compact(eventJson) + "\n" logStream.write(metadataJson.getBytes(StandardCharsets.UTF_8)) +if (testing && loggedEvents != null) { + loggedEvents += eventJson --- End diff -- I thought the loggedEvents only takes json value. Also the loggedEvents are generated here as a part of spark context and probably through other sources. The ReplayListenerSuite however tests the original events with the replay events (here the replay events are written to the event log but however the loggerEvents will not have the SparkListenerLogStart event as this is not a part of SparkContext if I understand it correctly). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...
Github user redsanket commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114921013 --- Diff: core/src/test/resources/HistoryServerExpectations/completed_app_list_json_expectation.json --- @@ -22,6 +23,7 @@ "duration" : 101795, "sparkUser" : "jose", "completed" : true, +"appSparkVersion" : "", --- End diff -- I am not sure the if the tests hit this code path https://github.com/apache/spark/pull/17658/files#diff-a7befb99e7bd7e3ab5c46c2568aa5b3eR474, so they take the default value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 Note: since in `Transformer`, there might be other manipulation to the dataset like dropping NaN values. The idea above won't work under that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17770 Thanks @rxin @marmbrus @hvanhovell @cloud-fan It is reasonable to me. I'll do eliminate the path of `resolveOperators`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @cloud-fan This is not about using python UDF, it is to allow pyspark to use java UDF (no python daemon will be launched). So actually it would improve the performance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 The bunch of projections will be collapsed in optimization. So it doesn't affect query execution. However, every `withColumn` call creates new `DataFrame` along with a projection on previous logical plan. It is costly by creating new query execution, analyzing logical plan, creating encoder, etc. So the improvement is coming from saving the cost by doing this one time with `withColumns`, instead of multiple `withColumn`. It can benefit other transformers that could work on multiple cols. I even have an idea to revamp the interface of `Transformer`. Because the transformation in `Transformer` is actually ending with a `withColumn` call to add/replace column. They are actually transforming columns in the dataset. But the performance difference is obvious only when the number of transformation stages is large enough like the example of many `Bucketizer`s. So it may not worth doing that. Just a thought. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17300 Will merge when tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17222 Hi @zjffdu thanks for working on it! But I'm not sure how useful this feature will be. AFAIK most users use scala/java UDF instead of Python UDF because it's too slow. We are working on a project to improve the communication between JVM and Python process, which may add a new Python UDF interface and also affect the python UDAF design. Can you hold this PR for a while? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_INSTANCES...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17859 > Do we need remove the comments from template config? Ah, that would be a good idea. I also noticed it's still used in `YarnSparkHadoopUtil.scala`, so that could be removed too. I also took a closer look at SPARK-17979 and this particular env variable wasn't removed in that change; seems it was removed much earlier (SPARK-9092 as far as I can tell), so looks it isn't very widely used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17300 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_INSTANCES...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17859 @vanzin Thanks a lot for you review. Do we need remove the comments from template config? It doesn't work anymore in current version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76466/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org