date:20170504

[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...

2017-05-04 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17300
  
Thanks both of you for review, I have addressed the comments and modified 
the test case. Please help calling jenkins for test, because I can't trigger 
that.  Thanks again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17868: [CORE]Add new unit tests to ShuffleSuite

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17868
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-05-04 Thread ConeyLiu

Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17300#discussion_r114935050
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -555,12 +555,15 @@ private[spark] class BlockManager(
 
   /**
* Return a list of locations for the given block, prioritizing the 
local machine since
-   * multiple block managers can share the same host.
+   * multiple block managers can share the same host, followed by hosts on 
the same rack.
*/
   private def getLocations(blockId: BlockId): Seq[BlockManagerId] = {
 val locs = Random.shuffle(master.getLocations(blockId))
 val (preferredLocs, otherLocs) = locs.partition { loc => 
blockManagerId.host == loc.host }
-preferredLocs ++ otherLocs
+val (sameRackLocs, differentRackLocs) = otherLocs.partition {
+  loc => blockManagerId.topologyInfo == loc.topologyInfo
--- End diff --

Modified, thanks a lot for the good advice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17868: [CORE]Add new unit tests to ShuffleSuite

2017-05-04 Thread heary-cao

GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/17868

[CORE]Add new unit tests to ShuffleSuite

## What changes were proposed in this pull request?

This PR update to two:
1.adds the new unit tests.
  testing would be performed when there is no shuffle stage, 
  shuffle will not generate the data file and the index files.
2.Modify the '[SPARK-4085] rerun map stage if reduce stage cannot find its 
local shuffle file' unit test, 
  parallelize is 1 but not is 2, Check the index file and delete.

## How was this patch tested?
The new unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark ShuffleSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17868.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17868


commit 874a3b1f1f1da21adf8fa682aa082efb5a0efb8f
Author: caoxuewen 
Date:   2017-05-05T05:44:36Z

Add new unit tests to ShuffleSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17861: Remove excess quotes in Windows executable

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17861
  
@jarrettmeyer I think we should create a JIRA for this as it does look 
non-trivial fix although the line diff is single. Please refer 
http://spark.apache.org/contributing.html.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17861: Remove excess quotes in Windows executable

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17861#discussion_r114934079
  
--- Diff: bin/spark-class2.cmd ---
@@ -64,7 +64,7 @@ if not "x%JAVA_HOME%"=="x" (
 rem The launcher library prints the command to be executed in a single 
line suitable for being
 rem executed by the batch interpreter. So read all the output of the 
launcher into a variable.
 set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
-"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" 
org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%
--- End diff --

I found this problem when I tested some cases on Windows before but I just 
thought it was my wrong environmental setup.

I think we should change `set RUNNER="%JAVA_HOME%\bin\java"` to `set 
RUNNER=%JAVA_HOME%\bin\java`. cc @felixcheung and @shivaram, I just realised I 
was cc'ed in the PR (https://github.com/apache/spark/pull/16596) but it looks I 
missed this as well ...

I can reproduce this problem as below:

```cmd
C:\...\spark>set JAVA_HOME
JAVA_HOME=C:\Program Files\Java\jdk1.8.0_121

C:\...\spark>.\bin\spark-shell
'""C:\Program' is not recognized as an internal or external command,
operable program or batch file.
```

```cmd
echo "%RUNNER%"
```

prints

```cmd
""C:\Program Files\Java\jdk1.8.0_121\bin\java""
```

It looks cmd does not recognise the space.


To double check, I copied the jdk into `C:\Java` and then ran the commands 
as below:

```cmd
C:\...\spark>set JAVA_HOME=C:\Java\jdk1.8.0_121

C:\...\spark>.\bin\spark-shell

...

Spark context Web UI available at http://10.0.2.15:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1493961061248).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
...
```


```cmd
echo "%RUNNER%"
```

prints 

```cmd
""C:\Java\jdk1.8.0_121\bin\java""
```


**After fixing the line I suggested**

```diff
 if not "x%JAVA_HOME%"=="x" (
-   set RUNNER="%JAVA_HOME%\bin\java"
+   set RUNNER=%JAVA_HOME%\bin\java
 ) else (
```


```cmd
C:\...\spark>set JAVA_HOME
JAVA_HOME=C:\Program Files\Java\jdk1.8.0_121

C:\...\spark>.\bin\spark-shell
...
Spark context Web UI available at http://10.0.2.15:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1493962115332).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
...
```


```cmd
echo "%RUNNER%"
```

prints 

```cmd
"C:\Program Files\Java\jdk1.8.0_121\bin\java"
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17844
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76471/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17844
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17844
  
**[Test build #76471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76471/testReport)**
 for PR 17844 at commit 
[`9248a5e`](https://github.com/apache/spark/commit/9248a5e005c000c42e5a233c9f3ca37b51b6c95d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

2017-05-04 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17864
  
@sethah Thanks for summarizing the previous discussions. 
What are you suggesting for this PR? I think it makes sense to log a 
warning when imputing integer types with mean. In addition, perhaps we can set 
"median" as the default strategy. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17678
  
**[Test build #76480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76480/testReport)**
 for PR 17678 at commit 
[`aef3481`](https://github.com/apache/spark/commit/aef3481b125b49343caa46bb2f78cd634369a8a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-04 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17678
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114932485
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

that's true actually.
if you think it's useful we could always have them in separate rd.
I'm pretty sure `@rdname` needs to match `@aliases` to fix multiple link 
bug https://issues.apache.org/jira/browse/SPARK-18825; which means we can't 
have multiple functions in the same rd - each has to have its own.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76470/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76470/testReport)**
 for PR 17770 at commit 
[`b29ded3`](https://github.com/apache/spark/commit/b29ded3f806616e43f260db4f133c7bbe3a8fb3b).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17658
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76469/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17658
  
**[Test build #76469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76469/testReport)**
 for PR 17658 at commit 
[`dad87a6`](https://github.com/apache/spark/commit/dad87a64c42de22e1a7a565d9b922811a759dff8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323

Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114931344
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

I still believe that AS is applicable to both. Essentially what we do is:

```
SELECT column AS new_column FROM table
```

and

```
(SELECT * FROM table) AS new_table
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323

Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114931185
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

To be honest I find both equally confusing, so if you think that a single 
annotation is better, I am happy to oblige.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17865
  
(Thank you @gatorsmile for triggering the test)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76478/testReport)**
 for PR 17770 at commit 
[`4ff9610`](https://github.com/apache/spark/commit/4ff9610133fca947fab23af6ea67e6c7af50e8d2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76478/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929441
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -153,7 +173,7 @@ def _():
 # math functions that take two arguments as input
 _binary_mathfunctions = {
 'atan2': 'Returns the angle theta from the conversion of rectangular 
coordinates (x, y) to' +
- 'polar coordinates (r, theta).',
+ 'polar coordinates (r, theta). Units in radians.',
--- End diff --

I am unclear we should note this for every instance and users really gets 
confused as I see these use Scala/Java's built-in library. I wonder if there is 
an example that supports this differently in other libraries?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929075
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -131,9 +152,8 @@ def _():
 'var_pop':  'Aggregate function: returns the population variance of 
the values in a group.',
 'skewness': 'Aggregate function: returns the skewness of the values in 
a group.',
 'kurtosis': 'Aggregate function: returns the kurtosis of the values in 
a group.',
-'collect_list': 'Aggregate function: returns a list of objects with 
duplicates.',
-'collect_set': 'Aggregate function: returns a set of objects with 
duplicate elements' +
-   ' eliminated.',
+'collect_list': _collect_list_doc,
--- End diff --

Let's wrap it (and the same instances) with `ignore_unicode_prefix` like 
we(you) did before. Please refer 
https://github.com/apache/spark/blob/8ddf0d2a60795a2306f94df8eac6e265b1fe5230/python/pyspark/rdd.py#L146-L156
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114930599
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -910,8 +941,8 @@ def weekofyear(col):
 """
 Extract the week number of a given date as integer.
 
->>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
->>> df.select(weekofyear(df.a).alias('week')).collect()
+>>> df = spark.createDataFrame([('2015-04-08',)], ['time'])
--- End diff --

Let's use `d` for `DateType` or `datetime.date` similarly with other 
existing names. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929597
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -206,17 +226,20 @@ def _():
 @since(1.3)
 def approxCountDistinct(col, rsd=None):
 """
-.. note:: Deprecated in 2.1, use approx_count_distinct instead.
+.. note:: Deprecated in 2.1, use :func:`approx_count_distinct instead`.
--- End diff --

Probably `` :func:`approx_count_distinct` ``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929803
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz):
 @since(1.5)
 def to_utc_timestamp(timestamp, tz):
 """
-Given a timestamp, which corresponds to a certain time of day in the 
given timezone, returns
-another timestamp that corresponds to the same time of day in UTC.
+Given a `timestamp`, which corresponds to a time of day in the 
timezone `tz`,
--- End diff --

Should this be ``` ``timestamp`` ``` not `` `timestamp` ``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929993
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -67,9 +67,16 @@ def _():
 _.__doc__ = 'Window function: ' + doc
 return _
 
+_lit_doc = """
+Creates a :class:`Column` of literal value. Supports basic types like 
:class:`IntegerType`,
+:class:`FloatType`, :class:`BooleanType`, and :class:`StringType`
--- End diff --

I would like to keep this identical with the one in `functions.scala` to 
reduce overhead when someone sweeps the same documentation changes across APIs 
in other languages.

If the additional informations are Python-specific, let's add it in 
`::note`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929689
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -456,7 +479,7 @@ def monotonically_increasing_id():
 def nanvl(col1, col2):
 """Returns col1 if it is not NaN, or col2 if col1 is NaN.
 
-Both inputs should be floating point columns (DoubleType or FloatType).
+Both inputs should be floating point columns (:class:`DoubleType` or 
FloatType).
--- End diff --

I think we should link both `DoubleType` and `FloatType ` all or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114929646
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -397,7 +420,7 @@ def input_file_name():
 
 @since(1.6)
 def isnan(col):
-"""An expression that returns true iff the column is NaN.
+"""An expression that returns true if the column is NaN.
--- End diff --

I think "iff" is the abbreviation for "if and only if". I don't think it is 
worth changing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r114930366
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -793,8 +824,8 @@ def date_format(date, format):
 .. note:: Use when ever possible specialized functions like `year`. 
These benefit from a
 specialized implementation.
 
->>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
->>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect()
+>>> df = spark.createDataFrame([('2015-04-08',)], ['time'])
--- End diff --

Okay. I guess it is a documentation improvement to use a bit more 
meaningful name over an arbitrary name `a`. Let's match these to existing names 
such as `ts` or `t` (abbreviation for timestamp) or `dt` (abbreviation for 
`datetime.datetime`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17867: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods f...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17867
  
**[Test build #76479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76479/testReport)**
 for PR 17867 at commit 
[`4922c03`](https://github.com/apache/spark/commit/4922c03a0ba0ed7386198b3e9e068352cc4378f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76476/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76476/testReport)**
 for PR 17770 at commit 
[`2af9e2b`](https://github.com/apache/spark/commit/2af9e2bfc0fc85840dfe04e886b293f1ec962b0d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-05-04 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17300#discussion_r114929963
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -555,12 +555,15 @@ private[spark] class BlockManager(
 
   /**
* Return a list of locations for the given block, prioritizing the 
local machine since
-   * multiple block managers can share the same host.
+   * multiple block managers can share the same host, followed by hosts on 
the same rack.
*/
   private def getLocations(blockId: BlockId): Seq[BlockManagerId] = {
 val locs = Random.shuffle(master.getLocations(blockId))
 val (preferredLocs, otherLocs) = locs.partition { loc => 
blockManagerId.host == loc.host }
-preferredLocs ++ otherLocs
+val (sameRackLocs, differentRackLocs) = otherLocs.partition {
+  loc => blockManagerId.topologyInfo == loc.topologyInfo
--- End diff --

If `blockManagerId.topologyInfo` is `None`, we will prefer the locations 
with empty `topologyInfo`. It is slightly different with what the shuffling 
wants to do here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114929845
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

that we did, at one point. I think the feedback is we could have one line 
for parameter (`object`) and return value could be more but which line matches 
which input parameter type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17867: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated me...

2017-05-04 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/17867

[SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods for ML

## What changes were proposed in this pull request?
Remove ML methods we deprecated in 2.1.

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-20606

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17867


commit e5b1337f3a09995568b69dde83dba90f9c01fcfe
Author: Yanbo Liang 
Date:   2017-05-05T03:18:53Z

Remove deprecated methods for ML.

commit 4922c03a0ba0ed7386198b3e9e068352cc4378f5
Author: Yanbo Liang 
Date:   2017-05-05T04:12:02Z

Add stuff to MimaExcludes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-04 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17658#discussion_r114929683
  
--- Diff: 
core/src/test/resources/HistoryServerExpectations/completed_app_list_json_expectation.json
 ---
@@ -22,6 +23,7 @@
 "duration" : 101795,
 "sparkUser" : "jose",
 "completed" : true,
+"appSparkVersion" : "",
--- End diff --

It's not really about the default value; these tests replay the log files, 
which contain the Spark version, so I would expect the data retrieved through 
the API to contain the version that was recorded in the event log.

Another way of saying that probably there's a bug somewhere in your code 
that is preventing the data from the event log from being exposed correctly 
through the REST API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323

Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114929528
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

Wouldn't be better to annotate actual implementations? To get something 
like this:


![image](https://cloud.githubusercontent.com/assets/1554276/25733425/295f465e-3159-11e7-87b7-d959c9bf3352.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-05-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17673#discussion_r114929436
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -36,7 +36,10 @@ import org.apache.spark.util.{Utils, VersionUtils}
  * Params for [[Word2Vec]] and [[Word2VecModel]].
  */
 private[feature] trait Word2VecBase extends Params
-  with HasInputCol with HasOutputCol with HasMaxIter with HasStepSize with 
HasSeed {
+  with HasInputCol with HasOutputCol with HasMaxIter with HasStepSize with 
HasSeed with HasSolver {
+  // We currently support SkipGram with Hierarchical Softmax and
+  // Continuous Bag of Words with Negative Sampling
+  private val supportedModels = Array("sg-hs", "cbow-ns")
--- End diff --

how is this used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76478/testReport)**
 for PR 17770 at commit 
[`4ff9610`](https://github.com/apache/spark/commit/4ff9610133fca947fab23af6ea67e6c7af50e8d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114928953
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

shouldn't we have a `@return` here? perhaps to say
```
Returns a new SparkDataFrame or Column with an alias set.
For Column, equivalent to SQL "AS" keyword.

@return a new SparkDataFrame or Column
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17865
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17865
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76477/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17865
  
**[Test build #76477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76477/testReport)**
 for PR 17865 at commit 
[`91515c6`](https://github.com/apache/spark/commit/91515c620287e193c6d208038025fe194740e4d2).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17865
  
**[Test build #76477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76477/testReport)**
 for PR 17865 at commit 
[`91515c6`](https://github.com/apache/spark/commit/91515c620287e193c6d208038025fe194740e4d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114928655
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

I guess we don't say `return a new Column` but more generally `return a 
Column`
and in other cases we say `return a new SparkDataFrame`

so I guess it's a difference in wording.
I think what you propose is fine, though do you think it's confusing to say 
`Equivalent to SQL "AS" keyword.` because that makes sense only for Column and 
not the whole dataframe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17865
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17678
  
retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76476/testReport)**
 for PR 17770 at commit 
[`2af9e2b`](https://github.com/apache/spark/commit/2af9e2bfc0fc85840dfe04e886b293f1ec962b0d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17866: [SPARK-20605][Core][Yarn][Mesos] Deprecate not used AM a...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17866
  
**[Test build #76475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76475/testReport)**
 for PR 17866 at commit 
[`3c9120e`](https://github.com/apache/spark/commit/3c9120e51a510f858dbcae4da69b53777992fc9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17866: [SPARK-20605][Core][Yarn][Mesos] Deprecate not us...

2017-05-04 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/17866

[SPARK-20605][Core][Yarn][Mesos] Deprecate not used AM and executor port 
configuration

## What changes were proposed in this pull request?

After SPARK-10997, client mode Netty RpcEnv doesn't require to start 
server, so port configurations are not used any more, here propose to remove 
these two configurations: "spark.executor.port" and "spark.am.port".

## How was this patch tested?

Existing UTs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-20605

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17866


commit 3c9120e51a510f858dbcae4da69b53777992fc9e
Author: jerryshao 
Date:   2017-05-05T03:23:47Z

deprecate not used AM and executor port configuration

Change-Id: I1280b8d803e22bd2084bdb4f49580c7955a2f476




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76474/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76474/testReport)**
 for PR 17770 at commit 
[`a855182`](https://github.com/apache/spark/commit/a855182d8f5037daab718820775cbcf8add01546).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323

Github user zero323 closed the pull request at:

https://github.com/apache/spark/pull/17825


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323

GitHub user zero323 reopened a pull request:

https://github.com/apache/spark/pull/17825

[SPARK-20550][SPARKR] R wrapper for Dataset.alias

## What changes were proposed in this pull request?

- Add SparkR wrapper for `Dataset.alias`.
- Adjust roxygen annotations for `functions.alias` (including example 
usage).

## How was this patch tested?

Unit tests, `check_cran.sh`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zero323/spark SPARK-20550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17825


commit 944a3ec791a8f103093e24511e895a4ce60970d8
Author: zero323 
Date:   2017-05-01T08:59:24Z

Initial implementation

commit 5e9f8da45c432e0752e5e78556add33e0a6d0557
Author: zero323 
Date:   2017-05-01T22:27:11Z

Adjust argument annotations

- Remove param annotations from dataframe.alias
- Use generic annotations for column.alias

commit 73133f9442ad8317fb12b600221962bf47d8a95c
Author: zero323 
Date:   2017-05-01T22:31:26Z

Add usage examples to column.alias

commit 848eeefc1f18c6aabaf65e6efed259a2fa5c19c3
Author: zero323 
Date:   2017-05-01T22:34:51Z

Remove return type annotation

commit 05c0781110b42a940e06cc31650449a8715e85c9
Author: zero323 
Date:   2017-05-02T02:00:13Z

Fix typo

commit 22d7cf661bb54a8f7f9c660e1d914802f1eb4153
Author: zero323 
Date:   2017-05-02T04:25:34Z

Move dontruns to their own lines

commit 22e1292557f1a5597cde6337267a099bbcdc07aa
Author: zero323 
Date:   2017-05-02T04:27:11Z

Extend param description

commit 6bb3d914960d1cf63e582a7d732ca80ed321e9c5
Author: zero323 
Date:   2017-05-02T04:33:34Z

Add type annotations to since notes

commit b3c1a416a16a9d32649edda2b66fc9c3476358a5
Author: zero323 
Date:   2017-05-02T04:38:51Z

Attach alias test to select-with-column test case

commit 40fedcb8c41bc84deead205aad81e84c095045b5
Author: zero323 
Date:   2017-05-02T04:44:45Z

Extend description

commit 1e1ad443751fc3dc93487e5385cc934feb93f631
Author: zero323 
Date:   2017-05-03T00:25:15Z

Move alias documentation to generics

commit 2d5ace288f2443327696823c343c095f0d8d64ca
Author: zero323 
Date:   2017-05-04T01:13:45Z

Add family annotation

commit 5fe5495580eb3852ea5092a34dc2334c0e45c9b7
Author: zero323 
Date:   2017-05-04T06:32:54Z

Check that stats::alias is not masked

commit 09f9ccaf5e66a400d26b4ab6d600d951305d5fd3
Author: zero323 
Date:   2017-05-04T07:04:52Z

Fix style

commit f1c74f338b8df865a5e8b9a6e281211aa27af7d3
Author: zero323 
Date:   2017-05-04T10:17:42Z

vim




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323

Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114925159
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

How about?

```
#' Return a new Column or a SparkDataFrame with a name set. Equivalent to 
SQL "AS" keyword.
```
Is the `Column` new?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76474/testReport)**
 for PR 17770 at commit 
[`a855182`](https://github.com/apache/spark/commit/a855182d8f5037daab718820775cbcf8add01546).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76472/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76472/testReport)**
 for PR 17770 at commit 
[`8c8fe1e`](https://github.com/apache/spark/commit/8c8fe1e20609a373f164e8b2252a970e4e468eb3).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17300
  
**[Test build #76473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76473/testReport)**
 for PR 17300 at commit 
[`56f5231`](https://github.com/apache/spark/commit/56f5231626cceb114c45413b7b340ee719c3f2f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76472/testReport)**
 for PR 17770 at commit 
[`8c8fe1e`](https://github.com/apache/spark/commit/8c8fe1e20609a373f164e8b2252a970e4e468eb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...

2017-05-04 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17300
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17678
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76467/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17678
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17844: [SPARK-20548][FLAKY-TEST] share one REPL instance among ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17844
  
**[Test build #76471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76471/testReport)**
 for PR 17844 at commit 
[`9248a5e`](https://github.com/apache/spark/commit/9248a5e005c000c42e5a233c9f3ca37b51b6c95d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17678
  
**[Test build #76467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76467/testReport)**
 for PR 17678 at commit 
[`aef3481`](https://github.com/apache/spark/commit/aef3481b125b49343caa46bb2f78cd634369a8a2).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114924076
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

right - I think again we should emphasize on returning a new SparkDataFrame


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17770
  
@srinathshankar also thinks it's weird to add a barrier node. I suggest 
@hvanhovell and @srinathshankar duke it out.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17825
  
could you close/reopen to trigger appveyor again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17840: [SPARK-20574][ML] Allow Bucketizer to handle non-...

2017-05-04 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17840


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76470/testReport)**
 for PR 17770 at commit 
[`b29ded3`](https://github.com/apache/spark/commit/b29ded3f806616e43f260db4f133c7bbe3a8fb3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-04 Thread redsanket

Github user redsanket commented on a diff in the pull request:

https://github.com/apache/spark/pull/17658#discussion_r114924015
  
--- Diff: 
core/src/test/resources/HistoryServerExpectations/completed_app_list_json_expectation.json
 ---
@@ -22,6 +23,7 @@
 "duration" : 101795,
 "sparkUser" : "jose",
 "completed" : true,
+"appSparkVersion" : "",
--- End diff --

probably I could change the default value, looks like ok will do it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17840: [SPARK-20574][ML] Allow Bucketizer to handle non-Double ...

2017-05-04 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17840
  
Merged into master and branch-2.0. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17658
  
**[Test build #76469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76469/testReport)**
 for PR 17658 at commit 
[`dad87a6`](https://github.com/apache/spark/commit/dad87a64c42de22e1a7a565d9b922811a759dff8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17865
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-04 Thread map222

Github user map222 commented on the issue:

https://github.com/apache/spark/pull/17865
  
@HyukjinKwon I ended up not making examples for the aggregate functions, as 
I didn't make a good dataframe to demonstrate them. I could add more examples 
for the string functions if you think that is a good idea. There are dozens of 
functions that could be documented, I'm not sure how far we want to go, or 
which ones need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-04 Thread map222

GitHub user map222 opened a pull request:

https://github.com/apache/spark/pull/17865

[SPARK-20456][Docs] Add examples for functions collection for pyspark

## What changes were proposed in this pull request?

This adds documentation to many functions in pyspark.sql.functions.py:
`upper`, `lower`, `reverse`, `unix_timestamp`, `from_unixtime`, `rand`, 
`randn`, `collect_list`, `collect_set`, `lit`
Add units to the trigonometry functions.
Renames columns in datetime examples to be more informative.
Adds links between some functions.

## How was this patch tested?

`./dev/lint-python`
`python python/pyspark/sql/functions.py`
`./python/run-tests.py --module pyspark-sql`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/map222/spark spark-20456

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17865


commit 91515c620287e193c6d208038025fe194740e4d2
Author: Michael Patterson 
Date:   2017-05-05T00:26:56Z

First revision: trigonometry units, lit, collect_set, collect_list, 
unix_timestamp, from_unixtime




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-05-04 Thread budde

Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r114922000
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -60,12 +61,19 @@ private[kinesis] class KinesisInputDStream[T: ClassTag](
   val isBlockIdValid = blockInfos.map { _.isBlockIdValid() }.toArray
   logDebug(s"Creating KinesisBackedBlockRDD for $time with 
${seqNumRanges.length} " +
   s"seq number ranges: ${seqNumRanges.mkString(", ")} ")
+
+  /**
+   * Construct the Kinesis read configs from streaming context
+   * and pass to KinesisBackedBlockRDD
+   */
+  val kinesisReadConfigs = KinesisReadConfigurations(ssc)
+
   new KinesisBackedBlockRDD(
 context.sc, regionName, endpointUrl, blockIds, seqNumRanges,
 isBlockIdValid = isBlockIdValid,
-retryTimeoutMs = ssc.graph.batchDuration.milliseconds.toInt,
 messageHandler = messageHandler,
-kinesisCreds = kinesisCreds)
+kinesisCreds = kinesisCreds,
+kinesisReadConfigs = kinesisReadConfigs)
--- End diff --

I think it would be sufficient to change this to

```scala
  kinesisReadConfigs = KinesisReadConfigurations(ssc))
```

and omit lines 65-70. I don't think a comment is necessary here, the code 
is pretty straightforward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-05-04 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/17467
  
Fair enough. I took another look and I think I may have been thinking of 
the way things worked in an earlier revision of this code. I think the case 
class is reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_INSTANCES...

2017-05-04 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17859
  
Ok, I will open another pr to remove it. Thanks a lot both of you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_IN...

2017-05-04 Thread ConeyLiu

Github user ConeyLiu closed the pull request at:

https://github.com/apache/spark/pull/17859


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-04 Thread redsanket

Github user redsanket commented on a diff in the pull request:

https://github.com/apache/spark/pull/17658#discussion_r114921697
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -283,10 +283,15 @@ private[spark] object EventLoggingListener extends 
Logging {
*
* @param logStream Raw output stream to the event log file.
*/
-  def initEventLog(logStream: OutputStream): Unit = {
+  def initEventLog(logStream: OutputStream, testing: Boolean,
+   loggedEvents: ArrayBuffer[JValue]): Unit = {
 val metadata = SparkListenerLogStart(SPARK_VERSION)
-val metadataJson = compact(JsonProtocol.logStartToJson(metadata)) + 
"\n"
+val eventJson = JsonProtocol.logStartToJson(metadata)
+val metadataJson = compact(eventJson) + "\n"
 logStream.write(metadataJson.getBytes(StandardCharsets.UTF_8))
+if (testing && loggedEvents != null) {
+  loggedEvents += eventJson
--- End diff --

I thought the loggedEvents only takes json value. Also the loggedEvents are 
generated here as a part of spark context and probably through other sources. 
The ReplayListenerSuite however tests the original events with the replay 
events (here the replay events are written to the event log but however the 
loggerEvents will not have the SparkListenerLogStart event as this is not a 
part of SparkContext if I understand it correctly).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-04 Thread redsanket

Github user redsanket commented on a diff in the pull request:

https://github.com/apache/spark/pull/17658#discussion_r114921013
  
--- Diff: 
core/src/test/resources/HistoryServerExpectations/completed_app_list_json_expectation.json
 ---
@@ -22,6 +23,7 @@
 "duration" : 101795,
 "sparkUser" : "jose",
 "completed" : true,
+"appSparkVersion" : "",
--- End diff --

I am not sure the if the tests hit this code path 
https://github.com/apache/spark/pull/17658/files#diff-a7befb99e7bd7e3ab5c46c2568aa5b3eR474,
 so they take the default value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...

2017-05-04 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
Note: since in `Transformer`, there might be other manipulation to the 
dataset like dropping NaN values. The idea above won't work under that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17770
  
Thanks @rxin @marmbrus @hvanhovell @cloud-fan It is reasonable to me. I'll 
do eliminate the path of `resolveOperators`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-05-04 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/17222
  
@cloud-fan This is not about using python UDF, it is to allow pyspark to 
use java UDF (no python daemon will be launched). So actually it would improve 
the performance. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...

2017-05-04 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
The bunch of projections will be collapsed in optimization. So it doesn't 
affect query execution. However, every `withColumn` call creates new 
`DataFrame` along with a projection on previous logical plan. It is costly by 
creating new query execution, analyzing logical plan, creating encoder, etc. So 
the improvement is coming from saving the cost by doing this one time with 
`withColumns`, instead of multiple `withColumn`.

It can benefit other transformers that could work on multiple cols. I even 
have an idea to revamp the interface of `Transformer`. Because the 
transformation in `Transformer` is actually ending with a `withColumn` call to 
add/replace column. They are actually transforming columns in the dataset. But 
the performance difference is obvious only when the number of transformation 
stages is large enough like the example of many `Bucketizer`s. So it may not 
worth doing that. Just a thought.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...

2017-05-04 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17300
  
Will merge when tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-05-04 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17222
  
Hi @zjffdu thanks for working on it! But I'm not sure how useful this 
feature will be. AFAIK most users use scala/java UDF instead of Python UDF 
because it's too slow. We are working on a project to improve the communication 
between JVM and Python process, which may add a new Python UDF interface and 
also affect the python UDAF design. Can you hold this PR for a while? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_INSTANCES...

2017-05-04 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17859
  
> Do we need remove the comments from template config?

Ah, that would be a good idea. I also noticed it's still used in 
`YarnSparkHadoopUtil.scala`, so that could be removed too.

I also took a closer look at SPARK-17979 and this particular env variable 
wasn't removed in that change; seems it was removed much earlier (SPARK-9092 as 
far as I can tell), so looks it isn't very widely used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...

2017-05-04 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17300
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17859: [SPARK-20595][Deploy]Parse the 'SPARK_EXECUTOR_INSTANCES...

2017-05-04 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17859
  
@vanzin Thanks a lot for you review. Do we need remove the comments from 
template config?  It doesn't work anymore in current version. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76466/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 418 matches

Mail list logo