[GitHub] spark issue #17525: [SPARK-20209][SS] Execute next trigger immediately if pr...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17525
  
**[Test build #75504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75504/testReport)**
 for PR 17525 at commit 
[`50f0195`](https://github.com/apache/spark/commit/50f0195a4eee34db813c9040437de95796c577cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17455: [Spark-20044][Web UI] Support Spark UI behind fro...

2017-04-03 Thread okoethibm
Github user okoethibm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17455#discussion_r109588861
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -132,7 +132,13 @@ private[deploy] class Master(
 webUi.bind()
 masterWebUiUrl = "http://"; + masterPublicAddress + ":" + 
webUi.boundPort
 if (reverseProxy) {
-  masterWebUiUrl = conf.get("spark.ui.reverseProxyUrl", masterWebUiUrl)
+  conf.getOption("spark.ui.reverseProxyUrl") map { reverseProxyUrl =>
+val proxyUrlNoSlash = reverseProxyUrl.stripSuffix("/")
+System.setProperty("spark.ui.proxyBase", proxyUrlNoSlash)
+// If the master URL has a path component, it must end with a 
slash.
+// Otherwise the browser generates incorrect relative links
+masterWebUiUrl = proxyUrlNoSlash + "/"
--- End diff --

If we have a front-end reverse proxy path like 
mydomain.com:80/path/to/spark, then the spark.ui.proxyBase property (prefix for 
URL generation) *must not* include a trailing slash, the way it's used in 
UiUtils, like prependBaseUri("/static/bootstrap.min.css"). 
However, the explicit URL address pointing to the master UI page (e.g. the 
back-lilnk from workers to master, which masterWebUiUrl feeds into) *must* 
include a trailing slash, if it has a path component, because the master UI 
page contains relative liks like "app?...".
Without a path component, the trailing slash does not matter for resolving 
these links, but with a path component, they must resolve to 
mydomain.com:80/path/to/spark/app (*not* mydomain.com:80/path/to/app), 
therefore the base URL must have a trailing slash.

The code is intended to work regardless whether spark.ui.reverseProxyUrl 
was specified with or without a trailing slash, so the safe way to ensure a 
single trailing slash was to first strip an optional slash and then add one. 
Your suggestion would double the slash if there is one specified in the config.
If there's a clean way to move the stripSuffix handling into the config 
itself, that would make the code prettier, though



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17526: [SPARKR][DOC] update doc for fpgrowth

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17526
  
**[Test build #75503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75503/testReport)**
 for PR 17526 at commit 
[`e4e03ea`](https://github.com/apache/spark/commit/e4e03eaf98581da92cdf29d93c602384ad82ad36).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17526: [SPARKR][DOC] update doc for fpgrowth

2017-04-03 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/17526

[SPARKR][DOC] update doc for fpgrowth

## What changes were proposed in this pull request?

minor update

@zero323 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rfpgrowthfollowup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17526


commit 5fee9e70b0ca31c5a4e55b66f908fa56b205ead5
Author: Felix Cheung 
Date:   2017-04-04T06:53:38Z

update doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17445: [SPARK-20115] [CORE] Fix DAGScheduler to recompute all t...

2017-04-03 Thread umehrot2
Github user umehrot2 commented on the issue:

https://github.com/apache/spark/pull/17445
  
Jenkins test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17525: [SPARK-20209][SS] Execute next trigger immediatel...

2017-04-03 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/17525

[SPARK-20209][SS] Execute next trigger immediately if previous batch took 
longer than trigger interval

## What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, 
then it will wait for 9 mins before starting the next batch. This does not make 
sense. The processing time based trigger policy should be to do process batches 
as fast as possible, but no faster than 1 in every trigger interval. If batches 
are taking longer than trigger interval anyways, then no point waiting extra 
trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. 

## How was this patch tested?

Added new unit tests to comprehensively test this behavior.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-20209

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17525


commit 50f0195a4eee34db813c9040437de95796c577cc
Author: Tathagata Das 
Date:   2017-04-04T06:48:00Z

Removed delay from trigger executor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-04-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-04-03 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17170
  
merged to master.
@zero323 could you follow up with vignettes and programming guide update 
please - we need them for the 2.2.0 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17524
  
**[Test build #75502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75502/testReport)**
 for PR 17524 at commit 
[`427741f`](https://github.com/apache/spark/commit/427741f548ff4469d62906546655f7ec96564ced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17455: [Spark-20044][Web UI] Support Spark UI behind fro...

2017-04-03 Thread okoethibm
Github user okoethibm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17455#discussion_r109586326
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala ---
@@ -157,7 +157,9 @@ private[deploy] class ExecutorRunner(
   // Add webUI log urls
   val baseUrl =
 if (conf.getBoolean("spark.ui.reverseProxy", false)) {
-  
s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
+  // TODO get from master?
--- End diff --

Oops, that was a leftover from testing. In fact, the code is simpler when 
consistently get the reverse proxy URL from the config along with the reverse 
proxy flag, requiring both settings to be consistently set on all nodes. I 
briefly considered a communication extension to send the master (reverse proxy) 
URL to the executors, but felt it didn't really help


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17394


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17524
  
**[Test build #75501 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75501/testReport)**
 for PR 17524 at commit 
[`c102187`](https://github.com/apache/spark/commit/c1021871bdd000e87ff0906af434bceac3129b2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17394
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17524
  
**[Test build #75500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75500/testReport)**
 for PR 17524 at commit 
[`8382228`](https://github.com/apache/spark/commit/83822289d790a7ebedf8634df6bbdf9cebeb5057).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Tes...

2017-04-03 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17524

[SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases in DDLSuite with 
Hive Metastore

### What changes were proposed in this pull request?
This is a follow-up of enabling test cases in DDLSuite with Hive Metastore. 
It consists of the following remaining tasks:
- Run all the `alter table` and `drop table` DDL tests against data source 
tables when using Hive metastore. 
- Do not run any `alter table` and `drop table` DDL test against Hive serde 
tables when using InMemoryCatalog.
- Reenable `alter table: set serde partition` and `alter table: set serde` 
tests for Hive serde tables.

### How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark cleanupDDLSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17524.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17524


commit 83822289d790a7ebedf8634df6bbdf9cebeb5057
Author: Xiao Li 
Date:   2017-04-04T06:17:06Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17480
  
**[Test build #75499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75499/testReport)**
 for PR 17480 at commit 
[`f54c9ae`](https://github.com/apache/spark/commit/f54c9ae77bfdd3756e120f764aa443500ad6fcf8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109575589
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
+  :func:`DataFrame.filter` to select rows with null 
values.
+
+  >>> df2.collect()
+  [Row(name=u'Tom', height=80), Row(name=u'Alice', 
height=None)]
+  >>> df2.filter( df2.height.isNull ).collect()
+  [Row(name=u'Alice', height=None)]
+  '''
+_isNotNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

^ cc @holdenk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17480: [SPARK-20079][Core][yarn] Re registration of AM h...

2017-04-03 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17480#discussion_r109575470
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -249,7 +249,9 @@ private[spark] class ExecutorAllocationManager(
* yarn-client mode when AM re-registers after a failure.
*/
   def reset(): Unit = synchronized {
-initializing = true
+if (maxNumExecutorsNeeded() == 0) {
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109575023
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

Could you maybe give a shot with this patch - 
https://github.com/map222/spark/compare/patterson-documentation...HyukjinKwon:rlike-docstring.patch

?

I double checked it produces 

![2017-04-04 1 23 
30](https://cloud.githubusercontent.com/assets/6477701/24641412/84765e9c-193a-11e7-85d5-9745ea151c12.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17251
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75498/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17251
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17251
  
**[Test build #75498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75498/testReport)**
 for PR 17251 at commit 
[`2150ce5`](https://github.com/apache/spark/commit/2150ce552a7a02d656329761e04a7fcb38e5e648).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17469
  
It might be better to run`./dev/lint-python` locally if possible. There 
will catch more of minor nits ahead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109574284
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

I just found a good reference in pep8

> For triple-quoted strings, always use double quote characters to be 
consistent with the docstring convention in PEP 257

https://www.python.org/dev/peps/pep-0008/#string-quotes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109574278
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
+  :func:`DataFrame.filter` to select rows with null 
values.
+
+  >>> df2.collect()
+  [Row(name=u'Tom', height=80), Row(name=u'Alice', 
height=None)]
+  >>> df2.filter( df2.height.isNull ).collect()
+  [Row(name=u'Alice', height=None)]
+  '''
+_isNotNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

Up to my knowledge, both docstrings comply pep8 up to my knowledge,
```
""" ...
"""
```

or 

```
"""
...
"""
```

but for this case, it seems a separate variable. Personally, I prefer

```python
_isNull_doc = """
True if the current expression is null. Often combined with
:func:`DataFrame.filter` to select rows with null values.

>>> df2.collect()
[Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]
>>> df2.filter( df2.height.isNull ).collect()
[Row(name=u'Alice', height=None)]
"""
```

but I could not find a formal reference to support this idea (in case that 
it is a separate variable) and I am not supposed to decide this. So, I am fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75497/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75497 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75497/testReport)**
 for PR 17394 at commit 
[`862a4d7`](https://github.com/apache/spark/commit/862a4d7a61e48ff7b0e1d52ea0416bc57a4d6a33).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17459
  
@johnc1231 The prototype I did: 
https://github.com/apache/spark/compare/master...viirya:general-toblockmatrix?expand=1

Maybe you can take a look and see if it is useful to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75496/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17494
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17494
  
**[Test build #75496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75496/testReport)**
 for PR 17494 at commit 
[`fbcc1fe`](https://github.com/apache/spark/commit/fbcc1fe1c8e2652dc54c2ebfacce01a3f69449a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17505: [SPARK-20187][SQL] Replace loadTable with moveFil...

2017-04-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17505#discussion_r109567793
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -694,12 +694,25 @@ private[hive] class HiveClientImpl(
   tableName: String,
   replace: Boolean,
   isSrcLocal: Boolean): Unit = withHiveState {
-shim.loadTable(
-  client,
-  new Path(loadPath),
-  tableName,
-  replace,
-  isSrcLocal)
+val tbl = client.getTable(tableName)
+val fs = tbl.getDataLocation.getFileSystem(conf)
+if (replace) {
--- End diff --

[`loadTable` is calling `replaceFiles` when `replace` is true. 
](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1640),
 instead of calling `Hive.copyFiles`.

`replaceFiles` is based on the calls of `moveFile`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17520
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75494/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17394
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17520
  
**[Test build #75494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75494/testReport)**
 for PR 17520 at commit 
[`0bab4fd`](https://github.com/apache/spark/commit/0bab4fd335279accca5e90ed4ecdb1d7ea99383e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on the issue:

https://github.com/apache/spark/pull/17459
  
Alright, I agree with this. We'll switch off Dense or Sparse matrix 
backings based on what the type of the first vector in the iterator is. I'd be 
happy to take on making these adjustments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17506: [SPARK-20189][DStream] Fix spark kinesis testcases to re...

2017-04-03 Thread yssharma
Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17506
  
The Scala style check fail because of the double spaced lines probably. But 
that's how the existing code was so thought of keeping it that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17459
  
> I considered having toBlockMatrix check if the rows of IndexedRowMatrix 
were dense or sparse, but there is no guarantee of consistency. Like, an 
IndexedRowMatrix could be a mix of Dense and Sparse Vectors. In that case, it 
would not be clear what type of BlockMatrix to create. A decent approximation 
of this would be to just decide the matrix type based on the first vector we 
look at in the iterator we get from groupByKey, creating a mix of Dense and 
Sparse matrices in a BlockMatrix, but I still think it's best to be explicit. 
Also, we currently have the description of toBlockMatrix promising to make a 
BlockMatrix backed by instances of SparseMatrix, so we have made promises to 
users about the composition of the BlockMatrix before.

I don't mean we don't care about it. I meant there is no guarantee that 
`BlockMatrix` is purely consisted of `DenseMatrix` or `SparseMatrix`. It could 
be a mix of them.

Thus, we can have a `toBlockMatrix` which creates a `BlockMatrix` which is 
a mix of `DenseMatrix` and `SparseMatrix`. A block in a `BlockMatrix` can be a 
`DenseMatrix` and `SparseMatrix`, depending on the ratio of values in the 
block. Yes, it is like `a decent approximation` you talked.

For a `IndexedRowMatrix` completely consisted of `DenseVector`, this 
`toBlockMatrix` definitely returns a `BlockMatrix` backed by `DenseMatrix`. For 
other cases, `DenseMatrix` might not be best choice for all blocks in the 
`BlockMatrix`, as many blocks will be sparse.

About the promise that `toBlockMatrix` makes a `BlockMatrix` backed by 
instances of `SparseMatrix`, as I said it is not explicitly bound to the API 
level. I think it is not a big problem.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-04-03 Thread yssharma
Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17467
  
@srowen - Could I get some love here as well. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15332: [SPARK-10364][SQL] Support Parquet logical type TIMESTAM...

2017-04-03 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/15332
  
Thanks a lot @ueshin @viirya @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17251
  
**[Test build #75498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75498/testReport)**
 for PR 17251 at commit 
[`2150ce5`](https://github.com/apache/spark/commit/2150ce552a7a02d656329761e04a7fcb38e5e648).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-04-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17251
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on the issue:

https://github.com/apache/spark/pull/17459
  
@viirya I think we definitely care about giving users the ability to make 
either dense or sparse Block matrices. I made a 100k by 10k IndexedRowMatrix of 
random doubles, then converted it to a BlockMatrix to multiply it by its 
transpose. With the current toBlockMatrix implementation, that took 252 seconds 
on 128 cores. With my implementation, that took 35 seconds. The backing of a 
BlockMatrix matters a lot, and we need to let users be explicit about it. 

I considered having toBlockMatrix check if the rows of IndexedRowMatrix 
were dense or sparse, but there is no guarantee of consistency. Like, an 
IndexedRowMatrix could be a mix of Dense and Sparse Vectors. In that case, it 
would not be clear what type of BlockMatrix to create. A decent approximation 
of this would be to just decide the matrix type based on the first vector we 
look at in the iterator we get from groupByKey, but I still think it's best to 
be explicit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75495/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17494
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17494
  
**[Test build #75495 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75495/testReport)**
 for PR 17494 at commit 
[`8936880`](https://github.com/apache/spark/commit/8936880bafd8a8520011e663c0edc3b428b9160f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17523: [SPARK-20064][PySpark]

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17523
  
(it would be nicer if the title is fixed to indicate what it proposes in 
short)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-04-03 Thread Downchuck
Github user Downchuck commented on the issue:

https://github.com/apache/spark/pull/16347
  
Is there anyone on the Spark team taking this up? This bug is painful; it's 
saddened a hundred TB of data I stacked up, and I'm really trying to avoid more 
manual work. "INSERT OVERWRITE TABLE ... DISTRIBUTE BY ... SORT BY" is how I 
live my life these days.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109558929
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,259 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 31
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE TEMPORARY VIEW temp_Data_Source_View
+  USING org.apache.spark.sql.sources.DDLScanSource
+  OPTIONS (
+From '1',
+To '10',
+Table 'test1')
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
+
+
+
+-- !query 3
+CREATE VIEW v AS SELECT * FROM t
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
+-- !query 4 schema
+struct<>
+-- !query 4 output
+
+
+
+-- !query 5
+DESCRIBE t
+-- !query 5 schema
+struct
+-- !query 5 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 3
-DESC t
--- !query 3 schema
+-- !query 6
+DESC default.t
+-- !query 6 schema
 struct
--- !query 3 output
-# Partition Information
+-- !query 6 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 4
+-- !query 7
 DESC TABLE t
--- !query 4 schema
+-- !query 7 schema
 struct
--- !query 4 output
-# Partition Information
+-- !query 7 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 5
+-- !query 8
 DESC FORMATTED t
--- !query 5 schema
+-- !query 8 schema
 struct
--- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
+-- !query 8 output
 # col_name data_type   comment 
-Comment:   table_comment   
  

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75497/testReport)**
 for PR 17394 at commit 
[`862a4d7`](https://github.com/apache/spark/commit/862a4d7a61e48ff7b0e1d52ea0416bc57a4d6a33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15332: [SPARK-10364][SQL] Support Parquet logical type T...

2017-04-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15332


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15332: [SPARK-10364][SQL] Support Parquet logical type TIMESTAM...

2017-04-03 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/15332
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17494
  
**[Test build #75496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75496/testReport)**
 for PR 17494 at commit 
[`fbcc1fe`](https://github.com/apache/spark/commit/fbcc1fe1c8e2652dc54c2ebfacce01a3f69449a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17494
  
**[Test build #75495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75495/testReport)**
 for PR 17494 at commit 
[`8936880`](https://github.com/apache/spark/commit/8936880bafd8a8520011e663c0edc3b428b9160f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17494: [SPARK-20076][ML][PySpark] Add Python interface f...

2017-04-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17494#discussion_r109557018
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlation.scala 
---
@@ -56,7 +56,7 @@ object Correlation {
*  Here is how to access the correlation coefficient:
*  {{{
*val data: Dataset[Vector] = ...
-   *val Row(coeff: Matrix) = Statistics.corr(data, "value").head
+   *val Row(coeff: Matrix) = Correlation.corr(data, "value").head
*// coeff now contains the Pearson correlation matrix.
*  }}}
*
--- End diff --

oh, right. fixed. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17512: [SPARK-20196][PYTHON][SQL] update doc for catalog functi...

2017-04-03 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17512
  
will update after #17518 + changes to R doc too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17494: [SPARK-20076][ML][PySpark] Add Python interface f...

2017-04-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17494#discussion_r109556837
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -71,6 +71,62 @@ def test(dataset, featuresCol, labelCol):
 return _java2py(sc, javaTestObj.test(*args))
 
 
+class Correlation(object):
+"""
+.. note:: Experimental
+
+Compute the correlation matrix for the input dataset of Vectors using 
the specified method.
+Methods currently supported: `pearson` (default), `spearman`.
--- End diff --

Sounds good. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-04-03 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16906
  
+1 on that, we do have the log on the R side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17459
  
I've done some prototype locally to generalize this change to 
`SparseMatrix`. During that, I have a thought that do we have the limit that 
all Matrix in `BlockMatrix` need to be the same kind of Matrix (i.e., 
`DenseMatrix` or `SparseMatrix`)?

Actually we can easily have only one `toBlockMatrix` method which creates a 
`BlockMatrix` including both `DenseMatrix` or `SparseMatrix`, depending if the 
blocks are sparse or not.

From the external view of this API, we don't have an explicit difference 
between `SparseMatrix`-backed and `DenseMatrix`-backed `BlockMatrix`s. We don't 
have subclasses for it, nor any property can be used to know about it. Doesn't 
it mean we don't really care about it?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17415


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-04-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17415
  
Thanks, Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17520
  
**[Test build #75494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75494/testReport)**
 for PR 17520 at commit 
[`0bab4fd`](https://github.com/apache/spark/commit/0bab4fd335279accca5e90ed4ecdb1d7ea99383e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109554814
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val allNotNull = (colStatLeft.nullCount == 0) && 
(colStatRight.nullCount == 0)
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight, (maxLeft < minRight) && allNotNull)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight, (maxLeft <= minRight) && allNotNull)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  case _: GreaterThan =>
+(maxLeft <= minRight, (minLeft > maxRight) && allNotNull)
+  case _: GreaterThanOrEqual =>
+(maxLeft < minRight, (minLeft >= maxRight) && allNotNull)
+
+  // Left = Right or Left <=> Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft
+  //  minRight   maxRight
+  // +--+--->
  

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-04-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17415
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17487: [Spark-20145] Fix range case insensitive bug in S...

2017-04-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17487


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL

2017-04-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17487
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17505: [SPARK-20187][SQL] Replace loadTable with moveFil...

2017-04-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17505#discussion_r109553390
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -242,6 +251,16 @@ private[client] class Shim_v0_12 extends Shim with 
Logging {
   JInteger.TYPE,
   JBoolean.TYPE,
   JBoolean.TYPE)
+  private lazy val moveFileMethod =
+findMethod(
+  classOf[Hive],
+  "moveFile",
--- End diff --

does this exist in all the versions Spark supports?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17112: [WIP] Measurement for SPARK-16929.

2017-04-03 Thread jinxing64
Github user jinxing64 closed the pull request at:

https://github.com/apache/spark/pull/17112


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence...

2017-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17336#discussion_r109548396
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -85,38 +85,58 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 
assert(prediction.select("prediction").where("id=3").first().getSeq[String](0).isEmpty)
   }
 
+  test("FPGrowth prediction should not contain duplicates") {
+// This should generate rule 1 -> 3, 2 -> 3
+val dataset = spark.createDataFrame(Seq(
+  Array("1", "3"),
+  Array("2", "3")
+).map(Tuple1(_))).toDF("items")
+val model = new FPGrowth().fit(dataset)
+
+val prediction = model.transform(
+  spark.createDataFrame(Seq(Tuple1(Array("1", "2".toDF("items")
+).first().getAs[Seq[String]]("prediction")
+
+assert(prediction === Seq("3"))
+  }
+
+  test("FPGrowthModel setMinConfidence should affect rules generation and 
transform") {
+val model = new 
FPGrowth().setMinSupport(0.1).setMinConfidence(0.1).fit(dataset)
+val oldRulesNum = model.associationRules.count()
+val oldPredict = model.transform(dataset)
+
+model.setMinConfidence(0.8765)
+assert(oldRulesNum > model.associationRules.count())
+
assert(!model.transform(dataset).collect().toSet.equals(oldPredict.collect().toSet))
+
+// association rules should stay the same for same minConfidence
+model.setMinConfidence(0.1)
+assert(oldRulesNum === model.associationRules.count())
+
assert(model.transform(dataset).collect().toSet.equals(oldPredict.collect().toSet))
+  }
+
   test("FPGrowth parameter check") {
 val fpGrowth = new FPGrowth().setMinSupport(0.4567)
 val model = fpGrowth.fit(dataset)
   .setMinConfidence(0.5678)
 assert(fpGrowth.getMinSupport === 0.4567)
 assert(model.getMinConfidence === 0.5678)
+MLTestingUtils.checkCopy(model)
   }
 
   test("read/write") {
 def checkModelData(model: FPGrowthModel, model2: FPGrowthModel): Unit 
= {
-  assert(model.freqItemsets.sort("items").collect() ===
-model2.freqItemsets.sort("items").collect())
+  assert(model.freqItemsets.collect().toSet.equals(
+model2.freqItemsets.collect().toSet))
+  assert(model.associationRules.collect().toSet.equals(
+model2.associationRules.collect().toSet))
+  
assert(model.setMinConfidence(0.9).associationRules.collect().toSet.equals(
+model2.setMinConfidence(0.9).associationRules.collect().toSet))
 }
 val fPGrowth = new FPGrowth()
 testEstimatorAndModelReadWrite(fPGrowth, dataset, 
FPGrowthSuite.allParamSettings,
   FPGrowthSuite.allParamSettings, checkModelData)
   }
-
-  test("FPGrowth prediction should not contain duplicates") {
--- End diff --

For the future, I'd prefer not to move stuff around unless it's necessary 
since it makes the diff larger.  No need to revert this, though, since I 
already checked it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence...

2017-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17336#discussion_r109548283
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -85,38 +85,58 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 
assert(prediction.select("prediction").where("id=3").first().getSeq[String](0).isEmpty)
   }
 
+  test("FPGrowth prediction should not contain duplicates") {
+// This should generate rule 1 -> 3, 2 -> 3
+val dataset = spark.createDataFrame(Seq(
+  Array("1", "3"),
+  Array("2", "3")
+).map(Tuple1(_))).toDF("items")
+val model = new FPGrowth().fit(dataset)
+
+val prediction = model.transform(
+  spark.createDataFrame(Seq(Tuple1(Array("1", "2".toDF("items")
+).first().getAs[Seq[String]]("prediction")
+
+assert(prediction === Seq("3"))
+  }
+
+  test("FPGrowthModel setMinConfidence should affect rules generation and 
transform") {
+val model = new 
FPGrowth().setMinSupport(0.1).setMinConfidence(0.1).fit(dataset)
+val oldRulesNum = model.associationRules.count()
+val oldPredict = model.transform(dataset)
+
+model.setMinConfidence(0.8765)
+assert(oldRulesNum > model.associationRules.count())
+
assert(!model.transform(dataset).collect().toSet.equals(oldPredict.collect().toSet))
+
+// association rules should stay the same for same minConfidence
+model.setMinConfidence(0.1)
+assert(oldRulesNum === model.associationRules.count())
+
assert(model.transform(dataset).collect().toSet.equals(oldPredict.collect().toSet))
+  }
+
   test("FPGrowth parameter check") {
 val fpGrowth = new FPGrowth().setMinSupport(0.4567)
 val model = fpGrowth.fit(dataset)
   .setMinConfidence(0.5678)
 assert(fpGrowth.getMinSupport === 0.4567)
 assert(model.getMinConfidence === 0.5678)
+MLTestingUtils.checkCopy(model)
   }
 
   test("read/write") {
 def checkModelData(model: FPGrowthModel, model2: FPGrowthModel): Unit 
= {
-  assert(model.freqItemsets.sort("items").collect() ===
-model2.freqItemsets.sort("items").collect())
+  assert(model.freqItemsets.collect().toSet.equals(
+model2.freqItemsets.collect().toSet))
+  assert(model.associationRules.collect().toSet.equals(
--- End diff --

No need to add these 2 since they are values computed from the model data.  
Checking freqItemsets is sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-03 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Thanks for the review @viirya, I'm working on an update but want to be sure 
the python tests for arrow get run before I push.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-03 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r109547685
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2747,6 +2747,17 @@ class Dataset[T] private[sql](
 }
   }
 
+  /**
+   * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark.
+   */
+  private[sql] def collectAsArrowToPython(): Int = {
+val payloadRdd = toArrowPayloadBytes()
+val payloadByteArrays = payloadRdd.collect()
--- End diff --

The conversion going on in `table.to_pandas()` is working on an already 
loaded table, but the Arrow Readers can read multiple batches of data and 
output a single table.  The issue is that pyspark serializers expects the data 
to be "framed" with the length so I can not send that directly to the Arrow 
Reader.  Even with `toLocalIteratorAndServer` I would have to read each batch 
of data on the driver, then combine.  It would be possible to write the 
"framed" stream another stream without the lengths, where it can then be then 
be read into a single table - but I'm not sure if that added complexity is 
worth it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17499: [SPARK-20161][CORE] Default log4j properties file should...

2017-04-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17499
  
Maybe Hive can do it in Hive.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17521: [SPARK-20204][SQL] separate SQLConf into catalyst confs ...

2017-04-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17521
  
To be clear, I don't think we should have two separate places to define 
config entries. If this is what the pr is doing, I strongly veto.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17522: [SPARK-18278] [Scheduler] Documentation to point to Kube...

2017-04-03 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/17522
  
@mridulm, I understand your concern here. This is however an effort from 
the Kubernetes community 
(https://github.com/kubernetes/kubernetes/issues/34377), so, the eventuality of 
a different parallel effort, is unlikely.
@rxin thanks for reviewing. I've updated the wording as @markhamstra just 
suggested. Do we want more clarification about the level of commitment or does 
this look ok? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17522: [SPARK-18278] [Scheduler] Documentation to point to Kube...

2017-04-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17522
  
Seems fine to me, since the number of external resource managers are small. 
We should definitely make it clear there is no firm commitment currently to 
merge this into Spark though.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17521: [SPARK-20204][SQL] separate SQLConf into catalyst confs ...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17521
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75492/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17521: [SPARK-20204][SQL] separate SQLConf into catalyst confs ...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17521
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17521: [SPARK-20204][SQL] separate SQLConf into catalyst confs ...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17521
  
**[Test build #75492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75492/testReport)**
 for PR 17521 at commit 
[`32aaf63`](https://github.com/apache/spark/commit/32aaf6390f7897cb2b109341d62280fbe08c9336).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-04-03 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16906
  
I think this looks reasonable, although it would maybe make sense to add a 
warning if the user has explicitly requested hive support and we are falling 
through to non-hive support (e.g. in the except side of the try block).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-04-03 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17375
  
Anaconda default to 3.6 definitely makes this make more sense, thanks 
@zero323 I had forgotten that. I'll give @davies until next week to say 
anything about this but otherwise I think the set of backports for this issue 
make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17494: [SPARK-20076][ML][PySpark] Add Python interface f...

2017-04-03 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17494#discussion_r109538706
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -71,6 +71,62 @@ def test(dataset, featuresCol, labelCol):
 return _java2py(sc, javaTestObj.test(*args))
 
 
+class Correlation(object):
+"""
+.. note:: Experimental
+
+Compute the correlation matrix for the input dataset of Vectors using 
the specified method.
+Methods currently supported: `pearson` (default), `spearman`.
--- End diff --

So the Scala documentation had a warning about caching being suggested when 
using Spearman, would it make sense to copy this warning over as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17494: [SPARK-20076][ML][PySpark] Add Python interface f...

2017-04-03 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17494#discussion_r109538556
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlation.scala 
---
@@ -56,7 +56,7 @@ object Correlation {
*  Here is how to access the correlation coefficient:
*  {{{
*val data: Dataset[Vector] = ...
-   *val Row(coeff: Matrix) = Statistics.corr(data, "value").head
+   *val Row(coeff: Matrix) = Correlation.corr(data, "value").head
*// coeff now contains the Pearson correlation matrix.
*  }}}
*
--- End diff --

Also since we are here as well, there is a reference to input RDD up above 
in the docstring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

2017-04-03 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16793
  
Let me try and take a look tonight. It seems like there are some small 
formatting issues still at a quick glance but I feel like this should be close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-04-03 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17328
  
jenkins, ok to test.
Does someone on the SQL side have a chance to look at this to say if its 
something they want added to the DataFrame API? Maybe @marmbrus ? I'm a little 
hesistant with adding it to functions in this way since the `map_values` has a 
different meaning than `mapValues` in RDD land and it seems like that could 
cause some confusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17523: [SPARK-20064][PySpark]

2017-04-03 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17523
  
Thanks for doing this @setjet & welcome to the Spark project :) This change 
looks good pending jenkins, if everything passes I'll merge it tonight.

For others looking at this PR wondering, make-distribution writes its own 
version number when building but this version number is used for development 
builds so keeping it up-to-date is useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17523: [SPARK-20064][PySpark]

2017-04-03 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17523
  
Jenkins OK to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...

2017-04-03 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17508
  
@srowen @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17520
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75493/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17520
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17522: [SPARK-18278] [Scheduler] Documentation to point to Kube...

2017-04-03 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17522
  
I dont think we should be pointing to third party projects in spark 
documentation - for example, it might be possible that some other effort gets 
merged in instead of the above.

If/when it does eventually get merged, we can add the appropriate cluster 
manager entry for it - until then, there are other means of evangelizing user 
participation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17520
  
**[Test build #75493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75493/testReport)**
 for PR 17520 at commit 
[`4aaab02`](https://github.com/apache/spark/commit/4aaab02b6fa384c51aef8484255f7a51097842be).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17523: [SPARK-20064][PySpark]

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17523
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17422: [SPARK-20087] Attach accumulators / metrics to 'TaskKill...

2017-04-03 Thread noodle-fb
Github user noodle-fb commented on the issue:

https://github.com/apache/spark/pull/17422
  
@JoshRosen ping? not sure how to github correctly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17523: [SPARK-20064][PySpark]

2017-04-03 Thread setjet
GitHub user setjet opened a pull request:

https://github.com/apache/spark/pull/17523

[SPARK-20064][PySpark]

## What changes were proposed in this pull request?
PySpark version in version.py was lagging behind
Versioning is  in line with PEP 440: 
https://www.python.org/dev/peps/pep-0440/

## How was this patch tested?
Simply rebuild the project with existing tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/setjet/spark SPARK-20064

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17523


commit a2358f7afa8502b8272a4e7caa6c64ad9f0db27d
Author: Ruben Janssen 
Date:   2016-07-16T15:03:19Z

added a python example for chisq selector in mllib

commit ca7cd787e174e04fbe0fcdcff26c8169450abc7b
Author: Ruben Janssen 
Date:   2016-08-01T18:14:01Z

updated documentation to refer to the example

commit 035aeb63ef8e8f2af8f7ed838d434a069392c336
Author: Ruben Janssen 
Date:   2016-10-16T15:00:44Z

updated with changes suggested by sethah

commit f49e6aea59994c471ea0270b41d5237a1f2a6a47
Author: Ruben Janssen 
Date:   2016-10-16T15:09:46Z

oops forgot to revert back local changes

commit a45ff2fa5e5a3633d3de24c5c2f91d59824b0fc8
Author: setjet 
Date:   2017-04-03T19:18:42Z

Merge remote-tracking branch 'upstream/master'

commit 8363e28e2d400c599052120153fc08eff8253cd5
Author: setjet 
Date:   2017-04-03T19:53:02Z

increased pyspark version

commit 881470d87d499c16cfbf6ea0a265369d60ba8f80
Author: setjet 
Date:   2017-04-03T21:25:37Z

Revert "oops forgot to revert back local changes"

This reverts commit f49e6aea59994c471ea0270b41d5237a1f2a6a47.

commit 09171936d5d1e9293fee6d28c44d74441a4920ab
Author: setjet 
Date:   2017-04-03T21:26:03Z

Revert "updated with changes suggested by sethah"

This reverts commit 035aeb63ef8e8f2af8f7ed838d434a069392c336.

commit c15654aa242d486b5eeb7e22e79915a165f6bb99
Author: setjet 
Date:   2017-04-03T21:26:30Z

Revert "updated documentation to refer to the example"

This reverts commit ca7cd787e174e04fbe0fcdcff26c8169450abc7b.

commit 47e4ab2cf8794718d68b5007f4980aae175eb94e
Author: setjet 
Date:   2017-04-03T21:26:39Z

Revert "added a python example for chisq selector in mllib"

This reverts commit a2358f7afa8502b8272a4e7caa6c64ad9f0db27d.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2017-04-03 Thread nsyca
Github user nsyca commented on the issue:

https://github.com/apache/spark/pull/17520
  
cc: @hvanhovell


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75487/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-04-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17087
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >