date:20170124

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16691
  
**[Test build #71971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71971/testReport)**
 for PR 16691 at commit 
[`f7312c7`](https://github.com/apache/spark/commit/f7312c76902836d0341a9ca0dcb1412ac413f573).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16700
  
**[Test build #71976 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71976/testReport)**
 for PR 16700 at commit 
[`878d45e`](https://github.com/apache/spark/commit/878d45eb81ad9b1b5486689a527c0c2cdfe81b81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-24 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16630
  
This test just resists to start. Could someone help? Many thanks!
@srowen @jkbradley @MLnick @yanboliang 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-24 Thread windpiger

GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16700

[SPARK-19359][SQL]clear useless path after rename a partition with 
upper-case by HiveExternalCatalog

## What changes were proposed in this pull request?

Hive metastore is not case preserving and keep partition columns with lower 
case names. 

If SparkSQL create a table with upper-case partion name use 
HiveExternalCatalog, when we rename partition, it first call the HiveClient to 
renamePartition, which will create a new lower case partition path, then 
SparkSql rename the lower case path to the upper-case.

while if the renamed partition contains more than one depth partition ,e.g. 
A=1/B=2, hive renamePartition change to a=1/b=2, then SparkSql rename it to 
A=1/B=2, but the a=1 still exists in the filesystem, we should also delete it.

## How was this patch tested?
unit test added


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark 
clearUselessPathAfterRenamPartition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16700


commit 6a8efddfb44f3f72c15ff6c20cd6ce341bec6da7
Author: windpiger 
Date:   2017-01-25T07:44:20Z

[SPARK-19359][SQL]clear useless path after rename a partition with 
upper-case in HiveExternalCatalog

commit 878d45eb81ad9b1b5486689a527c0c2cdfe81b81
Author: windpiger 
Date:   2017-01-25T07:51:07Z

reset a tc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16699
  
**[Test build #71975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71975/testReport)**
 for PR 16699 at commit 
[`d071b95`](https://github.com/apache/spark/commit/d071b95ccc94404d37cde7cc122cf8a13fd04449).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16699
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16699
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16699
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71974/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16699
  
**[Test build #71974 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71974/testReport)**
 for PR 16699 at commit 
[`a1f5695`](https://github.com/apache/spark/commit/a1f56952b4ffcdfb400ff0ce014987c50de5f33e).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16688
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71970/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16688
  
**[Test build #71970 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71970/testReport)**
 for PR 16688 at commit 
[`b71120d`](https://github.com/apache/spark/commit/b71120d562b28c94b8a1b0689b3c2fac11d84a37).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16699
  
**[Test build #71974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71974/testReport)**
 for PR 16699 at commit 
[`a1f5695`](https://github.com/apache/spark/commit/a1f56952b4ffcdfb400ff0ce014987c50de5f33e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread actuaryzhang

GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/16699

[SPARK-18710] Add offset in GLM

## What changes were proposed in this pull request?
Add support for offset in GLM. This is useful for at least two reasons:

1. Account for exposure: e.g., when modeling the number of accidents, we 
may need to use miles driven as an offset to access factors on frequency.
2. Test incremental effects of new variables: we can use predictions from 
the existing model as offset and run a much smaller model on only new 
variables. This avoids re-estimating the large model with all variables (old + 
new) and can be very important for efficient large-scaled analysis. 

## How was this patch tested?
New test.

@yanboliang @srowen @felixcheung @sethah 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark offset

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16699.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16699


commit 3bf2718c1a1e68273508e63499bb5d1cc8230155
Author: actuaryzhang 
Date:   2017-01-24T23:46:16Z

add trait offset

commit 0e240eb313aa91cb645fb3ab8d70e51b6c65b3c7
Author: actuaryzhang 
Date:   2017-01-24T23:48:03Z

add offset setter

commit 9c41453a19c0f9c31403fafaf1995c642c37c70d
Author: actuaryzhang 
Date:   2017-01-25T05:15:50Z

implement offset in GLM

commit 7823f8af8b0926790816c9e79e9425e503e494ad
Author: actuaryzhang 
Date:   2017-01-25T06:55:56Z

add test for glm with offset




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16698: [CORE][DOCS] Update a help message for --files in spark-...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71968/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16698: [CORE][DOCS] Update a help message for --files in spark-...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16698
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16698: [CORE][DOCS] Update a help message for --files in spark-...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16698
  
**[Test build #71968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71968/testReport)**
 for PR 16698 at commit 
[`c55c7c9`](https://github.com/apache/spark/commit/c55c7c91db64f821e075c5df091facc62d9568c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71967/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #71967 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71967/testReport)**
 for PR 16680 at commit 
[`37e0296`](https://github.com/apache/spark/commit/37e029687f1af1d5acfbe8111c6da8987a20abf1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71966/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #71966 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71966/testReport)**
 for PR 16680 at commit 
[`15c4dec`](https://github.com/apache/spark/commit/15c4dec8c4a35abd6e2dcf47dd2f2e9bc37c8129).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2017-01-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16329
  
Thanks! Merging to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread titicaca

Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97714703
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

In local R, if we try
```
df <- data.frame(x = c(0,1,2), y = c(NA, NA, 1))
class(head(df, 1)$y)
```
The output is still numeric instead of logical. But the existed test is 
expecting NA logical instead of NA numeric.

So is it necessary to correct the existed tests, for example 
@test_sparkSQL.R#1280
from `expect_equal(collect(select(df, first(df$age)))[[1]], NA)` to
`expect_equal(collect(select(df, first(df$age)))[[1]], NA_real_)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71973/testReport)**
 for PR 16677 at commit 
[`4e31bb7`](https://github.com/apache/spark/commit/4e31bb7959cb774b51d6d8662f53a3ad96b4dc49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71972/testReport)**
 for PR 16677 at commit 
[`def10e6`](https://github.com/apache/spark/commit/def10e696bed279a240b4454154ce4aea713cf47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r97712511
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = FakePartitioning(child.outputPartitioning,
+  childRDD.getNumPartitions)
+val shuffleDependency = ShuffleExchange.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Int] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().numberOfOutput.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// Try to keep child plan's original data parallelism or not. It is 
enabled by default.
+val respectChildParallelism = sqlContext.conf.enableParallelGlobalLimit
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  childRDD
+} else if (!respectChildParallelism) {
+  // This is mainly for tests.
+  // We take the rows of each partition until we reach the required 
limit number.
+  var countForRows = 0
+  val takeAmounts = new mutable.HashMap[Int, Int]()
+  numberOfOutput.zipWithIndex.foreach { case (num, index) =>
+if (countForRows + num < limit) {
+  countForRows += num
+  takeAmounts += ((index, num))
+} else {
+  val toTake = limit - countForRows
+  countForRows += toTake
+  takeAmounts += ((index, toTake))
+}
+  }
+  val shuffled = new ShuffledRowRDD(shuffleDependency)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+takeAmounts.get(index).map { size =>
+  iter.take(size)
+}.getOrElse(iter)
+  }
+} else {
+  // We try to distribute the required limit number of rows across all 
child rdd's partitions.
+  var numToReduce = (sumOfOutput - limit)
+  val reduceAmounts = new mutable.HashMap[Int, Int]()
+  val nonEmptyParts = numberOfOutput.filter(_ > 0).size
+  val reducePerPart = numToReduce / nonEmptyParts
+  numberOfOutput.zipWithIndex.foreach { case (num, index) =>
+if (num >= reducePerPart) {
+  numToReduce -= reducePerPart
+  reduceAmounts += ((index, reducePerPart))
+} else {
+  numToReduce -= num
+  reduceAmounts += ((index, num))
+}
+  }
+  while (numToReduce > 0) {
+numberOfOutput.zipWithIndex.foreach { case (num, index) =>
+  val toReduce = if (numToReduce / nonEmptyParts > 0) {
+numToReduce / nonEmptyParts
+  } else {
+numToReduce
+  }
+  if (num - reduceAmounts(index) >= toReduce) {
+reduceAmounts(index) = reduceAmounts(index) + toReduce
+numToReduce -= toReduce
+  } else if (num - reduceAmounts(index) > 0) {
+reduceAmounts(index) = reduceAmounts(index) + 1
+numToReduce -= 1
+  }
+}
+  }
+
+  val shuffled = new ShuffledRowRDD(shuffleDependency)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+

[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread titicaca

Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97712469
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

Yes. My first commit was trying to cast the column to its corresponding R 
data type explicitly, even if it is an vector with all NAs. However some 
existed tests were failed and expecting to get logical NA. For example
```
3. Failure: column functions (@test_sparkSQL.R#1280) 
---
collect(select(df, first(df$age)))[[1]] not equal to NA.
Types not compatible: double vs logical
4. Failure: column functions (@test_sparkSQL.R#1282) 
---
collect(select(df, first("age")))[[1]] not equal to NA.
Types not compatible: double vs logical
``` 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16329
  
Sure, let me quickly go over the changes. Will merge it after that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-24 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/16582#discussion_r97710897
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -274,25 +277,28 @@ private[spark] object JettyUtils extends Logging {
   conf: SparkConf,
   serverName: String = ""): ServerInfo = {
 
-val collection = new ContextHandlerCollection
 addFilters(handlers, conf)
 
 val gzipHandlers = handlers.map { h =>
+  h.setVirtualHosts(Array("@" + SPARK_CONNECTOR_NAME))
--- End diff --

Of course I understand this change is needed. I've confirmed it manually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16685
  
@ilganeli Regarding the support of **UPDATE**, your idea is pretty good. 
Please submit a separate PR and improve the document. I will review it. 

Regarding the support of **UPSERT**, we need to measure the performance. We 
can try the both solutions and measure the performance difference. 

I have a very basic question. If we are doing the **UPSERT for a small data 
set** (which is a common case), it sounds like fetching the whole table from 
the source is pretty expensive. Normally, the performance of JDBC is 
nortoriously bad when the table size is large. Thus, IMHO, we need to avoid 
fetching the source table for supporting UPSERT. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r97710485
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = FakePartitioning(child.outputPartitioning,
+  childRDD.getNumPartitions)
+val shuffleDependency = ShuffleExchange.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Int] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().numberOfOutput.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// Try to keep child plan's original data parallelism or not. It is 
enabled by default.
+val respectChildParallelism = sqlContext.conf.enableParallelGlobalLimit
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  childRDD
+} else if (!respectChildParallelism) {
+  // This is mainly for tests.
+  // We take the rows of each partition until we reach the required 
limit number.
+  var countForRows = 0
+  val takeAmounts = new mutable.HashMap[Int, Int]()
+  numberOfOutput.zipWithIndex.foreach { case (num, index) =>
+if (countForRows + num < limit) {
+  countForRows += num
+  takeAmounts += ((index, num))
+} else {
+  val toTake = limit - countForRows
+  countForRows += toTake
+  takeAmounts += ((index, toTake))
+}
+  }
+  val shuffled = new ShuffledRowRDD(shuffleDependency)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+takeAmounts.get(index).map { size =>
+  iter.take(size)
+}.getOrElse(iter)
+  }
+} else {
+  // We try to distribute the required limit number of rows across all 
child rdd's partitions.
+  var numToReduce = (sumOfOutput - limit)
+  val reduceAmounts = new mutable.HashMap[Int, Int]()
--- End diff --

yeah, i have thought it before. forget to add it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r97710405
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = FakePartitioning(child.outputPartitioning,
+  childRDD.getNumPartitions)
+val shuffleDependency = ShuffleExchange.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Int] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().numberOfOutput.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// Try to keep child plan's original data parallelism or not. It is 
enabled by default.
+val respectChildParallelism = sqlContext.conf.enableParallelGlobalLimit
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  childRDD
--- End diff --

oh, right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r97710396
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = FakePartitioning(child.outputPartitioning,
+  childRDD.getNumPartitions)
+val shuffleDependency = ShuffleExchange.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Int] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().numberOfOutput.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// Try to keep child plan's original data parallelism or not. It is 
enabled by default.
+val respectChildParallelism = sqlContext.conf.enableParallelGlobalLimit
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  childRDD
+} else if (!respectChildParallelism) {
+  // This is mainly for tests.
+  // We take the rows of each partition until we reach the required 
limit number.
+  var countForRows = 0
+  val takeAmounts = new mutable.HashMap[Int, Int]()
+  numberOfOutput.zipWithIndex.foreach { case (num, index) =>
+if (countForRows + num < limit) {
+  countForRows += num
+  takeAmounts += ((index, num))
+} else {
+  val toTake = limit - countForRows
+  countForRows += toTake
+  takeAmounts += ((index, toTake))
+}
+  }
+  val shuffled = new ShuffledRowRDD(shuffleDependency)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+takeAmounts.get(index).map { size =>
+  iter.take(size)
+}.getOrElse(iter)
--- End diff --

Actually we won't reach here, but the change is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16465: [SPARK-19064][PySpark]Fix pip installing of sub c...

2017-01-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16465#discussion_r97710172
  
--- Diff: dev/pip-sanity-check.py ---
@@ -18,6 +18,8 @@
 from __future__ import print_function
 
 from pyspark.sql import SparkSession
+from pyspark.ml.param import Params
+from pyspark.mllib.linalg import *
--- End diff --

ok. i think this should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16691
  
**[Test build #71971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71971/testReport)**
 for PR 16691 at commit 
[`f7312c7`](https://github.com/apache/spark/commit/f7312c76902836d0341a9ca0dcb1412ac413f573).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16688
  
**[Test build #71970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71970/testReport)**
 for PR 16688 at commit 
[`b71120d`](https://github.com/apache/spark/commit/b71120d562b28c94b8a1b0689b3c2fac11d84a37).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16688
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16693
  
LGTM except two minor comments in the error messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97709404
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -44,40 +44,6 @@ case class CreateHiveTableAsSelectCommand(
   override def innerChildren: Seq[LogicalPlan] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-lazy val metastoreRelation: MetastoreRelation = {
-  import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-  import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
-  import org.apache.hadoop.io.Text
-  import org.apache.hadoop.mapred.TextInputFormat
-
-  val withFormat =
-tableDesc.withNewStorage(
-  inputFormat =
-
tableDesc.storage.inputFormat.orElse(Some(classOf[TextInputFormat].getName)),
-  outputFormat =
-tableDesc.storage.outputFormat
-  .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, 
Text]].getName)),
-  serde = 
tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)),
-  compressed = tableDesc.storage.compressed)
--- End diff --

Actually, after the code refactoring, this is always ensured in the rule 
`DetermineHiveSerde`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97709347
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -44,40 +44,6 @@ case class CreateHiveTableAsSelectCommand(
   override def innerChildren: Seq[LogicalPlan] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-lazy val metastoreRelation: MetastoreRelation = {
-  import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-  import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
-  import org.apache.hadoop.io.Text
-  import org.apache.hadoop.mapred.TextInputFormat
-
-  val withFormat =
-tableDesc.withNewStorage(
-  inputFormat =
-
tableDesc.storage.inputFormat.orElse(Some(classOf[TextInputFormat].getName)),
-  outputFormat =
-tableDesc.storage.outputFormat
-  .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, 
Text]].getName)),
-  serde = 
tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)),
-  compressed = tableDesc.storage.compressed)
-
-  val withSchema = if (withFormat.schema.isEmpty) {
-tableDesc.copy(schema = query.schema)
-  } else {
-withFormat
--- End diff --

To the other reviewers, this is not needed, because the schema is always 
empty when we need to create a table. See [the assert 
here.](https://github.com/cloud-fan/spark/blob/db00cf9061b2ad4263671f5ca9252642a091ee45/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala#L70).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-24 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/16582#discussion_r97709181
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -274,25 +277,28 @@ private[spark] object JettyUtils extends Logging {
   conf: SparkConf,
   serverName: String = ""): ServerInfo = {
 
-val collection = new ContextHandlerCollection
 addFilters(handlers, conf)
 
 val gzipHandlers = handlers.map { h =>
+  h.setVirtualHosts(Array("@" + SPARK_CONNECTOR_NAME))
--- End diff --

> Without this change the UI does not work at all. The test I added already 
covers it.
Hmm, it's funny. I commented out this change and run the test case you 
added (UISuite and UISeleniumSuite) but it passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71964/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16696
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16696
  
**[Test build #71964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71964/testReport)**
 for PR 16696 at commit 
[`b88fac5`](https://github.com/apache/spark/commit/b88fac58331b5fbbae83ac5cb8ba37d1bbb76b4c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class LimitNode extends UnaryNode `
  * `case class GlobalLimit(limitExpr: Expression, child: LogicalPlan) 
extends LimitNode`
  * `case class LocalLimit(limitExpr: Expression, child: LogicalPlan) 
extends LimitNode`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16465: [SPARK-19064][PySpark]Fix pip installing of sub componen...

2017-01-24 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16465
  
Holden, https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py 
works well for merging to master and back port to any branch :) unless there is 
conflict then it would be easier with a separate PR.

Have fun!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71963/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16696
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16696
  
**[Test build #71963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71963/testReport)**
 for PR 16696 at commit 
[`62013f5`](https://github.com/apache/spark/commit/62013f5c5bc1a6d43e1b111aa3784b4524c8fda4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97708445
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -89,12 +55,30 @@ case class CreateHiveTableAsSelectCommand(
 // Since the table already exists and the save mode is Ignore, we 
will just return.
 return Seq.empty
   }
-  sparkSession.sessionState.executePlan(InsertIntoTable(
-metastoreRelation, Map(), query, overwrite = false, ifNotExists = 
false)).toRdd
--- End diff --

uh... Previously, we try to create the table even if the table still 
exists. A good change!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-01-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16680
  
@cloud-fan, could you please take a look? I tried to not change the current 
behaviour and logics at my best but just re-locate them here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16689
  
**[Test build #71969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71969/testReport)**
 for PR 16689 at commit 
[`6a0eb3f`](https://github.com/apache/spark/commit/6a0eb3f8789b6a66a4d1419c389fdda2edc0bc95).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16689
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71969/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16689
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-24 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97708190
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -475,6 +1164,45 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
   }
 
+  test("to_unix_timestamp with session local timezone") {
--- End diff --

Ah, I see! I'll move tests to `DateTimeUtilsSuite` soon. Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16668: [SPARK-18788][SPARKR] Add API for getNumPartitions

2017-01-24 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16668
  
@shivaram how about we merge this to master & branch-2.1? then I can based 
off of this to Dataset/DataFrame API in Scala as @cloud-fan suggests - it would 
be easier than porting the little fixes to get around the getNumPartitions 
conflicts in R. And having this in 2.1.x is not likely much worse than people 
calling the non-public methods...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16698: [CORE][DOCS] Update a help message for --files in spark-...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16698
  
**[Test build #71968 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71968/testReport)**
 for PR 16698 at commit 
[`c55c7c9`](https://github.com/apache/spark/commit/c55c7c91db64f821e075c5df091facc62d9568c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16689
  
**[Test build #71969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71969/testReport)**
 for PR 16689 at commit 
[`6a0eb3f`](https://github.com/apache/spark/commit/6a0eb3f8789b6a66a4d1419c389fdda2edc0bc95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71960/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16572
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16698: [CORE][DOCS] Update a help message for --files in spark-...

2017-01-24 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16698
  
Nitpicking though, I think it helps for some users. Could someone check 
this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16689
  
Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16698: [CORE][DOCS] Update a help message for --files in...

2017-01-24 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/16698

[CORE][DOCS] Update a help message for --files in spark-submit

## What changes were proposed in this pull request?
This pr is to update a help message for `--files` in spark-submit because 
it seems users get confused about how to get full paths of the files that one 
adds via the option.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SparkFilesDoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16698


commit c55c7c91db64f821e075c5df091facc62d9568c1
Author: Takeshi YAMAMURO 
Date:   2017-01-25T04:15:28Z

Update a help message in spark-submit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16572
  
**[Test build #71960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71960/testReport)**
 for PR 16572 at commit 
[`010d27a`](https://github.com/apache/spark/commit/010d27a79be684668012fd796d21a085308dd828).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97707703
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

if the DataFrame column is of type string, shouldn't it converts to R as 
character (which can be all NA), even though the column only has NULL (which 
maps to NA in R)?

it seems with this change it would become logical in R instead of character.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16689
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16689
  
great! @shivaram could you get Jenkins to test this fix please? I don't 
seem to have the power to command it :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-24 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16582#discussion_r97706756
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -274,25 +277,28 @@ private[spark] object JettyUtils extends Logging {
   conf: SparkConf,
   serverName: String = ""): ServerInfo = {
 
-val collection = new ContextHandlerCollection
 addFilters(handlers, conf)
 
 val gzipHandlers = handlers.map { h =>
+  h.setVirtualHosts(Array("@" + SPARK_CONNECTOR_NAME))
--- End diff --

I don't understand what you mean. Without this change the UI does not work 
at all. The test I added already covers it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #71967 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71967/testReport)**
 for PR 16680 at commit 
[`37e0296`](https://github.com/apache/spark/commit/37e029687f1af1d5acfbe8111c6da8987a20abf1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema in...

2017-01-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16680#discussion_r97706016
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -39,22 +37,76 @@ private[csv] object CSVInferSchema {
* 3. Replace any null types with string type
*/
   def infer(
-  tokenRdd: RDD[Array[String]],
-  header: Array[String],
+  csv: Dataset[String],
+  caseSensitive: Boolean,
   options: CSVOptions): StructType = {
-val startType: Array[DataType] = 
Array.fill[DataType](header.length)(NullType)
-val rootTypes: Array[DataType] =
-  tokenRdd.aggregate(startType)(inferRowType(options), mergeRowTypes)
-
-val structFields = header.zip(rootTypes).map { case (thisHeader, 
rootType) =>
-  val dType = rootType match {
-case _: NullType => StringType
-case other => other
+val firstLine: String = CSVUtils.filterCommentAndEmpty(csv, 
options).first()
--- End diff --

Both `CSVUtils.filterCommentAndEmpty` usages here and below should exactly 
the same up to my knowledge but I let them as are just simply to keep the 
behaviour for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-01-24 Thread samkum

Github user samkum commented on the issue:

https://github.com/apache/spark/pull/16387
  
Nope, I didn't tested it in isolation.

-Sameer.

On Jan 24, 2017 10:09 PM, "Marcelo Vanzin"  wrote:

> No the question is whether you tested without @viirya
>  commit b1ef9ec (the last one that forces
> spills of in-memory maps), or just the very last version of the patch.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71958/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16697
  
**[Test build #71965 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71965/testReport)**
 for PR 16697 at commit 
[`b4bc5af`](https://github.com/apache/spark/commit/b4bc5af16e53365451c51ca0c9a92ab6915a0987).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #71958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71958/testReport)**
 for PR 16478 at commit 
[`9d6f4ad`](https://github.com/apache/spark/commit/9d6f4adb7f8a6679103b0978fc840abc72fc7bcb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #71966 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71966/testReport)**
 for PR 16680 at commit 
[`15c4dec`](https://github.com/apache/spark/commit/15c4dec8c4a35abd6e2dcf47dd2f2e9bc37c8129).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97705600
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -116,6 +117,11 @@ final class DataStreamReader 
private[sql](sparkSession: SparkSession) extends Lo
* @since 2.0.0
*/
   def load(): DataFrame = {
+if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+  throw new AnalysisException("Hive data source can only be used with 
tables, you can not " +
+"write files of Hive data source directly.")
--- End diff --

This is to read the streaming data from Hive tables, right? I think we need 
to fix the error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71955/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97705570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -221,6 +222,11 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 2.0.0
*/
   def start(): StreamingQuery = {
+if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+  throw new AnalysisException("Hive data source can only be used with 
tables, you can not " +
+"read files of Hive data source directly.")
--- End diff --

This is not to read but write the results to Hive tables, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16697: [SPARK-19358][CORE] LiveListenerBus shall log the...

2017-01-24 Thread CodingCat

GitHub user CodingCat opened a pull request:

https://github.com/apache/spark/pull/16697

[SPARK-19358][CORE] LiveListenerBus shall log the event name when dropping 
them due to a fully filled queue

## What changes were proposed in this pull request?

Some dropped event will make the whole application behaves unexpectedly, 
e.g. some UI problem...we shall log the dropped event name to facilitate the 
debugging

## How was this patch tested?

Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CodingCat/spark SPARK-19358

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16697


commit 24bfa382a43c2fbbf54b24bb8f03766910216490
Author: CodingCat 
Date:   2016-03-07T14:37:37Z

improve the doc for "spark.memory.offHeap.size"

commit 2209e345df4636f8fa881b3ad45084b75f9fe3eb
Author: CodingCat 
Date:   2016-03-07T19:00:16Z

fix

commit f2d9db1b725c6dcadee5ee1a7e43d5fbb601f367
Author: CodingCat 
Date:   2016-12-28T03:52:41Z

Merge branch 'master' of https://github.com/apache/spark

commit dd409b4c7a5afe07cfe6b36691bed42049a5c7b2
Author: CodingCat 
Date:   2017-01-13T04:10:40Z

Merge branch 'master' of https://github.com/apache/spark

commit f026adebb787a09dc4cc90f87dc94590ab1816f7
Author: CodingCat 
Date:   2017-01-25T03:53:56Z

Merge branch 'master' of https://github.com/apache/spark

commit b4bc5af16e53365451c51ca0c9a92ab6915a0987
Author: CodingCat 
Date:   2017-01-25T03:58:52Z

logging event




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71955/testReport)**
 for PR 16677 at commit 
[`7f89c30`](https://github.com/apache/spark/commit/7f89c305f8ddd595fd752f7a8c238d23ec796895).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class FakePartitioning(orgPartition: Partitioning, numPartitions: 
Int) extends Partitioning `
  * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode with CodegenSupport `
  * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71959/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #71959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71959/testReport)**
 for PR 15505 at commit 
[`9380153`](https://github.com/apache/spark/commit/938015308194272478fb7c27fd1a942755f9da2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...

2017-01-24 Thread xwu0226

Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/16685
  
@ilganeli Thanks for replying to my comments! Please correct me if I am 
wrong. My understanding of your assumption is that the target table does not 
have or maintain any unique constraints. Mostly the target table is created and 
maintained solely by the spark application, right? 

If this is the assumption, I do believe that the simple INSERT and UPDATE 
may perform better than UPSERT.  But if the target table has unique constraint 
to start with, INSERT/UPDATE  and UPSERT/MERGE comparison may be like what you 
said as slight horse race, since in either case index lookup and validation is 
required, where UPSERT/MERGE may have a bit more `if/else` depending on the 
implementation in the database systems.  Benchmark between 2 approaches can 
tell. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-24 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/16654
  
@srowen I agree that metric should be irrelevant to details of the 
algorithms. AUC is irrelevant to algorithms, it is just relevant to the 
dataset: In spark-ml, scikit-learn, or any other packages, the input dataset 
contains `label,decision values(or probabilities)`ï¼ if and only if there 
exist two labels in the dataset, AUC can be computed, no matter which 
classifier is used. 

I also agree that some general metrics should be abstracted in Evaluator. 

I just disagree that if we treat WSSSE as a general metric:
There have been some attempts to add K-Medoids in spark, although their PRs 
were not accepted, there are still some third-party source implementing 
K-Medoids on spark.
More realisticly, Spark is used together with other ml-packages in many 
cases, suppose use other packages to generate the model locally, and evaluate 
the result in spark.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71952/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71952/testReport)**
 for PR 16605 at commit 
[`bd1773b`](https://github.com/apache/spark/commit/bd1773b1946287b00a5cd4cdc1c775a69f835098).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16691
  
Working on the test failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread titicaca

Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Sorry for the late reply. I figured out that the tests failed because if a 
vector is with only NAs, the type is logical, therefore we cannot cast the type 
in that case. I have updated the codes and added some tests for that. Thank you 
for the advice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16691
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16691
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71956/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16691
  
**[Test build #71956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71956/testReport)**
 for PR 16691 at commit 
[`0c24291`](https://github.com/apache/spark/commit/0c24291b2738d2c71b59decc60b9e33524b8f84d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16696
  
**[Test build #71964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71964/testReport)**
 for PR 16696 at commit 
[`b88fac5`](https://github.com/apache/spark/commit/b88fac58331b5fbbae83ac5cb8ba37d1bbb76b4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71962/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16695
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 457 matches

Mail list logo