date:20160605

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13505
  
**[Test build #60019 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60019/consoleFull)**
 for PR 13505 at commit 
[`5504b6c`](https://github.com/apache/spark/commit/5504b6c2dd3ac7959b2cb7e139a54208368a9a45).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13505
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60019/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13505
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13372: [SPARK-15585][SQL] Fix NULL handling along with a spark-...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13372
  
I think the best way is probably to document and ask users to use 
`'\u'` explicitly. @maropu does that work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13476: [SPARK-15684][SparkR]Not mask startsWith and endsWith in...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13476
  
**[Test build #60027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60027/consoleFull)**
 for PR 13476 at commit 
[`ce14d78`](https://github.com/apache/spark/commit/ce14d78a7b7451f17e521159a27189aa0452a7ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13476: [SPARK-15684][SparkR]Not mask startsWith and ends...

2016-06-05 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13476#discussion_r65842260
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1137,6 +1137,13 @@ test_that("string operators", {
   expect_equal(count(where(df, like(df$name, "A%"))), 1)
   expect_equal(count(where(df, startsWith(df$name, "A"))), 1)
   expect_equal(first(select(df, substr(df$name, 1, 2)))[[1]], "Mi")
+  if (as.numeric(R.version$major) >= 3 && as.numeric(R.version$minor) >= 
3) {
--- End diff --

I agree with you that it should work for Column anyway. These are all new 
tested added by me. I will move the column tests out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13508
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60023/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13508
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13508
  
**[Test build #60023 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60023/consoleFull)**
 for PR 13508 at commit 
[`4c85aea`](https://github.com/apache/spark/commit/4c85aeab4fade5284a18975ee8a9b9973b1ea779).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13372: [SPARK-15585][SQL] Fix NULL handling along with a spark-...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13372
  
Actually sorry -- I thought about this more and unfortunately we can't do 
it this way.  The main problem is that we would break existing option use case, 
e.g. the following code:
```python
df.option("sep", "|").csv("...")
```

In this case, the default sep would still be chosen. I'm going to revert 
this patch, and then think about a workaround instead.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13372: [SPARK-15585][SQL] Fix NULL handling along with a...

2016-06-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13372


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13372: [SPARK-15585][SQL] Fix NULL handling along with a spark-...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13372
  
Thanks - I'm going to merge this in master/2.0. and make some fixes myself.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13486
  
**[Test build #60024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60024/consoleFull)**
 for PR 13486 at commit 
[`8c68b79`](https://github.com/apache/spark/commit/8c68b79fc9c56b258a29e6c64e1c6de5eeed8644).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13403
  
**[Test build #60026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60026/consoleFull)**
 for PR 13403 at commit 
[`ae5fb7a`](https://github.com/apache/spark/commit/ae5fb7ab466015d4968518a31d7bc6bf36cdf33e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13436: [SPARK-15696][SQL] Improve `crosstab` to have a consiste...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13436
  
**[Test build #60025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60025/consoleFull)**
 for PR 13436 at commit 
[`cc94b37`](https://github.com/apache/spark/commit/cc94b3762d8f47bd1b24382272569952eacf70fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13508
  
**[Test build #60023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60023/consoleFull)**
 for PR 13508 at commit 
[`4c85aea`](https://github.com/apache/spark/commit/4c85aeab4fade5284a18975ee8a9b9973b1ea779).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13508
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60022/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13508
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13508
  
**[Test build #60022 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60022/consoleFull)**
 for PR 13508 at commit 
[`d0d2e4c`](https://github.com/apache/spark/commit/d0d2e4c464aefe7d18d6ece736e0227c68d842bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13508
  
**[Test build #60022 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60022/consoleFull)**
 for PR 13508 at commit 
[`d0d2e4c`](https://github.com/apache/spark/commit/d0d2e4c464aefe7d18d6ece736e0227c68d842bb).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60021/consoleFull)**
 for PR 13518 at commit 
[`4737361`](https://github.com/apache/spark/commit/4737361489fd680405b291ec498ab91374685ffe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-05 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13518
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13518: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-05 Thread lw-lin

GitHub user lw-lin reopened a pull request:

https://github.com/apache/spark/pull/13518

[SPARK-15472][SQL] Add support for writing in `csv`, `json`, `text` formats 
in Structured Streaming

## What changes were proposed in this pull request?

This patch adds support for writing in `csv`, `json`, `text` formats in 
Structured Streaming:

**1. at a high level, this patch forms the following hierarchy**(`text` as 
an example):
```

  â
 TextOutputWriterBase
 â  â
BatchTextOutputWriter   StreamingTextOutputWriter
```
```

â  â
BatchTextOutputWriterFactory   StreamingOutputWriterFactory
  â
  StreamingTextOutputWriterFactory
```
The `StreamingTextOutputWriter` and other 'streaming' output writers would 
write data **without** using an `OutputCommitter`. This was the same approach 
taken by [SPARK-14716](https://github.com/apache/spark/pull/12409).

**2. to support compression, this patch attaches an extension to the path 
assigned by `FileStreamSink`**, which is slightly different from 
[SPARK-14716](https://github.com/apache/spark/pull/12409). For example, if we 
are writing out using the `gzip` compression and `FileStreamSink` assigns path 
`${uuid}` to a text writer, then in the end the file written out will be 
`${uuid}.txt.gz` -- so that when we read the file back, we'll correctly 
interpret it as `gzip` compressed.

## How was this patch tested?

`FileStreamSinkSuite` is expanded much more to cover the added `csv`, 
`json`, `text` formats:

```scala
test(" csv - unpartitioned data - codecs: none/gzip")
test("json - unpartitioned data - codecs: none/gzip")
test("text - unpartitioned data - codecs: none/gzip")

test(" csv - partitioned data - codecs: none/gzip")
test("json - partitioned data - codecs: none/gzip")
test("text - partitioned data - codecs: none/gzip")

test(" csv - unpartitioned writing and batch reading - codecs: none/gzip")
test("json - unpartitioned writing and batch reading - codecs: none/gzip")
test("text - unpartitioned writing and batch reading - codecs: none/gzip")

test(" csv - partitioned writing and batch reading - codecs: none/gzip")
test("json - partitioned writing and batch reading - codecs: none/gzip")
test("text - partitioned writing and batch reading - codecs: none/gzip")
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark add-csv-json-text-for-ss

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13518


commit 97034f9aeb092b10e1606e60a8e6b4878ebd54cf
Author: Liwei Lin 
Date:   2016-06-05T09:03:04Z

Add csv, json, text

commit 2035b597b44aa519d8da3b155036446f88b3050e
Author: Liwei Lin 
Date:   2016-06-05T09:03:15Z

Fix parquet extension

commit 4737361489fd680405b291ec498ab91374685ffe
Author: Liwei Lin 
Date:   2016-06-05T11:52:14Z

Fix style




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13461: [SPARK-15721][ML] Make DefaultParamsReadable, DefaultPar...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13461
  
**[Test build #3067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3067/consoleFull)**
 for PR 13461 at commit 
[`1b1bc93`](https://github.com/apache/spark/commit/1b1bc93f0d606d3a517a49b397957d99c35c4b99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
Locally, run-tests.sh run successfully, but it fails on jenkins ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12836
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12836
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60020/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12836
  
**[Test build #60020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60020/consoleFull)**
 for PR 12836 at commit 
[`e4fa8e6`](https://github.com/apache/spark/commit/e4fa8e66896be19430ae4cfabef2669b5ecc4dd7).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13446: [SPARK-15704] [SQL] add a test case in DatasetAggregator...

2016-06-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13446
  
Sorry I was interrupted by something and forgot about it...
thanks, merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveTh...

2016-06-05 Thread xiaowangyu

Github user xiaowangyu closed the pull request at:

https://github.com/apache/spark/pull/9113


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...

2016-06-05 Thread xiaowangyu

Github user xiaowangyu commented on the issue:

https://github.com/apache/spark/pull/9113
  
Thanks! I close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13516: [MLLIB][DOC] Edit logistic regression docs to pro...

2016-06-05 Thread goodsoldiersvejk

Github user goodsoldiersvejk closed the pull request at:

https://github.com/apache/spark/pull/13516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13519: [SPARK-15771] [ML] [Examples] Use 'accuracy' rather than...

2016-06-05 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/13519
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13515: [MINOR] Fix Typos 'an -> a'

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13515
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13515: [MINOR] Fix Typos 'an -> a'

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13515
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60016/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13515: [MINOR] Fix Typos 'an -> a'

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13515
  
**[Test build #60016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60016/consoleFull)**
 for PR 13515 at commit 
[`6de11a6`](https://github.com/apache/spark/commit/6de11a63e1f2a42ffaef9c4e24f1f448087f5b8f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r65835938
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -180,6 +180,9 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 :param path: string represents path to the JSON dataset,
  or RDD of Strings storing JSON objects.
 :param schema: an optional :class:`StructType` for the input 
schema.
+:param samplingRatio: sets the ratio for sampling and reading the 
input data to infer
--- End diff --

Ah, I see. It does not affect the actual I/O but just drops some and then 
try to infer the schema. 
I will remove the change.

BTW, actually, I have found another one 
[`mergeSchema`](https://github.com/apache/spark/blob/431542765785304edb76a19885fbc5f9b8ae7d64/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L148-L152)
 option in Parquet data source, which I guess should be located in 
`ParquetOptions`. Can this be done here together maybe..?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13461: [SPARK-15721][ML] Make DefaultParamsReadable, DefaultPar...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13461
  
**[Test build #3067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3067/consoleFull)**
 for PR 13461 at commit 
[`1b1bc93`](https://github.com/apache/spark/commit/1b1bc93f0d606d3a517a49b397957d99c35c4b99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13285: [Spark-15129][R][DOC]R API changes in ML

2016-06-05 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/13285
  
@GayathriMurali I think what is there for ```include_example``` is OK. 
Please see my other inline comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13285: [Spark-15129][R][DOC]R API changes in ML

2016-06-05 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13285#discussion_r65835075
  
--- Diff: docs/sparkr.md ---
@@ -285,71 +285,28 @@ head(teenagers)
 
 # Machine Learning
 
-SparkR allows the fitting of generalized linear models over DataFrames 
using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib 
to train a model of the specified family. Currently the gaussian and binomial 
families are supported. We support a subset of the available R formula 
operators for model fitting, including '~', '.', ':', '+', and '-'.
+SparkR supports the following Machine Learning algorithms.
 
-The [summary()](api/R/summary.html) function gives the summary of a model 
produced by [glm()](api/R/glm.html).
+* Generalized Linear Regression Model [spark.glm()](api/R/glm.html)
+* Naive Bayes [spark.naiveBayes()](api/R/naiveBayes.html)
+* KMeans [spark.kmeans()](api/R/kmeans.html)
+* AFT Survival Regression [spark.survreg()](api/R/survreg.html)
 
-* For gaussian GLM model, it returns a list with 'devianceResiduals' and 
'coefficients' components. The 'devianceResiduals' gives the min/max deviance 
residuals of the estimation; the 'coefficients' gives the estimated 
coefficients and their estimated standard errors, t values and p-values. (It 
only available when model fitted by normal solver.)
-* For binomial GLM model, it returns a list with 'coefficients' component 
which gives the estimated coefficients.
+Generalized Linear Regression can be used to train a model from a 
specified family. Currently the Gaussian, Binomial, Poisson and Gamma families 
are supported. We support a subset of the available R formula operators for 
model fitting, including '~', '.', ':', '+', and '-'.
 
-The examples below show the use of building gaussian GLM model and 
binomial GLM model using SparkR.
+The [summary()](api/R/summary.html) function gives the summary of a model 
produced by different algorithms listed above.
+This summary is same as the result of summary() function in R.
 
-## Gaussian GLM model
+## Model persistence
 
-
-{% highlight r %}
-# Create the DataFrame
-df <- createDataFrame(sqlContext, iris)
-
-# Fit a gaussian GLM model over the dataset.
-model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = 
"gaussian")
-
-# Model summary are returned in a similar format to R's native glm().
-summary(model)
-##$devianceResiduals
-## Min   Max 
-## -1.307112 1.412532
-##
-##$coefficients
-##   Estimate  Std. Error t value  Pr(>|t|)
-##(Intercept)2.251393  0.3697543  6.08889  9.568102e-09
-##Sepal_Width0.8035609 0.106339   7.556598 4.187317e-12
-##Species_versicolor 1.458743  0.1121079  13.01195 0   
-##Species_virginica  1.946817  0.100015   19.46525 0   
-
-# Make predictions based on the model.
-predictions <- predict(model, newData = df)
-head(select(predictions, "Sepal_Length", "prediction"))
-##  Sepal_Length prediction
-##1  5.1   5.063856
-##2  4.9   4.662076
-##3  4.7   4.822788
-##4  4.6   4.742432
-##5  5.0   5.144212
-##6  5.4   5.385281
-{% endhighlight %}
-
+* write.ml allows users to save a fitted model in a given input path
+* read.ml allows users to read/load the model which was saved using 
write.ml in a given path
 
-## Binomial GLM model
+Model persistence is supported for all Machine Learning algorithms for all 
families.
 
-
-{% highlight r %}
-# Create the DataFrame
-df <- createDataFrame(sqlContext, iris)
-training <- filter(df, df$Species != "setosa")
-
-# Fit a binomial GLM model over the dataset.
-model <- glm(Species ~ Sepal_Length + Sepal_Width, data = training, family 
= "binomial")
-
-# Model coefficients are returned in a similar format to R's native glm().
-summary(model)
-##$coefficients
-##   Estimate
-##(Intercept)  -13.046005
-##Sepal_Length   1.902373
-##Sepal_Width0.404655
-{% endhighlight %}
-
+The examples below show the use of building Gaussian GLM, NaiveBayes, 
kMeans and AFTSurvivalReg models using SparkR
--- End diff --

Further more, you should make these names consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,

[GitHub] spark pull request #13285: [Spark-15129][R][DOC]R API changes in ML

2016-06-05 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13285#discussion_r65835028
  
--- Diff: docs/sparkr.md ---
@@ -285,71 +285,28 @@ head(teenagers)
 
 # Machine Learning
 
-SparkR allows the fitting of generalized linear models over DataFrames 
using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib 
to train a model of the specified family. Currently the gaussian and binomial 
families are supported. We support a subset of the available R formula 
operators for model fitting, including '~', '.', ':', '+', and '-'.
+SparkR supports the following Machine Learning algorithms.
 
-The [summary()](api/R/summary.html) function gives the summary of a model 
produced by [glm()](api/R/glm.html).
+* Generalized Linear Regression Model [spark.glm()](api/R/glm.html)
+* Naive Bayes [spark.naiveBayes()](api/R/naiveBayes.html)
+* KMeans [spark.kmeans()](api/R/kmeans.html)
+* AFT Survival Regression [spark.survreg()](api/R/survreg.html)
 
-* For gaussian GLM model, it returns a list with 'devianceResiduals' and 
'coefficients' components. The 'devianceResiduals' gives the min/max deviance 
residuals of the estimation; the 'coefficients' gives the estimated 
coefficients and their estimated standard errors, t values and p-values. (It 
only available when model fitted by normal solver.)
-* For binomial GLM model, it returns a list with 'coefficients' component 
which gives the estimated coefficients.
+Generalized Linear Regression can be used to train a model from a 
specified family. Currently the Gaussian, Binomial, Poisson and Gamma families 
are supported. We support a subset of the available R formula operators for 
model fitting, including '~', '.', ':', '+', and '-'.
 
-The examples below show the use of building gaussian GLM model and 
binomial GLM model using SparkR.
+The [summary()](api/R/summary.html) function gives the summary of a model 
produced by different algorithms listed above.
+This summary is same as the result of summary() function in R.
 
-## Gaussian GLM model
+## Model persistence
 
-
-{% highlight r %}
-# Create the DataFrame
-df <- createDataFrame(sqlContext, iris)
-
-# Fit a gaussian GLM model over the dataset.
-model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = 
"gaussian")
-
-# Model summary are returned in a similar format to R's native glm().
-summary(model)
-##$devianceResiduals
-## Min   Max 
-## -1.307112 1.412532
-##
-##$coefficients
-##   Estimate  Std. Error t value  Pr(>|t|)
-##(Intercept)2.251393  0.3697543  6.08889  9.568102e-09
-##Sepal_Width0.8035609 0.106339   7.556598 4.187317e-12
-##Species_versicolor 1.458743  0.1121079  13.01195 0   
-##Species_virginica  1.946817  0.100015   19.46525 0   
-
-# Make predictions based on the model.
-predictions <- predict(model, newData = df)
-head(select(predictions, "Sepal_Length", "prediction"))
-##  Sepal_Length prediction
-##1  5.1   5.063856
-##2  4.9   4.662076
-##3  4.7   4.822788
-##4  4.6   4.742432
-##5  5.0   5.144212
-##6  5.4   5.385281
-{% endhighlight %}
-
+* write.ml allows users to save a fitted model in a given input path
+* read.ml allows users to read/load the model which was saved using 
write.ml in a given path
 
-## Binomial GLM model
+Model persistence is supported for all Machine Learning algorithms for all 
families.
 
-
-{% highlight r %}
-# Create the DataFrame
-df <- createDataFrame(sqlContext, iris)
-training <- filter(df, df$Species != "setosa")
-
-# Fit a binomial GLM model over the dataset.
-model <- glm(Species ~ Sepal_Length + Sepal_Width, data = training, family 
= "binomial")
-
-# Model coefficients are returned in a similar format to R's native glm().
-summary(model)
-##$coefficients
-##   Estimate
-##(Intercept)  -13.046005
-##Sepal_Length   1.902373
-##Sepal_Width0.404655
-{% endhighlight %}
-
+The examples below show the use of building Gaussian GLM, NaiveBayes, 
kMeans and AFTSurvivalReg models using SparkR
--- End diff --

The example include glm with gaussian family, glm with binomial family.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For add

[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-05 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r65834648
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -180,6 +180,9 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 :param path: string represents path to the JSON dataset,
  or RDD of Strings storing JSON objects.
 :param schema: an optional :class:`StructType` for the input 
schema.
+:param samplingRatio: sets the ratio for sampling and reading the 
input data to infer
--- End diff --

it was actually intentional that samplingRatio was undocumented, because 
regardless the value, Spark still needs to read all the data so this might as 
well be 1 all the time.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/9113
  
@viper-kun no - as I said, "I don't think anybody has thought a lot about 
it yet."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13446: [SPARK-15704] [SQL] add a test case in DatasetAggregator...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13446
  
@cloud-fan next time please leave a message on the pr saying it was merged 
and the branches it was merged in.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12836
  
**[Test build #60020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60020/consoleFull)**
 for PR 12836 at commit 
[`e4fa8e6`](https://github.com/apache/spark/commit/e4fa8e66896be19430ae4cfabef2669b5ecc4dd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13446: [SPARK-15704] [SQL] add a test case in DatasetAgg...

2016-06-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13446


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
@shivaram, I didn't change the code, but merged with master, because prior 
to this the build was failing because some pyspark tests didn't pass.

After my today's merge, when I run gapply test cases from R studio 
everything passes but if I run using ./run-tests.sh - command line, it fails on 
arrange ... 

I'm changing the test cases, so that I call order after collecting the 
dataframe ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13373
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13373
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60018/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13373
  
**[Test build #60018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60018/consoleFull)**
 for PR 13373 at commit 
[`8b9b07d`](https://github.com/apache/spark/commit/8b9b07d8ced030563c2485fa3ac271cb69aa4ed0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13505
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13505
  
**[Test build #60015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60015/consoleFull)**
 for PR 13505 at commit 
[`5504b6c`](https://github.com/apache/spark/commit/5504b6c2dd3ac7959b2cb7e139a54208368a9a45).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13505
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60015/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13505
  
**[Test build #60019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60019/consoleFull)**
 for PR 13505 at commit 
[`5504b6c`](https://github.com/apache/spark/commit/5504b6c2dd3ac7959b2cb7e139a54208368a9a45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13505
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13373
  
**[Test build #60018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60018/consoleFull)**
 for PR 13373 at commit 
[`8b9b07d`](https://github.com/apache/spark/commit/8b9b07d8ced030563c2485fa3ac271cb69aa4ed0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13147: [SPARK-6320][SQL] Move planLater method into GenericStra...

2016-06-05 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/13147
  
@marmbrus Do you have any other thoughts on this?
If so, let me know them and why don't we merge the minimal version same as 
for `branch-2.0` into `master` for now?
I think the API difference between `master` and `branch-2.0` for a long 
time is not desirable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13373
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13373
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60017/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13373
  
**[Test build #60017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60017/consoleFull)**
 for PR 13373 at commit 
[`dd6bdf0`](https://github.com/apache/spark/commit/dd6bdf05b1156b6e1471ceadc817c3f8a54270b2).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class PushFilterIntoRelation(conf: SQLConf) extends 
Rule[LogicalPlan] with PredicateHelper `
  * `case class PushProjectIntoRelation(conf: SQLConf) extends 
Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13373
  
**[Test build #60017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60017/consoleFull)**
 for PR 13373 at commit 
[`dd6bdf0`](https://github.com/apache/spark/commit/dd6bdf05b1156b6e1471ceadc817c3f8a54270b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13515: [MINOR] Fix Typos 'an -> a'

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13515
  
**[Test build #60016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60016/consoleFull)**
 for PR 13515 at commit 
[`6de11a6`](https://github.com/apache/spark/commit/6de11a63e1f2a42ffaef9c4e24f1f448087f5b8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13515: [MINOR] Fix Typos 'an -> a'

2016-06-05 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13515#discussion_r6583
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
 ---
@@ -37,7 +37,7 @@ import org.apache.spark.sql.hive.test.{TestHive, 
TestHiveQueryExecution}
  * Allows the creations of tests that execute the same query against both 
hive
  * and catalyst, comparing the results.
  *
- * The "golden" results from Hive are cached in an retrieved both from the 
classpath and
+ * The "golden" results from Hive are cached in a retrieved both from the 
classpath and
--- End diff --

Thanks, I will fix this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...

2016-06-05 Thread viper-kun

Github user viper-kun commented on the issue:

https://github.com/apache/spark/pull/9113
  
@rxin  Is there any design about replacement?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #9162: [SPARK-10655][SQL] Adding additional data type map...

2016-06-05 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/9162#discussion_r65828274
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
 ---
@@ -47,19 +49,20 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
 conn.prepareStatement("INSERT INTO tbl VALUES 
(17,'dave')").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE numbers ( small SMALLINT, med 
INTEGER, big BIGINT, "
-  + "deci DECIMAL(31,20), flt FLOAT, dbl DOUBLE)").executeUpdate()
+  + "deci DECIMAL(31,20), flt FLOAT, dbl DOUBLE, real REAL, decflt 
DECFLOAT)").executeUpdate()
--- End diff --

Thanks for reviewing @gatorsmile . Added test cases for those two 
variations of the DECFLOAT types.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/12836
  
The error was
```
1. Error: gapply() on a DataFrame 
--
java.lang.OutOfMemoryJava heap space
```

@NarineK Do you think there was any code change that could have caused this 
or is this just flakiness in Jenkins ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13491: [SPARK-15748][SQL] Replace inefficient foldLeft()...

2016-06-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13491


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13491: [SPARK-15748][SQL] Replace inefficient foldLeft() call w...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13491
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12836
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12836
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60013/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12836
  
**[Test build #60013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60013/consoleFull)**
 for PR 12836 at commit 
[`249568e`](https://github.com/apache/spark/commit/249568e2d244b3b81d53dfce797f8c021602749f).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13481: [SPARK-15738][PYSPARK][ML] Adding Pyspark ml RFormula __...

2016-06-05 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/13481
  
That looks pretty good to me too, thanks @MLnick! I'll put that in soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13401: [SPARK-15657][SQL] RowEncoder should validate the...

2016-06-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13401


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13401: [SPARK-15657][SQL] RowEncoder should validate the data t...

2016-06-05 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13401
  
LGTM, merging to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13505
  
**[Test build #60015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60015/consoleFull)**
 for PR 13505 at commit 
[`5504b6c`](https://github.com/apache/spark/commit/5504b6c2dd3ac7959b2cb7e139a54208368a9a45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13488: [MINOR][R][DOC] Fix R documentation generation instructi...

2016-06-05 Thread vectorijk

Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13488
  
Thanks

On Sun, Jun 5, 2016, 13:05 asfgit  wrote:

> Closed #13488  via 8a91105
> 

> .
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute the
> thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13505
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13505
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60011/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13505
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13505: [SPARK-15764][SQL] Replace N^2 loop in BindReferences

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13505
  
**[Test build #60011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60011/consoleFull)**
 for PR 13505 at commit 
[`5504b6c`](https://github.com/apache/spark/commit/5504b6c2dd3ac7959b2cb7e139a54208368a9a45).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60014/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13520
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-05 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13512
  
@cloud-fan i am running into some trouble updating my branch to the latest 
master. i get errors in tests due to Analyzer.validateTopLevelTupleFields

the issue seems to be that in KeyValueGroupedDataset[K, T] the Aggregators 
are supposed to operate on T, but the logicalPlan at this point already has K 
appended to T because AppendColumns(func, inputPlan) is applied to the plan 
before its passed into KeyValueGroupedDataset. so validateTopLevelTupleFields 
also sees the column for the key in the inputs and believes the deserializer 
for T is missing a field.

any suggestions on how to get around this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13520
  
**[Test build #60014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60014/consoleFull)**
 for PR 13520 at commit 
[`0a5d82f`](https://github.com/apache/spark/commit/0a5d82fc8c1b3e0910231060090181e143e5215a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-05 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13512
  
@cloud-fan from the (added) unit tests:
```
val df2 = Seq("a" -> 1, "a" -> 3, "b" -> 3).toDF("i", "j")
checkAnswer(df2.groupBy("i").agg(ComplexResultAgg.toColumn),
  Row("a", Row(2, 4)) :: Row("b", Row(1, 3)) :: Nil)
```
this shows how the underlying type is Row (with a schema consisting of 
Strings and Ints), and it gets converted to the input type of the Aggregator 
which is (String, Long), so this involves both conversion and upcast.

and:
```
val df3 = Seq(("a", "x", 1), ("a", "y", 3), ("b", "x", 3)).toDF("i", "j", 
"k")
checkAnswer(df3.groupBy("i").agg(ComplexResultAgg("i", "k")),
  Row("a", Row(2, 4)) :: Row("b", Row(1, 3)) :: Nil)
```
this is similar to the previous example but i also select the columns i 
want the Aggregator to operate on (namely columns "i" and "k")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13520
  
**[Test build #60014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60014/consoleFull)**
 for PR 13520 at commit 
[`0a5d82f`](https://github.com/apache/spark/commit/0a5d82fc8c1b3e0910231060090181e143e5215a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local...

2016-06-05 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13520

[SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples 
if possible

## What changes were proposed in this pull request?

Instead of using local variable `sc` like the following example, this PR 
uses `spark.sparkContext`. This makes examples more concise, and also fixes 
some misleading, i.e., creating SparkContext from SparkSession.
```
-println("Creating SparkContext")
-val sc = spark.sparkContext
-
 println("Writing local file to DFS")
 val dfsFilename = dfsDirPath + "/dfs_read_write_test"
-val fileRDD = sc.parallelize(fileContents)
+val fileRDD = spark.sparkContext.parallelize(fileContents)
```

This will change 12 files (+30 lines, -52 lines).

## How was this patch tested?

Manual.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15773

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13520


commit 0a5d82fc8c1b3e0910231060090181e143e5215a
Author: Dongjoon Hyun 
Date:   2016-06-05T21:42:42Z

[SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples 
if possible




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825524
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -529,7 +529,28 @@ object SQLConf {
   .internal()
   .doc("How long in milliseconds a file is guaranteed to be visible 
for all readers.")
   .timeConf(TimeUnit.MILLISECONDS)
-  .createWithDefault(60 * 1000L) // 10 minutes
+  .createWithDefault(60 * 10 * 1000L) // 10 minutes
+
+  val FILE_SOURCE_LOG_DELETION = 
SQLConfigBuilder("spark.sql.streaming.fileSource.log.deletion")
+.internal()
+.doc("Whether to delete the expired log files in file stream source.")
+.booleanConf
+.createWithDefault(true)
+
+  val FILE_SOURCE_LOG_COMPACT_INTERVAL =
+SQLConfigBuilder("spark.sql.streaming.fileSource.log.compactInterval")
+  .internal()
+  .doc("Number of log files after which all the previous files " +
+"are compacted into the next log file.")
+  .intConf
+  .createWithDefault(10)
+
+  val FILE_SOURCE_LOG_CLEANUP_DELAY =
+SQLConfigBuilder("spark.sql.streaming.fileSource.log.cleanupDelay")
+  .internal()
+  .doc("How long in milliseconds a file is guaranteed to be visible 
for all readers.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createWithDefault(60 * 10 * 1000L) // 10 minutes
--- End diff --

A nitpick but think it'd be easier to "decode" - `10 * 60 * 1000L`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -129,3 +131,86 @@ class FileStreamSource(
 
   override def toString: String = s"FileStreamSource[$qualifiedBasePath]"
 }
+
+class FileStreamSourceLog(sparkSession: SparkSession, path: String)
+  extends HDFSMetadataLog[Seq[String]](sparkSession, path) {
+
+  // Configurations about metadata compaction
+  private val compactInterval = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL)
+  require(compactInterval > 0,
+s"Please set ${SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key} (was 
$compactInterval) to a " +
+  s"positive value.")
+
+  private val fileCleanupDelayMs = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_CLEANUP_DELAY)
+
+  private val isDeletingExpiredLog = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_DELETION)
+
+  private var compactBatchId: Long = -1L
+
+  private def isCompactionBatch(batchId: Long, compactInterval: Long): 
Boolean = {
+batchId % compactInterval == 0
+  }
+
+  override def add(batchId: Long, metadata: Seq[String]): Boolean = {
+if (isCompactionBatch(batchId, compactInterval)) {
+  compactMetadataLog(batchId - 1)
+}
+
+super.add(batchId, metadata)
+  }
+
+  private def compactMetadataLog(batchId: Long): Unit = {
+// read out compact metadata and merge with new metadata.
+val batches = super.get(Some(compactBatchId), Some(batchId))
+val totalMetadata = batches.flatMap(_._2)
+if (totalMetadata.isEmpty) {
+  return
+}
+
+// Remove old compact metadata file and rewrite.
+val renamedPath = new Path(path, 
s".${batchId.toString}-${UUID.randomUUID.toString}.tmp")
+fileManager.rename(batchIdToPath(batchId), renamedPath)
+
+var isSuccess = false
+try {
+  isSuccess = super.add(batchId, totalMetadata)
+} catch {
+  case NonFatal(e) => isSuccess = false
--- End diff --

Why are you setting `isSuccess` to `false` since it's `false` already?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -129,3 +131,86 @@ class FileStreamSource(
 
   override def toString: String = s"FileStreamSource[$qualifiedBasePath]"
 }
+
+class FileStreamSourceLog(sparkSession: SparkSession, path: String)
+  extends HDFSMetadataLog[Seq[String]](sparkSession, path) {
+
+  // Configurations about metadata compaction
+  private val compactInterval = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL)
+  require(compactInterval > 0,
+s"Please set ${SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key} (was 
$compactInterval) to a " +
+  s"positive value.")
+
+  private val fileCleanupDelayMs = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_CLEANUP_DELAY)
+
+  private val isDeletingExpiredLog = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_DELETION)
+
+  private var compactBatchId: Long = -1L
+
+  private def isCompactionBatch(batchId: Long, compactInterval: Long): 
Boolean = {
+batchId % compactInterval == 0
+  }
+
+  override def add(batchId: Long, metadata: Seq[String]): Boolean = {
+if (isCompactionBatch(batchId, compactInterval)) {
+  compactMetadataLog(batchId - 1)
+}
+
+super.add(batchId, metadata)
+  }
+
+  private def compactMetadataLog(batchId: Long): Unit = {
+// read out compact metadata and merge with new metadata.
+val batches = super.get(Some(compactBatchId), Some(batchId))
+val totalMetadata = batches.flatMap(_._2)
+if (totalMetadata.isEmpty) {
+  return
+}
+
+// Remove old compact metadata file and rewrite.
+val renamedPath = new Path(path, 
s".${batchId.toString}-${UUID.randomUUID.toString}.tmp")
+fileManager.rename(batchIdToPath(batchId), renamedPath)
+
+var isSuccess = false
+try {
+  isSuccess = super.add(batchId, totalMetadata)
+} catch {
+  case NonFatal(e) => isSuccess = false
+} finally {
+  if (!isSuccess) {
+// Rollback to the previous status if compaction is failed.
--- End diff --

s/status/state ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825440
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -129,3 +131,86 @@ class FileStreamSource(
 
   override def toString: String = s"FileStreamSource[$qualifiedBasePath]"
 }
+
+class FileStreamSourceLog(sparkSession: SparkSession, path: String)
+  extends HDFSMetadataLog[Seq[String]](sparkSession, path) {
+
+  // Configurations about metadata compaction
+  private val compactInterval = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL)
+  require(compactInterval > 0,
+s"Please set ${SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key} (was 
$compactInterval) to a " +
--- End diff --

I'd move `(was $compactInterval)` at the end of the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/12836
  
Yeah I think we can still make this to 2.0 -- Are there any other comments 
@sun-rui ? 
Also pinging @davies / @rxin again for a SQL reviewer to take a look at this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...

2016-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/9113
  
We have currently inlined the Hive thrift server into the code base, but 
the long-term replacement is to be determined. I don't think anybody has 
thought a lot about it yet.

Do you mind closing this pull request for now? Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12836
  
**[Test build #60013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60013/consoleFull)**
 for PR 12836 at commit 
[`249568e`](https://github.com/apache/spark/commit/249568e2d244b3b81d53dfce797f8c021602749f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13444: [SPARK-15530][SQL] Set #parallelism for file list...

2016-06-05 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13444#discussion_r65823862
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -409,13 +409,24 @@ private[sql] object HadoopFsRelation extends Logging {
   def listLeafFilesInParallel(
   paths: Seq[Path],
   hadoopConf: Configuration,
-  sparkContext: SparkContext): mutable.LinkedHashSet[FileStatus] = {
+  sparkSession: SparkSession): mutable.LinkedHashSet[FileStatus] = {
+assert(paths.size >= 
sparkSession.sessionState.conf.parallelPartitionDiscoveryThreshold)
 logInfo(s"Listing leaf files and directories in parallel under: 
${paths.mkString(", ")}")
 
+val sparkContext = sparkSession.sparkContext
+val sqlConf = sparkSession.sessionState.conf
 val serializableConfiguration = new 
SerializableConfiguration(hadoopConf)
 val serializedPaths = paths.map(_.toString)
 
-val fakeStatuses = 
sparkContext.parallelize(serializedPaths).mapPartitions { paths =>
+// Set the number of parallelism to prevent following file listing 
from generating many tasks
+// in case of large #defaultParallelism.
+val numParallelism = Math.min(
+  paths.size / Math.max(sqlConf.parallelPartitionDiscoveryThreshold, 
1) + 1,
+  sparkContext.defaultParallelism)
--- End diff --

I am not sure this `Math.min` can help if we have a small cluster (say, 
defaultParallelism is 4). I think in general, we need to create more tasks than 
`defaultParallelism` to help load balancing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13444: [SPARK-15530][SQL] Set #parallelism for file list...

2016-06-05 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13444#discussion_r65823818
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala
 ---
@@ -75,7 +75,7 @@ class ListingFileCatalog(
 
   protected def listLeafFiles(paths: Seq[Path]): 
mutable.LinkedHashSet[FileStatus] = {
 if (paths.length >= 
sparkSession.sessionState.conf.parallelPartitionDiscoveryThreshold) {
--- End diff --

oh, this flag is used at here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13295: [SPARK-15294][SPARKR][MINOR] Add pivot functionality to ...

2016-06-05 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/13295
  
@mhnatiuk It looks like the tests are failing with the error message I've 
pasted below. From reading the code I think this is related to the `if` check 
we have in the pivot implementation where it should be `length(values) != 
length(unique(values))` instead of `==` as we have right now ?

```
1. Error: pivot GroupedData column 
-
error in evaluating the argument 'x' in selecting a method for function 
'collect': 
  error in evaluating the argument 'x' in selecting a method for function 
'summarize': 
  Values in list are not unique
Calls: pivot -> pivot
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13488: [MINOR][R][DOC] Fix R documentation generation in...

2016-06-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13488


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13488: [MINOR][R][DOC] Fix R documentation generation instructi...

2016-06-05 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/13488
  
Merging this to master and branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13508: [SPARK-15766][SparkR]:R should export is.nan

2016-06-05 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/13508
  
don't we need to for `isnan` 
https://github.com/apache/spark/blob/d642b273544bb77ef7f584326aa2d214649ac61b/R/pkg/R/functions.R#L651
this is different from `isNaN`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 188 matches

Mail list logo