date:20161031

[GitHub] spark pull request #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use...

2016-10-31 Thread mgummelt

Github user mgummelt commented on a diff in the pull request:

https://github.com/apache/spark/pull/15654#discussion_r85799941
  
--- Diff: 
mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala 
---
@@ -51,7 +52,7 @@ private[mesos] class MesosClusterDispatcher(
   extends Logging {
 
   private val publicAddress = 
Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(args.host)
-  private val recoveryMode = conf.get("spark.deploy.recoveryMode", 
"NONE").toUpperCase()
+  private val recoveryMode = 
conf.get(RECOVERY_MODE).getOrElse("NONE").toUpperCase()
--- End diff --

Shouldn't the "NONE" default be added to the config builder?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread mgummelt

Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/15654
  
Thanks!  One small fix then LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15628
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67819/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15628
  
**[Test build #67819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67819/consoleFull)**
 for PR 15628 at commit 
[`b5277c9`](https://github.com/apache/spark/commit/b5277c9bffef72b207f3d79f11b7bb01661de9e1).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15628
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15659
  
**[Test build #67814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67814/consoleFull)**
 for PR 15659 at commit 
[`e668af6`](https://github.com/apache/spark/commit/e668af63e9ee26a7d54f3a8092f32498ab287d67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15693
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67811/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15693
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15693
  
**[Test build #67811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67811/consoleFull)**
 for PR 15693 at commit 
[`0b660e0`](https://github.com/apache/spark/commit/0b660e02480bb3d193daf4acc997c1c0ca040930).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...

2016-10-31 Thread mgummelt

Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/13713
  
We need to get @srowen or one of the other committers to merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15651
  
lgtm pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/1
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15626
  
**[Test build #67822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67822/consoleFull)**
 for PR 15626 at commit 
[`8a2028b`](https://github.com/apache/spark/commit/8a2028b34f5b9830a37161249ebcb306c65d49e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15654
  
**[Test build #67823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67823/consoleFull)**
 for PR 15654 at commit 
[`bb74f52`](https://github.com/apache/spark/commit/bb74f521cc47bcf4ae099665b5c0aff2531155d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15651
  
**[Test build #67816 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67816/consoleFull)**
 for PR 15651 at commit 
[`ffe4318`](https://github.com/apache/spark/commit/ffe43185f06f8b1aeffbf0c88fbc587aa8894bde).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15627
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67826/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15627
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15627
  
**[Test build #67826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67826/consoleFull)**
 for PR 15627 at commit 
[`f797481`](https://github.com/apache/spark/commit/f7974812a5cc76cf98bba1c70e739bbc770d7dde).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67816/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15651
  
**[Test build #67817 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67817/consoleFull)**
 for PR 15651 at commit 
[`e0e38bf`](https://github.com/apache/spark/commit/e0e38bfc64760918295a368c56a8ffda40a889e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67817/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15651
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15698: [SPARK-18182] Expose ReplayListenerBus.read() ove...

2016-10-31 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/15698

[SPARK-18182] Expose ReplayListenerBus.read() overload which takes string 
iterator

The `ReplayListenerBus.read()` method is used when implementing a custom 
`ApplicationHistoryProvider`. The current interface only exposes a `read()` 
method which takes an `InputStream` and performs stream-to-lines conversion 
itself, but it would also be useful to expose an overloaded method which 
accepts an iterator of strings, thereby enabling events to be provided from 
non-`InputStream` sources.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark replay-listener-bus-interface

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15698


commit b777ee5bb25f38086bfe2126be26de8f1e14a14d
Author: Josh Rosen 
Date:   2016-10-31T19:39:43Z

Expose ReplayListenerBus.read() overload which accepts an iterator of lines.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15651
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15698
  
**[Test build #67828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67828/consoleFull)**
 for PR 15698 at commit 
[`b777ee5`](https://github.com/apache/spark/commit/b777ee5bb25f38086bfe2126be26de8f1e14a14d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-31 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/11105
  
ping @squito / @rxin if either of you have some post-Spark Summit EU 
bandwidth to review this it would be awesome :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15659
  
**[Test build #67814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67814/consoleFull)**
 for PR 15659 at commit 
[`e668af6`](https://github.com/apache/spark/commit/e668af63e9ee26a7d54f3a8092f32498ab287d67).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67827/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15697
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67814/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15697
  
**[Test build #67827 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67827/consoleFull)**
 for PR 15697 at commit 
[`a292ae8`](https://github.com/apache/spark/commit/a292ae8fc5bcb32e32c21d5e3ec7f093a4be13cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15659
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-31 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15538
  
Are we planning to incorporate the Parquet 1.9 libraries into Spark 2.1? If 
so, then this PR should be unnecessary.

Hopefully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread mgummelt

Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/15654
  
cc @srowen for merge into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...

2016-10-31 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/14803#discussion_r85819725
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -608,6 +614,81 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 
   // === other tests 
 
+  test("read new files in partitioned table without globbing, should read 
partition data") {
+withTempDirs { case (dir, tmp) =>
+  val partitionFooSubDir = new File(dir, "partition=foo")
+  val partitionBarSubDir = new File(dir, "partition=bar")
+
+  val schema = new StructType().add("value", 
StringType).add("partition", StringType)
+  val fileStream = createFileStream("json", 
s"${dir.getCanonicalPath}", Some(schema))
+  val filtered = fileStream.filter($"value" contains "keep")
+  testStream(filtered)(
+// Create new partition=foo sub dir and write to it
+AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", 
partitionFooSubDir, tmp),
+CheckAnswer(("keep2", "foo")),
+
+// Append to same partition=foo sub dir
+AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp),
+CheckAnswer(("keep2", "foo"), ("keep3", "foo")),
+
+// Create new partition sub dir and write to it
+AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp),
+CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar")),
+
+// Append to same partition=bar sub dir
+AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp),
+CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar"), 
("keep5", "bar"))
+  )
+}
+  }
+
+  test("when schema inference is turned on, should read partition data") {
+def createFile(content: String, src: File, tmp: File): Unit = {
+  val tempFile = Utils.tempFileWith(new File(tmp, "text"))
+  val finalFile = new File(src, tempFile.getName)
+  src.mkdirs()
+  require(stringToFile(tempFile, content).renameTo(finalFile))
+}
+
+withSQLConf(SQLConf.STREAMING_SCHEMA_INFERENCE.key -> "true") {
+  withTempDirs { case (dir, tmp) =>
+val partitionFooSubDir = new File(dir, "partition=foo")
+val partitionBarSubDir = new File(dir, "partition=bar")
+
+// Create file in partition, so we can infer the schema.
+createFile("{'value': 'drop0'}", partitionFooSubDir, tmp)
+
+val fileStream = createFileStream("json", 
s"${dir.getCanonicalPath}")
+val filtered = fileStream.filter($"value" contains "keep")
+testStream(filtered)(
+  // Append to same partition=foo sub dir
+  AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", 
partitionFooSubDir, tmp),
+  CheckAnswer(("keep2", "foo")),
+
+  // Append to same partition=foo sub dir
+  AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp),
+  CheckAnswer(("keep2", "foo"), ("keep3", "foo")),
+
+  // Create new partition sub dir and write to it
+  AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp),
+  CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", 
"bar")),
+
+  // Append to same partition=bar sub dir
+  AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp),
+  CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", 
"bar"), ("keep5", "bar")),
+
+  // Delete the two partition dirs
+  DeleteFile(partitionFooSubDir),
--- End diff --

@viirya why need to delete dirs in this test? It's flaky since the source 
maybe is listing files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15626
  
**[Test build #67822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67822/consoleFull)**
 for PR 15626 at commit 
[`8a2028b`](https://github.com/apache/spark/commit/8a2028b34f5b9830a37161249ebcb306c65d49e1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67822/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67818/consoleFull)**
 for PR 15696 at commit 
[`2a61351`](https://github.com/apache/spark/commit/2a613516dd469bca5ed4d7b0f17f678e9e70e267).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class TaskCommitMessage(obj: Any) extends Serializable`
  * `abstract class FileCommitProtocol `
  * `class MapReduceFileCommitterProtocol(committer: OutputCommitter) 
extends FileCommitProtocol `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67818/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-10-31 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/13881
  
Sorry for the long delay!  Whenever you get a chance to update this, it'd 
be nice to log this info via the Instrumentation class, rather than logInfo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15659
  
**[Test build #67824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67824/consoleFull)**
 for PR 15659 at commit 
[`3bf961e`](https://github.com/apache/spark/commit/3bf961efbffc9b03eba7053348ac6ef1634d0ade).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #67825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67825/consoleFull)**
 for PR 11105 at commit 
[`8c560ca`](https://github.com/apache/spark/commit/8c560ca6dd8c28f86630ae42bb50739a8614bec3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15654
  
**[Test build #67823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67823/consoleFull)**
 for PR 15654 at commit 
[`bb74f52`](https://github.com/apache/spark/commit/bb74f521cc47bcf4ae099665b5c0aff2531155d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...

2016-10-31 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15667#discussion_r85808139
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -257,7 +258,31 @@ case class InsertIntoHiveTable(
 table.catalogTable.identifier.table,
 partitionSpec)
 
+var doOverwrite = overwrite
--- End diff --

nit: `doHiveOverwrite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r85808919
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -418,21 +424,41 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 }
 
 if (DDLUtils.isDatasourceTable(withStatsProps)) {
-  val oldDef = client.getTable(db, withStatsProps.identifier.table)
-  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
-  // to retain the spark specific format if it is. Also add old data 
source properties to table
-  // properties, to retain the data source table format.
-  val oldDataSourceProps = 
oldDef.properties.filter(_._1.startsWith(SPARK_SQL_PREFIX))
+  val oldTableDef = client.getTable(db, 
withStatsProps.identifier.table)
+
+  // Always update the location property w.r.t. the new table location.
+  val locationProp = tableDefinition.storage.locationUri.map { 
location =>
+TABLE_LOCATION -> location
+  }
+  // Only update the `locationUri` field if the location is really 
changed, because this table
+  // may be not Hive-compatible and can not set the `locationUri` 
field. We should respect the
+  // old `locationUri` even it's None.
+  val oldLocation = getLocationFromRawTable(oldTableDef)
+  val locationUri = if (oldLocation == 
tableDefinition.storage.locationUri) {
--- End diff --

```Scala
  test("alter table - rename") {
val tabName = "tab1"
val newTabName = "tab2"
withTable(tabName, newTabName) {
  spark.range(10).write.saveAsTable(tabName)
  val catalog = spark.sessionState.catalog
  sql(s"ALTER TABLE $tabName RENAME TO $newTabName")
  sql(s"DESC FORMATTED $newTabName").show(100, false)
  assert(!catalog.tableExists(TableIdentifier(tabName)))
  assert(catalog.tableExists(TableIdentifier(newTabName)))
}
  }
```

You can try to run the above test case in `DDLSuite.scala` and 
`HiveDDLSuite.scala`. The locations are different. One is using the new table 
name; another is using the old one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15695
  
**[Test build #67815 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67815/consoleFull)**
 for PR 15695 at commit 
[`0d4461a`](https://github.com/apache/spark/commit/0d4461a9e444008a35cc04c607447dc3d4677b7f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15627
  
**[Test build #67826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67826/consoleFull)**
 for PR 15627 at commit 
[`f797481`](https://github.com/apache/spark/commit/f7974812a5cc76cf98bba1c70e739bbc770d7dde).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15697
  
**[Test build #67827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67827/consoleFull)**
 for PR 15697 at commit 
[`a292ae8`](https://github.com/apache/spark/commit/a292ae8fc5bcb32e32c21d5e3ec7f093a4be13cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing e...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15637
  
How about the complex types? `Array`, `Map` and `Struct`? It sounds like 
the test cases do not cover these test cases. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15651
  
**[Test build #67813 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67813/consoleFull)**
 for PR 15651 at commit 
[`5405a94`](https://github.com/apache/spark/commit/5405a949f3589b99d92dc5fa3f2fc264692910d1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15637: [SPARK-18000] [SQL] Aggregation function for comp...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15637#discussion_r85814522
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MapAggregate.scala
 ---
@@ -0,0 +1,332 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.nio.ByteBuffer
+
+import scala.collection.immutable.TreeMap
+import scala.collection.mutable
+
+import com.google.common.primitives.{Doubles, Ints, Longs}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions.{Expression, 
ExpressionDescription}
+import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
+import org.apache.spark.sql.types.{DataType, _}
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * The MapAggregate function for a column returns:
+ * 1. null if no non-null value exists.
+ * 2. (distinct non-null value, frequency) pairs of equi-width histogram 
when the number of
+ * distinct non-null values is less than or equal to the specified maximum 
number of bins.
+ * 3. an empty map otherwise.
+ *
+ * @param child child expression that can produce column value with 
`child.eval(inputRow)`
+ * @param numBinsExpression The maximum number of bins.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, numBins) - Returns 1. null if no non-null value exists.
+  2. (distinct non-null value, frequency) pairs of equi-width 
histogram when the number of
+  distinct non-null values is less than or equal to the specified 
maximum number of bins.
+  3. an empty map otherwise.
+""")
+case class MapAggregate(
+child: Expression,
+numBinsExpression: Expression,
+override val mutableAggBufferOffset: Int,
+override val inputAggBufferOffset: Int) extends 
TypedImperativeAggregate[MapDigest] {
+
+  def this(child: Expression, numBinsExpression: Expression) = {
+this(child, numBinsExpression, 0, 0)
+  }
+
+  // Mark as lazy so that numBinsExpression is not evaluated during tree 
transformation.
+  private lazy val numBins: Int = 
numBinsExpression.eval().asInstanceOf[Int]
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(NumericType, TimestampType, DateType, StringType), 
IntegerType)
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val defaultCheck = super.checkInputDataTypes()
+if (defaultCheck.isFailure) {
+  defaultCheck
+} else if (!numBinsExpression.foldable) {
+  TypeCheckFailure("The maximum number of bins provided must be a 
constant literal")
+} else if (numBins < 2) {
+  TypeCheckFailure(
+"The maximum number of bins provided must be a positive integer 
literal >= 2 " +
+  s"(current value = $numBins)")
+} else {
+  TypeCheckSuccess
+}
+  }
+
+  override def update(buffer: MapDigest, input: InternalRow): Unit = {
+if (buffer.isInvalid) {
+  return
+}
+val evaluated = child.eval(input)
+if (evaluated != null) {
+  buffer.update(child.dataType, evaluated, numBins)
+}
--- End diff --

A general comment about the impl. Here, I think we should avoid `return` if 
possible. For example, we can re-write it like
```Scala
if (!buffer.isInvalid) {
  val evaluated = child.eval(input)
  if (evaluated != null) {
buffer.update(child.dataType, evaluated, numBins)
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-10-31 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r85805218
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -722,6 +722,7 @@ private[spark] object JsonProtocol {
 val value = Utils.jsonOption(json \ "Value").map { v => 
accumValueFromJson(name, v) }
 val internal = (json \ "Internal").extractOpt[Boolean].getOrElse(false)
 val countFailedValues = (json \ "Count Failed 
Values").extractOpt[Boolean].getOrElse(false)
+val dataProperty = (json \ 
"DataProperty").extractOpt[Boolean].getOrElse(false)
--- End diff --

Done :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15695
  
**[Test build #67812 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67812/consoleFull)**
 for PR 15695 at commit 
[`e2d2bac`](https://github.com/apache/spark/commit/e2d2bac560a529a2d22d8b1f55874edbeb4da0f1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...

2016-10-31 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15692
  
You'll need to add the Param itself. (Search for ```Params.dummy()``` in 
that file to find examples.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15695
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15695
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67812/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15633: [SPARK-18087] [SQL] Optimize insert to not requir...

2016-10-31 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15633#discussion_r85806652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -179,24 +180,30 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
   "Cannot overwrite a path that is also being read from.")
   }
 
+  def refreshPartitionsCallback(updatedPartitions: 
Seq[TablePartitionSpec]): Unit = {
+if (l.catalogTable.isDefined &&
--- End diff --

imo that is a little harder to read, since you have two anonymous function 
declarations instead of one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15654
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67823/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15654
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15654
  
@mgummelt  done! ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...

2016-10-31 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15667#discussion_r85807729
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -257,7 +258,31 @@ case class InsertIntoHiveTable(
 table.catalogTable.identifier.table,
 partitionSpec)
 
+var doOverwrite = overwrite
+
 if (oldPart.isEmpty || !ifNotExists) {
+  // SPARK-18107: Insert overwrite runs much slower than 
hive-client.
+  // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
+  // version and we may not want to catch up new Hive version 
every time. We delete the
+  // Hive partition first and then load data file into the Hive 
partition.
+  if (oldPart.nonEmpty && overwrite) {
+oldPart.get.storage.locationUri.map { uri =>
+  val partitionPath = new Path(uri)
+  val fs = partitionPath.getFileSystem(hadoopConf)
+  if (fs.exists(partitionPath)) {
+val pathPermission = 
fs.getFileStatus(partitionPath).getPermission()
+if (!fs.delete(partitionPath, true)) {
+  throw new RuntimeException(
+"Cannot remove partition directory '" + 
partitionPath.toString)
+} else {
+  fs.mkdirs(partitionPath, pathPermission)
--- End diff --

Is the mkdir necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-31 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15538#discussion_r85809383
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -55,6 +56,21 @@ class ParquetFileFormat
   with DataSourceRegister
   with Logging
   with Serializable {
+  // Poor man's "static initializer". Scala doesn't have language support 
for static initializers,
+  // and it's important that we initialize 
`ParquetFileFormat.redirectParquetLogsViaSLF4J` before
+  // doing anything with the Parquet libraries. Rather than expect clients 
to initialize the
+  // `ParquetFileFormat` singleton object at the right time, we put that 
initialization in the
+  // constructor of this class. This method is idempotent, and essentially 
a no-op after its first
+  // call.
+  ParquetFileFormat.ensureParquetLogRedirection
+
+  // Java serialization will not call the default constructor. Make sure 
we call
+  // ParquetFileFormat.ensureParquetLogRedirection in deserialization by 
implementing this hook
+  // method.
+  private def readObject(in: ObjectInputStream): Unit = {
+in.defaultReadObject
+ParquetFileFormat.ensureParquetLogRedirection
--- End diff --

You could also call `ensureParquetLogRedirection` from some main class 
right? e.g. `class Executor`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15697: [SparkR][Test]:remove unnecessary suppressWarning...

2016-10-31 Thread wangmiao1981

GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/15697

[SparkR][Test]:remove unnecessary suppressWarnings

## What changes were proposed in this pull request?

In test_mllib.R, there are two unnecessary suppressWarnings. This PR just 
removes them.

## How was this patch tested?

Existing unit tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark rtest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15697


commit a292ae8fc5bcb32e32c21d5e3ec7f093a4be13cd
Author: wm...@hotmail.com 
Date:   2016-10-31T19:04:57Z

remove suppressWarnings




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15695
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15695
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67815/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15637: [SPARK-18000] [SQL] Aggregation function for comp...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15637#discussion_r85811315
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/MapAggregateQuerySuite.scala ---
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+
+class MapAggregateQuerySuite extends QueryTest with SharedSQLContext {
+
+  private val table = "map_aggregate_test"
+  private val col1 = "col1"
+  private val col2 = "col2"
+  private val schema = StructType(Seq(StructField(col1, StringType), 
StructField(col2, DoubleType)))
+
+  private def query(numBins: Int): DataFrame = {
+sql(s"SELECT map_aggregate($col1, $numBins), map_aggregate($col2, 
$numBins) FROM $table")
+  }
+
+  test("null handling") {
+withTempView(table) {
+  // Null input
+  val nullRdd: RDD[Row] = spark.sparkContext.parallelize(Seq(Row(null, 
null)))
+  spark.createDataFrame(nullRdd, schema).createOrReplaceTempView(table)
+  checkAnswer(query(numBins = 2), Row(null, null))
+
+  // Empty input
+  val emptyRdd: RDD[Row] = spark.sparkContext.parallelize(Seq.empty)
+  spark.createDataFrame(emptyRdd, 
schema).createOrReplaceTempView(table)
+  checkAnswer(query(numBins = 2), Row(null, null))
+
+  // Add some non-null data
+  val rdd: RDD[Row] = spark.sparkContext.parallelize(Seq(Row(null, 
3.0D), Row("a", null)))
+  spark.createDataFrame(rdd, schema).createOrReplaceTempView(table)
+  checkAnswer(query(numBins = 2), Row(Map(("a", 1)), Map((3.0D, 1
+}
+  }
+
+  test("returns empty result when ndv exceeds numBins") {
+withTempView(table) {
+  val rdd: RDD[Row] = spark.sparkContext.parallelize(
+Seq(Row("a", 4.0D), Row("d", 2.0D), Row("c", 4.0D), Row("b", 
1.0D), Row("a", 3.0D),
+  Row("a", 2.0D)), 2)
+  spark.createDataFrame(rdd, schema).createOrReplaceTempView(table)
+  checkAnswer(query(numBins = 4), Row(
+Map(("a", 3), ("b", 1), ("c", 1), ("d", 1)),
+Map((1.0D, 1), (2.0D, 2), (3.0D, 1), (4.0D, 2
+  // One partial exceeds numBins during update()
+  checkAnswer(query(numBins = 2), Row(Map.empty, Map.empty))
+  // Exceeding numBins during merge()
+  checkAnswer(query(numBins = 3), Row(Map.empty, Map.empty))
+}
+  }
+
+  test("multiple columns of different types") {
+def queryMultiColumns(numBins: Int): DataFrame = {
+  sql(
+s"""
+   |SELECT
+   |  map_aggregate(c1, $numBins),
+   |  map_aggregate(c2, $numBins),
+   |  map_aggregate(c3, $numBins),
+   |  map_aggregate(c4, $numBins),
+   |  map_aggregate(c5, $numBins),
+   |  map_aggregate(c6, $numBins),
+   |  map_aggregate(c7, $numBins),
+   |  map_aggregate(c8, $numBins),
+   |  map_aggregate(c9, $numBins),
+   |  map_aggregate(c10, $numBins)
+   |FROM $table
+""".stripMargin)
+}
+
+val allTypeSchema = StructType(Seq(
+  StructField("c1", ByteType),
+  StructField("c2", ShortType),
+  StructField("c3", IntegerType),
+  StructField("c4", LongType),
+  StructField("c5", FloatType),
+  StructField("c6", DoubleType),
+  StructField("c7", DecimalType(10, 5)),
+  StructField("c8", DateType),
+  StructField("c9", TimestampType),
+  StructField("c10", StringType)))
--- End diff --

Here, it still misses `BinaryType` and `BooleanType`


---
If your project is set up for it, you can reply

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15695
  
Looks like FileStreamSourceSuite is broken is 2.0. Looking at it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15651
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67813/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15637: [SPARK-18000] [SQL] Aggregation function for comp...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15637#discussion_r85812423
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MapAggregate.scala
 ---
@@ -0,0 +1,332 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.nio.ByteBuffer
+
+import scala.collection.immutable.TreeMap
+import scala.collection.mutable
+
+import com.google.common.primitives.{Doubles, Ints, Longs}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions.{Expression, 
ExpressionDescription}
+import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
+import org.apache.spark.sql.types.{DataType, _}
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * The MapAggregate function for a column returns:
+ * 1. null if no non-null value exists.
+ * 2. (distinct non-null value, frequency) pairs of equi-width histogram 
when the number of
+ * distinct non-null values is less than or equal to the specified maximum 
number of bins.
+ * 3. an empty map otherwise.
+ *
+ * @param child child expression that can produce column value with 
`child.eval(inputRow)`
+ * @param numBinsExpression The maximum number of bins.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, numBins) - Returns 1. null if no non-null value exists.
+  2. (distinct non-null value, frequency) pairs of equi-width 
histogram when the number of
+  distinct non-null values is less than or equal to the specified 
maximum number of bins.
+  3. an empty map otherwise.
--- End diff --

Describe the general description of this function at first, and then 
explains the return values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6

501 - 572 of 572 matches

Mail list logo