date:20141001

[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2511


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...

2014-10-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2511#issuecomment-57590958
  
Thanks! Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2357#issuecomment-57590659
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21183/consoleFull)
 for   PR 2357 at commit 
[`609dd98`](https://github.com/apache/spark/commit/609dd98b7de77397dfc490c1c0a12bb9349830e5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/789#issuecomment-57590497
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21182/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-57590495
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21180/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2377] Python API for Streaming

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2538#issuecomment-57590494
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21181/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-10-01 Thread ScrapCodes

GitHub user ScrapCodes reopened a pull request:

https://github.com/apache/spark/pull/2357

[SPARK-3437][BUILD] Support crossbuilding in maven. With new 
scala-install-plugin.

Since this plugin is not deployed anywhere, for anyone trying this patch 
has to publish it locally by cloning the following repo 
https://github.com/ScrapCodes/scala-install-plugin.  And then running `mvn 
install`. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 maven-improvements

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2357.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2357


commit 4d2d30cc9dcbe2b929ed8ed124d6719430ec6fae
Author: Prashant Sharma 
Date:   2014-09-11T08:09:46Z

Supported new scala install plugin. Which can let us cross build for scala.

commit 609dd98b7de77397dfc490c1c0a12bb9349830e5
Author: Prashant Sharma 
Date:   2014-09-11T10:00:10Z

Changed to newly updated with cross build support branch.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-01 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-57590353
  
And this https://github.com/ScrapCodes/scala-install-plugin plugin takes 
care of publishing correct poms too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-01 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-57590286
  
Hey Patrick, thanks for looking at this. I did not say it is not possible. 
I just said the best(easiest ) way I could come up was to modify the maven 
install plugin. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-57589906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21179/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-01 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-57589522
  
@tgravescs This code only apples in mesos mode, so another mode - yarn and 
standalone - is not affected.

+1 @timothysc,

val fwInfo = 
FrameworkInfo.newBuilder().setUser(sc.sparkUser).setName(sc.appName).build()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-57589474
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21178/consoleFull)
 for   PR 2501 at commit 
[`65ed04a`](https://github.com/apache/spark/commit/65ed04afdc49f96d5f66257cb003f1e8e345095c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/789#issuecomment-57589292
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21175/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/789#issuecomment-57589288
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21175/consoleFull)
 for   PR 789 at commit 
[`1be3fa5`](https://github.com/apache/spark/commit/1be3fa53c4daf29d5b0153f2ac39e6d221f9bc56).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  throw new SparkException("Failed to load class to register 
with Kryo", e)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-10-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1893#issuecomment-57588918
  
@pwendell can you take a look at this when you have a chance


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2502


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: JIRA issue: [SPARK-1405] Gibbs sampling based ...

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/476


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] SPARK-2450: Add YARN executor log links ...

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1375


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2391


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-57588340
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21177/consoleFull)
 for   PR 1486 at commit 
[`338d4f8`](https://github.com/apache/spark/commit/338d4f8fedd68b64a7fdfaf078afcc2623072501).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2693][SQL] Supported for UDAF Hive Aggr...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2620#issuecomment-57588338
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21176/consoleFull)
 for   PR 2620 at commit 
[`caf25c6`](https://github.com/apache/spark/commit/caf25c6633751f5418864a484304b17cf7a18b1a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-57588159
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-57588021
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21174/consoleFull)
 for   PR 2388 at commit 
[`99945ce`](https://github.com/apache/spark/commit/99945ce52e7559728191226fbc21a2a592591ceb).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TopicModelingKryoRegistrator extends KryoRegistrator `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2693][SQL] Supported for UDAF Hive Aggr...

2014-10-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2620#issuecomment-57588061
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-57588026
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21174/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/789#issuecomment-57585682
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21175/consoleFull)
 for   PR 789 at commit 
[`1be3fa5`](https://github.com/apache/spark/commit/1be3fa53c4daf29d5b0153f2ac39e6d221f9bc56).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/789#issuecomment-57585666
  
Updated patch allows using both at the same time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2511#issuecomment-57584098
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21173/consoleFull)
 for   PR 2511 at commit 
[`9fb973f`](https://github.com/apache/spark/commit/9fb973f39582e03ad06bf99c78f01099d493170a).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2511#issuecomment-57584100
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21173/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-57584038
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21174/consoleFull)
 for   PR 2388 at commit 
[`99945ce`](https://github.com/apache/spark/commit/99945ce52e7559728191226fbc21a2a592591ceb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/2609#discussion_r18322230
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -202,9 +205,20 @@ private[spark] class Worker(
   // Spin up a separate thread (in a future) to do the dir cleanup; 
don't tie up worker actor
   val cleanupFuture = concurrent.future {
 logInfo("Cleaning up oldest application directories in " + workDir 
+ " ...")
-Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
-  .foreach(Utils.deleteRecursively)
+val appDirs = workDir.listFiles()
+if (appDirs == null) {
+  throw new IOException("ERROR: Failed to list files in " + 
appDirs)
+}
+appDirs.filter { dir => {
--- End diff --

You do not need the extra bracket after the "dir =>". We use the enclosing 
bracket's scope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3719][CORE]:"complete/failed stages" is...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2574#discussion_r18321787
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressPage.scala ---
@@ -70,11 +72,11 @@ private[ui] class JobProgressPage(parent: 
JobProgressTab) extends WebUIPage("")
 
 
   Completed Stages:
-  {completedStages.size}
+  {totalCompletedStages}
--- End diff --

I agree; this will help to avoid user confusion if there's a big difference 
between the count and number of displayed stages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...

2014-10-01 Thread ravipesala

Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/2590#issuecomment-57581369
  
Fixed code as per comments, please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-10-01 Thread mubarak

Github user mubarak closed the pull request at:

https://github.com/apache/spark/pull/1723


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-10-01 Thread mubarak

Github user mubarak commented on the pull request:

https://github.com/apache/spark/pull/1723#issuecomment-57581072
  
Fixed using https://github.com/apache/spark/pull/2464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2511#issuecomment-57580433
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21173/consoleFull)
 for   PR 2511 at commit 
[`9fb973f`](https://github.com/apache/spark/commit/9fb973f39582e03ad06bf99c78f01099d493170a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-01 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57580447
  
@yhuai I removed all unnecessary implicits to make it consistent, but have 
to keep wrapperToFileSinkDesc because HiveFileFormatUtils.getHiveRecordWriter 
needs FileSinkDesc type, and also it help to track the internal state change of 
FileSinkDesc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...

2014-10-01 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2511#discussion_r18321536
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -166,7 +186,7 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 val withFilter = f.map(f => Filter(f, base)).getOrElse(base)
 val withProjection =
   g.map {g =>
-Aggregate(assignAliases(g), assignAliases(p), withFilter)
+Aggregate(assignAliasesForGroups(g,p), assignAliases(p), 
withFilter)
--- End diff --

Yes @marmbrus , better we remove assignAliases to grouping 
expressions.Updated code as per that. Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-10-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2563#discussion_r18321520
  
--- Diff: python/pyspark/sql.py ---
@@ -385,50 +429,32 @@ def _parse_datatype_string(datatype_string):
 >>> check_datatype(complex_maptype)
 True
 """
-index = datatype_string.find("(")
-if index == -1:
-# It is a primitive type.
-index = len(datatype_string)
-type_or_field = datatype_string[:index]
-rest_part = datatype_string[index + 1:len(datatype_string) - 1].strip()
-
-if type_or_field in _all_primitive_types:
-return _all_primitive_types[type_or_field]()
-
-elif type_or_field == "ArrayType":
-last_comma_index = rest_part.rfind(",")
-containsNull = True
-if rest_part[last_comma_index + 1:].strip().lower() == "false":
-containsNull = False
-elementType = _parse_datatype_string(
-rest_part[:last_comma_index].strip())
-return ArrayType(elementType, containsNull)
-
-elif type_or_field == "MapType":
-last_comma_index = rest_part.rfind(",")
-valueContainsNull = True
-if rest_part[last_comma_index + 1:].strip().lower() == "false":
-valueContainsNull = False
-keyType, valueType = _parse_datatype_list(
-rest_part[:last_comma_index].strip())
-return MapType(keyType, valueType, valueContainsNull)
-
-elif type_or_field == "StructField":
-first_comma_index = rest_part.find(",")
-name = rest_part[:first_comma_index].strip()
-last_comma_index = rest_part.rfind(",")
-nullable = True
-if rest_part[last_comma_index + 1:].strip().lower() == "false":
-nullable = False
-dataType = _parse_datatype_string(
-rest_part[first_comma_index + 1:last_comma_index].strip())
-return StructField(name, dataType, nullable)
-
-elif type_or_field == "StructType":
-# rest_part should be in the format like
-# List(StructField(field1,IntegerType,false)).
-field_list_string = rest_part[rest_part.find("(") + 1:-1]
-fields = _parse_datatype_list(field_list_string)
+return _parse_datatype_json_value(json.loads(json_string))
+
+
+def _parse_datatype_json_value(json_value):
+if json_value in _all_primitive_types.keys():
--- End diff --

if json_value is {}, it's not hashable, you can not use 'in' for it.

I would like to use same type of json_value for all types, such as dict, 
with a key called type, such as:

```
{'type': 'int'}
``` 
for other types, it could have additional keys, based on the type, such as:

```
{'type':'array', 'element': {'type':'int'}, 'null': True}
```

In this ways, it will be easier to do the type switch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57579497
  
@buenrostro-oo @tdas We have seen several test failures from 
`NetworkReceiverSuite`. Do you have time to take a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-10-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2563#discussion_r18321352
  
--- Diff: python/pyspark/sql.py ---
@@ -205,6 +234,16 @@ def __str__(self):
 return "ArrayType(%s,%s)" % (self.elementType,
  str(self.containsNull).lower())
 
+simpleString = 'array'
+
+def jsonValue(self):
+return {
+self.simpleString: {
+'type': self.elementType.jsonValue(),
+'containsNull': self.containsNull
+}
+}
--- End diff --

This looks like js style, it could be fit in fewer lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-10-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2563#discussion_r18321283
  
--- Diff: python/pyspark/sql.py ---
@@ -62,6 +63,12 @@ def __eq__(self, other):
 def __ne__(self, other):
 return not self.__eq__(other)
 
+def jsonValue(self):
+return self.simpleString
--- End diff --

you can have default implementation as:

self.__class__.__name__.[:-4].lower()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-01 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2576#discussion_r18321063
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/orc/OrcTableOperations.scala ---
@@ -0,0 +1,418 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.spark.sql.orc
+
+import org.apache.spark.sql.execution.{ExistingRdd, LeafNode, UnaryNode, 
SparkPlan}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.{TaskContext, SerializableWritable}
+import org.apache.spark.rdd.RDD
+
+import _root_.parquet.hadoop.util.ContextUtil
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.lib.output.{FileOutputFormat, 
FileOutputCommitter}
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+import org.apache.hadoop.io.{Writable, NullWritable}
+import org.apache.hadoop.mapreduce.{TaskID, TaskAttemptContext, Job}
+
+import org.apache.hadoop.hive.ql.io.orc.{OrcFile, OrcSerde, 
OrcInputFormat, OrcOutputFormat}
+import org.apache.hadoop.hive.serde2.objectinspector._
+import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal, HiveVarchar}
+
+import java.io.IOException
+import java.text.SimpleDateFormat
+import java.util.{Locale, Date}
+import scala.collection.JavaConversions._
+import org.apache.hadoop.mapred.{SparkHadoopMapRedUtil, Reporter, JobConf}
+
+/**
+ * orc table scan operator. Imports the file that backs the given
+ * [[org.apache.spark.sql.orc.OrcRelation]] as a ``RDD[Row]``.
+ */
+case class OrcTableScan(
+   output: Seq[Attribute],
+   relation: OrcRelation,
+   columnPruningPred: Option[Expression])
+  extends LeafNode {
+
+  @transient
+  lazy val serde: OrcSerde = initSerde
+
+  @transient
+  lazy val getFieldValue: Seq[Product => Any] = {
+val inspector = 
serde.getObjectInspector.asInstanceOf[StructObjectInspector]
+output.map(attr => {
+  val ref = 
inspector.getStructFieldRef(attr.name.toLowerCase(Locale.ENGLISH))
+  row: Product => {
+val fieldData = row.productElement(1)
+val data = inspector.getStructFieldData(fieldData, ref)
+unwrapData(data, ref.getFieldObjectInspector)
+  }
+})
+  }
+
+  private def initSerde(): OrcSerde = {
+val serde = new OrcSerde
+serde.initialize(null, relation.prop)
+serde
+  }
+
+  def unwrapData(data: Any, oi: ObjectInspector): Any = oi match {
+case pi: PrimitiveObjectInspector => pi.getPrimitiveJavaObject(data)
+case li: ListObjectInspector =>
+  Option(li.getList(data))
+.map(_.map(unwrapData(_, li.getListElementObjectInspector)).toSeq)
+.orNull
+case mi: MapObjectInspector =>
+  Option(mi.getMap(data)).map(
+_.map {
+  case (k, v) =>
+(unwrapData(k, mi.getMapKeyObjectInspector),
+  unwrapData(v, mi.getMapValueObjectInspector))
+}.toMap).orNull
+case si: StructObjectInspector =>
+  val allRefs = si.getAllStructFieldRefs
+  new GenericRow(
+allRefs.map(r =>
+  unwrapData(si.getStructFieldData(data, r), 
r.getFieldObjectInspector)).toArray)
+  }
+
+  override def execute(): RDD[Row] = {
+val sc = sqlContext.sparkContext
+val job = new Job(sc.hadoopConfiguration)
+
+val conf: Configuration = ContextUtil.getConfiguration(job)
+val fileList = FileSystemHelper.listFiles(relation.path, conf)
+
+// add all paths in the directory but skip "hidden" ones such
+// as "_SUCCESS"
+for (path <- fileList if !path.getName.startsWith("_")) {
+  FileInputFormat.addInputPath(job, path)
+}
+val serialConf = sc.broadcast(new SerializableWritable(conf))
+
+setColumnIds(output, relat

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-01 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2576#discussion_r18321025
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/orc/OrcTableOperations.scala ---
@@ -0,0 +1,418 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.spark.sql.orc
+
+import org.apache.spark.sql.execution.{ExistingRdd, LeafNode, UnaryNode, 
SparkPlan}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.{TaskContext, SerializableWritable}
+import org.apache.spark.rdd.RDD
+
+import _root_.parquet.hadoop.util.ContextUtil
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.lib.output.{FileOutputFormat, 
FileOutputCommitter}
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+import org.apache.hadoop.io.{Writable, NullWritable}
+import org.apache.hadoop.mapreduce.{TaskID, TaskAttemptContext, Job}
+
+import org.apache.hadoop.hive.ql.io.orc.{OrcFile, OrcSerde, 
OrcInputFormat, OrcOutputFormat}
+import org.apache.hadoop.hive.serde2.objectinspector._
+import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal, HiveVarchar}
+
+import java.io.IOException
+import java.text.SimpleDateFormat
+import java.util.{Locale, Date}
+import scala.collection.JavaConversions._
+import org.apache.hadoop.mapred.{SparkHadoopMapRedUtil, Reporter, JobConf}
+
+/**
+ * orc table scan operator. Imports the file that backs the given
+ * [[org.apache.spark.sql.orc.OrcRelation]] as a ``RDD[Row]``.
+ */
+case class OrcTableScan(
+   output: Seq[Attribute],
+   relation: OrcRelation,
+   columnPruningPred: Option[Expression])
+  extends LeafNode {
+
+  @transient
+  lazy val serde: OrcSerde = initSerde
+
+  @transient
+  lazy val getFieldValue: Seq[Product => Any] = {
+val inspector = 
serde.getObjectInspector.asInstanceOf[StructObjectInspector]
+output.map(attr => {
+  val ref = 
inspector.getStructFieldRef(attr.name.toLowerCase(Locale.ENGLISH))
+  row: Product => {
+val fieldData = row.productElement(1)
+val data = inspector.getStructFieldData(fieldData, ref)
+unwrapData(data, ref.getFieldObjectInspector)
+  }
+})
+  }
+
+  private def initSerde(): OrcSerde = {
+val serde = new OrcSerde
+serde.initialize(null, relation.prop)
+serde
+  }
+
+  def unwrapData(data: Any, oi: ObjectInspector): Any = oi match {
+case pi: PrimitiveObjectInspector => pi.getPrimitiveJavaObject(data)
+case li: ListObjectInspector =>
+  Option(li.getList(data))
+.map(_.map(unwrapData(_, li.getListElementObjectInspector)).toSeq)
+.orNull
+case mi: MapObjectInspector =>
+  Option(mi.getMap(data)).map(
+_.map {
+  case (k, v) =>
+(unwrapData(k, mi.getMapKeyObjectInspector),
+  unwrapData(v, mi.getMapValueObjectInspector))
+}.toMap).orNull
+case si: StructObjectInspector =>
+  val allRefs = si.getAllStructFieldRefs
+  new GenericRow(
+allRefs.map(r =>
+  unwrapData(si.getStructFieldData(data, r), 
r.getFieldObjectInspector)).toArray)
+  }
+
+  override def execute(): RDD[Row] = {
+val sc = sqlContext.sparkContext
+val job = new Job(sc.hadoopConfiguration)
+
+val conf: Configuration = ContextUtil.getConfiguration(job)
+val fileList = FileSystemHelper.listFiles(relation.path, conf)
+
+// add all paths in the directory but skip "hidden" ones such
+// as "_SUCCESS"
+for (path <- fileList if !path.getName.startsWith("_")) {
+  FileInputFormat.addInputPath(job, path)
+}
+val serialConf = sc.broadcast(new SerializableWritable(conf))
+
+setColumnIds(output, relat

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57577580
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21171/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57577573
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21171/consoleFull)
 for   PR 2595 at commit 
[`a0d9de3`](https://github.com/apache/spark/commit/a0d9de33d6b8ea7dec2e6421a5debd5310d3aa03).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-01 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2576#discussion_r18320990
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/orc/ORCQuerySuite.scala ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.orc
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.ql.io.orc.CompressionKind
+import org.apache.spark.sql.{SQLConf, SchemaRDD, TestData, QueryTest}
+import org.apache.spark.sql.test.TestSQLContext
+import org.scalatest.{BeforeAndAfterAll, FunSuiteLike}
+import org.apache.spark.util.Utils
+import org.apache.spark.sql.catalyst.util.getTempFilePath
+import org.apache.spark.sql.test.TestSQLContext._
+
+import java.io.File
+
+case class TestRDDEntry(key: Int, value: String)
+
+case class NullReflectData(
+intField: java.lang.Integer,
+longField: java.lang.Long,
+floatField: java.lang.Float,
+doubleField: java.lang.Double,
+booleanField: java.lang.Boolean)
+
+case class OptionalReflectData(
+intField: Option[Int],
+longField: Option[Long],
+floatField: Option[Float],
+doubleField: Option[Double],
+booleanField: Option[Boolean])
+
+case class Nested(i: Int, s: String)
+
+case class Data(array: Seq[Int], nested: Nested)
+
+case class AllDataTypes(
+stringField: String,
+intField: Int,
+longField: Long,
+floatField: Float,
+doubleField: Double,
+shortField: Short,
+byteField: Byte,
+booleanField: Boolean)
+
+case class AllDataTypesWithNonPrimitiveType(
+stringField: String,
+intField: Int,
+longField: Long,
+floatField: Float,
+doubleField: Double,
+shortField: Short,
+byteField: Byte,
+booleanField: Boolean,
+array: Seq[Int],
+arrayContainsNull: Seq[Option[Int]],
+map: Map[Int, Long],
+mapValueContainsNull: Map[Int, Option[Long]],
+data: Data)
+
+case class BinaryData(binaryData: Array[Byte])
+
+class OrcQuerySuite extends QueryTest with FunSuiteLike with 
BeforeAndAfterAll {
+  TestData // Load test data tables.
+
+  var testRDD: SchemaRDD = null
+  test("Read/Write All Types") {
+val tempDir = getTempFilePath("orcTest").getCanonicalPath
+val range = (0 to 255)
+val data = sparkContext.parallelize(range)
+  .map(x => AllDataTypes(s"$x", x, x.toLong, x.toFloat, x.toDouble, 
x.toShort, x.toByte, x % 2 == 0))
+
+data.saveAsOrcFile(tempDir)
+
+checkAnswer(
+  orcFile(tempDir),
+  data.toSchemaRDD.collect().toSeq)
+
+Utils.deleteRecursively(new File(tempDir))
+
+  }
+
+  test("Compression options for writing to a Orcfile") {
+val defaultOrcCompressionCodec = TestSQLContext.orcCompressionCodec
+//TODO: support other compress codec
+val file = getTempFilePath("orcTest")
+val path = file.toString
+val rdd = TestSQLContext.sparkContext.parallelize((1 to 100))
+  .map(i => TestRDDEntry(i, s"val_$i"))
+
+// test default compression codec, now only support zlib
+rdd.saveAsOrcFile(path)
+var actualCodec = OrcFileOperator.readMetaData(new 
Path(path)).getCompression.name
+assert(actualCodec == TestSQLContext.orcCompressionCodec.toUpperCase)
+
+/**
--- End diff --

now only support zlib, i will remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h

[GitHub] spark pull request: [SPARK-2693][SQL] Supported for UDAF Hive Aggr...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2620#issuecomment-57576395
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21172/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3696]Do not override the user-difined c...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2541#discussion_r18320642
  
--- Diff: sbin/spark-config.sh ---
@@ -33,7 +33,7 @@ this="$config_bin/$script"
 
 export SPARK_PREFIX="`dirname "$this"`"/..
 export SPARK_HOME="${SPARK_PREFIX}"
-export SPARK_CONF_DIR="$SPARK_HOME/conf"
+export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"$SPARK_HOME/conf"}"
--- End diff --

I'm not a bash expert, so I'm curious: does the nesting of double quotes 
work properly here?  Here, we have double quotes inside of the `${}` and 
surrounding it, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Prevents per row dynamic dispatching and...

2014-10-01 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2592#issuecomment-57575171
  
Yes, it should go after the DP PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1717#issuecomment-57575122
  
Do you mind adding "closes #2098" to the description of your PR so that 
this automatically closes the other PR when merged?  Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1723#issuecomment-57575023
  
Hi @mubarak,

This issue has been fixed by #2464, so do you mind closing this? Thanks!

(Due to the way that this GitHub mirror is set up, we don't have permission 
to close your PR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3755][Core] Do not bind port 1 - 1024 t...

2014-10-01 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2623#issuecomment-57574915
  
https://github.com/apache/spark/pull/2610


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2201 Improve FlumeInputDStream's stabili...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1310#issuecomment-57574730
  
Hi @joyyoj,

Since this pull request doesn't show any code / changes, do you mind 
closing it?  Feel free to update / re-open if you have code that you'd like us 
to review.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-01 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57574745
  
@pwendell I think the packaging has some problem, probably in protobuf. I 
ran some test suite, but cannot go through. With the original package, the test 
is OK. Following are some example failure case.

sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
"test-only org.apache.spark.sql.hive.CachedTableSuite"
Caused by: sbt.ForkMain$ForkError: 
com.google.protobuf_spark.GeneratedMessage
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
"test-only org.apache.spark.sql.hive.execution.HiveQuerySuite"
[info]   ...
[info]   Cause: java.lang.ClassNotFoundException: 
com.google.protobuf_spark.GeneratedMessage
[info]   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
"test-only org.apache.spark.sql.parquet.ParquetMetastoreSuite"
[info]   ...
[info]   Cause: java.lang.ClassNotFoundException: 
com.google.protobuf_spark.GeneratedMessage
[info]   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57573144
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21169/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57573135
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21169/consoleFull)
 for   PR 2609 at commit 
[`77a9de0`](https://github.com/apache/spark/commit/77a9de0adb733c406440e0f498f888939461f831).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2337#issuecomment-57572402
  
I've given it some thought and I don't think that we should merge the more 
general async. mechanism that I described in #2482.  It had some confusing 
semantics surrounding cancellation (see the discussion of Thread.interrupt) and 
was probably more general than what most users need.

Given that we should probably keep the current async APIs, this PR's change 
looks good.  I'm going to merge this into `master`.  Thanks for this commit and 
sorry for the long review delay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1720][SPARK-1719] Add the value of LD_L...

2014-10-01 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1031#issuecomment-57572266
  
Ok, I'll try to use LD_LIBRARY_PATH.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57572114
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21171/consoleFull)
 for   PR 2595 at commit 
[`a0d9de3`](https://github.com/apache/spark/commit/a0d9de33d6b8ea7dec2e6421a5debd5310d3aa03).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/789#issuecomment-57572019
  
I think it's better to do both and explain that there might be problems. 
Otherwise users will see this new API and perhaps be surprised that their old 
registrator is no longer called. Not everyone reads the docs on the new API, so 
they might never notice, and just get poor performance.

BTW looking at Kryo's docs, it does support multiple register calls on the 
same class, and it just uses the value from the last one. So it will probably 
do the right thing here if we call their custom registrator last.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3626] [WIP] Replace AsyncRDDActions wit...

2014-10-01 Thread JoshRosen

Github user JoshRosen closed the pull request at:

https://github.com/apache/spark/pull/2482


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3626] [WIP] Replace AsyncRDDActions wit...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2482#issuecomment-57572018
  
I'm going to close this for now.  My approach has some confusing semantics 
and may be more general than what most users need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin

Github user chouqin commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57571761
  
`NetworkReceiverSuite` in spark-streaming has failed, it is not related to 
this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin

Github user chouqin commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57571778
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2750] support https in spark web ui

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1980#issuecomment-57570323
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21165/consoleFull)
 for   PR 1980 at commit 
[`a29ec86`](https://github.com/apache/spark/commit/a29ec8632cce8cb29c38d8e9e3aee2334400130b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  println(s"Failed to load main class $childMainClass.")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2750] support https in spark web ui

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1980#issuecomment-57570330
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21165/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-01 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-57570180
  
@ScrapCodes Do you have any better idea for this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57570139
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21166/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57570133
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21166/consoleFull)
 for   PR 2595 at commit 
[`a0d9de3`](https://github.com/apache/spark/commit/a0d9de33d6b8ea7dec2e6421a5debd5310d3aa03).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  println(s"Failed to load main class $childMainClass.")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-57569132
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21170/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-57569130
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21170/consoleFull)
 for   PR 2501 at commit 
[`bdf9a3f`](https://github.com/apache/spark/commit/bdf9a3f9dab4e4fe7cc89cc9a32a31e7511bf8de).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LogicalRDD(output: Seq[Attribute], rdd: 
RDD[Row])(sqlContext: SQLContext)`
  * `case class PhysicalRDD(output: Seq[Attribute], rdd: RDD[Row]) extends 
LeafNode `
  * `case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends 
LeafNode `
  * `case class SparkLogicalPlan(alreadyPlanned: SparkPlan)(@transient 
sqlContext: SQLContext)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57568876
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21169/consoleFull)
 for   PR 2609 at commit 
[`77a9de0`](https://github.com/apache/spark/commit/77a9de0adb733c406440e0f498f888939461f831).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-57568818
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21170/consoleFull)
 for   PR 2501 at commit 
[`bdf9a3f`](https://github.com/apache/spark/commit/bdf9a3f9dab4e4fe7cc89cc9a32a31e7511bf8de).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3638 | Forced a compatible version of ht...

2014-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2535


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57568072
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21168/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57568070
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21168/consoleFull)
 for   PR 2609 at commit 
[`7b7cae4`](https://github.com/apache/spark/commit/7b7cae481661b7b2a58ad96e08291573a0e043e7).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3638 | Forced a compatible version of ht...

2014-10-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2535#issuecomment-57567934
  
This looks good to me.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57567979
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21168/consoleFull)
 for   PR 2609 at commit 
[`7b7cae4`](https://github.com/apache/spark/commit/7b7cae481661b7b2a58ad96e08291573a0e043e7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57567664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21167/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57567662
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21167/consoleFull)
 for   PR 2609 at commit 
[`a045620`](https://github.com/apache/spark/commit/a04562069843603725cd4320afce9e5b19abe53b).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2609#issuecomment-57567579
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21167/consoleFull)
 for   PR 2609 at commit 
[`a045620`](https://github.com/apache/spark/commit/a04562069843603725cd4320afce9e5b19abe53b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3755][Core] Do not bind port 1 - 1024 t...

2014-10-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2610#issuecomment-57567022
  
Oops you're right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-10-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-57566803
  
Okay here are some thoughts and questions:
 - I don't think it really matters that we can't handle `f1.f11 > f2.f22` 
because we already don't know what do to if a user does `[1,2] > [0,3]` even 
without this new syntax.
 - Am I correct in saying that hive doesn't support this syntax at all and 
that we are inventing new functionality?  I'm not strictly opposed to this, but 
we should be careful as once we support something we can't get rid of it later.
 - I'm not convinced that we need to handle arbitrary array nesting here.  
The case of getting all of one field from an array (which i guess makes this 
SQL short hand for `array.map(_.fieldName)`) seems reasonable, but is there a 
use case for the arbitrary nesting version?
 - This ends up complicating `GetField` quite a bit.  What about creating a 
new expression type `ArrayGetField` and adding something to the analyzer that 
switches expression types when an array is detected.  The idea here is to keep 
each expression simple so we can code-gen on a case by case basis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-10-01 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-57566795
  
> This patch adds the following public classes (experimental):
println(s"Failed to load main class $childMainClass.")

FYI: I believe I have these phantom new class notes finally sorted out in 
#2606.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2377] Python API for Streaming

2014-10-01 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2538#discussion_r18318446
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala
 ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.api.python
+
+import java.io.{ObjectInputStream, ObjectOutputStream}
+import java.lang.reflect.Proxy
+import java.util.{ArrayList => JArrayList, List => JList}
+import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._
+
+import org.apache.spark.api.java._
+import org.apache.spark.api.python._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming.{Interval, Duration, Time}
+import org.apache.spark.streaming.dstream._
+import org.apache.spark.streaming.api.java._
+
+
+/**
+ * Interface for Python callback function with three arguments
+ */
+private[python] trait PythonTransformFunction {
+  def call(time: Long, rdds: JList[_]): JavaRDD[Array[Byte]]
+}
+
+/**
+ * Wrapper for PythonTransformFunction
+ * TODO: support checkpoint
+ */
+private[python] class TransformFunction(@transient var pfunc: 
PythonTransformFunction)
+  extends function.Function2[JList[JavaRDD[_]], Time, 
JavaRDD[Array[Byte]]] with Serializable {
+
+  def apply(rdd: Option[RDD[_]], time: Time): Option[RDD[Array[Byte]]] = {
+Option(pfunc.call(time.milliseconds, 
List(rdd.map(JavaRDD.fromRDD(_)).orNull).asJava))
+  .map(_.rdd)
+  }
+
+  def apply(rdd: Option[RDD[_]], rdd2: Option[RDD[_]], time: Time): 
Option[RDD[Array[Byte]]] = {
+val rdds = List(rdd.map(JavaRDD.fromRDD(_)).orNull, 
rdd2.map(JavaRDD.fromRDD(_)).orNull).asJava
+Option(pfunc.call(time.milliseconds, rdds)).map(_.rdd)
+  }
+
+  // for function.Function2
+  def call(rdds: JList[JavaRDD[_]], time: Time): JavaRDD[Array[Byte]] = {
+pfunc.call(time.milliseconds, rdds)
+  }
+
+  private def writeObject(out: ObjectOutputStream): Unit = {
+assert(PythonDStream.serializer != null, "Serializer has not been 
registered!")
+val bytes = PythonDStream.serializer.serialize(pfunc)
+out.writeInt(bytes.length)
+out.write(bytes)
+  }
+
+  private def readObject(in: ObjectInputStream): Unit = {
+assert(PythonDStream.serializer != null, "Serializer has not been 
registered!")
+val length = in.readInt()
+val bytes = new Array[Byte](length)
+in.readFully(bytes)
+pfunc = PythonDStream.serializer.deserialize(bytes)
+  }
+}
+
+/**
+ * Interface for Python Serializer to serialize PythonTransformFunction
+ */
+private[python] trait PythonTransformFunctionSerializer {
+  def dumps(id: String): Array[Byte]  //
--- End diff --

Extra `//`
nit: move this trait to be near `PythonTransformFunction`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-57566562
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21163/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2377] Python API for Streaming

2014-10-01 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2538#discussion_r18318422
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala
 ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.api.python
+
+import java.io.{ObjectInputStream, ObjectOutputStream}
+import java.lang.reflect.Proxy
+import java.util.{ArrayList => JArrayList, List => JList}
+import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._
+
+import org.apache.spark.api.java._
+import org.apache.spark.api.python._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming.{Interval, Duration, Time}
+import org.apache.spark.streaming.dstream._
+import org.apache.spark.streaming.api.java._
+
+
+/**
+ * Interface for Python callback function with three arguments
+ */
+private[python] trait PythonTransformFunction {
+  def call(time: Long, rdds: JList[_]): JavaRDD[Array[Byte]]
+}
+
+/**
+ * Wrapper for PythonTransformFunction
+ * TODO: support checkpoint
+ */
+private[python] class TransformFunction(@transient var pfunc: 
PythonTransformFunction)
+  extends function.Function2[JList[JavaRDD[_]], Time, 
JavaRDD[Array[Byte]]] with Serializable {
--- End diff --

Function2 is already Serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-57566559
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21163/consoleFull)
 for   PR 2339 at commit 
[`43a69f0`](https://github.com/apache/spark/commit/43a69f00a2fd004a57b860b3ee6bda8fc1e9f840).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  println(s"Failed to load main class $childMainClass.")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2377] Python API for Streaming

2014-10-01 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2538#discussion_r18318391
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -0,0 +1,532 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+from itertools import chain
+import time
+import operator
+import unittest
+import tempfile
+
+from pyspark.context import SparkConf, SparkContext, RDD
+from pyspark.streaming.context import StreamingContext
+
+
+class PySparkStreamingTestCase(unittest.TestCase):
+
+timeout = 10  # seconds
+duration = 1
+
+def setUp(self):
+class_name = self.__class__.__name__
+conf = SparkConf().set("spark.default.parallelism", 1)
+self.sc = SparkContext(appName=class_name, conf=conf)
+self.sc.setCheckpointDir("/tmp")
+# TODO: decrease duration to speed up tests
+self.ssc = StreamingContext(self.sc, self.duration)
+
+def tearDown(self):
+self.ssc.stop()
+
+def _take(self, dstream, n):
+"""
+Return the first `n` elements in the stream (will start and stop).
+"""
+results = []
+
+def take(_, rdd):
+if rdd and len(results) < n:
+results.extend(rdd.take(n - len(results)))
+
+dstream.foreachRDD(take)
+
+self.ssc.start()
+while len(results) < n:
+time.sleep(0.01)
+self.ssc.stop(False, True)
+return results
+
+def _collect(self, dstream):
+"""
+Collect each RDDs into the returned list.
+
+:return: list, which will have the collected items.
+"""
+result = []
+
+def get_output(_, rdd):
+r = rdd.collect()
+if r:
+result.append(r)
+dstream.foreachRDD(get_output)
+return result
+
+def _test_func(self, input, func, expected, sort=False, input2=None):
+"""
+@param input: dataset for the test. This should be list of lists.
+@param func: wrapped function. This function should return 
PythonDStream object.
+@param expected: expected output for this testcase.
+"""
+if not isinstance(input[0], RDD):
+input = [self.sc.parallelize(d, 1) for d in input]
+input_stream = self.ssc.queueStream(input)
+if input2 and not isinstance(input2[0], RDD):
+input2 = [self.sc.parallelize(d, 1) for d in input2]
+input_stream2 = self.ssc.queueStream(input2) if input2 is not None 
else None
+
+# Apply test function to stream.
+if input2:
+stream = func(input_stream, input_stream2)
+else:
+stream = func(input_stream)
+
+result = self._collect(stream)
+self.ssc.start()
+
+start_time = time.time()
+# Loop until get the expected the number of the result from the 
stream.
+while True:
+current_time = time.time()
+# Check time out.
+if (current_time - start_time) > self.timeout:
+print "timeout after", self.timeout
+break
+# StreamingContext.awaitTermination is not used to wait because
+# if py4j server is called every 50 milliseconds, it gets an 
error.
+time.sleep(0.05)
+# Check if the output is the same length of expected output.
+if len(expected) == len(result):
+break
+if sort:
+self._sort_result_based_on_key(result)
+self._sort_result_based_on_key(expected)
+self.assertEqual(expected, result)
+
+def _sort_result_based_on_key(self, outputs):
+"""Sort the list based on first value."""
+for output in outputs:

[GitHub] spark pull request: [SPARK-2489] [SQL] Parquet support for fixed_l...

2014-10-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1737#issuecomment-57566129
  
You are right that we would have to change the BinaryType to be a case 
class instead to hold this information and then change the rest of the code to 
deal with that.  It is possible that we could play some tricks with the 
`unapply` method in the BinaryType companion object to minimize the changes to 
pattern matching code, I'd have to play around with it more to see if that is 
actually feasible though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57565848
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21166/consoleFull)
 for   PR 2595 at commit 
[`a0d9de3`](https://github.com/apache/spark/commit/a0d9de33d6b8ea7dec2e6421a5debd5310d3aa03).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3479] [Build] Report failed test catego...

2014-10-01 Thread nchammas

Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2606#discussion_r18318253
  
--- Diff: dev/run-tests-jenkins ---
@@ -84,42 +98,46 @@ function post_message () {
   fi
 }
 
+
+# We diff master...$ghprbActualCommit because that gets us changes 
introduced in the PR
+#+ and not anything else added to master since the PR was branched.
+
 # check PR merge-ability and check for new public classes
 {
   if [ "$sha1" == "$ghprbActualCommit" ]; then
-merge_note=" * This patch **does not** merge cleanly!"
+merge_note=" * This patch **does not merge cleanly**."
   else
 merge_note=" * This patch merges cleanly."
+  fi
+  
+  source_files=$(
+  git diff master...$ghprbActualCommit --name-only  `# diff patch 
against master from branch point` \
--- End diff --

After lots of trial and error, I'm pretty sure this is the correct way to 
do this diff, and I understand why. A brief explanation is included as a 
comment earlier in the file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread manishamde

Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57565744
  
@chouqin Sorry for the delay in my review. I will finish mine within the 
next 24 hours.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3479] [Build] Report failed test catego...

2014-10-01 Thread nchammas

Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2606#discussion_r18318239
  
--- Diff: dev/run-tests-jenkins ---
@@ -84,42 +98,46 @@ function post_message () {
   fi
 }
 
+
+# We diff master...$ghprbActualCommit because that gets us changes 
introduced in the PR
+#+ and not anything else added to master since the PR was branched.
+
 # check PR merge-ability and check for new public classes
 {
   if [ "$sha1" == "$ghprbActualCommit" ]; then
-merge_note=" * This patch **does not** merge cleanly!"
+merge_note=" * This patch **does not merge cleanly**."
   else
 merge_note=" * This patch merges cleanly."
+  fi
+  
+  source_files=$(
--- End diff --

We can do a valid diff regardless of the merge-ability of the patch, so I 
moved this out of the if block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3479] [Build] [WIP] Report failed test ...

2014-10-01 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2606#issuecomment-57565606
  
cc @pwendell This PR is ready for review.

Here are examples of the messages posted when:
* [all tests 
pass](https://github.com/apache/spark/pull/2606#issuecomment-57548426)
* [PySpark unit tests fail and there is a new 
class](https://github.com/apache/spark/pull/2606#issuecomment-57535539)
* [Spark unit tests 
fail](https://github.com/apache/spark/pull/2606#issuecomment-57522419)
* [RAT tests 
fail](https://github.com/apache/spark/pull/2606#issuecomment-57510289)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin

Github user chouqin commented on the pull request:

https://github.com/apache/spark/pull/2595#issuecomment-57565298
  
@mengxr @jkbradley thanks for your comments, it can pass unit test now, do 
you have any more suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2750] support https in spark web ui

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1980#issuecomment-57564475
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21165/consoleFull)
 for   PR 1980 at commit 
[`a29ec86`](https://github.com/apache/spark/commit/a29ec8632cce8cb29c38d8e9e3aee2334400130b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...

2014-10-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2561#issuecomment-57564269
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/250/consoleFull)
 for   PR 2561 at commit 
[`430f5d1`](https://github.com/apache/spark/commit/430f5d15a95ddda314d5750e5b42fdc5e2fac4ba).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 564 matches

Mail list logo