date:20160425

[GitHub] spark pull request: [SPARK-14483][WEBUI] Display user name for eac...

2016-04-25 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/12257#issuecomment-214606536
  
ping @tgravescs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][BUILD] Enable RAT checking on `LZ4Bloc...

2016-04-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12677#issuecomment-214606445
  
Hi, @davies and @srowen .
This PR just removes `LZ4BlockInputStream.java` from `dev/.rat-exclude` and 
passed the RAT test.
Could you merge this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...

2016-04-25 Thread hbhanawat

Github user hbhanawat commented on the pull request:

https://github.com/apache/spark/pull/12641#issuecomment-214606466
  
@vanzin @rxin Thanks for commenting. 

Incorporated review comments apart from the masterURL comment. Regarding 
the masterURL being part of API, I think the scheduler and backend creation may 
depend on the masterURL and hence it will be better if it is part of the API. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-14314][SparkR] Add model persistence to...

2016-04-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12680#discussion_r61025886
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/KMeansWrapper.scala ---
@@ -17,14 +17,21 @@
 
 package org.apache.spark.ml.r
 
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.DefaultFormats
--- End diff --

It looks `import org.json4s.DefaultFormats` is not needed because `import 
org.json4s._` imports that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214605915
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56950/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214605913
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214605820
  
**[Test build #56950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56950/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214605811
  
**[Test build #56955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56955/consoleFull)**
 for PR 12268 at commit 
[`ad21b8e`](https://github.com/apache/spark/commit/ad21b8eea981f61cb35de646f3568b27dd2141a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...

2016-04-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12676#issuecomment-214605818
  
Hi, @jkbradley .
Could you review this PR when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12682


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14911][Core] Fix a potential data race ...

2016-04-25 Thread lw-lin

Github user lw-lin commented on the pull request:

https://github.com/apache/spark/pull/12681#issuecomment-214605511
  
@davies (who made the first change) might want to take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214605459
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12682#issuecomment-214604567
  
Merging - I fixed the test and verified locally.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-04-25 Thread kiszk

Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-214604863
  
No problem, it seemed to be some conflicts in this PR last week. I will 
continue to resolve conflicts as soon as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-214604317
  
**[Test build #56954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56954/consoleFull)**
 for PR 12629 at commit 
[`2fa4a12`](https://github.com/apache/spark/commit/2fa4a128fcb576914d4632ab4a71f135839ab287).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/12682

[SPARK-14904][SQL] Put removed HiveContext in compatibility module

## What changes were proposed in this pull request?
This is for users who can't upgrade and need to continue to use HiveContext.

## How was this patch tested?
Added some basic tests for sanity check.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark add-back-hive-context

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12682


commit 2fb8b737afeb47269d3d369c029d354a96fb52a0
Author: Andrew Or 
Date:   2016-04-25T22:43:59Z

Put old HiveContext in new compatibility module

commit a54c57f7a802a25f661ab4e4c450af2b30a2
Author: Andrew Or 
Date:   2016-04-26T01:22:54Z

Add some tests

commit 4d3f745c82b1ba6f833b0314060b611382f297f8
Author: Andrew Or 
Date:   2016-04-26T01:23:56Z

Merge branch 'master' of github.com:apache/spark into add-back-hive-context

commit bcd01a910c1fe98214c58583315e555155dbd921
Author: Reynold Xin 
Date:   2016-04-26T03:45:28Z

Merge pull request #12672 from andrewor14/add-back-hive-context

[SPARK-14904][SQL] Put removed HiveContext in compatibility module

commit 15bdc7dcd31e49dc49363492160f3c7f27d685a2
Author: Reynold Xin 
Date:   2016-04-26T03:50:48Z

Fix tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214602181
  
**[Test build #2880 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2880/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14911][Core] Fix a potential data race ...

2016-04-25 Thread lw-lin

Github user lw-lin commented on the pull request:

https://github.com/apache/spark/pull/12681#issuecomment-214601397
  
Actually this wouldn't cause any problem and wouldn't fail any test suits 
**_for now_**, because the read of `acquiredButNotUsed` is guaranteed to see 
most recent value due to the existing `synchronized(this) {consumers...}` block 
before the read of `acquiredButNotUsed`. It is kind of ["Piggybacking" on 
synchronization](http://www.javamex.com/tutorials/synchronization_piggyback.shtml)
 -- but let's not rely on this because it's vulnerable to future code changes? 
:-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-04-25 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-214600906
  
@kiszk  Sorry I was going to  review last week but I didn't have enough 
time. I might make time this weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214600667
  
**[Test build #2882 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2882/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14911][Core] Fix a potential data race ...

2016-04-25 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12681#issuecomment-214598957
  
cc @andrewor14 and @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-25 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/12638#issuecomment-214597468
  
To be clear, if there's a completely unused class, I think it's worth the
time to delete it (dead code is confusing for people trying to learn the
code base).
On Apr 25, 2016 7:20 PM, "Liwei Lin"  wrote:

> Closed #12638 .
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly or view it on GitHub
> 
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214596991
  
**[Test build #2878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2878/consoleFull)**
 for PR 12672 at commit 
[`4d3f745`](https://github.com/apache/spark/commit/4d3f745c82b1ba6f833b0314060b611382f297f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14910] [SQL] Native DDL Command Support...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12679#issuecomment-214596879
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14910] [SQL] Native DDL Command Support...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12679#issuecomment-214596881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56951/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14910] [SQL] Native DDL Command Support...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12679#issuecomment-214596748
  
**[Test build #56951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56951/consoleFull)**
 for PR 12679 at commit 
[`d0f203b`](https://github.com/apache/spark/commit/d0f203b7eec4892df323bc8c436a1d3545ebdc3b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14911][Core] Fix a potential data race ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12681#issuecomment-214596791
  
**[Test build #56953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56953/consoleFull)**
 for PR 12681 at commit 
[`6b72b96`](https://github.com/apache/spark/commit/6b72b963d54855771dcabc1fca8ed963be28303c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14911][Core] Fix a potential data race ...

2016-04-25 Thread lw-lin

GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/12681

[SPARK-14911][Core] Fix a potential data race in TaskMemoryManager

## What changes were proposed in this pull request?

[[SPARK-13210][SQL] catch OOM when allocate memory and expand 
array](https://github.com/apache/spark/commit/37bc203c8dd5022cb11d53b697c28a737ee85bcc)
 introduced an `acquiredButNotUsed` field, but it might not be correctly 
synchronized:
- the write `acquiredButNotUsed += acquired` is guarded by `this` lock (see 
[here](https://github.com/apache/spark/blame/master/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L271));
- the read `memoryManager.releaseExecutionMemory(acquiredButNotUsed, 
taskAttemptId, tungstenMemoryMode)` (see 
[here](https://github.com/apache/spark/blame/master/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L400))
 might not be correctly synchronized, and thus might not see 
`acquiredButNotUsed`'s new written value.

This patch makes `acquiredButNotUsed` to fix this.

## How was this patch tested?

This should be covered by existing suits.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark fix-acquiredButNotUsed

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12681


commit 6b72b963d54855771dcabc1fca8ed963be28303c
Author: Liwei Lin 
Date:   2016-04-26T03:11:53Z

fix a potential data race in TaskMemoryManager




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214596283
  
**[Test build #2879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2879/consoleFull)**
 for PR 12672 at commit 
[`4d3f745`](https://github.com/apache/spark/commit/4d3f745c82b1ba6f833b0314060b611382f297f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14409][ML] Adding a RankingEvaluator to...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12461#issuecomment-214595882
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214595976
  
**[Test build #2881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2881/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14409][ML] Adding a RankingEvaluator to...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12461#issuecomment-214595886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56952/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14409][ML] Adding a RankingEvaluator to...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12461#issuecomment-214595732
  
**[Test build #56952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56952/consoleFull)**
 for PR 12461 at commit 
[`a35d961`](https://github.com/apache/spark/commit/a35d9612c03cc48dae2ec2b06c0a0752d1a47919).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-14314][SparkR] Add model persistence to...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12680#issuecomment-214593821
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...

2016-04-25 Thread kiszk

Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/11956#issuecomment-214593726
  
@rxin, would it be possible to review this PR, too? Especially, for 
decompression part that you originally wrote.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-14314][SparkR] Add model persistence to...

2016-04-25 Thread GayathriMurali

GitHub user GayathriMurali opened a pull request:

https://github.com/apache/spark/pull/12680

[Spark-14314][SparkR] Add model persistence to KMeans

## What changes were proposed in this pull request?

Add model persistence to KMeans SparkR


## How was this patch tested?

Unit tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GayathriMurali/spark SPARK-14314

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12680


commit a38f18c6f2b28bd5615072858fd99984066a9f8e
Author: GayathriMurali 
Date:   2016-04-26T03:01:13Z

[Spark-14314][SparkR] Add model persistence to KMeans




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214593441
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56949/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214593440
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214593367
  
**[Test build #56949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56949/consoleFull)**
 for PR 12672 at commit 
[`4d3f745`](https://github.com/apache/spark/commit/4d3f745c82b1ba6f833b0314060b611382f297f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, 
function: String)`
  * `case class UnresolvedGenerator(name: FunctionIdentifier, children: 
Seq[Expression])`
  * `class RuntimeConfigImpl(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14849][CORE]Always set an address for t...

2016-04-25 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12613#discussion_r61022118
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -122,7 +122,7 @@ private[netty] class NettyRpcEnv(
 
   @Nullable
   override lazy val address: RpcAddress = {
-if (server != null) RpcAddress(host, server.getPort()) else null
+if (server != null) RpcAddress(host, server.getPort()) else 
RpcAddress(host, -1)
--- End diff --

while you are at it, we should probably document when server is null and 
explain the choices here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14849][CORE]Always set an address for t...

2016-04-25 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12613#discussion_r61022100
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -122,7 +122,7 @@ private[netty] class NettyRpcEnv(
 
   @Nullable
--- End diff --

this is no longer nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14849][CORE]Always set an address for t...

2016-04-25 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12613#issuecomment-214591914
  
I think there are also some code in SparkEnv that deals with this?
```
if (isDriver) {
  conf.set("spark.driver.port", rpcEnv.address.port.toString)
} else if (rpcEnv.address != null) {
  conf.set("spark.executor.port", rpcEnv.address.port.toString)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14409][ML] Adding a RankingEvaluator to...

2016-04-25 Thread yongtang

Github user yongtang commented on the pull request:

https://github.com/apache/spark/pull/12461#issuecomment-214591768
  
@MLnick @srowen I just updated the pull request to wrap the 
RankingEvaluator into calling RankingMetrics. Was finally able to fix the 
exception issue I previously encountered. Sorry for the delay and let me know 
if there are any issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14409][ML] Adding a RankingEvaluator to...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12461#issuecomment-214591542
  
**[Test build #56952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56952/consoleFull)**
 for PR 12461 at commit 
[`a35d961`](https://github.com/apache/spark/commit/a35d9612c03cc48dae2ec2b06c0a0752d1a47919).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214591014
  
LGTM. Let's fix the test and get it in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12672#discussion_r61021153
  
--- Diff: 
sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala
 ---
@@ -0,0 +1,100 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.hive
+
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.{SparkContext, SparkFunSuite}
+
+
+class HiveContextCompatibilitySuite extends SparkFunSuite with 
BeforeAndAfterEach {
+
+  private var sc: SparkContext = null
+  private var hc: HiveContext = null
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+sc = new SparkContext("local[4]", "test")
+HiveUtils.newTemporaryConfiguration(useInMemoryDerby = true).foreach { 
case (k, v) =>
+  sc.hadoopConfiguration.set(k, v)
+}
+hc = new HiveContext(sc)
+  }
+
+  override def afterEach(): Unit = {
+try {
+  hc.sharedState.cacheManager.clearCache()
+  hc.sessionState.catalog.reset()
+} finally {
+  super.afterEach()
+}
+  }
+
+  override def afterAll(): Unit = {
+try {
+  sc.stop()
+  sc = null
+  hc = null
+} finally {
+  super.afterAll()
+}
+  }
+
+  test("basic operations") {
+val _hc = hc
+import _hc.implicits._
+val df1 = (1 to 20).map { i => (i, i) }.toDF("a", "x")
+val df2 = (1 to 100).map { i => (i, i % 10, i % 2 == 0) }.toDF("a", 
"b", "c")
+  .select($"a", $"b")
+  .filter($"a" > 10 && $"b" > 6 && $"c")
+val df3 = df1.join(df2, "a")
+val res = df3.collect()
+val expected = Seq((18, 18, 8)).toDF("a", "x", "b").collect()
+assert(res.toSeq == expected.toSeq)
+df3.registerTempTable("mai_table")
+val df4 = hc.table("mai_table")
+val res2 = df4.collect()
+assert(res2.toSeq == expected.toSeq)
+  }
+
+  test("basic DDLs") {
+val _hc = hc
+import _hc.implicits._
+val databases = hc.sql("SHOW DATABASES").collect().map(_.getString(0))
+assert(databases.toSeq == Seq("default"))
+hc.sql("CREATE DATABASE mee_db")
+hc.sql("USE mee_db")
+val databases2 = hc.sql("SHOW DATABASES").collect().map(_.getString(0))
+assert(databases2.toSet == Set("default", "mee_db"))
+val df = (1 to 10).map { i => ("bob" + i.toString, i) }.toDF("name", 
"age")
+df.registerTempTable("mee_table")
+hc.sql("CREATE TABLE moo_table (name string, age int)")
+hc.sql("INSERT INTO moo_table SELECT * FROM mee_table")
+assert(hc.sql("SELECT * FROM moo_table").collect().toSeq == 
df.collect().toSeq)
--- End diff --

I think this should sort the results before comparing them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214588696
  
```
[info] HiveContextCompatibilitySuite:
[info] - basic operations (11 seconds, 60 milliseconds)
[info] - basic DDLs *** FAILED *** (1 second, 754 milliseconds)
[info]   Array([bob8,8], [bob9,9], [bob10,10], [bob6,6], [bob7,7], 
[bob1,1], [bob2,2], [bob3,3], [bob4,4], [bob5,5]) did not equal Array([bob1,1], 
[bob2,2], [bob3,3], [bob4,4], [bob5,5], [bob6,6], [bob7,7], [bob8,8], [bob9,9], 
[bob10,10]) (HiveContextCompatibilitySuite.scala:88)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
[info]   at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
[info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
[info]   at 
org.apache.spark.sql.hive.HiveContextCompatibilitySuite$$anonfun$2.apply$mcV$sp(HiveContextCompatibilitySuite.scala:88)
[info]   at 
org.apache.spark.sql.hive.HiveContextCompatibilitySuite$$anonfun$2.apply(HiveContextCompatibilitySuite.scala:75)
[info]   at 
org.apache.spark.sql.hive.HiveContextCompatibilitySuite$$anonfun$2.apply(HiveContextCompatibilitySuite.scala:75)
[info]   at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at 
org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
[info]   at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at 
org.apache.spark.sql.hive.HiveContextCompatibilitySuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HiveContextCompatibilitySuite.scala:25)
[info]   at 
org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
[info]   at 
org.apache.spark.sql.hive.HiveContextCompatibilitySuite.runTest(HiveContextCompatibilitySuite.scala:25)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at 
org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
[info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
[info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

[GitHub] spark pull request: [MINOR][BUILD] Enable RAT checking on `LZ4Bloc...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12677#issuecomment-214588007
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][BUILD] Enable RAT checking on `LZ4Bloc...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12677#issuecomment-214588009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56948/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][BUILD] Enable RAT checking on `LZ4Bloc...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12677#issuecomment-214587875
  
**[Test build #56948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56948/consoleFull)**
 for PR 12677 at commit 
[`5279cd8`](https://github.com/apache/spark/commit/5279cd8f98f887aded3c89fbd79d536d2ebc9238).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-25 Thread lw-lin

Github user lw-lin commented on the pull request:

https://github.com/apache/spark/pull/12638#issuecomment-214586964
  
Sure, so I'm closing this PR since the removal itself is not worthy for 
committers to process.
@marmbrus thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-25 Thread lw-lin

Github user lw-lin closed the pull request at:

https://github.com/apache/spark/pull/12638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14889][Spark Core] scala.MatchError: NO...

2016-04-25 Thread sbcd90

Github user sbcd90 commented on the pull request:

https://github.com/apache/spark/pull/12666#issuecomment-214586830
  
Hi @srowen ,

changed to `IllegalArgumentException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...

2016-04-25 Thread sun-rui

Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/12493#issuecomment-214585898
  
@shivaram, it may be related to the workaround for SPARK-14803, let me 
check it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14910] [SQL] Native DDL Command Support...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12679#issuecomment-214585606
  
**[Test build #56951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56951/consoleFull)**
 for PR 12679 at commit 
[`d0f203b`](https://github.com/apache/spark/commit/d0f203b7eec4892df323bc8c436a1d3545ebdc3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14803][SQL][Optimizer] A bug in Elimina...

2016-04-25 Thread sun-rui

Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/12575#issuecomment-214585552
  
OK. I will study optimizer and try to have a better fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14747][SQL] Add assertStreaming/assertN...

2016-04-25 Thread lw-lin

Github user lw-lin commented on the pull request:

https://github.com/apache/spark/pull/12521#issuecomment-214585357
  
Updates: thanks to [[SPARK-14473][SQL] Define analysis rules to catch 
operations not supported in 
streaming](https://github.com/apache/spark/commit/775cf17eaaae1a38efe47b282b1d6bbdb99bd759),
 now we have friendly messages for most of the incorrect usages:
> Exception in thread "main" org.apache.spark.sql.AnalysisException: 
Queries with streaming sources must be executed with write.startStream();

and

> Exception in thread "main" org.apache.spark.sql.AnalysisException: 
Queries without streaming sources cannot be executed with write.startStream();

-

That leaves this patch capturing other incorrect usages such as calling 
`.trigger()`/`.queryName()` on non-continuous queries, or calling 
`bucketBy()`/`sortBy()` on continuous queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14910] [SQL] Native DDL Command Support...

2016-04-25 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/12679

[SPARK-14910] [SQL] Native DDL Command Support for Describe Function in 
Non-identifier Format

 What changes were proposed in this pull request?
The existing `Describe Function` only support the function name in 
`identifier`. This is different from what Hive behaves. That is why many test 
cases `udf_abc` in `HiveCompatibilitySuite` do not pass. For example, 
- udf_not.q
- udf_bitwise_not.q

This PR is to resolve the issues. Now, we can support the command of 
`Describe Function` whose function names are in the following format:
- `qualifiedName` (e.g., `db.func1`)
- `STRING` (e.g., `'func1'`)
- `comparisonOperator` (e.g,. `<`)
- `arithmeticOperator` (e.g., `+`)
- `predicateOperator` (e.g., `or`)

Note, before this PR, we only have a native command support when the 
function name is in the format of `qualifiedName`.
 How was this patch tested?
Added test cases in `DDLSuite.scala`. Also manually verified all the 
related test cases in `HiveCompatibilitySuite` passed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark descFunction

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12679


commit cc25003e8acb00e50b46e88f0b24b80a44626633
Author: gatorsmile 
Date:   2016-04-25T23:10:29Z

initial fix for desc function

commit d0f203b7eec4892df323bc8c436a1d3545ebdc3b
Author: gatorsmile 
Date:   2016-04-26T01:45:31Z

code clean




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14828][SQL] Start SparkSession in REPL ...

2016-04-25 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12589#issuecomment-214584300
  
it'll be there later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r61018699
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -274,6 +339,12 @@ class KMeans @Since("1.5.0") (
   .setMaxIterations($(maxIter))
   .setSeed($(seed))
   .setEpsilon($(tol))
+
+if (isSet(initialModel)) {
+  require(rdd.first().size == 
$(initialModel).clusterCenters.head.size, "mismatched dimension")
--- End diff --

can you print the size vector in the message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214583679
  
**[Test build #56950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56950/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r61018493
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedGeneralTypeParams.scala
 ---
@@ -0,0 +1,34 @@
+/*
--- End diff --

I think the filename should be `GenericTypeParams`, right? Will be nice to 
integrate into the other part of `Params`. Maybe we can use scala macro?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214583172
  
**[Test build #2881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2881/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214583192
  
**[Test build #2882 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2882/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214583158
  
**[Test build #2880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2880/consoleFull)**
 for PR 12625 at commit 
[`e47fbf0`](https://github.com/apache/spark/commit/e47fbf0de63e78dfdc6b16b1d844dacb2aa09f68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214583134
  
**[Test build #2879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2879/consoleFull)**
 for PR 12672 at commit 
[`4d3f745`](https://github.com/apache/spark/commit/4d3f745c82b1ba6f833b0314060b611382f297f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214583108
  
**[Test build #2878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2878/consoleFull)**
 for PR 12672 at commit 
[`4d3f745`](https://github.com/apache/spark/commit/4d3f745c82b1ba6f833b0314060b611382f297f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12259#issuecomment-214582968
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12259#issuecomment-214582969
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56946/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12259#issuecomment-214582825
  
**[Test build #56946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56946/consoleFull)**
 for PR 12259 at commit 
[`de7ea5d`](https://github.com/apache/spark/commit/de7ea5d4b852287c06f80b7319549461c8fe3a65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r61018069
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -258,6 +290,27 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
+  @Since("2.0.0")
+  def setInitialModel(value: KMeansModel): this.type = set(initialModel, 
value)
--- End diff --

I vote for throwing an error when overriding certain parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r61017723
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -198,6 +231,17 @@ object KMeansModel extends MLReadable[KMeansModel] {
   val model = new KMeansModel(metadata.uid, new 
MLlibKMeansModel(clusterCenters))
 
   DefaultParamsReader.getAndSetParams(model, metadata)
+
--- End diff --

ditto. we may do it in `DefaultParamsReader.getAndSetParams`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r61017539
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -171,12 +192,23 @@ object KMeansModel extends MLReadable[KMeansModel] {
 
   /** [[MLWriter]] instance for [[KMeansModel]] */
   private[KMeansModel] class KMeansModelWriter(instance: KMeansModel) 
extends MLWriter {
+import org.json4s.JsonDSL._
 
 private case class Data(clusterCenters: Array[Vector])
 
 override protected def saveImpl(path: String): Unit = {
-  // Save metadata and Params
-  DefaultParamsWriter.saveMetadata(instance, path, sc)
+  if (instance.isSet(instance.initialModel)) {
+val initialModelPath = new Path(path, "initialModel").toString
+val initialModel = instance.getInitialModel
+initialModel.save(initialModelPath)
+
+// Save metadata and Params
+DefaultParamsWriter.saveMetadata(instance, path, sc, 
Some("hasInitialModel" -> true))
--- End diff --

Well, is it possible to handle this in `DefaultParamsWriter.saveMetadata`? 
The extra logic here seems to be boilerplate. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r61017075
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -137,6 +138,17 @@ class KMeansModel private[ml] (
   @Since("1.6.0")
   override def write: MLWriter = new KMeansModel.KMeansModelWriter(this)
 
+  override def hashCode(): Int = {
+(Array(this.getClass, uid) ++ clusterCenters)
--- End diff --

@yinxusen what's the motivation to override the `hashCode` and `equals` 
here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214580247
  
**[Test build #56949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56949/consoleFull)**
 for PR 12672 at commit 
[`4d3f745`](https://github.com/apache/spark/commit/4d3f745c82b1ba6f833b0314060b611382f297f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14908] [YARN] Provide support HDFS-loca...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12678#issuecomment-214580205
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016865
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -141,6 +144,71 @@ def getEvaluator(self):
 """
 return self.getOrDefault(self.evaluator)
 
+def _transfer_param_map_to_java_impl(self, pyParamMap, java_obj):
+"""
+Transfer a Python ParamMap to a Java ParamMap which belongs to an 
Java estimator of
+ValidatorParams.
+This utility method helps CrossValidator and TrainValidationSplit 
implementing their
+_transfer_param_map_to_java().
+"""
+estimator, epms, evaluator, seed = self._to_java_impl()
--- End diff --

This is getting the Python Params from the class, not from pyParamMap.  It 
should get them from pyParamMap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214580036
  
I added some comments.  I think the tuning.py changes could be simplified 
somewhat, but I'll need to return to this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14908] [YARN] Provide support HDFS-loca...

2016-04-25 Thread mikhaildubkov

GitHub user mikhaildubkov opened a pull request:

https://github.com/apache/spark/pull/12678

[SPARK-14908] [YARN] Provide support HDFS-located resources for sparkâ¦

The main goal behind these changes are provide support to use HDFS 
resources for "spark.executor.extraClassPath", when Hadoop/YARN deployments 
used.
 This can be helpful when you want to use custom SparkSerializer 
implementation (our project case).

How it works with these changes:
1. Value of "spark.executor.extraClassPath" splits by comma
2. Iterate over all paths and filter those which started with "hdfs;//"
3. Generate link for each path and add LocalResource to executor launch 
context local resources
4. Add generated links to executor CLASSPATH
5. NodeManager loads the specified local resources to application cache

After that, you do not need deploy extra resources to each Hadoop node 
manually, it will be automatically.

The changes fully backward compatible and does not break any existing 
"spark.executor.extraClassPath" usages.

This patch was tested manually on our Hadoop cluster (4-nodes).



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mikhaildubkov/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12678


commit a4f1c10a3f0f10b9f18ca61e599f50a1e17ba8bd
Author: Mikhail Dubkov 
Date:   2016-04-26T00:23:42Z

[SPARK-14908] [YARN] Provide support HDFS-located resources for 
spark.executor.extraClasspath on YARN




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016852
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -684,6 +689,52 @@ def _compare_pipelines(self, m1, m2):
 self.assertEqual(len(m1.stages), len(m2.stages))
 for s1, s2 in zip(m1.stages, m2.stages):
 self._compare_pipelines(s1, s2)
+elif isinstance(m1, OneVsRestParams):
+# Check the equality of classifiers (value and parent).
+self._compare_pipelines(m1.getClassifier(), m2.getClassifier())
+self.assertEqual(m1.classifier.parent, m2.classifier.parent)
+
+# Check the equality of other params (value and parent).
+for p in m1.params:
+if p.name != "classifier":
+self.assertEqual(m1.getOrDefault(p), 
m2.getOrDefault(p))
+self.assertEqual(p.parent, m2.getParam(p.name).parent)
+
+# Check extra attributes of OneVsRestModel.
+if isinstance(m1, OneVsRestModel):
+self.assertEqual(len(m1.models), len(m2.models))
+for x, y in zip(m1.models, m2.models):
+self._compare_pipelines(x, y)
+elif isinstance(m1, ValidatorParams):
+# Check the equality of estimators (value and parent).
+self._compare_pipelines(m1.getEstimator(), m2.getEstimator())
+self.assertEqual(m1.estimator.parent, m2.estimator.parent)
+
+# Check the equality of evaluators (value and parent).
+self._compare_pipelines(m1.getEvaluator(), m2.getEvaluator())
+self.assertEqual(m1.evaluator.parent, m2.evaluator.parent)
+
+# Check the equality of estimator parameter maps (value and 
parent).
+self.assertEqual(len(m1.getEstimatorParamMaps()), 
len(m2.getEstimatorParamMaps()))
+for epm1, epm2 in zip(m1.getEstimatorParamMaps(), 
m2.getEstimatorParamMaps()):
+self.assertEqual(len(epm1), len(epm2))
+for pair in epm1:
+self.assertIn(pair, epm2)
--- End diff --

Check value.  If value is an instance of ```Params```, then call 
```_compare_pipelines``` recursively on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016862
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -26,10 +27,10 @@
 from pyspark.ml.util import JavaMLWriter, JavaMLReader, MLReadable, 
MLWritable
 from pyspark.ml.wrapper import JavaParams
 from pyspark.sql.functions import rand
-from pyspark.mllib.common import inherit_doc, _py2java
+from pyspark.mllib.common import inherit_doc, _py2java, _java2py
 
 __all__ = ['ParamGridBuilder', 'CrossValidator', 'CrossValidatorModel', 
'TrainValidationSplit',
-   'TrainValidationSplitModel']
+   'TrainValidationSplitModel', 'ValidatorParams']
--- End diff --

Could you please put these on 4 different lines?
* ParamGridBuilder
* CrossValidator, CrossValidatorModel
* TrainValidationSplit, TrainValidationSplitModel
* ValidatorParams


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016854
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -746,6 +797,70 @@ def test_nested_pipeline_persistence(self):
 except OSError:
 pass
 
+# TODO: Add OneVsRest as part of nested meta-algorithms.
+def test_nested_meta_algorithms_persistence(self):
+sqlContext = SQLContext(self.sc)
+temp_path = tempfile.mkdtemp()
+try:
+df = sqlContext.createDataFrame([
+Row(label=0.0, features=Vectors.dense(1.0, 0.8)),
+Row(label=0.0, features=Vectors.dense(0.8, 0.8)),
+Row(label=0.0, features=Vectors.dense(1.0, 1.2)),
+Row(label=1.0, features=Vectors.sparse(2, [], [])),
+Row(label=1.0, features=Vectors.sparse(2, [0], [0.1])),
+Row(label=1.0, features=Vectors.sparse(2, [1], [0.1]))])
+
+lr = LogisticRegression()
+
+# Check the estimator of 
CrossValidator(TrainValidationSplit(LogisticRegression))
+tvs_grid = ParamGridBuilder().addGrid(lr.maxIter, [5, 
10]).build()
--- End diff --

Use maxIter = 1,2 to make it faster.  Same elsewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016847
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -674,8 +676,11 @@ def _compare_pipelines(self, m1, m2):
 if isinstance(m1, JavaParams):
 self.assertEqual(len(m1.params), len(m2.params))
 for p in m1.params:
-self.assertEqual(m1.getOrDefault(p), m2.getOrDefault(p))
-self.assertEqual(p.parent, m2.getParam(p.name).parent)
+# Prevent key not found error in case of some param 
neither in paramMap and
+# defaultParamMap.
+if p in m1._paramMap or p in m1._defaultParamMap:
--- End diff --

Use ```m1.isDefined```.
Also, how about creating a helper method:
```
def _compare_param(m1, m2, paramName)
```
which implements these 3 lines?  That should eliminate some duplicate code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14870][SQL][FOLLOW-UP] Move decimalData...

2016-04-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12674


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14870][SQL][FOLLOW-UP] Move decimalData...

2016-04-25 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12674#issuecomment-214579476
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14861][SQL] Replace internal usages of ...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12625#issuecomment-214577259
  
**[Test build #2875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2875/consoleFull)**
 for PR 12625 at commit 
[`4c427c0`](https://github.com/apache/spark/commit/4c427c0c284d4350c9495b748a00e90fea6cad5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12676#issuecomment-214576212
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56947/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12676#issuecomment-214576210
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12676#issuecomment-214576138
  
**[Test build #56947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56947/consoleFull)**
 for PR 12676 at commit 
[`b237877`](https://github.com/apache/spark/commit/b2378773474ac81cfd0bc1182e2841659527fbfc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14870][SQL][FOLLOW-UP] Move decimalData...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12674#issuecomment-214575750
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14870][SQL][FOLLOW-UP] Move decimalData...

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12674#issuecomment-214575751
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56942/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14870][SQL][FOLLOW-UP] Move decimalData...

2016-04-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12674#issuecomment-214575594
  
**[Test build #56942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56942/consoleFull)**
 for PR 12674 at commit 
[`f1659c6`](https://github.com/apache/spark/commit/f1659c60fbf5b08820f6683d0417288ffbc6ebc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14902][SQL] Expose RuntimeConfig in Spa...

2016-04-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12669


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14902][SQL] Expose RuntimeConfig in Spa...

2016-04-25 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12669#issuecomment-214575022
  
LGTM - merging in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12560#issuecomment-214574485
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56945/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14654][CORE][WIP] New accumulator API

2016-04-25 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12612#discussion_r61014595
  
--- Diff: core/src/main/scala/org/apache/spark/NewAccumulator.scala ---
@@ -0,0 +1,356 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.{lang => jl}
+import java.io.{ObjectInputStream, ObjectOutputStream}
+import java.util.concurrent.atomic.AtomicLong
+import javax.annotation.concurrent.GuardedBy
+
+import org.apache.spark.scheduler.AccumulableInfo
+import org.apache.spark.util.Utils
+
+
+private[spark] case class AccumulatorMetadata(
+id: Long,
+name: Option[String],
+countFailedValues: Boolean) extends Serializable
+
+trait UpdatedValue extends Serializable
+
+private[spark] class UpdatedValueString(s: String) extends UpdatedValue {
+  override def toString: String = s
+}
+
+private[spark] case class AccumulatorUpdates(
+id: Long,
+name: Option[String],
+countFailedValues: Boolean,
+value: UpdatedValue) extends Serializable
+
+abstract class NewAccumulator[IN, OUT] extends Serializable {
+  private[spark] var metadata: AccumulatorMetadata = _
+  private[this] var atDriverSide = true
+
+  private[spark] def register(
+  sc: SparkContext,
+  name: Option[String] = None,
+  countFailedValues: Boolean = false): Unit = {
+if (this.metadata != null) {
+  throw new IllegalStateException("Cannot register an Accumulator 
twice.")
+}
+this.metadata = AccumulatorMetadata(AccumulatorContext.newId(), name, 
countFailedValues)
+AccumulatorContext.register(this)
+sc.cleaner.foreach(_.registerAccumulatorForCleanup(this))
+  }
+
+  def id: Long = {
+assert(metadata != null, "Cannot get accumulator id with null 
metadata")
+metadata.id
+  }
+
+  def name: Option[String] = {
+assert(metadata != null, "Cannot get accumulator name with null 
metadata")
+metadata.name
+  }
+
+  def countFailedValues: Boolean = {
+assert(metadata != null, "Cannot get accumulator countFailedValues 
with null metadata")
+metadata.countFailedValues
+  }
+
+  private[spark] def toInfo(update: Option[Any], value: Option[Any]): 
AccumulableInfo = {
+val isInternal = 
name.exists(_.startsWith(InternalAccumulator.METRICS_PREFIX))
+new AccumulableInfo(id, name, update, value, isInternal, 
countFailedValues)
+  }
+
+  final private[spark] def isAtDriverSide: Boolean = atDriverSide
+
+  final def isRegistered: Boolean =
+metadata != null && 
AccumulatorContext.originals.containsKey(metadata.id)
+
+  def initialize(): Unit = {}
+
+  def add(v: IN): Unit
+
+  def +=(v: IN): Unit = add(v)
+
+  def updatedValue: UpdatedValue
+
+  def isNoOp(updates: UpdatedValue): Boolean
+
+  def applyUpdates(updates: UpdatedValue): Unit
+
+  final def value: OUT = {
+if (atDriverSide) {
+  localValue
+} else {
+  throw new UnsupportedOperationException("Can't read accumulator 
value in task")
+}
+  }
+
+  def localValue: OUT
+
+  private[spark] def getUpdates: AccumulatorUpdates =
+AccumulatorUpdates(id, name, countFailedValues, updatedValue)
+
+  // Called by Java when serializing an object
+  private def writeObject(out: ObjectOutputStream): Unit = 
Utils.tryOrIOException {
+if (atDriverSide && !isRegistered) {
+  throw new IllegalStateException(
+"Accumulator must be registered before serialize and send to 
executor")
+}
+out.defaultWriteObject()
+  }
+
+  // Called by Java when deserializing an object
+  private def readObject(in: ObjectInputStream): Unit = 
Utils.tryOrIOException {
+in.defaultReadObject()
+initialize()
+atDriverSide = false
+
+// Automatically register the

[GitHub] spark pull request: [SPARK-14904][SQL] Put removed HiveContext in ...

2016-04-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12672#issuecomment-214574532
  
Some of my own libraries were able to compile against this, so +1 from me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12560#issuecomment-214574484
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 >

101 - 200 of 786 matches

Mail list logo