date:20140910

[GitHub] spark pull request: (WIP) [SPARK-3454] Expose JSON representation ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2333#issuecomment-55226854
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20141/consoleFull)
 for   PR 2333 at commit 
[`f7958b0`](https://github.com/apache/spark/commit/f7958b04508e02d0f58895a3a8cc3e0b6fef33ab).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: (WIP) [SPARK-3454] Expose JSON representation ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2333#issuecomment-55226767
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20141/consoleFull)
 for   PR 2333 at commit 
[`f7958b0`](https://github.com/apache/spark/commit/f7958b04508e02d0f58895a3a8cc3e0b6fef33ab).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-10 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-55225283
  
@davies Oh, actually I didn't even realize this issue also exist in 
PySpark.. So basically I only need to rewrite `StructField.__repr__` and quote 
the field name, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-10 Thread baishuo

Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-55225133
  
steps to verify this PR by SparkSQLCliDriver:
firstï¼create two table:
run the following sql:



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55224983
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/66/consoleFull)
 for   PR 2351 at commit 
[`0a5b6eb`](https://github.com/apache/spark/commit/0a5b6ebcd38f13fa15721c56a9d96bd9000529f5).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StatsParam(AccumulatorParam):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-55224360
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/60/consoleFull)
 for   PR 2291 at commit 
[`f3d8c98`](https://github.com/apache/spark/commit/f3d8c98c3360220a5308eedf915c5772ff91a9fb).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2144#issuecomment-55224000
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/64/consoleFull)
 for   PR 2144 at commit 
[`fae8b19`](https://github.com/apache/spark/commit/fae8b19dd5e69b7e19ca8436ecba1614a3bc76b6).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Dummy(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread chouqin

Github user chouqin commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-55223947
  
Can we change the fields from `val` to `var`? `leftNode` and `rightNode` 
are `var`s, I wonder if we can change other fields too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55223673
  
@allwefantasy  Sparkæ¯å¯ä»¥è°æ´executoråæ¶è¿è¡çtaskæ°éç. 
   å¦æä½ æ³è®©æ¯ä¸ªexecutoråæ¶å¯ä»¥è¿è¡17ä¸ªtask. 
å¯ä»¥å¨`conf/spark-defaults.conf` æä»¶æ·»å å¦ä¸éç½®

 spark.executor.cores 17


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2890][SQL] Allow reading of data when c...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2209#issuecomment-55223584
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/59/consoleFull)
 for   PR 2209 at commit 
[`a703ff4`](https://github.com/apache/spark/commit/a703ff41743c566fee25c7121dbd077c6a52b021).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LowerCaseSchema(child: LogicalPlan) extends UnaryNode with 
Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [mllib] DecisionTree: Add minInstancesPerNode,...

2014-09-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2349#discussion_r17405363
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -706,8 +706,8 @@ class DecisionTreeSuite extends FunSuite with 
LocalSparkContext {
 assert(bestInfoStats == 
InformationGainStats.invalidInformationGainStats)
   }
 
-  test("don't choose split that doesn't satisfy min instance per node 
requirements") {
-// if a split doesn't satisfy min instances per node requirements,
+  test("do not choose split that does not satisfy min instance per node 
requirements") {
+// if a split does not satisfy min instances per node requirements,
--- End diff --

It sounds reasonable, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55222920
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20138/consoleFull)
 for   PR 2330 at commit 
[`6e84cb2`](https://github.com/apache/spark/commit/6e84cb22460660880c557b6a0d11945c0bbeee0e).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55222753
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20139/consoleFull)
 for   PR 2336 at commit 
[`7e4ad04`](https://github.com/apache/spark/commit/7e4ad040c0a7e38bb86128659e85d773137dfbab).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-55222662
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20140/consoleFull)
 for   PR 2291 at commit 
[`bb452c8`](https://github.com/apache/spark/commit/bb452c8ad113c65fe4591dd4a25ab3678657f833).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55222663
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20139/consoleFull)
 for   PR 2336 at commit 
[`7e4ad04`](https://github.com/apache/spark/commit/7e4ad040c0a7e38bb86128659e85d773137dfbab).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-5506
  
@chouqin  Thanks for looking at the PR!  I wanted to allocate a root node 
beforehand, but the problem is that the member data in Node is not all mutable. 
 Let me know, though, if you see a way around it.

Caching the impurity sounds good; I'll try to incorporate that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-10 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-55222163
  
Ha, first pass in recent days!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [mllib] DecisionTree: Add minInstancesPerNode,...

2014-09-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/2349#issuecomment-55221916
  
@davies  Thanks for taking a look!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [mllib] DecisionTree: Add minInstancesPerNode,...

2014-09-10 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/2349#discussion_r17404967
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -706,8 +706,8 @@ class DecisionTreeSuite extends FunSuite with 
LocalSparkContext {
 assert(bestInfoStats == 
InformationGainStats.invalidInformationGainStats)
   }
 
-  test("don't choose split that doesn't satisfy min instance per node 
requirements") {
-// if a split doesn't satisfy min instances per node requirements,
+  test("do not choose split that does not satisfy min instance per node 
requirements") {
+// if a split does not satisfy min instances per node requirements,
--- End diff --

Not really a typo.  But I figured that, if people are munging logs from 
tests, quote characters might be troublesome to deal with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3407][SQL][wip]Add Date type support

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-55221787
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20136/consoleFull)
 for   PR 2344 at commit 
[`7269bba`](https://github.com/apache/spark/commit/7269bba52e12e646e440b89fbbff82f12995c6d5).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2292#issuecomment-55221458
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/61/consoleFull)
 for   PR 2292 at commit 
[`9468ab0`](https://github.com/apache/spark/commit/9468ab0cc210f444fbc18ebd34dc99ba19636499).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread allwefantasy

Github user allwefantasy commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55221324
  
@witgo   è¥¿é¢è¿ä¸æ®µä»£ç å¯ä»¥å¤çº¿ç¨åä¹ï¼


for (i <- 0 until content.length) {
val term = content(i)
val topic = topics(i)
val chosenTopic = topicModel.dropOneDistSampler(topicsDist, 
topicThisTerm,
  rand, term, topic)
if (topic != chosenTopic) {
  topics(i) = chosenTopic
  topicsDist(topic) += -1
  topicsDist(chosenTopic) += 1
  topicModel.update(term, topic, -1)
  topicModel.update(term, chosenTopic, 1)
}
  }


å°æ¤ä»£ç æ¹æ

content.zipWithIndex.map(f=>f._2).toList.par.foreach{i=>
val term = content(i)
val topic = topics(i)
val chosenTopic = topicModel.dropOneDistSampler(topicsDist, 
topicThisTerm,
  rand, term, topic)
if (topic != chosenTopic) {
  topics(i) = chosenTopic
  topicsDist(topic) += -1
  topicsDist(chosenTopic) += 1
  topicModel.update(term, topic, -1)
  topicModel.update(term, chosenTopic, 1)
}
  }

æç®åçæåµæ¯åæºCPUæ ¸å¤ï¼24æ ¸ãä½å
åæéï¼æä»¥å¸æå¤çº¿ç¨è¯ä¸é¨åä»£ç ã


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-10 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-55220938
  
@liancheng Do you plan to fix this in Python?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55220913
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/66/consoleFull)
 for   PR 2351 at commit 
[`0a5b6eb`](https://github.com/apache/spark/commit/0a5b6ebcd38f13fa15721c56a9d96bd9000529f5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55220824
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/65/consoleFull)
 for   PR 2336 at commit 
[`fbe9029`](https://github.com/apache/spark/commit/fbe902942449d8b112a4f6ca1e0be6968218213c).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55220766
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/65/consoleFull)
 for   PR 2336 at commit 
[`fbe9029`](https://github.com/apache/spark/commit/fbe902942449d8b112a4f6ca1e0be6968218213c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2144#issuecomment-55220156
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/64/consoleFull)
 for   PR 2144 at commit 
[`fae8b19`](https://github.com/apache/spark/commit/fae8b19dd5e69b7e19ca8436ecba1614a3bc76b6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-10 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2144#issuecomment-55220021
  
Thanks to @shaneknapp we now have `pypy-2.0.2-1.el6.x86_64` on the Jenkins 
workers, so I'm going to try retesting this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55219321
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20137/consoleFull)
 for   PR 2330 at commit 
[`d135fa3`](https://github.com/apache/spark/commit/d135fa38801467b0dd870063c00103ddd45438c7).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55219255
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20138/consoleFull)
 for   PR 2330 at commit 
[`6e84cb2`](https://github.com/apache/spark/commit/6e84cb22460660880c557b6a0d11945c0bbeee0e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55219182
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/62/consoleFull)
 for   PR 2336 at commit 
[`fbe9029`](https://github.com/apache/spark/commit/fbe902942449d8b112a4f6ca1e0be6968218213c).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55219205
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/63/consoleFull)
 for   PR 2336 at commit 
[`fbe9029`](https://github.com/apache/spark/commit/fbe902942449d8b112a4f6ca1e0be6968218213c).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

2014-09-10 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2327#issuecomment-55219180
  
@marmbrus Please help review this one. `HiveTableScan` is also optimized 
BTW.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55219154
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/63/consoleFull)
 for   PR 2336 at commit 
[`fbe9029`](https://github.com/apache/spark/commit/fbe902942449d8b112a4f6ca1e0be6968218213c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3463] [PySpark] aggregate and show spil...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2336#issuecomment-55219139
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/62/consoleFull)
 for   PR 2336 at commit 
[`fbe9029`](https://github.com/apache/spark/commit/fbe902942449d8b112a4f6ca1e0be6968218213c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55219072
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20135/consoleFull)
 for   PR 2351 at commit 
[`0a5b6eb`](https://github.com/apache/spark/commit/0a5b6ebcd38f13fa15721c56a9d96bd9000529f5).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StatsParam(AccumulatorParam):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [mllib] DecisionTree: Add minInstancesPerNode,...

2014-09-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2349#discussion_r17404100
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -706,8 +706,8 @@ class DecisionTreeSuite extends FunSuite with 
LocalSparkContext {
 assert(bestInfoStats == 
InformationGainStats.invalidInformationGainStats)
   }
 
-  test("don't choose split that doesn't satisfy min instance per node 
requirements") {
-// if a split doesn't satisfy min instances per node requirements,
+  test("do not choose split that does not satisfy min instance per node 
requirements") {
+// if a split does not satisfy min instances per node requirements,
--- End diff --

Why "don't" is typo?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [mllib] DecisionTree: Add minInstancesPerNode,...

2014-09-10 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2349#issuecomment-55219010
  
This patch looks good to me, just one minor question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2292#issuecomment-55218285
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/61/consoleFull)
 for   PR 2292 at commit 
[`9468ab0`](https://github.com/apache/spark/commit/9468ab0cc210f444fbc18ebd34dc99ba19636499).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-55218265
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/60/consoleFull)
 for   PR 2291 at commit 
[`f3d8c98`](https://github.com/apache/spark/commit/f3d8c98c3360220a5308eedf915c5772ff91a9fb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2890][SQL] Allow reading of data when c...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2209#issuecomment-55218135
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/59/consoleFull)
 for   PR 2209 at commit 
[`a703ff4`](https://github.com/apache/spark/commit/a703ff41743c566fee25c7121dbd077c6a52b021).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2781][SQL] Check resolution of LogicalP...

2014-09-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1706


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2314][SQL] Override collect and take in...

2014-09-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1592#issuecomment-55216993
  
I just merged #2323 to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3447][SQL] Remove explicit conversion w...

2014-09-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2323


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Add test case with workaround for readin...

2014-09-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2340


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-10 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2350#issuecomment-55216927
  
No problem. This is actually only the first set of changes I intend to push 
out. There is still more clean up to be done in other parts of the Yarn code, 
but I decided to leave it out for this PR. I can submit smaller patches for the 
rest of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread chouqin

Github user chouqin commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-55216875
  
@jkbradley Thanks for your nice work, I have read your code and just have 
one question:

Can we allocate a root node before the loop in `train()`, and allocate left 
and child node for the next level after choosing split, then 
`DecisionTree.findBestSplits` can just return `doneTraining`. During the choose 
splits step which iterate all the nodes, it just set fields of current node, 
and don't allocate a new node. This seems to be more easier to understand for 
me, because it handle all levels in a same way.

What's more, I think after choosing a best split and allocate left and 
right child nodes, we can set impurity of left and right child, which can avoid 
recomputation of impurity in `calculateGainForSplit`, this saving may be 
useless when (number of splits * number of features * number of classes) is 
small though.

This is just a suggestion, ignore this if you don't think it helps :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-55216852
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20130/consoleFull)
 for   PR 2341 at commit 
[`5c4ac33`](https://github.com/apache/spark/commit/5c4ac3303fcf94bb5cbbc272013a88ff8c4e7749).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2781][SQL] Check resolution of LogicalP...

2014-09-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1706#issuecomment-55216814
  
Thanks for doing this!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3447][SQL] Remove explicit conversion w...

2014-09-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2323#issuecomment-55216743
  
Okay, I reverted the JSON rdd changed and merged this to master.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55216611
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20133/consoleFull)
 for   PR 1983 at commit 
[`41b03c2`](https://github.com/apache/spark/commit/41b03c28d60c290ab6c5a8334bdfb6e7b2271a25).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Document(docId: Int, content: Array[Int]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3469] Call all TaskCompletionListeners ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2343#issuecomment-55216561
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20129/consoleFull)
 for   PR 2343 at commit 
[`29b6162`](https://github.com/apache/spark/commit/29b61621e9b8d1e58ecca004c66304d2e944e726).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2917] [SQL] Avoid table creation in log...

2014-09-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1846#discussion_r17403185
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -47,4 +47,16 @@ class SQLQuerySuite extends QueryTest {
   GROUP BY key, value
   ORDER BY value) a""").collect().toSeq)
   }
+
+  test("test CTAS") {
+sql("CREATE TABLE test_ctas_123 AS SELECT key, value FROM src").schema
--- End diff --

Yeah, as long as you don't remove `InsertIntoCreatedTable` from 
SchemaRDDLike.scala I'm okay with pushing the other changes to another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2917] [SQL] Avoid table creation in log...

2014-09-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1846#discussion_r17403171
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.expressions.Row
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.LowerCaseSchema
+import org.apache.spark.sql.execution.{SparkPlan, Command, LeafNode}
+import org.apache.spark.sql.hive.HiveContext
+import org.apache.spark.sql.hive.MetastoreRelation
+
+/**
+ * :: Experimental ::
+ * Create table and insert the query result into it.
+ * @param database the database name of the new relation
+ * @param tableName the table name of the new relation
+ * @param insertIntoRelation function of creating the 
`InsertIntoHiveTable` 
+ *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
+ * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ */
+@Experimental
+case class CreateTableAsSelect(
+  database: String, 
+  tableName: String, 
+  insertIntoRelation: MetastoreRelation => InsertIntoHiveTable)
+extends LeafNode with Command {
+
+  def output = Seq.empty
+
+  // This is hacking way to get the output attributes and the spark plan 
of the query.
+  @transient private lazy val fakeInserIntoRelation = 
insertIntoRelation(null)
--- End diff --

Instead of this, can we just make `schema: Seq[Attribute]` an argument of 
this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1330#issuecomment-55216430
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20134/consoleFull)
 for   PR 1330 at commit 
[`179ba61`](https://github.com/apache/spark/commit/179ba61d0b0cf2df511baedcbe71e34ab0a5c68a).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55216393
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20132/consoleFull)
 for   PR 2339 at commit 
[`9a9e035`](https://github.com/apache/spark/commit/9a9e035a96a7b23d44f03f77b60fb071a05e2eb0).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55216348
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20135/consoleFull)
 for   PR 2351 at commit 
[`0a5b6eb`](https://github.com/apache/spark/commit/0a5b6ebcd38f13fa15721c56a9d96bd9000529f5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3407][SQL][wip]Add Date type support

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-55216369
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20136/consoleFull)
 for   PR 2344 at commit 
[`7269bba`](https://github.com/apache/spark/commit/7269bba52e12e646e440b89fbbff82f12995c6d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55216364
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20137/consoleFull)
 for   PR 2330 at commit 
[`d135fa3`](https://github.com/apache/spark/commit/d135fa38801467b0dd870063c00103ddd45438c7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55216326
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/54/consoleFull)
 for   PR 2330 at commit 
[`b32c3fe`](https://github.com/apache/spark/commit/b32c3fe568163614a6eb424e523f7dd545d8ce9e).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [YARN] SPARK-2668: Add variable of yarn log di...

2014-09-10 Thread renozhang

Github user renozhang commented on the pull request:

https://github.com/apache/spark/pull/1573#issuecomment-55215558
  
@tgravescs I've update to the latest, thanks for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-10 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2350#issuecomment-55215052
  
@andrewor14  thanks for working on this.  Cleanup and commonizing are good, 
but we should try to minimize these huge changes. They are really hard to 
review adequately and take a lot of time.  Going forward Let's try to break 
changes like this up into smaller ones when possible.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55214750
  
@nchammas LGTM, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3469] Call all TaskCompletionListeners ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2343#issuecomment-55214633
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/53/consoleFull)
 for   PR 2343 at commit 
[`1cb444d`](https://github.com/apache/spark/commit/1cb444d4afc5355625d26fee7bd40ecb5532b8ca).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added support for accessing secured HDFS

2014-09-10 Thread huozhanfeng

Github user huozhanfeng commented on the pull request:

https://github.com/apache/spark/pull/2320#issuecomment-55214476
  
I just know it that standalone supports cluster mode. I guess keytab should 
not be transfered on the net no matter in what mode. Token should be generated 
on spark client in advance in cluster mode, then token and driver jars be 
transfered to the Worker which will start the driver. 

How spark transfer driver jars to Workers security in standalone mode? 
Reusing the existing security mechanisms to share tokens is a feasible way.

Adding TLS to Spark's communication layer is a larger workload. Now I want 
to find a simple way to  support for accessing secured HDFS in standalone mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-55214353
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/52/consoleFull)
 for   PR 2341 at commit 
[`306120f`](https://github.com/apache/spark/commit/306120fc93021f3d2d86333c77296fe3d36b76e1).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...

2014-09-10 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2348#issuecomment-55213681
  
@hayesgm You can see what tests failed in the link at the start of Spark 
QA's most recent message. You can also run these tests locally by running 
`./dev/run-tests`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55213524
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/54/consoleFull)
 for   PR 2330 at commit 
[`b32c3fe`](https://github.com/apache/spark/commit/b32c3fe568163614a6eb424e523f7dd545d8ce9e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55213482
  
@davies This PR is ready for another review. I believe I've covered all the 
feedback given so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2348#issuecomment-55213303
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/50/consoleFull)
 for   PR 2348 at commit 
[`3cb3b21`](https://github.com/apache/spark/commit/3cb3b219f0f1cce0bb44992ea140cdf48ff893b1).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RedisInputDStream(`
  * `class RedisReceiver(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55213331
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/49/consoleFull)
 for   PR 2339 at commit 
[`9a9e035`](https://github.com/apache/spark/commit/9a9e035a96a7b23d44f03f77b60fb071a05e2eb0).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55213274
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20131/consoleFull)
 for   PR 2330 at commit 
[`b32c3fe`](https://github.com/apache/spark/commit/b32c3fe568163614a6eb424e523f7dd545d8ce9e).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-55213199
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20130/consoleFull)
 for   PR 2341 at commit 
[`5c4ac33`](https://github.com/apache/spark/commit/5c4ac3303fcf94bb5cbbc272013a88ff8c4e7749).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55213225
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20132/consoleFull)
 for   PR 2339 at commit 
[`9a9e035`](https://github.com/apache/spark/commit/9a9e035a96a7b23d44f03f77b60fb071a05e2eb0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55213212
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20133/consoleFull)
 for   PR 1983 at commit 
[`41b03c2`](https://github.com/apache/spark/commit/41b03c28d60c290ab6c5a8334bdfb6e7b2271a25).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2327#issuecomment-55213228
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20125/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2330#issuecomment-55213214
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20131/consoleFull)
 for   PR 2330 at commit 
[`b32c3fe`](https://github.com/apache/spark/commit/b32c3fe568163614a6eb424e523f7dd545d8ce9e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3469] Call all TaskCompletionListeners ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2343#issuecomment-55213194
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20129/consoleFull)
 for   PR 2343 at commit 
[`29b6162`](https://github.com/apache/spark/commit/29b61621e9b8d1e58ecca004c66304d2e944e726).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1330#issuecomment-55213209
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20134/consoleFull)
 for   PR 1330 at commit 
[`179ba61`](https://github.com/apache/spark/commit/179ba61d0b0cf2df511baedcbe71e34ab0a5c68a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-55212859
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20127/consoleFull)
 for   PR 2214 at commit 
[`23d96f1`](https://github.com/apache/spark/commit/23d96f198e2bbdcc6a50fb853f8a4f1943740b50).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3389] Add Converter for ease of Parquet...

2014-09-10 Thread laserson

Github user laserson commented on the pull request:

https://github.com/apache/spark/pull/2256#issuecomment-55212750
  
@JoshRosen @mateiz poke.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55212763
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20128/consoleFull)
 for   PR 2351 at commit 
[`4b20494`](https://github.com/apache/spark/commit/4b20494ce4e5e287a09fee5df5e0684711258627).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StatsParam(AccumulatorParam):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1972: Added support for tracking custom ...

2014-09-10 Thread otisg

Github user otisg commented on the pull request:

https://github.com/apache/spark/pull/918#issuecomment-55212495
  
Can custom metrics be sent anywhere else other than having them in the UI?  
I'd like to send them to SPM as Custom Metrics 
(https://sematext.atlassian.net/wiki/display/PUBSPM/Custom+Metrics). Any 
suggestions @kalpit ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2327#issuecomment-55212049
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/45/consoleFull)
 for   PR 2327 at commit 
[`e5d2cf2`](https://github.com/apache/spark/commit/e5d2cf229ac1d4cc7ff943ae8320c623bb4a2f9b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder extends compression.Encoder[IntegerType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[IntegerType.type])`
  * `  class Encoder extends compression.Encoder[LongType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-55211597
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/46/consoleFull)
 for   PR 2214 at commit 
[`a1ad308`](https://github.com/apache/spark/commit/a1ad308426d385f2b4b764e3750fc13512bec408).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-10 Thread witgo

Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1330#discussion_r17401380
  
--- Diff: pom.xml ---
@@ -839,7 +839,6 @@
   -unchecked
   -deprecation
   -feature
-  -language:postfixOps
--- End diff --

@jkbradley   I removed this parameter. The related discussion in #1069


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3469] Call all TaskCompletionListeners ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2343#issuecomment-55211065
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/53/consoleFull)
 for   PR 2343 at commit 
[`1cb444d`](https://github.com/apache/spark/commit/1cb444d4afc5355625d26fee7bd40ecb5532b8ca).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55211000
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/48/consoleFull)
 for   PR 2351 at commit 
[`4b20494`](https://github.com/apache/spark/commit/4b20494ce4e5e287a09fee5df5e0684711258627).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StatsParam(AccumulatorParam):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-10 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1330#issuecomment-55210927
  
The code has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2341#issuecomment-55210724
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/52/consoleFull)
 for   PR 2341 at commit 
[`306120f`](https://github.com/apache/spark/commit/306120fc93021f3d2d86333c77296fe3d36b76e1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-10 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2350#discussion_r17401007
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -35,28 +34,57 @@ class ClientArguments(val args: Array[String], val 
sparkConf: SparkConf) {
   var executorMemory = 1024 // MB
   var executorCores = 1
   var numExecutors = 2
-  var amQueue = sparkConf.get("QUEUE", "default")
+  var amQueue = sparkConf.get("spark.yarn.queue", "default")
   var amMemory: Int = 512 // MB
   var appName: String = "Spark"
   var priority = 0
 
-  parseArgs(args.toList)
+  // Additional memory to allocate to containers
+  // For now, use driver's memory overhead as our AM container's memory 
overhead
+  val amMemoryOverhead = sparkConf.getInt(
+"spark.yarn.driver.memoryOverhead", 
YarnSparkHadoopUtil.DEFAULT_MEMORY_OVERHEAD)
+  val executorMemoryOverhead = sparkConf.getInt(
+"spark.yarn.executor.memoryOverhead", 
YarnSparkHadoopUtil.DEFAULT_MEMORY_OVERHEAD)
 
-  // env variable SPARK_YARN_DIST_ARCHIVES/SPARK_YARN_DIST_FILES set in 
yarn-client then
-  // it should default to hdfs://
-  files = 
Option(files).getOrElse(sys.env.get("SPARK_YARN_DIST_FILES").orNull)
-  archives = 
Option(archives).getOrElse(sys.env.get("SPARK_YARN_DIST_ARCHIVES").orNull)
+  parseArgs(args.toList)
+  loadDefaultArgs()
+  validateArgs()
+
+  /** Load any default arguments provided through environment variables 
and Spark properties. */
+  private def loadDefaultArgs(): Unit = {
+// For backward compatibility, SPARK_YARN_DIST_{ARCHIVES/FILES} should 
be resolved to hdfs://,
+// while spark.yarn.dist.{archives/files} should be resolved to 
file:// (SPARK-2051).
+files = 
Option(files).orElse(sys.env.get("SPARK_YARN_DIST_FILES")).orNull
+files = Option(files)
+  .orElse(sparkConf.getOption("spark.yarn.dist.files"))
+  .map(p => Utils.resolveURIs(p))
+  .orNull
+archives = 
Option(archives).orElse(sys.env.get("SPARK_YARN_DIST_ARCHIVES")).orNull
+archives = Option(archives)
+  .orElse(sparkConf.getOption("spark.yarn.dist.archives"))
+  .map(p => Utils.resolveURIs(p))
+  .orNull
+  }
 
-  // spark.yarn.dist.archives/spark.yarn.dist.files defaults to use 
file:// if not specified,
-  // for both yarn-client and yarn-cluster
-  files = 
Option(files).getOrElse(sparkConf.getOption("spark.yarn.dist.files").
-map(p => Utils.resolveURIs(p)).orNull)
-  archives = 
Option(archives).getOrElse(sparkConf.getOption("spark.yarn.dist.archives").
-map(p => Utils.resolveURIs(p)).orNull)
+  /**
+   * Fail fast if any arguments provided are invalid.
+   * This is intended to be called only after the provided arguments have 
been parsed.
+   */
+  private def validateArgs(): Unit = {
+Map[Boolean, String](
+  (numExecutors <= 0) -> "You must specify at least 1 executor!",
+  (amMemory <= amMemoryOverhead) -> s"AM memory must be > 
$amMemoryOverhead MB",
+  (executorMemory <= executorMemoryOverhead) ->
+s"Executor memory must be > $executorMemoryOverhead MB"
--- End diff --

the memory ones are.  see https://issues.apache.org/jira/browse/SPARK-3476


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3469] Call all TaskCompletionListeners ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2343#issuecomment-55210655
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/51/consoleFull)
 for   PR 2343 at commit 
[`1cb444d`](https://github.com/apache/spark/commit/1cb444d4afc5355625d26fee7bd40ecb5532b8ca).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-10 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2350#discussion_r17400958
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -45,120 +46,97 @@ class Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: Spa
 
   def this(clientArgs: ClientArguments) = this(clientArgs, new SparkConf())
 
-  val args = clientArgs
-  val conf = hadoopConf
-  val sparkConf = spConf
-  var rpc: YarnRPC = YarnRPC.create(conf)
-  val yarnConf: YarnConfiguration = new YarnConfiguration(conf)
+  val yarnConf: YarnConfiguration = new YarnConfiguration(hadoopConf)
 
+  /* 
-
 *
+   | The following methods have much in common in the stable and alpha 
versions of Client, |
+   | but cannot be implemented in the parent trait due to subtle API 
differences across|
+   | hadoop versions.  
|
+   * 
-
 */
 
-  // for client user who want to monitor app status by itself.
-  def runApp() = {
-validateArgs()
-
+  /** Submit an application running our ApplicationMaster to the 
ResourceManager. */
+  override def submitApplication(): ApplicationId = {
 init(yarnConf)
 start()
-logClusterResourceDetails()
 
-val newApp = super.getNewApplication()
-val appId = newApp.getApplicationId()
+logInfo("Requesting a new application from cluster with %d 
NodeManagers"
+  .format(getYarnClusterMetrics.getNumNodeManagers))
 
-verifyClusterResources(newApp)
-val appContext = createApplicationSubmissionContext(appId)
-val appStagingDir = getAppStagingDir(appId)
-val localResources = prepareLocalResources(appStagingDir)
-val env = setupLaunchEnv(localResources, appStagingDir)
-val amContainer = createContainerLaunchContext(newApp, localResources, 
env)
+// Get a new application from our RM
+val newAppResponse = getNewApplication()
+val appId = newAppResponse.getApplicationId()
 
-val capability = 
Records.newRecord(classOf[Resource]).asInstanceOf[Resource]
-// Memory for the ApplicationMaster.
-capability.setMemory(args.amMemory + memoryOverhead)
-amContainer.setResource(capability)
+// Verify whether the cluster has enough resources for our AM
+verifyClusterResources(newAppResponse)
 
-appContext.setQueue(args.amQueue)
-appContext.setAMContainerSpec(amContainer)
-
appContext.setUser(UserGroupInformation.getCurrentUser().getShortUserName())
+// Set up the appropriate contexts to launch our AM
+val containerContext = createContainerLaunchContext(newAppResponse)
+val appContext = createApplicationSubmissionContext(appId, 
containerContext)
 
-submitApp(appContext)
+// Finally, submit and monitor the application
+logInfo(s"Submitting application ${appId.getId} to ResourceManager")
+submitApplication(appContext)
 appId
   }
 
-  def run() {
-val appId = runApp()
-monitorApplication(appId)
+  /**
+   * Set up a context for launching our ApplicationMaster container.
+   * In the Yarn alpha API, the memory requirements of this container must 
be set in
+   * the ContainerLaunchContext instead of the 
ApplicationSubmissionContext.
+   */
+  override def createContainerLaunchContext(newAppResponse: 
GetNewApplicationResponse)
+  : ContainerLaunchContext = {
+val containerContext = 
super.createContainerLaunchContext(newAppResponse)
+val capability = Records.newRecord(classOf[Resource])
+capability.setMemory(getAMMemory(newAppResponse) + amMemoryOverhead)
+containerContext.setResource(capability)
+containerContext
   }
 
-  def logClusterResourceDetails() {
-val clusterMetrics: YarnClusterMetrics = super.getYarnClusterMetrics
-logInfo("Got cluster metric info from ASM, numNodeManagers = " +
-  clusterMetrics.getNumNodeManagers)
-  }
-
-
-  def createApplicationSubmissionContext(appId: ApplicationId): 
ApplicationSubmissionContext = {
-logInfo("Setting up application submission context for ASM")
+  /** Set up the context for submitting our ApplicationMaster. */
+  def createApplicationSubmissionContext(
+  appId: ApplicationId,
+  containerContext: ContainerLaunchContext): 
ApplicationSubmissionContext = {
 val appContext = 
Records.newRecord(classOf[ApplicationSubmissionContext])
 appContext.setApplicationId(appId)
 appContext.setApplicationNa

[GitHub] spark pull request: [SPARK-3469] Call all TaskCompletionListeners ...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2343#issuecomment-55210443
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/51/consoleFull)
 for   PR 2343 at commit 
[`1cb444d`](https://github.com/apache/spark/commit/1cb444d4afc5355625d26fee7bd40ecb5532b8ca).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [mllib] DecisionTree: Add minInstancesPerNode,...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2349#issuecomment-55210270
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20126/consoleFull)
 for   PR 2349 at commit 
[`95c479d`](https://github.com/apache/spark/commit/95c479d5a60b166d9c75b9a81cee82e808f23aa0).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-10 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2350#discussion_r17400747
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -36,113 +36,114 @@ private[spark] class YarnClientSchedulerBackend(
 
   var client: Client = null
   var appId: ApplicationId = null
-  var checkerThread: Thread = null
   var stopping: Boolean = false
   var totalExpectedExecutors = 0
 
-  private[spark] def addArg(optionName: String, envVar: String, sysProp: 
String,
-  arrayBuf: ArrayBuffer[String]) {
-if (System.getenv(envVar) != null) {
-  arrayBuf += (optionName, System.getenv(envVar))
-} else if (sc.getConf.contains(sysProp)) {
-  arrayBuf += (optionName, sc.getConf.get(sysProp))
-}
-  }
-
+  /**
+   * Create a Yarn client to submit an application to the ResourceManager.
+   * This waits until the application is running.
+   */
   override def start() {
 super.start()
-
 val driverHost = conf.get("spark.driver.host")
 val driverPort = conf.get("spark.driver.port")
 val hostport = driverHost + ":" + driverPort
 conf.set("spark.driver.appUIAddress", sc.ui.appUIHostPort)
 
 val argsArrayBuf = new ArrayBuffer[String]()
-argsArrayBuf += (
-  "--args", hostport
-)
-
-// process any optional arguments, given either as environment 
variables
-// or system properties. use the defaults already defined in 
ClientArguments
-// if things aren't specified. system properties override environment
-// variables.
-List(("--driver-memory", "SPARK_MASTER_MEMORY", "spark.master.memory"),
-  ("--driver-memory", "SPARK_DRIVER_MEMORY", "spark.driver.memory"),
-  ("--num-executors", "SPARK_WORKER_INSTANCES", 
"spark.executor.instances"),
-  ("--num-executors", "SPARK_EXECUTOR_INSTANCES", 
"spark.executor.instances"),
-  ("--executor-memory", "SPARK_WORKER_MEMORY", 
"spark.executor.memory"),
-  ("--executor-memory", "SPARK_EXECUTOR_MEMORY", 
"spark.executor.memory"),
-  ("--executor-cores", "SPARK_WORKER_CORES", "spark.executor.cores"),
-  ("--executor-cores", "SPARK_EXECUTOR_CORES", "spark.executor.cores"),
-  ("--queue", "SPARK_YARN_QUEUE", "spark.yarn.queue"),
-  ("--name", "SPARK_YARN_APP_NAME", "spark.app.name"))
-.foreach { case (optName, envVar, sysProp) => addArg(optName, envVar, 
sysProp, argsArrayBuf) }
-
-logDebug("ClientArguments called with: " + argsArrayBuf)
+argsArrayBuf += ("--arg", hostport)
+argsArrayBuf ++= getExtraClientArguments
+
+logDebug("ClientArguments called with: " + argsArrayBuf.mkString(" "))
 val args = new ClientArguments(argsArrayBuf.toArray, conf)
 totalExpectedExecutors = args.numExecutors
 client = new Client(args, conf)
-appId = client.runApp()
-waitForApp()
-checkerThread = yarnApplicationStateCheckerThread()
+appId = client.submitApplication()
+waitForApplication()
+asyncMonitorApplication()
   }
 
-  def waitForApp() {
-
-// TODO : need a better way to find out whether the executors are 
ready or not
-// maybe by resource usage report?
-while(true) {
-  val report = client.getApplicationReport(appId)
-
-  logInfo("Application report from ASM: \n" +
-"\t appMasterRpcPort: " + report.getRpcPort() + "\n" +
-"\t appStartTime: " + report.getStartTime() + "\n" +
-"\t yarnAppState: " + report.getYarnApplicationState() + "\n"
+  /**
+   * Return any extra command line arguments to be passed to Client 
provided in the form of
+   * environment variables or Spark properties.
+   */
+  private def getExtraClientArguments: Seq[String] = {
+val extraArgs = new ArrayBuffer[String]
+val optionTuples = // List of (target Client argument, environment 
variable, Spark property)
+  List(
+("--driver-memory", "SPARK_MASTER_MEMORY", "spark.master.memory"),
+("--driver-memory", "SPARK_DRIVER_MEMORY", "spark.driver.memory"),
+("--num-executors", "SPARK_WORKER_INSTANCES", 
"spark.executor.instances"),
+("--num-executors", "SPARK_EXECUTOR_INSTANCES", 
"spark.executor.instances"),
+("--executor-memory", "SPARK_WORKER_MEMORY", 
"spark.executor.memory"),
+("--executor-memory", "SPARK_EXECUTOR_MEMORY", 
"spark.executor.memory"),
+("--executor-cores", "SPARK_WORKER_CORES", "spark.executor.cores"),
+("--executor-cores", "SPARK_EXECUTOR_CORES", 
"spark.executor.cores"),
+("--queue", "SPARK_YARN_QUEUE", "spark.yarn.queue"),
+("--name", "SPARK_YA

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55209675
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/49/consoleFull)
 for   PR 2339 at commit 
[`9a9e035`](https://github.com/apache/spark/commit/9a9e035a96a7b23d44f03f77b60fb071a05e2eb0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2348#issuecomment-55209709
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/50/consoleFull)
 for   PR 2348 at commit 
[`3cb3b21`](https://github.com/apache/spark/commit/3cb3b219f0f1cce0bb44992ea140cdf48ff893b1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-10 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2339#issuecomment-55209634
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...

2014-09-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2351#issuecomment-55209533
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20128/consoleFull)
 for   PR 2351 at commit 
[`4b20494`](https://github.com/apache/spark/commit/4b20494ce4e5e287a09fee5df5e0684711258627).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 378 matches

Mail list logo