[GitHub] spark pull request: update spark.default.parallelism

2014-04-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/389#issuecomment-40448460
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: style fix

2014-04-15 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/411#discussion_r11620940
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -678,7 +678,7 @@ private[spark] class BlockManager(
 case ArrayBufferValues(array) =
   tachyonStore.putValues(blockId, array, level, false)
 case ByteBufferValues(bytes) = {
--- End diff --

can you remove the { and } here also? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: style fix

2014-04-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40448496
  
Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: update spark.default.parallelism

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/389#issuecomment-40448600
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: update spark.default.parallelism

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/389#issuecomment-40448594
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: style fix

2014-04-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40448662
  
Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: style fix

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40448808
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...

2014-04-15 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/391#issuecomment-40449078
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1456 Remove view bounds on Ordered in fa...

2014-04-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/410#discussion_r11621180
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -89,12 +89,14 @@ class HashPartitioner(partitions: Int) extends 
Partitioner {
  * A [[org.apache.spark.Partitioner]] that partitions sortable records by 
range into roughly
  * equal ranges. The ranges are determined by sampling the content of the 
RDD passed in.
  */
-class RangePartitioner[K % Ordered[K]: ClassTag, V](
+class RangePartitioner[K : Ordering : ClassTag, V](
--- End diff --

Yes, when de-sugared there is an implicit Ordering -- that's why Michael 
can recover and bind it explicitly at line 98.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: style fix

2014-04-15 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/411#discussion_r11621270
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -678,7 +678,7 @@ private[spark] class BlockManager(
 case ArrayBufferValues(array) =
   tachyonStore.putValues(blockId, array, level, false)
 case ByteBufferValues(bytes) = {
--- End diff --

if you don't mind changing those that'd be great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40451066
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14137/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1456 Remove view bounds on Ordered in fa...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/410#issuecomment-40451062
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40451060
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1456 Remove view bounds on Ordered in fa...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/410#issuecomment-40451063
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14135/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40451067
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: update spark.default.parallelism

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/389#issuecomment-40451065
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14136/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/391#issuecomment-40451261
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Include stack trace for exceptions thrown by u...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/409#issuecomment-40451260
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/391#issuecomment-40451262
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14139/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Include stack trace for exceptions thrown by u...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/409#issuecomment-40451263
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14138/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-15 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40451813
  
I've merged this. Thanks @ahirreddy - cool stuff!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-15 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40451991
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40452091
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40452101
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-15 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40452444
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-15 Thread ahirreddy
Github user ahirreddy commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40452665
  
Awesome, thanks!—
Sent from Mailbox for iPhone

On Tue, Apr 15, 2014 at 12:16 AM, asfgit notificati...@github.com wrote:

 Closed #363 via c99bcb7feaa761c5826f2e1d844d0502a3b79538.
 ---
 Reply to this email directly or view it on GitHub:
 https://github.com/apache/spark/pull/363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1426: Make MLlib work with NumPy version...

2014-04-15 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/391#issuecomment-40452638
  
Thanks Sandeep! I've merged this in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40452733
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make distribution

2014-04-15 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/412

Make distribution



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark make_distribution

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #412


commit 709d71945a75a11fea6d91fd97ded30bd98b2950
Author: witgo wi...@qq.com
Date:   2014-04-15T07:26:30Z

add with-hive argument to make-distribution.sh

commit 6d344c8e35f28a2bb1063bbd24057e256d3fa2f2
Author: witgo wi...@qq.com
Date:   2014-04-15T07:29:29Z

add with-hive argument to make-distribution.sh




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add with-hive argument to make-distribution.sh

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/412#issuecomment-40453310
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40453437
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14140/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: remove unnecessary brace and semicolon in 'put...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/411#issuecomment-40453434
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40454904
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40460591
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40467603
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40467611
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40470083
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40470091
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40473966
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request:

https://github.com/apache/spark/pull/407#discussion_r11631197
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
   this.numBlocks
 }
 
-val partitioner = new HashPartitioner(numBlocks)
+// Hash an integer to propagate random bits at all positions, similar 
to java.util.HashTable
+def hash(x: Int): Int = {
--- End diff --

That hash function was already there.  I moved it up a few lines.  I can 
change the hash function as well if you like...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40475281
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request:

https://github.com/apache/spark/pull/407#discussion_r11631405
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
 private var lambda: Double,
 private var implicitPrefs: Boolean,
 private var alpha: Double,
+private var partitioner: Partitioner = null,
--- End diff --

I'll do this, but note that this adds considerable functionality above 
what's described in the bug I'm supposed to address.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40478717
  
Different unrelated failures each time. One more roll of the dice.

Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1357 (addendum). More Experimental items...

2014-04-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/372#issuecomment-40482466
  
@mengxr Nice one, that is music to my ears. I just suggest that if you 
agree, to mark a few more of these parts of MLlib as Experimental in order to 
give you the freedom to make these changes later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: improve the readability of SparkContext.scala

2014-04-15 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/414

improve the readability of SparkContext.scala



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SparkContext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #414


commit a9d7cd0b4e6cf9e2c08a22cdd9e4d61b86ec55bc
Author: witgo wi...@qq.com
Date:   2014-04-14T07:57:08Z

improve the readability of SparkContext code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/415#issuecomment-40489083
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread willb
GitHub user willb opened a pull request:

https://github.com/apache/spark/pull/415

SPARK-1501: Ensure assertions in Graph.apply are asserted.

The Graph.apply test in GraphSuite had some assertions in a closure in
a graph transformation. As a consequence, these assertions never
actually executed.  Furthermore, these closures had a reference to
(non-serializable) test harness classes because they called assert(),
which could be a problem if we proactively check closure serializability
in the future.

This commit simply changes the Graph.apply test to collect the graph
triplets so it can assert about each triplet from a map method.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/willb/spark graphsuite-nop-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #415


commit 0b636586b797546ce0cf78dbbfbe7462712aeaa4
Author: William Benton wi...@redhat.com
Date:   2014-03-14T16:40:56Z

Ensure assertions in Graph.apply are asserted.

The Graph.apply test in GraphSuite had some assertions in a closure in
a graph transformation. As a consequence, these assertions never
actually executed.  Furthermore, these closures had a reference to
(non-serializable) test harness classes because they called assert(),
which could be a problem if we proactively check closure serializability
in the future.

This commit simply changes the Graph.apply test to collect the graph
triplets so it can assert about each triplet from a map method.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/415#issuecomment-40489064
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40491073
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40491050
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...

2014-04-15 Thread techaddict
GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/416

SPARK-1462: Examples of ML algorithms are using deprecated APIs

This is a work in progress any comments are welcome. This will also fix 
SPARK-1464: Update MLLib Examples to Use Breeze.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark 1462

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #416


commit e7edc4af33cce17729176bf5f2270f38b15aad49
Author: Sandeep sand...@techaddict.me
Date:   2014-04-15T14:53:15Z

LocalLR uses breeze.linalg.Vector and DenseVector




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/416#issuecomment-40495701
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/415#issuecomment-40499267
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/415#issuecomment-40499268
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14148/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Decision Tree documentation for MLlib programm...

2014-04-15 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/402#issuecomment-40501557
  
@manishamde Thanks for writing decision tree documentation! There are some 
minor issues, but not worth another iteration. Do you mind me merging this 
first and then making minor updates?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...

2014-04-15 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/396#discussion_r11642707
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
@@ -17,10 +17,19 @@
 
 package org.apache.spark.deploy.yarn
 
+import java.util.Map
+import java.util.regex.Matcher
+import java.util.regex.Pattern
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+
 import org.apache.hadoop.io.Text
 import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.security.Credentials
 import org.apache.hadoop.security.UserGroupInformation
+import org.apache.hadoop.util.Shell
--- End diff --

This doesn't really matter now but this also doesn't compile for 0.23.

Please make sure to try it on both 0.23 and 2.x builds.  If you don't have 
those environments let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...

2014-04-15 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/416#issuecomment-40502368
  
@techaddict Thanks for work on this JIRA! Since we try to hide breeze types 
in MLlib, I'm not sure whether we should use breeze vectors directly in 
examples. We might choose either using breeze vectors in examples and leaving a 
note about their usage in MLlib, or implementing necessary operations in 
MLlib's vectors to be used in examples. I prefer the former given the time 
frame. @mateiz what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...

2014-04-15 Thread techaddict
Github user techaddict commented on the pull request:

https://github.com/apache/spark/pull/416#issuecomment-40502741
  
@mengxr  Ya i was thinking that too, as eventually we'll need function's 
like squaredDist(in KMeans Examples) implemented in mllib.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40502679
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/407#discussion_r11643032
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
   this.numBlocks
 }
 
-val partitioner = new HashPartitioner(numBlocks)
+// Hash an integer to propagate random bits at all positions, similar 
to java.util.HashTable
+def hash(x: Int): Int = {
--- End diff --

Yes, let's change it. We need a fast hash function but not necessarily high 
quality. Using an existing implementation can simplify the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/407#discussion_r11643157
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
 private var lambda: Double,
 private var implicitPrefs: Boolean,
 private var alpha: Double,
+private var partitioner: Partitioner = null,
--- End diff --

Okay, let's make it simpler. Just a single partitioner for both users and 
products, customizable via a setter function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1462: Examples of ML algorithms are usin...

2014-04-15 Thread techaddict
Github user techaddict commented on the pull request:

https://github.com/apache/spark/pull/416#issuecomment-40506793
  
@srowen i think we need to implements some additional function's to 
`linalg.Vector` like squaredDist (supported by `util.Vector`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Loads test tables when running sbt hive/conso...

2014-04-15 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/417

Loads test tables when running sbt hive/console without HIVE_DEV_HOME

When running Hive tests, the working directory is `$SPARK_HOME/sql/hive`, 
while when running `sbt hive/console`, it becomes `$SPARK_HOME`, and test 
tables are not loaded if `HIVE_DEV_HOME` is not defined.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark loadTestTables

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #417


commit 7cea8d668248c0c39225931c52baa39d42217b23
Author: Cheng Lian lian.cs@gmail.com
Date:   2014-04-15T16:53:43Z

Loads test tables when running sbt hive/console without HIVE_DEV_HOME




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/415#issuecomment-40510777
  
lgtm. merged. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40514510
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40514490
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Decision Tree documentation for MLlib programm...

2014-04-15 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/402#issuecomment-40515218
  
Looks good - thanks I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Loads test tables when running sbt hive/conso...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/417#issuecomment-40515442
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Loads test tables when running sbt hive/conso...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/417#issuecomment-40515443
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14150/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Generalize pattern for planning hash joins.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/418#issuecomment-40516161
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Generalize pattern for planning hash joins.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/418#issuecomment-40516180
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Generalize pattern for planning hash joins.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/418#issuecomment-40516364
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14152/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Generalize pattern for planning hash joins.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/418#issuecomment-40516363
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Generalize pattern for planning hash joins.

2014-04-15 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/418

Generalize pattern for planning hash joins.

This will be helpful for 
[SPARK-1495](https://issues.apache.org/jira/browse/SPARK-1495) and other cases 
where we want to have custom hash join implementations but don't want to repeat 
the logic for finding the join keys.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark hashFilter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #418


commit 165387dc03d23557a494d18992eb0f4c165fd20b
Author: Michael Armbrust mich...@databricks.com
Date:   2014-04-15T18:16:49Z

Move common functions to PredicateHelper.

commit d4ebf124921e838557eebb7eb2175c59865f1ffa
Author: Michael Armbrust mich...@databricks.com
Date:   2014-04-15T18:17:24Z

Generalize pattern for planning hash joins.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Loads test tables when running sbt hive/conso...

2014-04-15 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/417#issuecomment-40517020
  
This change seems reasonable.  I think we should leave `HIVE_DEV_HOME` 
though.  The point is to easily allow you to override the built in tests when 
we want to upgrade or run against a different version of hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-15 Thread dbtsai
Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/353


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...

2014-04-15 Thread xgong
Github user xgong commented on the pull request:

https://github.com/apache/spark/pull/396#issuecomment-40518846
  
@tgravescs  Would you mind to review this again ? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Decision Tree documentation for MLlib programm...

2014-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/402


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40523303
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14151/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40523300
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-15 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40523420
  
I merged master in and fixed the conflicts, it should be good to merge now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...

2014-04-15 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/396#discussion_r11653067
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
@@ -73,4 +81,61 @@ object YarnSparkHadoopUtil {
   def getLoggingArgsForContainerCommandLine(): String = {
 -Dlog4j.configuration=log4j-spark-container.properties
   }
+
+  def addToEnvironment(
+  env: HashMap[String, String],
+  variable: String,
+  value: String,
+  classPathSeparator: String) = {
+var envVariable = 
+if (env.get(variable) == None) {
+  envVariable = value
+} else {
+  envVariable = env.get(variable).get + classPathSeparator + value
+}
+env put (StringInterner.weakIntern(variable), 
StringInterner.weakIntern(envVariable))
+  }
+
+  def setEnvFromInputString(
+  env: HashMap[String, String],
+  envString: String,
+  classPathSeparator: String) = {
+if (envString != null  envString.length()  0) {
+  var childEnvs = envString.split(,)
+  var p = Pattern.compile(getEnvironmentVariableRegex())
+  for (cEnv - childEnvs) {
+var parts = cEnv.split(=) // split on '='
--- End diff --

@sryza @tgravescs does Hadoop not support env variables that have `=` 
inside of quoted strings?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-15 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/30#discussion_r11653646
  
--- Diff: python/pyspark/context.py ---
@@ -130,6 +130,13 @@ def __init__(self, master=None, appName=None, 
sparkHome=None, pyFiles=None,
 varName = k[len(spark.executorEnv.):]
 self.environment[varName] = v
 
+# Check if we're running on YARN:
+if self.master == yarn-client:
+if not os.environ.get(SPARK_JAR):
+raise Exception(Must set SPARK_JAR when using yarn-client 
mode)
+if not os.environ.get(PYSPARK_ZIP):
--- End diff --

Rather than exposing this to the user, why not just export it in the 
`./bin/pyspark` script, and there you can fail with a message that says you 
need to run `make` if the user hasn't done it already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40525624
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40525634
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request:

https://github.com/apache/spark/pull/407#discussion_r11653978
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
   this.numBlocks
 }
 
-val partitioner = new HashPartitioner(numBlocks)
+// Hash an integer to propagate random bits at all positions, similar 
to java.util.HashTable
+def hash(x: Int): Int = {
--- End diff --

OK; I changed all instances of the hash function to byteswap32.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40526721
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-15 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40527887
  
@sryza I think this is looking good. I played around with this on a local 
yarn install and it worked. The only points are twofold. Could we ditch 
requiring SPARK_JAR? I'm going to merge a patch shortly that removes that 
requirement. Also, we just automatically create the pyspark zip file and not 
expose this to the user?

Eventually we'll probably bundle this inside of the Spark assembly... but 
in the mean time having a thing that just works for users where they don't 
have to e.g. set environment variables would be nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...

2014-04-15 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/396#discussion_r11655050
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
@@ -73,4 +81,61 @@ object YarnSparkHadoopUtil {
   def getLoggingArgsForContainerCommandLine(): String = {
 -Dlog4j.configuration=log4j-spark-container.properties
   }
+
+  def addToEnvironment(
+  env: HashMap[String, String],
+  variable: String,
+  value: String,
+  classPathSeparator: String) = {
+var envVariable = 
+if (env.get(variable) == None) {
+  envVariable = value
+} else {
+  envVariable = env.get(variable).get + classPathSeparator + value
+}
+env put (StringInterner.weakIntern(variable), 
StringInterner.weakIntern(envVariable))
+  }
+
+  def setEnvFromInputString(
+  env: HashMap[String, String],
+  envString: String,
+  classPathSeparator: String) = {
+if (envString != null  envString.length()  0) {
+  var childEnvs = envString.split(,)
+  var p = Pattern.compile(getEnvironmentVariableRegex())
+  for (cEnv - childEnvs) {
+var parts = cEnv.split(=) // split on '='
--- End diff --

I've noticed this as an issue as well.  There's definitely room for 
improvement here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40529664
  
Thanks for looking at this! To add this to the test harness you can augment 
`dev/scalastyle` with two additional checks:

```
SPARK_YARN=true sbt/sbt yarn/scalastyle  scalastyle.txt
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt yarn/scalastyle  
scalastyle.txt
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40532258
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40532267
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40532862
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40533842
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40533843
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14153/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Loads test tables when running sbt hive/conso...

2014-04-15 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/417#issuecomment-40534041
  
@pwendell, this can be merged.  No reason not to include in 1.0 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40534045
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/413#issuecomment-40539374
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1497. Fix scalastyle warnings in YARN, H...

2014-04-15 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/413#discussion_r11665116
  
--- Diff: dev/scalastyle ---
@@ -18,6 +18,10 @@
 #
 
 echo -e q\n | sbt/sbt clean scalastyle  scalastyle.txt
+# Check style with YARN alpha built too
--- End diff --

Any interest in doing the hive one here too? I edited my comment earlier to 
show how to do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [FIX] update sbt-idea to version 1.6.0

2014-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/419#issuecomment-40551810
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >