[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/933#issuecomment-44766970
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/933#issuecomment-44766971
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15325/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/933#issuecomment-44766468
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/933#issuecomment-44766465
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...

2014-05-31 Thread ankurdave
GitHub user ankurdave opened a pull request:

https://github.com/apache/spark/pull/933

Add landmark-based Shortest Path algorithm to graphx.lib

This is a modified version of apache/spark#10.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ankurdave/spark shortestpaths

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #933


commit 0ce4c53da465f9d8b7591a03010bfde2bc18ec97
Author: Andres Perez 
Date:   2014-06-01T02:59:02Z

Add Shortest-path computations to graphx.lib with unit tests.

Adds a landmark-based shortest-path computation to 
org.apache.spark.graphx.lib.

Author: Andres Perez 
Author: Koert Kuipers 

Closes #10 from apache/spark and squashes the following commits:

88d80da [] Merge from master.
c9d1ee8 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
47e22db [Andres Perez] Remove algebird dependency from ShortestPaths.
44d19e5 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
4986f80 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
25fbe10 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
9ee0d89 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
745a7a1 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
9319fac [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
d47865f [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
ba6e530 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
5c5b197 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
ee9d90b [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
2d5e788 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
6cd90a5 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
2cbfe45 [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/spark
a3bdb0e [Koert Kuipers] Merge branch 'master' of 
https://github.com/apache/incubator-spark
f8f6d91 [Andres Perez] Revert "Add Shortest-path computations to graphx.lib 
with unit tests."
7496d6b [Andres Perez] Add Shortest-path computations to graphx.lib with 
unit tests.

commit 605a9782a09bb6d41c6ae899c794c25434247900
Author: Ankur Dave 
Date:   2014-05-31T01:27:39Z

Fix style errors

commit 1b73e389c05b4b56ff58bef585ef256c58e366ab
Author: Ankur Dave 
Date:   2014-05-31T01:33:50Z

Remove unnecessary VD type param, and pass through ED




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...

2014-05-31 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/933#issuecomment-44766447
  
@rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: updated java code blocks in spark SQL guide su...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/932


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: updated java code blocks in spark SQL guide su...

2014-05-31 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/932#issuecomment-44766071
  
Thanks. I've merged this in master & branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: updated java code blocks in spark SQL guide su...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/932#issuecomment-44765993
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: updated java code blocks in spark SQL guide su...

2014-05-31 Thread yadid
GitHub user yadid opened a pull request:

https://github.com/apache/spark/pull/932

updated java code blocks in spark SQL guide such that ctx will refer to ...

...a JavaSparkContext and sqlCtx will refer to a JavaSQLContext

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yadid/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #932


commit f92fb3a6db1d6bb961fae76be368e04269234520
Author: Yadid Ayzenberg 
Date:   2014-06-01T02:30:03Z

updated java code blocks in spark SQL guide such that ctx will refer to a 
JavaSparkContext and sqlCtx will refer to a JavaSQLContext




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1917: fix PySpark import of scipy.specia...

2014-05-31 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/866#issuecomment-44761085
  
Thanks Uri. Merged this into branch-1.0 as well as 0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1917: fix PySpark import of scipy.specia...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-05-31 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-44760575
  
Thanks for working on this!

Looks like adding the extra function is breaking the `show_functions` test. 
 Maybe cleanup by deleting that new UDF at the end of your test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve maven plugin configuration

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: support for Kinesis

2014-05-31 Thread cfregly
Github user cfregly commented on the pull request:

https://github.com/apache/spark/pull/223#issuecomment-44760308
  
update:  i discusses this with parviz recently - and we agreed that i would 
take this over.  new PR to come shortly.  here's the jira ticket:  
https://issues.apache.org/jira/browse/SPARK-1981


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-44758567
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-44758568
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15324/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...

2014-05-31 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/897#discussion_r13263116
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala 
---
@@ -672,38 +672,102 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
 
   /**
* Return approximate number of distinct values for each key in this RDD.
+   *
* The accuracy of approximation can be controlled through the relative 
standard deviation
* (relativeSD) parameter, which also controls the amount of memory 
used. Lower values result in
* more accurate counts but increase the memory footprint and vise 
versa. Uses the provided
* Partitioner to partition the output RDD.
+   *
+   * @param p The precision value for the normal set.
+   *  p must be a value between 4 and sp 
(32 max).
+   * @param sp The precision value for the sparse set, between 0 and 32.
+   *   If sp equals 0, the sparse representation is 
skipped.
+   * @param partitioner Partitioner to use for the resulting RDD.
*/
-  def countApproxDistinctByKey(relativeSD: Double, partitioner: 
Partitioner): JavaRDD[(K, Long)] = {
-rdd.countApproxDistinctByKey(relativeSD, partitioner)
+  def countApproxDistinctByKey(p: Int, sp: Int, partitioner: Partitioner): 
JavaPairRDD[K, Long] = {
+fromRDD(rdd.countApproxDistinctByKey(p, sp, partitioner))
   }
 
   /**
-   * Return approximate number of distinct values for each key this RDD.
+   * Return approximate number of distinct values for each key in this RDD.
+   *
* The accuracy of approximation can be controlled through the relative 
standard deviation
* (relativeSD) parameter, which also controls the amount of memory 
used. Lower values result in
-   * more accurate counts but increase the memory footprint and vise 
versa. The default value of
-   * relativeSD is 0.05. Hash-partitions the output RDD using the existing 
partitioner/parallelism
-   * level.
+   * more accurate counts but increase the memory footprint and vise 
versa. Uses the provided
+   * Partitioner to partition the output RDD.
+   *
+   * @param p The precision value for the normal set.
+   *  p must be a value between 4 and sp 
(32 max).
+   * @param sp The precision value for the sparse set, between 0 and 32.
+   *   If sp equals 0, the sparse representation is 
skipped.
+   * @param numPartitions The number of partitions in the resulting RDD.
*/
-  def countApproxDistinctByKey(relativeSD: Double = 0.05): JavaRDD[(K, 
Long)] = {
-rdd.countApproxDistinctByKey(relativeSD)
+  def countApproxDistinctByKey(p: Int, sp: Int, numPartitions: Int): 
JavaPairRDD[K, Long] = {
+fromRDD(rdd.countApproxDistinctByKey(p, sp, numPartitions))
   }
 
-
   /**
* Return approximate number of distinct values for each key in this RDD.
+   *
+   * The accuracy of approximation can be controlled through the relative 
standard deviation
+   * (relativeSD) parameter, which also controls the amount of memory 
used. Lower values result in
+   * more accurate counts but increase the memory footprint and vise 
versa. Uses the provided
+   * Partitioner to partition the output RDD.
+   *
+   * @param p The precision value for the normal set.
+   *  p must be a value between 4 and sp 
(32 max).
+   * @param sp The precision value for the sparse set, between 0 and 32.
+   *   If sp equals 0, the sparse representation is 
skipped.
+   */
+  def countApproxDistinctByKey(p: Int, sp: Int): JavaPairRDD[K, Long] = {
+fromRDD(rdd.countApproxDistinctByKey(p, sp))
+  }
+
+  /**
+   * Return approximate number of distinct values for each key in this 
RDD. This is deprecated.
+   * Use the variant with p and sp parameters 
instead.
+   *
* The accuracy of approximation can be controlled through the relative 
standard deviation
* (relativeSD) parameter, which also controls the amount of memory 
used. Lower values result in
-   * more accurate counts but increase the memory footprint and vise 
versa. HashPartitions the
-   * output RDD into numPartitions.
+   * more accurate counts but increase the memory footprint and vise 
versa. Uses the provided
+   * Partitioner to partition the output RDD.
+   */
+  @Deprecated
+  def countApproxDistinctByKey(relativeSD: Double, partitioner: 
Partitioner): JavaPairRDD[K, Long] =
+  {
+fromRDD(rdd.countApproxDistinctByKey(relativeSD, partitioner))
+  }
+
+  /**
+   * Return approximate number of distinct values for each key in this 
RDD. This is deprecated.
+   * Use the variant with p and sp parameters 
instead.
+   *
+   * The algorithm used is based on streamlib's impleme

[GitHub] spark pull request: Fix two issues in ReplSuite.

2014-05-31 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/781#discussion_r13263105
  
--- Diff: repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala ---
@@ -44,15 +44,19 @@ class ReplSuite extends FunSuite {
 }
   }
 }
+val classpath = paths.mkString(File.pathSeparator)
+System.setProperty("spark.executor.extraClassPath", classpath)
--- End diff --

For the sake of defensive programming, would you mind getting the previous 
value of `spark.executor.extraClassPath` and restoring it afterwards, instead 
of clearing it? The `spark.driver.port` value is a different situation, where 
Spark internally sets that parameter, and would reuse it if it remains set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1839: PySpark RDD#take() shouldn't alway...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/922


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1839: PySpark RDD#take() shouldn't alway...

2014-05-31 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/922#issuecomment-44758293
  
Thanks. I merged this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix two issues in ReplSuite.

2014-05-31 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/781#discussion_r13263078
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -44,7 +44,8 @@ private[spark] class SparkDeploySchedulerBackend(
 val driverUrl = "akka.tcp://spark@%s:%s/user/%s".format(
   conf.get("spark.driver.host"), conf.get("spark.driver.port"),
   CoarseGrainedSchedulerBackend.ACTOR_NAME)
-val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", 
"{{CORES}}", "{{WORKER_URL}}")
+val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", 
"{{CORES}}", "{{WORKER_URL}}",
--- End diff --

I'm not sure I understand this change. First, we already pass in {{CORES}}. 
Second, the CoarseGrainedExecutorBackend seems to take in the arguments as 
listed here:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/914


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...

2014-05-31 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/914#issuecomment-44757533
  
Ok merging this in master & branch-1.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1964 Add timestamp to hive metasto...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/913


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1964 Add timestamp to hive metasto...

2014-05-31 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/913#issuecomment-44757474
  
Merging this into master & branch-1.0. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Optionally include Hive as a dependency of the...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/801


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Optionally include Hive as a dependency of the...

2014-05-31 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/801#issuecomment-44757201
  
Merged into master and branch-1.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Optionally include Hive as a dependency of the...

2014-05-31 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/801#issuecomment-44757169
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-44756643
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-44756639
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-05-31 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-44756553
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44756501
  
Can you add "[SPARK-1495][SQL]" to the PR title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13262730
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -144,6 +144,150 @@ case class HashJoin(
  * :: DeveloperApi ::
  */
 @DeveloperApi
+case class LeftSemiJoinHash(
+ leftKeys: Seq[Expression],
+ rightKeys: Seq[Expression],
+ buildSide: BuildSide,
+ left: SparkPlan,
+ right: SparkPlan) extends BinaryNode {
+
+  override def outputPartitioning: Partitioning = left.outputPartitioning
+
+  override def requiredChildDistribution =
+ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: 
Nil
+
+  val (buildPlan, streamedPlan) = buildSide match {
+case BuildLeft => (left, right)
+case BuildRight => (right, left)
+  }
+
+  val (buildKeys, streamedKeys) = buildSide match {
+case BuildLeft => (leftKeys, rightKeys)
+case BuildRight => (rightKeys, leftKeys)
+  }
+
+  def output = left.output
+
+  @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, 
buildPlan.output)
+  @transient lazy val streamSideKeyGenerator =
+() => new MutableProjection(streamedKeys, streamedPlan.output)
+
+  def execute() = {
+
+buildPlan.execute().zipPartitions(streamedPlan.execute()) { 
(buildIter, streamIter) =>
+// TODO: Use Spark's HashMap implementation.
+  val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]()
+  var currentRow: Row = null
+
+  // Create a mapping of buildKeys -> rows
+  while (buildIter.hasNext) {
+currentRow = buildIter.next()
+val rowKey = buildSideKeyGenerator(currentRow)
+if(!rowKey.anyNull) {
+  val existingMatchList = hashTable.get(rowKey)
+  val matchList = if (existingMatchList == null) {
+val newMatchList = new ArrayBuffer[Row]()
+hashTable.put(rowKey, newMatchList)
+newMatchList
+  } else {
+existingMatchList
+  }
+  matchList += currentRow.copy()
+}
+  }
+
+  new Iterator[Row] {
+private[this] var currentStreamedRow: Row = _
+private[this] var currentHashMatched: Boolean = false
+
+private[this] val joinKeys = streamSideKeyGenerator()
+
+override final def hasNext: Boolean =
+  streamIter.hasNext && fetchNext()
+
+override final def next() = {
+  currentStreamedRow
+}
+
+/**
+ * Searches the streamed iterator for the next row that has at 
least one match in hashtable.
+ *
+ * @return true if the search is successful, and false the 
streamed iterator runs out of
+ * tuples.
+ */
+private final def fetchNext(): Boolean = {
+  currentHashMatched = false
+  while (!currentHashMatched && streamIter.hasNext) {
+currentStreamedRow = streamIter.next()
+if (!joinKeys(currentStreamedRow).anyNull) {
+  currentHashMatched = true
+}
+  }
+  currentHashMatched
+}
+  }
+}
+  }
+}
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class LeftSemiJoinBNL(
--- End diff --

I don't think this operator is exercised by the included test cases.  We 
should add a test where the join condition can be calculated with hash keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44756469
  
This is getting closer.  Thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13262710
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -144,6 +144,150 @@ case class HashJoin(
  * :: DeveloperApi ::
  */
 @DeveloperApi
+case class LeftSemiJoinHash(
+ leftKeys: Seq[Expression],
+ rightKeys: Seq[Expression],
+ buildSide: BuildSide,
+ left: SparkPlan,
+ right: SparkPlan) extends BinaryNode {
+
+  override def outputPartitioning: Partitioning = left.outputPartitioning
+
+  override def requiredChildDistribution =
+ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: 
Nil
+
+  val (buildPlan, streamedPlan) = buildSide match {
+case BuildLeft => (left, right)
+case BuildRight => (right, left)
+  }
+
+  val (buildKeys, streamedKeys) = buildSide match {
+case BuildLeft => (leftKeys, rightKeys)
+case BuildRight => (rightKeys, leftKeys)
+  }
+
+  def output = left.output
+
+  @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, 
buildPlan.output)
+  @transient lazy val streamSideKeyGenerator =
+() => new MutableProjection(streamedKeys, streamedPlan.output)
+
+  def execute() = {
+
+buildPlan.execute().zipPartitions(streamedPlan.execute()) { 
(buildIter, streamIter) =>
+// TODO: Use Spark's HashMap implementation.
+  val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]()
+  var currentRow: Row = null
+
+  // Create a mapping of buildKeys -> rows
+  while (buildIter.hasNext) {
+currentRow = buildIter.next()
+val rowKey = buildSideKeyGenerator(currentRow)
+if(!rowKey.anyNull) {
+  val existingMatchList = hashTable.get(rowKey)
+  val matchList = if (existingMatchList == null) {
+val newMatchList = new ArrayBuffer[Row]()
+hashTable.put(rowKey, newMatchList)
+newMatchList
+  } else {
+existingMatchList
+  }
+  matchList += currentRow.copy()
+}
+  }
+
+  new Iterator[Row] {
+private[this] var currentStreamedRow: Row = _
+private[this] var currentHashMatched: Boolean = false
+
+private[this] val joinKeys = streamSideKeyGenerator()
+
+override final def hasNext: Boolean =
+  streamIter.hasNext && fetchNext()
+
+override final def next() = {
+  currentStreamedRow
+}
+
+/**
+ * Searches the streamed iterator for the next row that has at 
least one match in hashtable.
+ *
+ * @return true if the search is successful, and false the 
streamed iterator runs out of
+ * tuples.
+ */
+private final def fetchNext(): Boolean = {
+  currentHashMatched = false
+  while (!currentHashMatched && streamIter.hasNext) {
+currentStreamedRow = streamIter.next()
+if (!joinKeys(currentStreamedRow).anyNull) {
+  currentHashMatched = true
+}
+  }
+  currentHashMatched
+}
+  }
+}
+  }
+}
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class LeftSemiJoinBNL(
+streamed: SparkPlan, broadcast: SparkPlan, condition: 
Option[Expression])
+(@transient sc: SparkContext)
+  extends BinaryNode {
+  // TODO: Override requiredChildDistribution.
+
+  override def outputPartitioning: Partitioning = 
streamed.outputPartitioning
+
+  override def otherCopyArgs = sc :: Nil
+
+  def output = left.output
+
+  /** The Streamed Relation */
+  def left = streamed
+  /** The Broadcast relation */
+  def right = broadcast
+
+  @transient lazy val boundCondition =
+InterpretedPredicate(
+  condition
+.map(c => BindReferences.bindReference(c, left.output ++ 
right.output))
+.getOrElse(Literal(true)))
+
+
+  def execute() = {
+val broadcastedRelation = 
sc.broadcast(broadcast.execute().map(_.copy()).collect().toIndexedSeq)
+
+val streamedPlusMatches = streamed.execute().mapPartitions { 
streamedIter =>
+  val joinedRow = new JoinedRow
+
+  streamedIter.filter(streamedRow => {
+var i = 0
+var matched = false
+
+while (i < broadcastedRelation.value.size && !matched) {
+  // TODO: One bitset per partition instead of per row.
+  val broadcastedRow = broadca

[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13262704
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -144,6 +144,150 @@ case class HashJoin(
  * :: DeveloperApi ::
  */
 @DeveloperApi
+case class LeftSemiJoinHash(
+ leftKeys: Seq[Expression],
+ rightKeys: Seq[Expression],
+ buildSide: BuildSide,
+ left: SparkPlan,
+ right: SparkPlan) extends BinaryNode {
+
+  override def outputPartitioning: Partitioning = left.outputPartitioning
+
+  override def requiredChildDistribution =
+ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: 
Nil
+
+  val (buildPlan, streamedPlan) = buildSide match {
+case BuildLeft => (left, right)
+case BuildRight => (right, left)
+  }
+
+  val (buildKeys, streamedKeys) = buildSide match {
+case BuildLeft => (leftKeys, rightKeys)
+case BuildRight => (rightKeys, leftKeys)
+  }
+
+  def output = left.output
+
+  @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, 
buildPlan.output)
+  @transient lazy val streamSideKeyGenerator =
+() => new MutableProjection(streamedKeys, streamedPlan.output)
+
+  def execute() = {
+
+buildPlan.execute().zipPartitions(streamedPlan.execute()) { 
(buildIter, streamIter) =>
+// TODO: Use Spark's HashMap implementation.
+  val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]()
+  var currentRow: Row = null
+
+  // Create a mapping of buildKeys -> rows
+  while (buildIter.hasNext) {
+currentRow = buildIter.next()
+val rowKey = buildSideKeyGenerator(currentRow)
+if(!rowKey.anyNull) {
+  val existingMatchList = hashTable.get(rowKey)
+  val matchList = if (existingMatchList == null) {
+val newMatchList = new ArrayBuffer[Row]()
+hashTable.put(rowKey, newMatchList)
+newMatchList
+  } else {
+existingMatchList
+  }
+  matchList += currentRow.copy()
+}
+  }
+
+  new Iterator[Row] {
+private[this] var currentStreamedRow: Row = _
+private[this] var currentHashMatched: Boolean = false
+
+private[this] val joinKeys = streamSideKeyGenerator()
+
+override final def hasNext: Boolean =
+  streamIter.hasNext && fetchNext()
+
+override final def next() = {
+  currentStreamedRow
+}
+
+/**
+ * Searches the streamed iterator for the next row that has at 
least one match in hashtable.
+ *
+ * @return true if the search is successful, and false the 
streamed iterator runs out of
+ * tuples.
+ */
+private final def fetchNext(): Boolean = {
+  currentHashMatched = false
+  while (!currentHashMatched && streamIter.hasNext) {
+currentStreamedRow = streamIter.next()
+if (!joinKeys(currentStreamedRow).anyNull) {
+  currentHashMatched = true
+}
+  }
+  currentHashMatched
+}
+  }
+}
+  }
+}
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class LeftSemiJoinBNL(
+streamed: SparkPlan, broadcast: SparkPlan, condition: 
Option[Expression])
+(@transient sc: SparkContext)
+  extends BinaryNode {
+  // TODO: Override requiredChildDistribution.
+
+  override def outputPartitioning: Partitioning = 
streamed.outputPartitioning
+
+  override def otherCopyArgs = sc :: Nil
+
+  def output = left.output
+
+  /** The Streamed Relation */
+  def left = streamed
+  /** The Broadcast relation */
+  def right = broadcast
+
+  @transient lazy val boundCondition =
+InterpretedPredicate(
+  condition
+.map(c => BindReferences.bindReference(c, left.output ++ 
right.output))
+.getOrElse(Literal(true)))
+
+
+  def execute() = {
+val broadcastedRelation = 
sc.broadcast(broadcast.execute().map(_.copy()).collect().toIndexedSeq)
+
+val streamedPlusMatches = streamed.execute().mapPartitions { 
streamedIter =>
+  val joinedRow = new JoinedRow
+
+  streamedIter.filter(streamedRow => {
+var i = 0
+var matched = false
+
+while (i < broadcastedRelation.value.size && !matched) {
+  // TODO: One bitset per partition instead of per row.
--- End diff --

Is this commen

[GitHub] spark pull request: Improve maven plugin configuration

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/786#issuecomment-44756289
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15323/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve maven plugin configuration

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/786#issuecomment-44756288
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13262687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -144,6 +144,150 @@ case class HashJoin(
  * :: DeveloperApi ::
  */
 @DeveloperApi
+case class LeftSemiJoinHash(
+ leftKeys: Seq[Expression],
+ rightKeys: Seq[Expression],
+ buildSide: BuildSide,
+ left: SparkPlan,
+ right: SparkPlan) extends BinaryNode {
+
+  override def outputPartitioning: Partitioning = left.outputPartitioning
+
+  override def requiredChildDistribution =
+ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: 
Nil
+
+  val (buildPlan, streamedPlan) = buildSide match {
+case BuildLeft => (left, right)
+case BuildRight => (right, left)
+  }
+
+  val (buildKeys, streamedKeys) = buildSide match {
+case BuildLeft => (leftKeys, rightKeys)
+case BuildRight => (rightKeys, leftKeys)
+  }
+
+  def output = left.output
+
+  @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, 
buildPlan.output)
+  @transient lazy val streamSideKeyGenerator =
+() => new MutableProjection(streamedKeys, streamedPlan.output)
+
+  def execute() = {
+
+buildPlan.execute().zipPartitions(streamedPlan.execute()) { 
(buildIter, streamIter) =>
+// TODO: Use Spark's HashMap implementation.
+  val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]()
+  var currentRow: Row = null
+
+  // Create a mapping of buildKeys -> rows
+  while (buildIter.hasNext) {
+currentRow = buildIter.next()
+val rowKey = buildSideKeyGenerator(currentRow)
+if(!rowKey.anyNull) {
+  val existingMatchList = hashTable.get(rowKey)
+  val matchList = if (existingMatchList == null) {
+val newMatchList = new ArrayBuffer[Row]()
+hashTable.put(rowKey, newMatchList)
+newMatchList
+  } else {
+existingMatchList
+  }
+  matchList += currentRow.copy()
+}
+  }
+
+  new Iterator[Row] {
+private[this] var currentStreamedRow: Row = _
+private[this] var currentHashMatched: Boolean = false
+
+private[this] val joinKeys = streamSideKeyGenerator()
+
+override final def hasNext: Boolean =
+  streamIter.hasNext && fetchNext()
+
+override final def next() = {
+  currentStreamedRow
--- End diff --

Is this correct if the operator is created with BuildLeft instead of 
BuildRight?  I think that would turn it into a RightSemiJoin.  Perhaps we 
should just remove the option to build on the other side.  I think you can then 
also safely simplify this to use a HashSet instead of a HashMap, which will 
reduce memory consumption significantly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13262640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -144,6 +144,150 @@ case class HashJoin(
  * :: DeveloperApi ::
--- End diff --

I realize that we aren't particularly good about this in most of the other 
physical operators, but could you add some Scala doc here about how this 
operator works and what the expected performance characteristics are?  Same 
below.  The goal of the Scala doc for physical operators should be to make it 
easy for people to understand query plans that are printed out by EXPLAIN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add support for left semi join

2014-05-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13262607
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -144,6 +144,150 @@ case class HashJoin(
  * :: DeveloperApi ::
  */
 @DeveloperApi
+case class LeftSemiJoinHash(
+ leftKeys: Seq[Expression],
--- End diff --

Intent only 4 spaces here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1935: Explicitly add commons-codec 1.5 a...

2014-05-31 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/912#issuecomment-44755843
  
Sure. I have closed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1947] [SQL] Child of SumDistinct or Ave...

2014-05-31 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/902#issuecomment-44755822
  
Thanks! merged into master and 1.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1935: Explicitly add commons-codec 1.5 a...

2014-05-31 Thread yhuai
Github user yhuai closed the pull request at:

https://github.com/apache/spark/pull/912


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1947] [SQL] Child of SumDistinct or Ave...

2014-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/902


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1959] String "NULL" shouldn't be interp...

2014-05-31 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/909#issuecomment-44755719
  
We should probably have a test for this... maybe something like:

```scala
  createQueryTest("nulls",
"""
  |CREATE TABLE nullVals AS SELECT "null", "NULL", "Null" FROM src 
LIMIT 1;
  |SELECT * FROM nullVals
""".stripMargin
  )
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44755561
  
So I've actually been assuming that users running older versions of Mesos 
just change `mesos.version` in the build and package Spark, so in my head I've 
sort of coupled "whether we compile against mesos X" and "whether users can run 
on mesos X" as the same thing.

But as long as mesos keeps the IPC messages compatible (which I think they 
do) then it really shouldn't matter whether we require compiling against the 
newest client. Let me just check with them about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44755392
  
Yes looks like it was introduced in 0.17.0: 
https://github.com/apache/mesos/commit/b609c851493c81c6ba8dfe51cf102400c05c2d0c

I see, I thought the intent was to require 0.18+ since Spark requires it in 
HEAD. If not yeah I'll close it but wouldn't there be other compatibility 
issues of this form?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1935: Explicitly add commons-codec 1.5 a...

2014-05-31 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/912#issuecomment-44755346
  
@yhuai mind closing this? Our "auto close" thing doesn't work for back 
ports like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...

2014-05-31 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/914#discussion_r13262479
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -381,16 +381,19 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
 object SparkSubmitArguments {
   /** Load properties present in the given file. */
   def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), s"Properties file ${file.getName} does not 
exist")
+require(file.exists(), s"Properties file $file does not exist")
+require(file.isFile(), s"Properties file $file is not a normal file")
--- End diff --

My bad, my test was silly, as I realize I had it pointed at a directory 
actually. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve maven plugin configuration

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/786#issuecomment-44755221
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve maven plugin configuration

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/786#issuecomment-44755219
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve maven plugin configuration

2014-05-31 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/786#issuecomment-44755096
  
Jenkins, test this please. Thanks for this @witgo - this is good clean-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44755003
  
Hey Sean - do you know if this will break Spark for Mesos 15/16/17? When 
did the introduce the newer API? We should definitely update this... I'm a bit 
concerned about bumping users to the bleeding edge of Mesos to run Spark 1.1... 
so the question is just regarding timing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...

2014-05-31 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/914#discussion_r13262410
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -381,16 +381,19 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
 object SparkSubmitArguments {
   /** Load properties present in the given file. */
   def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), s"Properties file ${file.getName} does not 
exist")
+require(file.exists(), s"Properties file $file does not exist")
+require(file.isFile(), s"Properties file $file is not a normal file")
--- End diff --

`File.isFile()` returns `true` for symlinks which point to files.

```
$ touch testfile
$ ln -s testfile testlink
$ scala
scala> new java.io.File("testlink").isFile()
res0: Boolean = true
```

Additionally, since the docs aren't 100% clear and I couldn't find a solid 
answer from Google, I checked both the 
[UnixFileSystem](http://code.metager.de/source/xref/openjdk/jdk8/jdk/src/solaris/native/java/io/UnixFileSystem_md.c#111)
 and 
[WindowsFileSystem](http://code.metager.de/source/xref/openjdk/jdk6/jdk/src/windows/native/java/io/Win32FileSystem_md.c#150).
 The former uses `stat` which resolves symbolic links. The latter will set 
isFile to true if and only if it is not a directory, so symlinks would be 
included.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44753736
  
I'm just copying-and-pasting to get something similar running externally. 
Maybe it's a little surprising that the example code doesn't work that way -- 
being in a `main()` kind of suggests this is a stand-alone program. Maybe just 
me. 

I think there are a few possibilities:

- Change all example code to set master if missing (that's what the current 
PR does)
- Change SparkConf to do something similar as a global default
- Just update javadoc to make it clear that the examples require the 
`spark.master` system property to be set

I slightly prefer one of the first two on the principle of least surprise, 
but can go any direction. I think at least the third should be done. What say 
everyone?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44753603
  
Hey Sean, how are you running the examples. Are you using the `run-example` 
script? That script should set the master to `local[*]` if the user hasn't 
specified it, which will use all cores locally. I think in some cases we might 
need to update the javadocs in examples to tell users to use `run-example`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1977][MLLIB] use immutable BitSet in AL...

2014-05-31 Thread nevillelyh
Github user nevillelyh commented on the pull request:

https://github.com/apache/spark/pull/925#issuecomment-44750919
  
Sure that would also work. I made a PR to chill as well.
https://github.com/twitter/chill/pull/185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/931#issuecomment-44749478
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/931#issuecomment-44749479
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15322/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/931#issuecomment-44749436
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/931#issuecomment-44749434
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...

2014-05-31 Thread xiajunluan
GitHub user xiajunluan opened a pull request:

https://github.com/apache/spark/pull/931

Fix JIRA-983 and support exteranl sort for sortByKey

Change class ExternalAppendOnlyMap and make it support customized 
comparator function(not only sorted by hashCode).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xiajunluan/spark-1 JIRA-983

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/931.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #931






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44747313
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44747315
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15321/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44746537
  
I pushed again, with `setIfMissing`. Is it better in the `SparkConf` 
constructor? or I am off base here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44746504
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44746502
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-31 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/792#issuecomment-44746294
  
Thanks, I meant to do this a while ago but never got around to it.

The code duplication is unfortunate, though. Why not just have 
toEdgePartition take an optional parameter `sort: Boolean`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use optional third argument as edge attribute.

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/901#issuecomment-44746122
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15320/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use optional third argument as edge attribute.

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/901#issuecomment-44746121
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use optional third argument as edge attribute.

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/901#issuecomment-44746090
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use optional third argument as edge attribute.

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/901#issuecomment-44746088
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use optional third argument as edge attribute.

2014-05-31 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/901#issuecomment-44746040
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use optional third argument as edge attribute.

2014-05-31 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/901#issuecomment-44745827
  
Unlike Python (but like Java), Scala doesn't use asInstanceOf for arbitrary 
type conversions. In this case, it won't work to do `"123".asInstanceOf[Int]`, 
because Scala doesn't know how to do the conversion. There are two ways to 
resolve this:

1. Have `edgeListFile` take a function that does the conversion:

```scala
def edgeListFile[ED: ClassTag](..., parseEdgeAttr: String => ED) = {
  ... parseEdgeAttr(lineArray(2)) ...
}
// Can be called like this:
edgeListFile[Int](..., _.toInt)
```

This is simple, but it doesn't facilitate reuse of the parsing 
functions. You always have to specify how to parse an Int, even though it's 
usually the same everywhere.

2. Define a library of standard parsers and use Scala implicits to select 
the right one automatically if it exists:
```scala
// `CanParse[T]` is a type class [1] indicating that we can parse any 
String
// to construct a T.  If there is an `implicit val` of type 
Parseable[Foo]
// in scope, it means you can convert from any String to a Foo. (This is
// called Read in Haskell.)
trait Parseable[T] { def parse(s: String): T }

// Here are some parsers. The user can define their own by creating
// additional implicit vals.
object Parseable {
  implicit val IntParser = new Parseable[Int] { def parse(s: String) = 
s.toInt }
  implicit val StringParser = new Parseable[String] { def parse(s: 
String) = s }
}

// Instead of an explicit function parameter, we use a context bound, 
which
// desugars to an implicit parameter of type Parseable[ED] (that is, a
// parser for ED) that we can access using implicitly[Parseable[ED]].
def edgeListFile[ED: ClassTag : Parseable](...) = {
  ... implicitly[Parseable[ED]].parse(lineArray(2)) ...
}
// Can be called without passing anything extra:
edgeListFile[Int](...)
   ```

---

Spark does have a [style 
guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide),
 but it's pretty sparse. Most questions will require looking for other examples 
in the code, or just submitting the PR and seeing what reviewers say.

[1] 
http://docs.scala-lang.org/tutorials/FAQ/finding-implicits.html#context_bounds


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1979] Added Error Handling if user pass...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/930#issuecomment-44745616
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1979] Added Error Handling if user pass...

2014-05-31 Thread pankajarora12
GitHub user pankajarora12 opened a pull request:

https://github.com/apache/spark/pull/930

[SPARK-1979] Added Error Handling if user passes application params with...

Added error message to user when used --arg for passing application 
parameters.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pankajarora12/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/930.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #930


commit d6bfba3d7b9236a02a7e91233f8e512bea761af0
Author: pankaj.arora 
Date:   2014-05-31T11:11:05Z

[SPARK-1979] Added Error Handling if user passes application params with 
--arg




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44745344
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44745345
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15319/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...

2014-05-31 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/926#issuecomment-44744710
  
Yeah I think it's essential to not prevent `-Dspark.master=...` from 
working, oops. I think it may be useful to have this work if one 
copies-and-pastes too, as I just did. The javadoc doesn't indicate that you 
have to set the master either. I will rework it to use `setIfMissing()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44744576
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44744577
  
Oops, yes, coming up now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/920#issuecomment-44744579
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...

2014-05-31 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/914#discussion_r13260477
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -381,16 +381,19 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
 object SparkSubmitArguments {
   /** Load properties present in the given file. */
   def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), s"Properties file ${file.getName} does not 
exist")
+require(file.exists(), s"Properties file $file does not exist")
+require(file.isFile(), s"Properties file $file is not a normal file")
--- End diff --

`isFile()` is `false` for symlinks. It may be more conservative to require 
`!file.isDirectory()`, since it seems valid to point to a symlinked config file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-05-31 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/455#issuecomment-44743612
  
@mateiz I noticed that the docs for Python programming guid have been 
merged into the overall Programming guide. 

Where do you think the best place is to put the bit of documentation about 
InputFormats for PySpark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742948
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742949
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15318/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742668
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15317/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742667
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP][SPARK-1930] The Container is running bey...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/894#issuecomment-44742298
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15316/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP][SPARK-1930] The Container is running bey...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/894#issuecomment-44742297
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...

2014-05-31 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/897#issuecomment-44742270
  
@pwendell Jenkins failed due to binary compatibility for 
SerializableHyperLogLog, which is no longer needed ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742189
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742193
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/897#issuecomment-44742112
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15315/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/897#issuecomment-44742111
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage

2014-05-31 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/828


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage

2014-05-31 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/828#issuecomment-44742037
  
This solution is not perfect. temporarily close this. The new #929 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-44742011
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >