[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/933#issuecomment-44766970 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/933#issuecomment-44766971 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15325/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/933#issuecomment-44766468 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/933#issuecomment-44766465 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...
GitHub user ankurdave opened a pull request: https://github.com/apache/spark/pull/933 Add landmark-based Shortest Path algorithm to graphx.lib This is a modified version of apache/spark#10. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ankurdave/spark shortestpaths Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/933.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #933 commit 0ce4c53da465f9d8b7591a03010bfde2bc18ec97 Author: Andres Perez Date: 2014-06-01T02:59:02Z Add Shortest-path computations to graphx.lib with unit tests. Adds a landmark-based shortest-path computation to org.apache.spark.graphx.lib. Author: Andres Perez Author: Koert Kuipers Closes #10 from apache/spark and squashes the following commits: 88d80da [] Merge from master. c9d1ee8 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 47e22db [Andres Perez] Remove algebird dependency from ShortestPaths. 44d19e5 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 4986f80 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 25fbe10 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 9ee0d89 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 745a7a1 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 9319fac [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark d47865f [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark ba6e530 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 5c5b197 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark ee9d90b [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 2d5e788 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 6cd90a5 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark 2cbfe45 [Koert Kuipers] Merge branch 'master' of https://github.com/apache/spark a3bdb0e [Koert Kuipers] Merge branch 'master' of https://github.com/apache/incubator-spark f8f6d91 [Andres Perez] Revert "Add Shortest-path computations to graphx.lib with unit tests." 7496d6b [Andres Perez] Add Shortest-path computations to graphx.lib with unit tests. commit 605a9782a09bb6d41c6ae899c794c25434247900 Author: Ankur Dave Date: 2014-05-31T01:27:39Z Fix style errors commit 1b73e389c05b4b56ff58bef585ef256c58e366ab Author: Ankur Dave Date: 2014-05-31T01:33:50Z Remove unnecessary VD type param, and pass through ED --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add landmark-based Shortest Path algorithm to ...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/933#issuecomment-44766447 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: updated java code blocks in spark SQL guide su...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/932 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: updated java code blocks in spark SQL guide su...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/932#issuecomment-44766071 Thanks. I've merged this in master & branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: updated java code blocks in spark SQL guide su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/932#issuecomment-44765993 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: updated java code blocks in spark SQL guide su...
GitHub user yadid opened a pull request: https://github.com/apache/spark/pull/932 updated java code blocks in spark SQL guide such that ctx will refer to ... ...a JavaSparkContext and sqlCtx will refer to a JavaSQLContext You can merge this pull request into a Git repository by running: $ git pull https://github.com/yadid/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/932.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #932 commit f92fb3a6db1d6bb961fae76be368e04269234520 Author: Yadid Ayzenberg Date: 2014-06-01T02:30:03Z updated java code blocks in spark SQL guide such that ctx will refer to a JavaSparkContext and sqlCtx will refer to a JavaSQLContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1917: fix PySpark import of scipy.specia...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/866#issuecomment-44761085 Thanks Uri. Merged this into branch-1.0 as well as 0.9. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1917: fix PySpark import of scipy.specia...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/866 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SparkSQL] allow UDF on struct
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/796#issuecomment-44760575 Thanks for working on this! Looks like adding the extra function is breaking the `show_functions` test. Maybe cleanup by deleting that new UDF at the end of your test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve maven plugin configuration
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/786 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: support for Kinesis
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/223#issuecomment-44760308 update: i discusses this with parviz recently - and we agreed that i would take this over. new PR to come shortly. here's the jira ticket: https://issues.apache.org/jira/browse/SPARK-1981 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SparkSQL] allow UDF on struct
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/796#issuecomment-44758567 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SparkSQL] allow UDF on struct
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/796#issuecomment-44758568 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15324/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/897#discussion_r13263116 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala --- @@ -672,38 +672,102 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)]) /** * Return approximate number of distinct values for each key in this RDD. + * * The accuracy of approximation can be controlled through the relative standard deviation * (relativeSD) parameter, which also controls the amount of memory used. Lower values result in * more accurate counts but increase the memory footprint and vise versa. Uses the provided * Partitioner to partition the output RDD. + * + * @param p The precision value for the normal set. + * p must be a value between 4 and sp (32 max). + * @param sp The precision value for the sparse set, between 0 and 32. + * If sp equals 0, the sparse representation is skipped. + * @param partitioner Partitioner to use for the resulting RDD. */ - def countApproxDistinctByKey(relativeSD: Double, partitioner: Partitioner): JavaRDD[(K, Long)] = { -rdd.countApproxDistinctByKey(relativeSD, partitioner) + def countApproxDistinctByKey(p: Int, sp: Int, partitioner: Partitioner): JavaPairRDD[K, Long] = { +fromRDD(rdd.countApproxDistinctByKey(p, sp, partitioner)) } /** - * Return approximate number of distinct values for each key this RDD. + * Return approximate number of distinct values for each key in this RDD. + * * The accuracy of approximation can be controlled through the relative standard deviation * (relativeSD) parameter, which also controls the amount of memory used. Lower values result in - * more accurate counts but increase the memory footprint and vise versa. The default value of - * relativeSD is 0.05. Hash-partitions the output RDD using the existing partitioner/parallelism - * level. + * more accurate counts but increase the memory footprint and vise versa. Uses the provided + * Partitioner to partition the output RDD. + * + * @param p The precision value for the normal set. + * p must be a value between 4 and sp (32 max). + * @param sp The precision value for the sparse set, between 0 and 32. + * If sp equals 0, the sparse representation is skipped. + * @param numPartitions The number of partitions in the resulting RDD. */ - def countApproxDistinctByKey(relativeSD: Double = 0.05): JavaRDD[(K, Long)] = { -rdd.countApproxDistinctByKey(relativeSD) + def countApproxDistinctByKey(p: Int, sp: Int, numPartitions: Int): JavaPairRDD[K, Long] = { +fromRDD(rdd.countApproxDistinctByKey(p, sp, numPartitions)) } - /** * Return approximate number of distinct values for each key in this RDD. + * + * The accuracy of approximation can be controlled through the relative standard deviation + * (relativeSD) parameter, which also controls the amount of memory used. Lower values result in + * more accurate counts but increase the memory footprint and vise versa. Uses the provided + * Partitioner to partition the output RDD. + * + * @param p The precision value for the normal set. + * p must be a value between 4 and sp (32 max). + * @param sp The precision value for the sparse set, between 0 and 32. + * If sp equals 0, the sparse representation is skipped. + */ + def countApproxDistinctByKey(p: Int, sp: Int): JavaPairRDD[K, Long] = { +fromRDD(rdd.countApproxDistinctByKey(p, sp)) + } + + /** + * Return approximate number of distinct values for each key in this RDD. This is deprecated. + * Use the variant with p and sp parameters instead. + * * The accuracy of approximation can be controlled through the relative standard deviation * (relativeSD) parameter, which also controls the amount of memory used. Lower values result in - * more accurate counts but increase the memory footprint and vise versa. HashPartitions the - * output RDD into numPartitions. + * more accurate counts but increase the memory footprint and vise versa. Uses the provided + * Partitioner to partition the output RDD. + */ + @Deprecated + def countApproxDistinctByKey(relativeSD: Double, partitioner: Partitioner): JavaPairRDD[K, Long] = + { +fromRDD(rdd.countApproxDistinctByKey(relativeSD, partitioner)) + } + + /** + * Return approximate number of distinct values for each key in this RDD. This is deprecated. + * Use the variant with p and sp parameters instead. + * + * The algorithm used is based on streamlib's impleme
[GitHub] spark pull request: Fix two issues in ReplSuite.
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/781#discussion_r13263105 --- Diff: repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala --- @@ -44,15 +44,19 @@ class ReplSuite extends FunSuite { } } } +val classpath = paths.mkString(File.pathSeparator) +System.setProperty("spark.executor.extraClassPath", classpath) --- End diff -- For the sake of defensive programming, would you mind getting the previous value of `spark.executor.extraClassPath` and restoring it afterwards, instead of clearing it? The `spark.driver.port` value is a different situation, where Spark internally sets that parameter, and would reuse it if it remains set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1839: PySpark RDD#take() shouldn't alway...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/922 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1839: PySpark RDD#take() shouldn't alway...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/922#issuecomment-44758293 Thanks. I merged this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix two issues in ReplSuite.
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/781#discussion_r13263078 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -44,7 +44,8 @@ private[spark] class SparkDeploySchedulerBackend( val driverUrl = "akka.tcp://spark@%s:%s/user/%s".format( conf.get("spark.driver.host"), conf.get("spark.driver.port"), CoarseGrainedSchedulerBackend.ACTOR_NAME) -val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}") +val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}", --- End diff -- I'm not sure I understand this change. First, we already pass in {{CORES}}. Second, the CoarseGrainedExecutorBackend seems to take in the arguments as listed here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/914 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/914#issuecomment-44757533 Ok merging this in master & branch-1.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1964 Add timestamp to hive metasto...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/913 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] SPARK-1964 Add timestamp to hive metasto...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/913#issuecomment-44757474 Merging this into master & branch-1.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Optionally include Hive as a dependency of the...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/801 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Optionally include Hive as a dependency of the...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/801#issuecomment-44757201 Merged into master and branch-1.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Optionally include Hive as a dependency of the...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/801#issuecomment-44757169 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SparkSQL] allow UDF on struct
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/796#issuecomment-44756643 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SparkSQL] allow UDF on struct
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/796#issuecomment-44756639 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SparkSQL] allow UDF on struct
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/796#issuecomment-44756553 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-44756501 Can you add "[SPARK-1495][SQL]" to the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/837#discussion_r13262730 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -144,6 +144,150 @@ case class HashJoin( * :: DeveloperApi :: */ @DeveloperApi +case class LeftSemiJoinHash( + leftKeys: Seq[Expression], + rightKeys: Seq[Expression], + buildSide: BuildSide, + left: SparkPlan, + right: SparkPlan) extends BinaryNode { + + override def outputPartitioning: Partitioning = left.outputPartitioning + + override def requiredChildDistribution = +ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil + + val (buildPlan, streamedPlan) = buildSide match { +case BuildLeft => (left, right) +case BuildRight => (right, left) + } + + val (buildKeys, streamedKeys) = buildSide match { +case BuildLeft => (leftKeys, rightKeys) +case BuildRight => (rightKeys, leftKeys) + } + + def output = left.output + + @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, buildPlan.output) + @transient lazy val streamSideKeyGenerator = +() => new MutableProjection(streamedKeys, streamedPlan.output) + + def execute() = { + +buildPlan.execute().zipPartitions(streamedPlan.execute()) { (buildIter, streamIter) => +// TODO: Use Spark's HashMap implementation. + val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]() + var currentRow: Row = null + + // Create a mapping of buildKeys -> rows + while (buildIter.hasNext) { +currentRow = buildIter.next() +val rowKey = buildSideKeyGenerator(currentRow) +if(!rowKey.anyNull) { + val existingMatchList = hashTable.get(rowKey) + val matchList = if (existingMatchList == null) { +val newMatchList = new ArrayBuffer[Row]() +hashTable.put(rowKey, newMatchList) +newMatchList + } else { +existingMatchList + } + matchList += currentRow.copy() +} + } + + new Iterator[Row] { +private[this] var currentStreamedRow: Row = _ +private[this] var currentHashMatched: Boolean = false + +private[this] val joinKeys = streamSideKeyGenerator() + +override final def hasNext: Boolean = + streamIter.hasNext && fetchNext() + +override final def next() = { + currentStreamedRow +} + +/** + * Searches the streamed iterator for the next row that has at least one match in hashtable. + * + * @return true if the search is successful, and false the streamed iterator runs out of + * tuples. + */ +private final def fetchNext(): Boolean = { + currentHashMatched = false + while (!currentHashMatched && streamIter.hasNext) { +currentStreamedRow = streamIter.next() +if (!joinKeys(currentStreamedRow).anyNull) { + currentHashMatched = true +} + } + currentHashMatched +} + } +} + } +} + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class LeftSemiJoinBNL( --- End diff -- I don't think this operator is exercised by the included test cases. We should add a test where the join condition can be calculated with hash keys. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-44756469 This is getting closer. Thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/837#discussion_r13262710 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -144,6 +144,150 @@ case class HashJoin( * :: DeveloperApi :: */ @DeveloperApi +case class LeftSemiJoinHash( + leftKeys: Seq[Expression], + rightKeys: Seq[Expression], + buildSide: BuildSide, + left: SparkPlan, + right: SparkPlan) extends BinaryNode { + + override def outputPartitioning: Partitioning = left.outputPartitioning + + override def requiredChildDistribution = +ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil + + val (buildPlan, streamedPlan) = buildSide match { +case BuildLeft => (left, right) +case BuildRight => (right, left) + } + + val (buildKeys, streamedKeys) = buildSide match { +case BuildLeft => (leftKeys, rightKeys) +case BuildRight => (rightKeys, leftKeys) + } + + def output = left.output + + @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, buildPlan.output) + @transient lazy val streamSideKeyGenerator = +() => new MutableProjection(streamedKeys, streamedPlan.output) + + def execute() = { + +buildPlan.execute().zipPartitions(streamedPlan.execute()) { (buildIter, streamIter) => +// TODO: Use Spark's HashMap implementation. + val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]() + var currentRow: Row = null + + // Create a mapping of buildKeys -> rows + while (buildIter.hasNext) { +currentRow = buildIter.next() +val rowKey = buildSideKeyGenerator(currentRow) +if(!rowKey.anyNull) { + val existingMatchList = hashTable.get(rowKey) + val matchList = if (existingMatchList == null) { +val newMatchList = new ArrayBuffer[Row]() +hashTable.put(rowKey, newMatchList) +newMatchList + } else { +existingMatchList + } + matchList += currentRow.copy() +} + } + + new Iterator[Row] { +private[this] var currentStreamedRow: Row = _ +private[this] var currentHashMatched: Boolean = false + +private[this] val joinKeys = streamSideKeyGenerator() + +override final def hasNext: Boolean = + streamIter.hasNext && fetchNext() + +override final def next() = { + currentStreamedRow +} + +/** + * Searches the streamed iterator for the next row that has at least one match in hashtable. + * + * @return true if the search is successful, and false the streamed iterator runs out of + * tuples. + */ +private final def fetchNext(): Boolean = { + currentHashMatched = false + while (!currentHashMatched && streamIter.hasNext) { +currentStreamedRow = streamIter.next() +if (!joinKeys(currentStreamedRow).anyNull) { + currentHashMatched = true +} + } + currentHashMatched +} + } +} + } +} + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class LeftSemiJoinBNL( +streamed: SparkPlan, broadcast: SparkPlan, condition: Option[Expression]) +(@transient sc: SparkContext) + extends BinaryNode { + // TODO: Override requiredChildDistribution. + + override def outputPartitioning: Partitioning = streamed.outputPartitioning + + override def otherCopyArgs = sc :: Nil + + def output = left.output + + /** The Streamed Relation */ + def left = streamed + /** The Broadcast relation */ + def right = broadcast + + @transient lazy val boundCondition = +InterpretedPredicate( + condition +.map(c => BindReferences.bindReference(c, left.output ++ right.output)) +.getOrElse(Literal(true))) + + + def execute() = { +val broadcastedRelation = sc.broadcast(broadcast.execute().map(_.copy()).collect().toIndexedSeq) + +val streamedPlusMatches = streamed.execute().mapPartitions { streamedIter => + val joinedRow = new JoinedRow + + streamedIter.filter(streamedRow => { +var i = 0 +var matched = false + +while (i < broadcastedRelation.value.size && !matched) { + // TODO: One bitset per partition instead of per row. + val broadcastedRow = broadca
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/837#discussion_r13262704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -144,6 +144,150 @@ case class HashJoin( * :: DeveloperApi :: */ @DeveloperApi +case class LeftSemiJoinHash( + leftKeys: Seq[Expression], + rightKeys: Seq[Expression], + buildSide: BuildSide, + left: SparkPlan, + right: SparkPlan) extends BinaryNode { + + override def outputPartitioning: Partitioning = left.outputPartitioning + + override def requiredChildDistribution = +ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil + + val (buildPlan, streamedPlan) = buildSide match { +case BuildLeft => (left, right) +case BuildRight => (right, left) + } + + val (buildKeys, streamedKeys) = buildSide match { +case BuildLeft => (leftKeys, rightKeys) +case BuildRight => (rightKeys, leftKeys) + } + + def output = left.output + + @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, buildPlan.output) + @transient lazy val streamSideKeyGenerator = +() => new MutableProjection(streamedKeys, streamedPlan.output) + + def execute() = { + +buildPlan.execute().zipPartitions(streamedPlan.execute()) { (buildIter, streamIter) => +// TODO: Use Spark's HashMap implementation. + val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]() + var currentRow: Row = null + + // Create a mapping of buildKeys -> rows + while (buildIter.hasNext) { +currentRow = buildIter.next() +val rowKey = buildSideKeyGenerator(currentRow) +if(!rowKey.anyNull) { + val existingMatchList = hashTable.get(rowKey) + val matchList = if (existingMatchList == null) { +val newMatchList = new ArrayBuffer[Row]() +hashTable.put(rowKey, newMatchList) +newMatchList + } else { +existingMatchList + } + matchList += currentRow.copy() +} + } + + new Iterator[Row] { +private[this] var currentStreamedRow: Row = _ +private[this] var currentHashMatched: Boolean = false + +private[this] val joinKeys = streamSideKeyGenerator() + +override final def hasNext: Boolean = + streamIter.hasNext && fetchNext() + +override final def next() = { + currentStreamedRow +} + +/** + * Searches the streamed iterator for the next row that has at least one match in hashtable. + * + * @return true if the search is successful, and false the streamed iterator runs out of + * tuples. + */ +private final def fetchNext(): Boolean = { + currentHashMatched = false + while (!currentHashMatched && streamIter.hasNext) { +currentStreamedRow = streamIter.next() +if (!joinKeys(currentStreamedRow).anyNull) { + currentHashMatched = true +} + } + currentHashMatched +} + } +} + } +} + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class LeftSemiJoinBNL( +streamed: SparkPlan, broadcast: SparkPlan, condition: Option[Expression]) +(@transient sc: SparkContext) + extends BinaryNode { + // TODO: Override requiredChildDistribution. + + override def outputPartitioning: Partitioning = streamed.outputPartitioning + + override def otherCopyArgs = sc :: Nil + + def output = left.output + + /** The Streamed Relation */ + def left = streamed + /** The Broadcast relation */ + def right = broadcast + + @transient lazy val boundCondition = +InterpretedPredicate( + condition +.map(c => BindReferences.bindReference(c, left.output ++ right.output)) +.getOrElse(Literal(true))) + + + def execute() = { +val broadcastedRelation = sc.broadcast(broadcast.execute().map(_.copy()).collect().toIndexedSeq) + +val streamedPlusMatches = streamed.execute().mapPartitions { streamedIter => + val joinedRow = new JoinedRow + + streamedIter.filter(streamedRow => { +var i = 0 +var matched = false + +while (i < broadcastedRelation.value.size && !matched) { + // TODO: One bitset per partition instead of per row. --- End diff -- Is this commen
[GitHub] spark pull request: Improve maven plugin configuration
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/786#issuecomment-44756289 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15323/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve maven plugin configuration
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/786#issuecomment-44756288 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/837#discussion_r13262687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -144,6 +144,150 @@ case class HashJoin( * :: DeveloperApi :: */ @DeveloperApi +case class LeftSemiJoinHash( + leftKeys: Seq[Expression], + rightKeys: Seq[Expression], + buildSide: BuildSide, + left: SparkPlan, + right: SparkPlan) extends BinaryNode { + + override def outputPartitioning: Partitioning = left.outputPartitioning + + override def requiredChildDistribution = +ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil + + val (buildPlan, streamedPlan) = buildSide match { +case BuildLeft => (left, right) +case BuildRight => (right, left) + } + + val (buildKeys, streamedKeys) = buildSide match { +case BuildLeft => (leftKeys, rightKeys) +case BuildRight => (rightKeys, leftKeys) + } + + def output = left.output + + @transient lazy val buildSideKeyGenerator = new Projection(buildKeys, buildPlan.output) + @transient lazy val streamSideKeyGenerator = +() => new MutableProjection(streamedKeys, streamedPlan.output) + + def execute() = { + +buildPlan.execute().zipPartitions(streamedPlan.execute()) { (buildIter, streamIter) => +// TODO: Use Spark's HashMap implementation. + val hashTable = new java.util.HashMap[Row, ArrayBuffer[Row]]() + var currentRow: Row = null + + // Create a mapping of buildKeys -> rows + while (buildIter.hasNext) { +currentRow = buildIter.next() +val rowKey = buildSideKeyGenerator(currentRow) +if(!rowKey.anyNull) { + val existingMatchList = hashTable.get(rowKey) + val matchList = if (existingMatchList == null) { +val newMatchList = new ArrayBuffer[Row]() +hashTable.put(rowKey, newMatchList) +newMatchList + } else { +existingMatchList + } + matchList += currentRow.copy() +} + } + + new Iterator[Row] { +private[this] var currentStreamedRow: Row = _ +private[this] var currentHashMatched: Boolean = false + +private[this] val joinKeys = streamSideKeyGenerator() + +override final def hasNext: Boolean = + streamIter.hasNext && fetchNext() + +override final def next() = { + currentStreamedRow --- End diff -- Is this correct if the operator is created with BuildLeft instead of BuildRight? I think that would turn it into a RightSemiJoin. Perhaps we should just remove the option to build on the other side. I think you can then also safely simplify this to use a HashSet instead of a HashMap, which will reduce memory consumption significantly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/837#discussion_r13262640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -144,6 +144,150 @@ case class HashJoin( * :: DeveloperApi :: --- End diff -- I realize that we aren't particularly good about this in most of the other physical operators, but could you add some Scala doc here about how this operator works and what the expected performance characteristics are? Same below. The goal of the Scala doc for physical operators should be to make it easy for people to understand query plans that are printed out by EXPLAIN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/837#discussion_r13262607 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -144,6 +144,150 @@ case class HashJoin( * :: DeveloperApi :: */ @DeveloperApi +case class LeftSemiJoinHash( + leftKeys: Seq[Expression], --- End diff -- Intent only 4 spaces here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1935: Explicitly add commons-codec 1.5 a...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/912#issuecomment-44755843 Sure. I have closed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1947] [SQL] Child of SumDistinct or Ave...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/902#issuecomment-44755822 Thanks! merged into master and 1.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1935: Explicitly add commons-codec 1.5 a...
Github user yhuai closed the pull request at: https://github.com/apache/spark/pull/912 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1947] [SQL] Child of SumDistinct or Ave...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/902 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1959] String "NULL" shouldn't be interp...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/909#issuecomment-44755719 We should probably have a test for this... maybe something like: ```scala createQueryTest("nulls", """ |CREATE TABLE nullVals AS SELECT "null", "NULL", "Null" FROM src LIMIT 1; |SELECT * FROM nullVals """.stripMargin ) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44755561 So I've actually been assuming that users running older versions of Mesos just change `mesos.version` in the build and package Spark, so in my head I've sort of coupled "whether we compile against mesos X" and "whether users can run on mesos X" as the same thing. But as long as mesos keeps the IPC messages compatible (which I think they do) then it really shouldn't matter whether we require compiling against the newest client. Let me just check with them about this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44755392 Yes looks like it was introduced in 0.17.0: https://github.com/apache/mesos/commit/b609c851493c81c6ba8dfe51cf102400c05c2d0c I see, I thought the intent was to require 0.18+ since Spark requires it in HEAD. If not yeah I'll close it but wouldn't there be other compatibility issues of this form? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1935: Explicitly add commons-codec 1.5 a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/912#issuecomment-44755346 @yhuai mind closing this? Our "auto close" thing doesn't work for back ports like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/914#discussion_r13262479 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -381,16 +381,19 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { object SparkSubmitArguments { /** Load properties present in the given file. */ def getPropertiesFromFile(file: File): Seq[(String, String)] = { -require(file.exists(), s"Properties file ${file.getName} does not exist") +require(file.exists(), s"Properties file $file does not exist") +require(file.isFile(), s"Properties file $file is not a normal file") --- End diff -- My bad, my test was silly, as I realize I had it pointed at a directory actually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve maven plugin configuration
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/786#issuecomment-44755221 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve maven plugin configuration
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/786#issuecomment-44755219 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve maven plugin configuration
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/786#issuecomment-44755096 Jenkins, test this please. Thanks for this @witgo - this is good clean-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44755003 Hey Sean - do you know if this will break Spark for Mesos 15/16/17? When did the introduce the newer API? We should definitely update this... I'm a bit concerned about bumping users to the bleeding edge of Mesos to run Spark 1.1... so the question is just regarding timing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/914#discussion_r13262410 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -381,16 +381,19 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { object SparkSubmitArguments { /** Load properties present in the given file. */ def getPropertiesFromFile(file: File): Seq[(String, String)] = { -require(file.exists(), s"Properties file ${file.getName} does not exist") +require(file.exists(), s"Properties file $file does not exist") +require(file.isFile(), s"Properties file $file is not a normal file") --- End diff -- `File.isFile()` returns `true` for symlinks which point to files. ``` $ touch testfile $ ln -s testfile testlink $ scala scala> new java.io.File("testlink").isFile() res0: Boolean = true ``` Additionally, since the docs aren't 100% clear and I couldn't find a solid answer from Google, I checked both the [UnixFileSystem](http://code.metager.de/source/xref/openjdk/jdk8/jdk/src/solaris/native/java/io/UnixFileSystem_md.c#111) and [WindowsFileSystem](http://code.metager.de/source/xref/openjdk/jdk6/jdk/src/windows/native/java/io/Win32FileSystem_md.c#150). The former uses `stat` which resolves symbolic links. The latter will set isFile to true if and only if it is not a directory, so symlinks would be included. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44753736 I'm just copying-and-pasting to get something similar running externally. Maybe it's a little surprising that the example code doesn't work that way -- being in a `main()` kind of suggests this is a stand-alone program. Maybe just me. I think there are a few possibilities: - Change all example code to set master if missing (that's what the current PR does) - Change SparkConf to do something similar as a global default - Just update javadoc to make it clear that the examples require the `spark.master` system property to be set I slightly prefer one of the first two on the principle of least surprise, but can go any direction. I think at least the third should be done. What say everyone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44753603 Hey Sean, how are you running the examples. Are you using the `run-example` script? That script should set the master to `local[*]` if the user hasn't specified it, which will use all cores locally. I think in some cases we might need to update the javadocs in examples to tell users to use `run-example`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1977][MLLIB] use immutable BitSet in AL...
Github user nevillelyh commented on the pull request: https://github.com/apache/spark/pull/925#issuecomment-44750919 Sure that would also work. I made a PR to chill as well. https://github.com/twitter/chill/pull/185 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/931#issuecomment-44749478 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/931#issuecomment-44749479 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15322/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/931#issuecomment-44749436 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/931#issuecomment-44749434 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix JIRA-983 and support exteranl sort for sor...
GitHub user xiajunluan opened a pull request: https://github.com/apache/spark/pull/931 Fix JIRA-983 and support exteranl sort for sortByKey Change class ExternalAppendOnlyMap and make it support customized comparator function(not only sorted by hashCode). You can merge this pull request into a Git repository by running: $ git pull https://github.com/xiajunluan/spark-1 JIRA-983 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/931.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #931 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44747313 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44747315 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15321/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44746537 I pushed again, with `setIfMissing`. Is it better in the `SparkConf` constructor? or I am off base here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44746504 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44746502 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add a function that can build an EdgePartition...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/792#issuecomment-44746294 Thanks, I meant to do this a while ago but never got around to it. The code duplication is unfortunate, though. Why not just have toEdgePartition take an optional parameter `sort: Boolean`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use optional third argument as edge attribute.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/901#issuecomment-44746122 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15320/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use optional third argument as edge attribute.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/901#issuecomment-44746121 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use optional third argument as edge attribute.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/901#issuecomment-44746090 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use optional third argument as edge attribute.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/901#issuecomment-44746088 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use optional third argument as edge attribute.
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/901#issuecomment-44746040 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use optional third argument as edge attribute.
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/901#issuecomment-44745827 Unlike Python (but like Java), Scala doesn't use asInstanceOf for arbitrary type conversions. In this case, it won't work to do `"123".asInstanceOf[Int]`, because Scala doesn't know how to do the conversion. There are two ways to resolve this: 1. Have `edgeListFile` take a function that does the conversion: ```scala def edgeListFile[ED: ClassTag](..., parseEdgeAttr: String => ED) = { ... parseEdgeAttr(lineArray(2)) ... } // Can be called like this: edgeListFile[Int](..., _.toInt) ``` This is simple, but it doesn't facilitate reuse of the parsing functions. You always have to specify how to parse an Int, even though it's usually the same everywhere. 2. Define a library of standard parsers and use Scala implicits to select the right one automatically if it exists: ```scala // `CanParse[T]` is a type class [1] indicating that we can parse any String // to construct a T. If there is an `implicit val` of type Parseable[Foo] // in scope, it means you can convert from any String to a Foo. (This is // called Read in Haskell.) trait Parseable[T] { def parse(s: String): T } // Here are some parsers. The user can define their own by creating // additional implicit vals. object Parseable { implicit val IntParser = new Parseable[Int] { def parse(s: String) = s.toInt } implicit val StringParser = new Parseable[String] { def parse(s: String) = s } } // Instead of an explicit function parameter, we use a context bound, which // desugars to an implicit parameter of type Parseable[ED] (that is, a // parser for ED) that we can access using implicitly[Parseable[ED]]. def edgeListFile[ED: ClassTag : Parseable](...) = { ... implicitly[Parseable[ED]].parse(lineArray(2)) ... } // Can be called without passing anything extra: edgeListFile[Int](...) ``` --- Spark does have a [style guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide), but it's pretty sparse. Most questions will require looking for other examples in the code, or just submitting the PR and seeing what reviewers say. [1] http://docs.scala-lang.org/tutorials/FAQ/finding-implicits.html#context_bounds --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1979] Added Error Handling if user pass...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/930#issuecomment-44745616 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1979] Added Error Handling if user pass...
GitHub user pankajarora12 opened a pull request: https://github.com/apache/spark/pull/930 [SPARK-1979] Added Error Handling if user passes application params with... Added error message to user when used --arg for passing application parameters. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pankajarora12/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/930.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #930 commit d6bfba3d7b9236a02a7e91233f8e512bea761af0 Author: pankaj.arora Date: 2014-05-31T11:11:05Z [SPARK-1979] Added Error Handling if user passes application params with --arg --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44745344 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44745345 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15319/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1974. Most examples fail at startup beca...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/926#issuecomment-44744710 Yeah I think it's essential to not prevent `-Dspark.master=...` from working, oops. I think it may be useful to have this work if one copies-and-pastes too, as I just did. The javadoc doesn't indicate that you have to set the master either. I will rework it to use `setIfMissing()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44744576 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44744577 Oops, yes, coming up now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806 (addendum) Use non-deprecated metho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/920#issuecomment-44744579 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Super minor: Close inputStream in SparkSubmitA...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/914#discussion_r13260477 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -381,16 +381,19 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { object SparkSubmitArguments { /** Load properties present in the given file. */ def getPropertiesFromFile(file: File): Seq[(String, String)] = { -require(file.exists(), s"Properties file ${file.getName} does not exist") +require(file.exists(), s"Properties file $file does not exist") +require(file.isFile(), s"Properties file $file is not a normal file") --- End diff -- `isFile()` is `false` for symlinks. It may be more conservative to require `!file.isDirectory()`, since it seems valid to point to a symlinked config file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-44743612 @mateiz I noticed that the docs for Python programming guid have been merged into the overall Programming guide. Where do you think the best place is to put the bit of documentation about InputFormats for PySpark? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742948 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742949 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15318/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742668 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15317/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742667 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-1930] The Container is running bey...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/894#issuecomment-44742298 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15316/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-1930] The Container is running bey...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/894#issuecomment-44742297 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/897#issuecomment-44742270 @pwendell Jenkins failed due to binary compatibility for SerializableHyperLogLog, which is no longer needed ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742189 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742193 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/897#issuecomment-44742112 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15315/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1941: Update streamlib to 2.7.0 and use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/897#issuecomment-44742111 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/828 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/828#issuecomment-44742037 This solution is not perfect. temporarily close this. The new #929 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-44742011 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---