[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash
Github user wangxiaojing commented on the pull request: https://github.com/apache/spark/pull/3442#issuecomment-65030850 @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65030925 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65030922 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Cleanup 'import SparkContex...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3530#issuecomment-65030889 This is a really important fix, actually, since we ran into problems with IntelliJ's automatic import cleanup removing these: if we perform this import cleanup incrementally as part of other patches, then those patches will introduce build-breaks if they're cherry-picked into pre-1.2 versions of Spark. As a result, it's much better to do all of this cleanup in one pass, as you've done here. +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65030931 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65031245 [Test build #23979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23979/consoleFull) for PR 3519 at commit [`8f5daf9`](https://github.com/apache/spark/commit/8f5daf9072f23ef46102fe4419da5cf79212bc2f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3531#issuecomment-65031244 [Test build #23978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23978/consoleFull) for PR 3531 at commit [`681243a`](https://github.com/apache/spark/commit/681243aa2ae1ae804a033a5aded0bc8127f30e80). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65031323 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23979/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65031322 [Test build #23979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23979/consoleFull) for PR 3519 at commit [`8f5daf9`](https://github.com/apache/spark/commit/8f5daf9072f23ef46102fe4419da5cf79212bc2f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait MonotonicityConstraint ` * `class IsotonicRegressionModel(` * `case class WeightedLabeledPoint(label: Double, features: Vector, weight: Double = 1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-65031620 @pwendell The example data do not need to be on the classpath. They are sample data files used by mllib examples, e.g., BinaryClassification, MovieLensALS. Usually the example code is the starting point for users. @srowen 's change makes it easy to run exmaples: 1. download and unzip the distribution zip 2. run `bin/run-example mllib.DatasetExample`, which will read a file under `data/` by default. The change looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r21075634 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -197,4 +198,27 @@ class VectorsSuite extends FunSuite { assert(svMap.get(2) === Some(3.1)) assert(svMap.get(3) === Some(0.0)) } + + test(vector p-norm) { +val dv = Vectors.dense(0.0, -1.2, 3.1, 0.0, -4.5, 1.9) +val sv = Vectors.sparse(6, Seq((1, -1.2), (2, 3.1), (3, 0.0), (4, -4.5), (5, 1.9))) + +assert(Vectors.norm(dv, 1.0) ~== dv.toArray.foldLeft(0.0)((a, v) = + a + math.abs(v)) relTol 1E-8) +assert(Vectors.norm(sv, 1.0) ~== sv.toArray.foldLeft(0.0)((a, v) = + a + math.abs(v)) relTol 1E-8) + +assert(Vectors.norm(dv, 2.0) ~== math.sqrt(dv.toArray.foldLeft(0.0)((a, v) = + a + v * v)) relTol 1E-8) +assert(Vectors.norm(sv, 2.0) ~== math.sqrt(sv.toArray.foldLeft(0.0)((a, v) = + a + v * v)) relTol 1E-8) + +assert(Vectors.norm(dv, Double.PositiveInfinity) ~== dv.toArray.map(math.abs).max relTol 1E-8) +assert(Vectors.norm(sv, Double.PositiveInfinity) ~== sv.toArray.map(math.abs).max relTol 1E-8) + +assert(Vectors.norm(dv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, v) = + a + math.pow(math.abs(v), 3.7)), 1.0 / 3.7) relTol 1E-8) +assert(Vectors.norm(sv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, v) = --- End diff -- `dv` - `sv` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3532 [SPARK-4674] Refactor getCallSite The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 refactor_getCallSite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3532.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3532 commit e7410177cf55b8e5f99fea844f8c3ed8035004e6 Author: Liang-Chi Hsieh vii...@gmail.com Date: 2014-12-01T08:18:28Z Refactor getCallSite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3532#issuecomment-65032940 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r21075718 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error(Unsupported Breeze vector type: + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p = 1.0) +val values = vector match { + case dv: DenseVector = dv.values + case sv: SparseVector = sv.values + case v = throw new IllegalArgumentException(Do not support vector type + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- It is an interesting discussion ~ :) But maybe more people are familiar with the `if ... else if ... else` statement. And this is not on the critical path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix wrong file name pattern in .gitignore
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3529#issuecomment-65033146 Thanks. I've merged this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix wrong file name pattern in .gitignore
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3529 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3527#issuecomment-65033393 Yea as @aarondav pointed out, I don't think akka framesize is going to be a problem anymore in 1.2+, regardless of the number of partitions. Still good to have this check to be defensive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3527#issuecomment-65033421 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3527 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3522#issuecomment-65033588 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3521#issuecomment-65033664 Merging in master branch-1.2. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3521 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3526#discussion_r21076063 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -298,11 +298,15 @@ case class InsertIntoParquetTable( val committer = format.getOutputCommitter(hadoopContext) committer.setupTask(hadoopContext) val writer = format.getRecordWriter(hadoopContext) - while (iter.hasNext) { -val row = iter.next() -writer.write(null, row) + try { +while (iter.hasNext) { + val row = iter.next() + writer.write(null, row) +} + } + finally { --- End diff -- can you put the finally on the same line as the previous } ? thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3522#issuecomment-65034129 [Test build #23980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23980/consoleFull) for PR 3522 at commit [`16fee22`](https://github.com/apache/spark/commit/16fee22d5294445e6ef46acc676780c18470c5fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor fix for doc and comment
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/3533 [SQL] Minor fix for doc and comment You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark sql-doc1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3533.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3533 commit 962910bbce3fed010985dca6d7fd6f538a5adff3 Author: wangfei wangf...@huawei.com Date: 2014-12-01T08:41:59Z doc and comment fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: 'Do not replicate streaming block when WAL is ...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/3534 'Do not replicate streaming block when WAL is enabled You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-4671 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3534.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3534 commit 500b45689d2cd6db2ec0a7e32949863dc973870a Author: jerryshao saisai.s...@intel.com Date: 2014-12-01T07:58:32Z Do not replicate streaming block when WAL is enabled --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3442#issuecomment-65034719 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4671][Streaming]Do not replicate stream...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3534#issuecomment-65035054 [Test build #23981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23981/consoleFull) for PR 3534 at commit [`500b456`](https://github.com/apache/spark/commit/500b45689d2cd6db2ec0a7e32949863dc973870a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3442#issuecomment-65035043 [Test build #23983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23983/consoleFull) for PR 3442 at commit [`3a63ecb`](https://github.com/apache/spark/commit/3a63ecb81aa02a02dc53d014ed3358927a95a376). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r21076434 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error(Unsupported Breeze vector type: + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p = 1.0) +val values = vector match { + case dv: DenseVector = dv.values + case sv: SparseVector = sv.values + case v = throw new IllegalArgumentException(Do not support vector type + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- yeah. but this will not work here unless p has type of `Int`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor fix for doc and comment
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3533#issuecomment-65035032 [Test build #23982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23982/consoleFull) for PR 3533 at commit [`962910b`](https://github.com/apache/spark/commit/962910bbce3fed010985dca6d7fd6f538a5adff3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix wrong file name pattern in .gitignore
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3529#issuecomment-65035216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23976/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix wrong file name pattern in .gitignore
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3529#issuecomment-65035212 [Test build #23976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23976/consoleFull) for PR 3529 at commit [`de3c70a`](https://github.com/apache/spark/commit/de3c70acf34b8aa000f189a3f7731fe844377de7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3535 [Branch-1.2] [DOC] Date type in SQL programming guide You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark datedoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3535.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3535 commit 18ff1eddc145cf23d197da0c0b5c55d6ea2e7bd1 Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-12-01T08:48:18Z [DOC] Date type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65035512 [Test build #23984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull) for PR 3462 at commit [`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3535#issuecomment-65035797 [Test build #23985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23985/consoleFull) for PR 3535 at commit [`18ff1ed`](https://github.com/apache/spark/commit/18ff1eddc145cf23d197da0c0b5c55d6ea2e7bd1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3531#issuecomment-65036523 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23978/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3531#discussion_r21077028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan) } override def execute() = { -val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) { - child.execute().mapPartitions { iter = -iter.take(limit).map(row = (false, row.copy())) +if (sortBasedShuffleOn) { + child.execute().map(_.copy).coalesce(1).mapPartitions { iter = +iter.take(limit) --- End diff -- Can we move the `map(_.copy)` after `take(limit)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Cleanup 'import SparkContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3530#issuecomment-65036726 [Test build #23977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23977/consoleFull) for PR 3530 at commit [`04e2273`](https://github.com/apache/spark/commit/04e227382a6925f443da8794210faba0828f6f0d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Cleanup 'import SparkContex...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3530#issuecomment-65036730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23977/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3531#discussion_r21077129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan) } override def execute() = { -val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) { - child.execute().mapPartitions { iter = -iter.take(limit).map(row = (false, row.copy())) +if (sortBasedShuffleOn) { --- End diff -- Probably we can ignore the `shortBasedShuffleOn` conditional checking. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3531#discussion_r21077317 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan) } override def execute() = { -val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) { - child.execute().mapPartitions { iter = -iter.take(limit).map(row = (false, row.copy())) +if (sortBasedShuffleOn) { --- End diff -- Refer to https://github.com/scwf/spark/commit/e2614038e78f4693fafedeee15b6fdf0ea1be473, seems ignore this will leads some problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3522#issuecomment-65037572 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23980/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3522#issuecomment-65037569 [Test build #23980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23980/consoleFull) for PR 3522 at commit [`16fee22`](https://github.com/apache/spark/commit/16fee22d5294445e6ef46acc676780c18470c5fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3531#discussion_r21077554 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan) } override def execute() = { -val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) { - child.execute().mapPartitions { iter = -iter.take(limit).map(row = (false, row.copy())) +if (sortBasedShuffleOn) { + child.execute().map(_.copy).coalesce(1).mapPartitions { iter = +iter.take(limit) --- End diff -- Hmm, I will try this. Actually i am not clear why we need ```copy``` here, @rxin added it to fix a bug. Hi @rxin, can you explain this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3521#issuecomment-65039393 @zsxwing I know it's too late, but the cast should also have a `@SuppressWarnings(unchecked)`, ideally, to avoid another warnings. I have some things like this taken care of in another open PR: https://www.github.com/apache/spark/pull/3157 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65040128 Not sure if it's easy, but most of the diff is inadvertent changes to whitespace at the end of lines. This makes it a little hard to see the changes you're making since they're not otherwise enumerated here or in the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3521#issuecomment-65040116 but the cast should also have a @SuppressWarnings(unchecked), ideally, to avoid another warnings. I have some things like this taken care of in another open PR: @srowen, yes. Then it's better to add it in your PR :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-65041271 [Test build #23986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23986/consoleFull) for PR 3222 at commit [`f5ab79e`](https://github.com/apache/spark/commit/f5ab79ebf8515cace31622dab32b1c4d33a35471). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...
GitHub user sbourke opened a pull request: https://github.com/apache/spark/pull/3536 [MLLIB][SPARK-4675] Find similar products and similar users in MatrixFactorizationModel Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products based on the feature space. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sbourke/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3536.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3536 commit 956ca1b86aacb22fabd52740ce0c6fef5524bae8 Author: Senior Stefano El Bour-que steven.bou...@schibsted.es Date: 2014-11-28T08:40:40Z added functionality to find similar users and similar products commit 12e6b6b3a2cbfa1baa29449396e7e85bed1dec56 Author: Steven Bourke steve@stevens-imac.local Date: 2014-11-30T23:22:46Z added unit test to make sure id isnt teh same --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3536#discussion_r21078944 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -95,6 +95,35 @@ class MatrixFactorizationModel( } /** + * Recommends similar products + * + * @param user the user to find similar users for + * @param num how many products to return. The number returned may be less than this. + * @return [[Rating]] objects, each of which contains the given user ID, a user ID, and a + * score in the rating field. Each represents one recommended user, and they are sorted + * by score, decreasing. The first returned is the one predicted to be most similar + * user to the specified user ID. The score is an opaque value that indicates how strongly + * recommended the user is. + */ + def recommendSimilariUsers(user: Int, num: Int): Array[Rating] = --- End diff -- Typo: `Similari`, also below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3442#issuecomment-65041697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23983/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3442#issuecomment-65041688 [Test build #23983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23983/consoleFull) for PR 3442 at commit [`3a63ecb`](https://github.com/apache/spark/commit/3a63ecb81aa02a02dc53d014ed3358927a95a376). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class BroadcastLeftSemiJoinHash(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3536#issuecomment-65042076 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3536#issuecomment-65042436 I think it's essential to explain (even in internal comments, or this PR) what the similarity metric is. It's just ranking by dot product, which makes it something like cosine similarity. The differences are that it isn't in [-1,1], and the result doesn't normalize away the length of the feature vectors. This tends to favor popular items, or mean that somewhat less similar items may rank higher because they're popular. I had traditionally viewed that as a negative, and preferred the more standard cosine similarity, but it's certainly up for debate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4671][Streaming]Do not replicate stream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3534#issuecomment-65043093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23981/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4671][Streaming]Do not replicate stream...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3534#issuecomment-65043086 [Test build #23981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23981/consoleFull) for PR 3534 at commit [`500b456`](https://github.com/apache/spark/commit/500b45689d2cd6db2ec0a7e32949863dc973870a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65043512 [Test build #23984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull) for PR 3462 at commit [`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65043520 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23984/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3535#issuecomment-65044316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23985/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3535#issuecomment-65044310 [Test build #23985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23985/consoleFull) for PR 3535 at commit [`18ff1ed`](https://github.com/apache/spark/commit/18ff1eddc145cf23d197da0c0b5c55d6ea2e7bd1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor fix for doc and comment
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3533#issuecomment-65045819 [Test build #23982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23982/consoleFull) for PR 3533 at commit [`962910b`](https://github.com/apache/spark/commit/962910bbce3fed010985dca6d7fd6f538a5adff3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor fix for doc and comment
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3533#issuecomment-65045823 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23982/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3575][SQL][WIP] Removes the Metastore P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3441#issuecomment-65047207 [Test build #23987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23987/consoleFull) for PR 3441 at commit [`630330a`](https://github.com/apache/spark/commit/630330afaae2dd1d10436cb4acb41b6da217f82b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-65050301 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23986/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-65050294 [Test build #23986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23986/consoleFull) for PR 3222 at commit [`f5ab79e`](https://github.com/apache/spark/commit/f5ab79ebf8515cace31622dab32b1c4d33a35471). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DBN(val stackedRBM: StackedRBM, val nn: NN)` * `class NN(val innerLayers: Array[NNLayer])` * `class RBM(` * `class StackedRBM(val innerRBMs: Array[RBM])` * `case class MinstItem(label: Int, data: Array[Int]) ` * `class MinstDatasetReader(labelsFile: String, imagesFile: String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/3526#issuecomment-65053699 @rxin no problem. Had modify it :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3575][SQL][WIP] Removes the Metastore P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3441#issuecomment-65053872 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23987/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3575][SQL][WIP] Removes the Metastore P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3441#issuecomment-65053865 [Test build #23987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23987/consoleFull) for PR 3441 at commit [`630330a`](https://github.com/apache/spark/commit/630330afaae2dd1d10436cb4acb41b6da217f82b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3526#issuecomment-65054273 [Test build #23988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23988/consoleFull) for PR 3526 at commit [`b36bf96`](https://github.com/apache/spark/commit/b36bf96ed12d937a511d2292e424da10de8720c8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...
Github user XuefengWu commented on the pull request: https://github.com/apache/spark/pull/3380#issuecomment-6502 @aarondav any more suggestion ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4672][GraphX]Non-transient PartitionsRD...
GitHub user JerryLead opened a pull request: https://github.com/apache/spark/pull/3537 [SPARK-4672][GraphX]Non-transient PartitionsRDDs lead to StackOverflow error The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 In a nutshell, if `val partitionsRDD` of VertexRDD and EdgeRDD are non-transient, the task's serialization chain will become very long in iterative algorithms and finally lead to the StackOverflow error. More details and explanation can be found in the JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JerryLead/spark my_change Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3537.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3537 commit 52799e3ea2b22f4bcaec3d9cd4c8891e212be09e Author: Lijie Xu csxuli...@gmail.com Date: 2014-12-01T08:54:37Z Merge pull request #1 from apache/master update commit 5207961636f41187109c2d71617f8aba7d277e07 Author: JerryLead jerryl...@163.com Date: 2014-12-01T11:45:31Z set VertexRDD.partitionsRDD and EdgeRDD.partitionsRDD to transient variables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4672][GraphX]Non-transient PartitionsRD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3537#issuecomment-65056312 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3891][SQL] Add array support to percent...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/spark/pull/2802#discussion_r21086151 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala --- @@ -172,6 +177,8 @@ private[hive] case class HiveGenericUdf(functionClassName: String, children: Seq override def eval(input: Row): Any = { returnInspector // Make sure initialized. +if(foldable) return constantReturnValue --- End diff -- In HiveQuerySuite, constant array testcase was failing SELECT sort_array( sort_array( array(hadoop distributed file system, enterprise databases, hadoop map-reduce))) FROM src LIMIT 1 [info] - constant array *** FAILED *** (596 milliseconds) [info] Failed to execute query using catalyst: [info] Error: java.lang.String cannot be cast to org.apache.hadoop.io.Text --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3891][SQL] Add array support to percent...
Github user gvramana commented on the pull request: https://github.com/apache/spark/pull/2802#issuecomment-65059567 @marmbrus, @chenghao-intel any other comments? Can you merge the same, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4672][GraphX]Non-transient PartitionsRD...
Github user JerryLead closed the pull request at: https://github.com/apache/spark/pull/3537 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3526#issuecomment-65060582 [Test build #23988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23988/consoleFull) for PR 3526 at commit [`b36bf96`](https://github.com/apache/spark/commit/b36bf96ed12d937a511d2292e424da10de8720c8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3526#issuecomment-65060591 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23988/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...
GitHub user YanTangZhai opened a pull request: https://github.com/apache/spark/pull/3538 [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql(select null from spark_test.for_test) println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanTangZhai/spark MatchNullType Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3538.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3538 commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-06T13:07:08Z Merge pull request #1 from apache/master update commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-20T13:14:08Z Merge pull request #3 from apache/master Update commit 8a0010691b669495b4c327cf83124cabb7da1405 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-12T06:54:58Z Merge pull request #6 from apache/master Update commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-16T12:03:22Z Merge pull request #7 from apache/master Update commit 76d40277d51f709247df1d3734093bf2c047737d Author: YanTangZhai hakeemz...@tencent.com Date: 2014-10-20T12:52:22Z Merge pull request #8 from apache/master update commit d26d98248a1a4d0eb15336726b6f44e05dd7a05a Author: YanTangZhai hakeemz...@tencent.com Date: 2014-11-04T09:00:31Z Merge pull request #9 from apache/master Update commit e249846d9b7967ae52ec3df0fb09e42ffd911a8a Author: YanTangZhai hakeemz...@tencent.com Date: 2014-11-11T03:18:24Z Merge pull request #10 from apache/master Update commit 6e643f81555d75ec8ef3eb57bf5ecb6520485588 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-12-01T11:23:56Z Merge pull request #11 from apache/master Update commit 896c7b73f0ba1b2d3dccf6fed6410bf077eb3d54 Author: yantangzhai tyz0...@163.com Date: 2014-12-01T13:08:41Z fix NullType MatchError in JavaSchemaRDD when sql has null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3538#issuecomment-65064263 [Test build #23989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23989/consoleFull) for PR 3538 at commit [`896c7b7`](https://github.com/apache/spark/commit/896c7b73f0ba1b2d3dccf6fed6410bf077eb3d54). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Documentation: add description for repartition...
Github user msiddalingaiah commented on the pull request: https://github.com/apache/spark/pull/3390#issuecomment-65067064 OK, done. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65069007 [Test build #23990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23990/consoleFull) for PR 3519 at commit [`6046550`](https://github.com/apache/spark/commit/6046550e79af307e582ffaae559e56d46c884967). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...
GitHub user YanTangZhai opened a pull request: https://github.com/apache/spark/pull/3539 [SPARK-4677] [WEB] Add hadoop input time in task webui Add hadoop input time in task webui like GC Time to explicitly show the time used by task to read input data. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanTangZhai/spark WebuiInputTime Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3539 commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-06T13:07:08Z Merge pull request #1 from apache/master update commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-20T13:14:08Z Merge pull request #3 from apache/master Update commit 8a0010691b669495b4c327cf83124cabb7da1405 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-12T06:54:58Z Merge pull request #6 from apache/master Update commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-16T12:03:22Z Merge pull request #7 from apache/master Update commit 76d40277d51f709247df1d3734093bf2c047737d Author: YanTangZhai hakeemz...@tencent.com Date: 2014-10-20T12:52:22Z Merge pull request #8 from apache/master update commit d26d98248a1a4d0eb15336726b6f44e05dd7a05a Author: YanTangZhai hakeemz...@tencent.com Date: 2014-11-04T09:00:31Z Merge pull request #9 from apache/master Update commit e249846d9b7967ae52ec3df0fb09e42ffd911a8a Author: YanTangZhai hakeemz...@tencent.com Date: 2014-11-11T03:18:24Z Merge pull request #10 from apache/master Update commit 6e643f81555d75ec8ef3eb57bf5ecb6520485588 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-12-01T11:23:56Z Merge pull request #11 from apache/master Update commit 3816f8540b947809cb821bcb3af36d7be0210d9c Author: yantangzhai tyz0...@163.com Date: 2014-12-01T14:09:24Z add hadoop input read time in webui --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4642] Documents about running-on-YARN n...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3500#issuecomment-65069892 @sryza is correct. Most of those were intentionally left undocumented. If you have reason they need to be changed then we can revisit them sooner to make sure they are what we want and get them documented. Note there is a different pr up to fix up spark.yarn.user.classpath.first (https://github.com/apache/spark/pull/3233) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3539#issuecomment-65069992 [Test build #23991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23991/consoleFull) for PR 3539 at commit [`3816f85`](https://github.com/apache/spark/commit/3816f8540b947809cb821bcb3af36d7be0210d9c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3539#issuecomment-65070129 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23991/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3539#issuecomment-65070127 [Test build #23991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23991/consoleFull) for PR 3539 at commit [`3816f85`](https://github.com/apache/spark/commit/3816f8540b947809cb821bcb3af36d7be0210d9c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3539#discussion_r21090555 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -238,10 +238,13 @@ class HadoopRDD[K, V]( val value: V = reader.createValue() var recordsSinceMetricsUpdate = 0 + var startTime : Long = 0L override def getNext() = { try { + startTime = System.nanoTime finished = !reader.next(key, value) + inputMetrics.readTime += (System.nanoTime - startTime) --- End diff -- Hm, is this going to be expensive, making 2 system calls for every read? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4665] Improve YarnAllocator's parsing o...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3525#issuecomment-65071045 Perhaps I've missed it but I haven't heard a lot of cases for either way. Do you have examples or use cases? I'd be open to changing it but want more reasoning behind it. I've found putting in the value rather then a % easier in some cases that weren't small/straight forward jobs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-3293]Fix yarn's web show SUCCEED...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3508#discussion_r21090918 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -92,6 +104,57 @@ private[spark] abstract class YarnSchedulerBackend( } /** + * This system security manager applies to the entire process. + * It's main purpose is to handle the case if the user code does a System.exit. + * This allows us to catch that and properly set the YARN application status and + * cleanup if needed. + */ + private def setupSystemSecurityManager(amActor: ActorRef): Unit = { --- End diff -- The securityManager in the AM was causing a performance impact and we just removed it. I expect the same issue to happen here. https://github.com/apache/spark/pull/3484 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4584] [yarn] Remove security manager fr...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3484#issuecomment-65071338 @vanzin I would still be curious if you have more details on the exact performance impact? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3538#issuecomment-65072171 [Test build #23989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23989/consoleFull) for PR 3538 at commit [`896c7b7`](https://github.com/apache/spark/commit/896c7b73f0ba1b2d3dccf6fed6410bf077eb3d54). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3538#issuecomment-65072175 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23989/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65081643 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23990/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-65081631 [Test build #23990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23990/consoleFull) for PR 3519 at commit [`6046550`](https://github.com/apache/spark/commit/6046550e79af307e582ffaae559e56d46c884967). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait MonotonicityConstraint ` * `class IsotonicRegressionModel(` * `case class WeightedLabeledPoint(label: Double, features: Vector, weight: Double = 1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/3380#discussion_r21098192 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala --- @@ -19,12 +19,19 @@ package org.apache.spark.scheduler import java.util.concurrent.Semaphore +import akka.actor.ActorSystem --- End diff -- nit: import ordering should abide by the [style guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/3380#discussion_r21098231 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala --- @@ -202,6 +209,60 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with Matchers stageInfo.rddInfos.forall(_.numPartitions == 4) should be {true} } + //SEE SPARK-2208: hack BlockManager to have a sleep when read shuffle data --- End diff -- nit: space after // --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/3380#discussion_r21098278 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala --- @@ -202,6 +209,60 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with Matchers stageInfo.rddInfos.forall(_.numPartitions == 4) should be {true} } + //SEE SPARK-2208: hack BlockManager to have a sleep when read shuffle data + test(local metrics with fetchWaitTime) { +val listener = new SaveStageAndTaskInfo +val sc2 = new SparkContext(local, SparkListenerSuite2) + +val env = SparkEnv.get +val bm: BlockManager = env.blockManager +val numOfCore = Runtime.getRuntime().availableProcessors() +val maxMemory = getMaxMemory(env.conf) + +val hackedBlockManager = new SlowBlockManager(env.executorId, env.actorSystem, bm.master, + env.serializer, maxMemory, env.conf, env.mapOutputTracker, env.shuffleManager, env.blockTransferService, env.securityManager,numOfCore) --- End diff -- we have a max line length of 100ch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/3380#discussion_r21098319 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala --- @@ -202,6 +209,60 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with Matchers stageInfo.rddInfos.forall(_.numPartitions == 4) should be {true} } + //SEE SPARK-2208: hack BlockManager to have a sleep when read shuffle data + test(local metrics with fetchWaitTime) { +val listener = new SaveStageAndTaskInfo +val sc2 = new SparkContext(local, SparkListenerSuite2) + +val env = SparkEnv.get +val bm: BlockManager = env.blockManager +val numOfCore = Runtime.getRuntime().availableProcessors() +val maxMemory = getMaxMemory(env.conf) + +val hackedBlockManager = new SlowBlockManager(env.executorId, env.actorSystem, bm.master, + env.serializer, maxMemory, env.conf, env.mapOutputTracker, env.shuffleManager, env.blockTransferService, env.securityManager,numOfCore) + + +val hackEnv = new SparkEnv(env.executorId, env.actorSystem, env.serializer, env.closureSerializer, env.cacheManager, env.mapOutputTracker, --- End diff -- line length issue here too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org