[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5668#issuecomment-95817298 [Test build #30919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30919/consoleFull) for PR 5668 at commit [`b4651fd`](https://github.com/apache/spark/commit/b4651fd80a55f016093d84cf3b00ad6c91333cef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/5680 [SPARK-7112][Streaming] Add a DirectStreamTracker to track the direct streams You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-7111 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5680 commit 28d668faf51495e779aa1f874ceb03a64bccf410 Author: jerryshao saisai.s...@intel.com Date: 2015-04-24T06:07:54Z Add DirectStreamTracker to track the direct streams --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5680#issuecomment-95819308 [Test build #30920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull) for PR 5680 at commit [`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...
Github user aniketbhatnagar commented on the pull request: https://github.com/apache/spark/pull/5354#issuecomment-95819955 +1 from my side. having a consistent httpclient version would be so much better! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95821459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30916/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95821427 [Test build #30916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30916/consoleFull) for PR 2342 at commit [`d3c63c8`](https://github.com/apache/spark/commit/d3c63c84a56041756841dd0706d87c8c808e84d3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class ExecutorUIData(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95822131 This looks like a duplicate of SPARK-6954 (PR #5536) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29026826 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( logDebug(sRead partition data of $this from block manager, block $blockId) iterator case None = // Data not found in Block Manager, grab it from write ahead log file -val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf) -val dataRead = reader.read(partition.segment) -reader.close() +var dataRead: ByteBuffer = null +var writeAheadLog: WriteAheadLog = null +try { + val dummyDirectory = FileUtils.getTempDirectoryPath() --- End diff -- Why here need to use `dummyDirectory`? Assuming WAL may not be file-based, so I'm not sure what's the meaning we need to have this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-95846092 Looks almost good, except the comments on the API. Other than that, i took a detailed pass on everything else and it looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5682#issuecomment-95846148 [Test build #30925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30925/consoleFull) for PR 5682 at commit [`4e98520`](https://github.com/apache/spark/commit/4e98520e78832b25877d825392d66d10779281f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-95860808 [Test build #30924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30924/consoleFull) for PR 5643 at commit [`90a69ec`](https://github.com/apache/spark/commit/90a69ec603279442c5a0b3e510e8f5db9e1bbb80). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...
GitHub user Sephiroth-Lin opened a pull request: https://github.com/apache/spark/pull/5684 [PySpark][Minor] Update sql example, so that can read file correctly To run Spark, default will read file from HDFS if we don't set the schema. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sephiroth-Lin/spark pyspark_example_minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5684.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5684 commit 19fe145e7a00574080b91d311376b6d2cdb4254e Author: linweizhong linweizh...@huawei.com Date: 2015-04-24T09:16:23Z Update example sql.py, so that can read file correctly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:[core] Write event logs of differen...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4845#issuecomment-95896735 It looks like this work is being continued in https://github.com/apache/spark/pull/5432 which is currently more active. Do you mind closing this PR and focusing discussion on that PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5687#discussion_r29041932 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala --- @@ -51,8 +51,8 @@ class Node ( var stats: Option[InformationGainStats]) extends Serializable with Logging { override def toString: String = { -id = + id + , isLeaf = + isLeaf + , predict = + predict + , + --- End diff -- These can use string interpolation. I take your point though it breaks the symmetry a bit and make this `toString` rely on details of the subclass. How about making the `Predict.toString` return something more compact like `s$predict ($prob)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-95910746 cool, will make the changes along with sprak-7045 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5681#issuecomment-95841203 [Test build #30922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30922/consoleFull) for PR 5681 at commit [`9ac27cd`](https://github.com/apache/spark/commit/9ac27cd5856205a5e316e1679bdd39200d4c3ede). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5680#issuecomment-95845825 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30920/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/5682 [SPARK-7098][SQL] Make the WHERE clause with timestamp show consistent result JIRA: https://issues.apache.org/jira/browse/SPARK-7098 The WHERE clause with timstamp shows inconsistent results. This pr fixes it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 consistent_timestamp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5682.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5682 commit 4e98520e78832b25877d825392d66d10779281f7 Author: Liang-Chi Hsieh vii...@gmail.com Date: 2015-04-24T08:07:44Z Make the WHERE clause with timestamp show consistent result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-95851114 [Test build #30927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30927/consoleFull) for PR 5266 at commit [`741db31`](https://github.com/apache/spark/commit/741db31f112469141a22634a406ab20feb13e678). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5681#issuecomment-95868782 [Test build #30922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30922/consoleFull) for PR 5681 at commit [`9ac27cd`](https://github.com/apache/spark/commit/9ac27cd5856205a5e316e1679bdd39200d4c3ede). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PolynomialExpansion extends UnaryTransformer[Vector, Vector, PolynomialExpansion] ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6568] spark-shell.cmd --jars option doe...
Github user tsudukim commented on a diff in the pull request: https://github.com/apache/spark/pull/5447#discussion_r29036669 --- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala --- @@ -82,7 +82,7 @@ object PythonRunner { sspark-submit is currently only supported for local files: $path) } val windows = Utils.isWindows || testWindows -var formattedPath = if (windows) Utils.formatWindowsPath(path) else path +var formattedPath = Utils.formatPath(path, windows) --- End diff -- That's right. I'll try to remove them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/4527 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PYSPARK] Add percentile method in rdd as nump...
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5686 [PYSPARK] Add percentile method in rdd as numpy 1. Add percentile method in rdd 2. By default, get the kth percentile element from bottom(ascending order) 3. By specifying key, it can return top or even user-defined kth percentile element 4. Tested it You can merge this pull request into a Git repository by running: $ git pull https://github.com/AiHe/spark percentile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5686.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5686 commit 1403816f8287aeee316b27ba569ce607fdb0ed2c Author: Alain a...@usc.edu Date: 2015-04-24T10:24:51Z [PYSPARK] Add percentile method in rdd as numpy 1. Add percentile method in rdd 2. By default, get the kth percentile element from bottom(ascending order) 3. By specifying key, it can return top or even user-defined kth percentile element 4. Tested it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95904531 @srowen The last assertResult I have add in the testcase is the case that can't only discarding the first non-null sample, because half of the array elems are not link to the shared object, if the first non-null sample (which generate by random) is not link to the shared object, we can't exclude the shared object. But if we sampling twice, even if the twice has not exclude the shared object, it can also work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-95905239 [Test build #30930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30930/consoleFull) for PR 5685 at commit [`2390a60`](https://github.com/apache/spark/commit/2390a608ed74a9703d3763d040421dccb51242ec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` * class SomethingNotSerializable ` * ` logDebug(s + cloning the object $obj of class $` * `class FieldAccessFinder(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5726] [MLLIB] Elementwise (Hadamard) Ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4580#issuecomment-95841209 [Test build #30923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30923/consoleFull) for PR 4580 at commit [`e7ff5f2`](https://github.com/apache/spark/commit/e7ff5f2cc3c172b97c6ea3cec6ebf7546682a74b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95905420 @shenh062326 yes, but you constructed it that way. I can construct a case that works and doesn't work for any sampling strategy. The question is, what is the common case? I'm pretty sure it's that all N objects share some common data structure, which sampling just 1 would catch. However if you want to go this way, at least generalize it. There is nothing magic about 2 samples, so it shouldn't be written that way with a redundant loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7092] Update spark scala version to 2.1...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/5662#discussion_r29041655 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala --- @@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: ScriptEngineFactory, initialSettings def apply(line: String): Result = debugging(sparse($line)) { var isIncomplete = false - currentRun.reporting.withIncompleteHandler((_, _) = isIncomplete = true) { + currentRun.parsing.withIncompleteHandler((_, _) = isIncomplete = true) { --- End diff -- It is harmless, if not beneficial. But this can be a stepping stone towards enabling add jars, because then we have two options 1) back port scala's version of add jars 2) port spark scala 2.10 repl's version of add jars on the fly. Without this patch such a option does not exist. I agree best thing is to patch scala/scala so that we donot have to do it. So I am working on it, with whatever time I have. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4474#issuecomment-95907922 @pwendell i think we cannot kill JVM directly when this occurs. when it is hive server that one driver for many jobs, if we kill JVM, other jobs on this driver cannot continue. i think this pr is ok that just abort this job and then DAGScheduler will throw a jobFailed exception to client. if it is hive server, then hive server can catch this exception and continue to run other jobs. if it is a application that i said, user application donot catch this exception and throw this to applicationMaster, then application will be failed. so that can ensure that is be right for any situations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...
Github user oefirouz commented on the pull request: https://github.com/apache/spark/pull/5601#issuecomment-95857492 Friendly bump for more comments :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-95860872 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30924/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-95879508 @chenghao-intel ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5060#discussion_r29040627 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -94,6 +94,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli // contains a map from hostname to a list of input format splits on the host. private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map() + // This is used for Spark Streaming to check whether driver host and port are set by user, + // if these two configurations are set by user, so the recovery mechanism should not remove this. + private[spark] val isDriverHostSetByUser = config.contains(spark.driver.host) --- End diff -- It doesn't seem worth tacking on yet more little fields in `SparkContext` just for a niche use case in a submodule. Use the config object in `Checkpoint`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95901582 [Test build #30931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30931/consoleFull) for PR 5608 at commit [`a9fca84`](https://github.com/apache/spark/commit/a9fca8444d7a8591032383a7d6ced84ee1f66a56). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95908120 Sampling strategy not always works, but sampling twice are more effective then only discarding the first non-null sample. And sampling 200 times will not cause performance issues. If you think the code shouldn't written like that, I aggree, I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5680#issuecomment-95845816 [Test build #30920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull) for PR 5680 at commit [`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95852527 [Test build #30928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30928/consoleFull) for PR 5604 at commit [`d07101b`](https://github.com/apache/spark/commit/d07101bd5a6f3b30532c4d4d77ab8d310607b684). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-95852537 @MechCoder Sorry for my late comment! I made some minor comments. It would be good if you can submit a follow-up PR to address those issues. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r29032437 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -429,7 +429,36 @@ class Word2Vec extends Serializable with Logging { */ @Experimental class Word2VecModel private[mllib] ( -private val model: Map[String, Array[Float]]) extends Serializable with Saveable { +model: Map[String, Array[Float]]) extends Serializable with Saveable { + + // wordList: Ordered list of words obtained from model. + private val wordList: Array[String] = model.keys.toArray + + // wordIndex: Maps each word to an index, which can retrieve the corresponding + //vector from wordVectors (see below). + private val wordIndex: Map[String, Int] = wordList.zip(0 until model.size).toMap --- End diff -- `wordList.zipWithIndex.toMap` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7118] [Python] Add the coalesce Spark S...
GitHub user ogirardot opened a pull request: https://github.com/apache/spark/pull/5683 [SPARK-7118] [Python] Add the coalesce Spark SQL function available in PySpark This patch adds a proxy call from PySpark to the Spark SQL coalesce function and this patch comes out of a discussion on dev@spark with @rxin This contribution is my original work and i license the work to the project under the project's open source license. Olivier. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ogirardot/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5683.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5683 commit e3fec1e76eaadf0aefaf16a0b935765858287f33 Author: Olivier Girardot o.girar...@lateral-thoughts.com Date: 2015-04-24T08:39:32Z SPARK-7118 Add the coalesce Spark SQL function available in PySpark No changes to the scala/java part, only changes in Python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5542#issuecomment-95870914 [Test build #30921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30921/consoleFull) for PR 5542 at commit [`6b594f0`](https://github.com/apache/spark/commit/6b594f05ef2725aa5f6bed716dbac6eed64a1879). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait AggregateFunction2 ` * `trait AggregateExpression2 extends Expression with AggregateFunction2 ` * `abstract class UnaryAggregateExpression extends UnaryExpression with AggregateExpression2 ` * `case class Min(child: Expression) extends UnaryAggregateExpression ` * `case class Average(child: Expression, distinct: Boolean = false)` * `case class Max(child: Expression) extends UnaryAggregateExpression ` * `case class Count(child: Expression)` * `case class CountDistinct(children: Seq[Expression])` * `case class Sum(child: Expression, distinct: Boolean = false)` * `case class First(child: Expression, distinct: Boolean = false)` * `case class Last(child: Expression, distinct: Boolean = false)` * `class AggregateExpressionSubsitution ` * ` class HashAggregation2(aggrSubsitution: AggregateExpressionSubsitution) extends Strategy ` * `sealed class BufferSeens(var buffer: MutableRow, var seens: Array[JSet[Any]] = null) ` * `sealed class BufferAndKey(leftLen: Int, rightLen: Int)` * `sealed trait Aggregate ` * `sealed trait PostShuffle extends Aggregate ` * `case class AggregatePreShuffle(` * `case class AggregatePostShuffle(` * `case class DistinctAggregate(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5542#issuecomment-95870924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30921/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/5685 [SPARK-7120][SPARK-7121][WIP] Closure cleaner nesting + documentation For instance, in SparkContext, I tried to do the following: {code} def scope[T](body: = T): T = body // no-op def myCoolMethod(path: String): RDD[String] = scope { parallelize(1 to 10).map { _ = path } } {code} and I got an exception complaining that SparkContext is not serializable. The issue here is that the inner closure is getting its path from the outer closure (the scope), but the outer closure actually references the SparkContext object itself to get the `parallelize` method. Note, however, that the inner closure doesn't actually need the SparkContext; it just needs a field from the outer closure. If we modify ClosureCleaner to clean the outer closure recursively while using the fields accessed by the inner closure, then we can serialize the inner closure. This is blocking my effort on a separate task. This is WIP because I plan to add tests for this later. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark closure-cleaner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5685.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5685 commit 86f78237b7623e4efa06c5feb053e0c304979c73 Author: Andrew Or and...@databricks.com Date: 2015-04-24T10:05:58Z Implement transitive cleaning + add missing documentation See in-code comments for more detail on what this means. commit 2390a608ed74a9703d3763d040421dccb51242ec Author: Andrew Or and...@databricks.com Date: 2015-04-24T10:08:11Z Feature flag this new behavior ... in case anything breaks, we should be able to resort to old behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5687 [Minor][MLLIB] Fix a formatting bug in toString method in Node 1. predict(predict.toString) has already output prefix âpredictâ thus itâs duplicated to print , predict = again 2. there are some extra spaces You can merge this pull request into a Git repository by running: $ git pull https://github.com/AiHe/spark tree-node-issue-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5687.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5687 commit 426eee7fa00eef343d10396704f0619e802841bc Author: Alain a...@usc.edu Date: 2015-04-24T09:26:03Z [Minor][MLLIB] Fix a formatting bug in toString method in Node.scala 1. predict(predict.toString) has already output prefix âpredictâ thus itâs duplicate to print , predict = again 2. there are some extra spaces --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5060#discussion_r29040703 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -41,12 +41,12 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time) val checkpointDuration = ssc.checkpointDuration val pendingTimes = ssc.scheduler.getPendingTimes().toArray val delaySeconds = MetadataCleaner.getDelaySeconds(ssc.conf) - val sparkConfPairs = ssc.conf.getAll + val sparkConfPairs = ssc.conf.getAll.filterNot { kv = --- End diff -- Maybe this can be turned into a generic function that removes given keys if the key is set in the config. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/4723#discussion_r29031149 --- Diff: python/pyspark/streaming/kafka.py --- @@ -70,7 +71,195 @@ def createStream(ssc, zkQuorum, groupId, topics, kafkaParams={}, except Py4JJavaError, e: # TODO: use --jar once it also work on driver if 'ClassNotFoundException' in str(e.java_exception): -print +KafkaUtils._printErrorMsg(ssc.sparkContext) +raise e +ser = PairDeserializer(NoOpSerializer(), NoOpSerializer()) +stream = DStream(jstream, ssc, ser) +return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v))) + +@staticmethod +def createDirectStream(ssc, topics, kafkaParams, + keyDecoder=utf8_decoder, valueDecoder=utf8_decoder): + +.. note:: Experimental + +Create an input stream that directly pulls messages from a Kafka Broker. + +This is not a receiver based Kafka input stream, it directly pulls the message from Kafka +in each batch duration and processed without storing. + +This does not use Zookeeper to store offsets. The consumed offsets are tracked +by the stream itself. For interoperability with Kafka monitoring tools that depend on +Zookeeper, you have to update Kafka/Zookeeper yourself from the streaming application. +You can access the offsets used in each batch from the generated RDDs (see + +To recover from driver failures, you have to enable checkpointing in the StreamingContext. +The information on consumed offset can be recovered from the checkpoint. +See the programming guide for details (constraints, etc.). + +:param ssc: StreamingContext object +:param topics: list of topic_name to consume. +:param kafkaParams: Additional params for Kafka +:param keyDecoder: A function used to decode key (default is utf8_decoder) +:param valueDecoder: A function used to decode value (default is utf8_decoder) +:return: A DStream object + +if not isinstance(topics, list): +raise TypeError(topics should be list) +if not isinstance(kafkaParams, dict): +raise TypeError(kafkaParams should be dict) + +jtopics = SetConverter().convert(topics, ssc.sparkContext._gateway._gateway_client) +jparam = MapConverter().convert(kafkaParams, ssc.sparkContext._gateway._gateway_client) + +try: +helperClass = ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \ + .loadClass(org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper) +helper = helperClass.newInstance() +jstream = helper.createDirectStream(ssc._jssc, jparam, jtopics) +except Py4JJavaError, e: +if 'ClassNotFoundException' in str(e.java_exception): +KafkaUtils._printErrorMsg(ssc.sparkContext) +raise e + +ser = PairDeserializer(NoOpSerializer(), NoOpSerializer()) +stream = DStream(jstream, ssc, ser) +return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v))) + +@staticmethod +def createDirectStreamFromOffset(ssc, kafkaParams, fromOffsets, + keyDecoder=utf8_decoder, valueDecoder=utf8_decoder): + +.. note:: Experimental + +Create an input stream that directly pulls messages from a Kafka Broker and specific offset. + +This is not a receiver based Kafka input stream, it directly pulls the message from Kafka +in each batch duration and processed without storing. + +This does not use Zookeeper to store offsets. The consumed offsets are tracked +by the stream itself. For interoperability with Kafka monitoring tools that depend on +Zookeeper, you have to update Kafka/Zookeeper yourself from the streaming application. +You can access the offsets used in each batch from the generated RDDs (see + +To recover from driver failures, you have to enable checkpointing in the StreamingContext. +The information on consumed offset can be recovered from the checkpoint. +See the programming guide for details (constraints, etc.). + +:param ssc: StreamingContext object. +:param kafkaParams: Additional params for Kafka. +:param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting +point of the stream. +:param
[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-95844789 [Test build #30924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30924/consoleFull) for PR 5643 at commit [`90a69ec`](https://github.com/apache/spark/commit/90a69ec603279442c5a0b3e510e8f5db9e1bbb80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-95878935 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30927/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-95878907 [Test build #30927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30927/consoleFull) for PR 5266 at commit [`741db31`](https://github.com/apache/spark/commit/741db31f112469141a22634a406ab20feb13e678). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `final class IDF extends Estimator[IDFModel] with IDFBase ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PYSPARK] Add percentile method in rdd as nump...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5686#issuecomment-95883976 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95904145 Sampling 1 might work for some cases; 100 for others; some may take 1000. There's no way to know. This change as it stands is needlessly complex because it duplicates the loop among other things. You just want to take n samples of the array, and use the largest as your base, and smallest as your multiplier. That would be OK, and make n some small number like 2 or 3. At least it would be less hard-coded, and would make for a better change, along with some comments about why you are doing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5668#issuecomment-95843846 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30919/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5681#issuecomment-95846544 LGTM if you do not want to set it as a parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5684#issuecomment-95877543 [Test build #30929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30929/consoleFull) for PR 5684 at commit [`19fe145`](https://github.com/apache/spark/commit/19fe145e7a00574080b91d311376b6d2cdb4254e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-95884593 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5679#discussion_r29041189 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1055,7 +1055,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli /** Build the union of a list of RDDs. */ def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = { val partitioners = rdds.flatMap(_.partitioner).toSet -if (partitioners.size == 1) { +if (rdds.forall(_.partitioner.isDefined) partitioners.size == 1) { --- End diff -- Yeah I like this. I suppose that the pre-existing condition already caught the empty RDD case, which `PartitionerAwareUnionRDD` will reject. Although symmetry between this check and the following one would be nice I don't think it's important. This looks correct since clearly `PartitionerAwareUnionRDD` intends to operate only on RDDs with partitioners. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-95906708 You know a lot more about this than I, but I was under the impression that the closure cleaner couldn't clean beyond a level or so because it would then be modifying local object state by nulling fields in them and that's not necessarily permissible. I'm sure you're on top of that, just noting my recollection from similar discussions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/4723#discussion_r29031104 --- Diff: python/pyspark/streaming/kafka.py --- @@ -70,7 +71,195 @@ def createStream(ssc, zkQuorum, groupId, topics, kafkaParams={}, except Py4JJavaError, e: # TODO: use --jar once it also work on driver if 'ClassNotFoundException' in str(e.java_exception): -print +KafkaUtils._printErrorMsg(ssc.sparkContext) +raise e +ser = PairDeserializer(NoOpSerializer(), NoOpSerializer()) +stream = DStream(jstream, ssc, ser) +return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v))) + +@staticmethod +def createDirectStream(ssc, topics, kafkaParams, + keyDecoder=utf8_decoder, valueDecoder=utf8_decoder): + +.. note:: Experimental + +Create an input stream that directly pulls messages from a Kafka Broker. + +This is not a receiver based Kafka input stream, it directly pulls the message from Kafka +in each batch duration and processed without storing. + +This does not use Zookeeper to store offsets. The consumed offsets are tracked +by the stream itself. For interoperability with Kafka monitoring tools that depend on +Zookeeper, you have to update Kafka/Zookeeper yourself from the streaming application. +You can access the offsets used in each batch from the generated RDDs (see + +To recover from driver failures, you have to enable checkpointing in the StreamingContext. +The information on consumed offset can be recovered from the checkpoint. +See the programming guide for details (constraints, etc.). + +:param ssc: StreamingContext object +:param topics: list of topic_name to consume. +:param kafkaParams: Additional params for Kafka +:param keyDecoder: A function used to decode key (default is utf8_decoder) +:param valueDecoder: A function used to decode value (default is utf8_decoder) +:return: A DStream object + +if not isinstance(topics, list): +raise TypeError(topics should be list) +if not isinstance(kafkaParams, dict): +raise TypeError(kafkaParams should be dict) + +jtopics = SetConverter().convert(topics, ssc.sparkContext._gateway._gateway_client) +jparam = MapConverter().convert(kafkaParams, ssc.sparkContext._gateway._gateway_client) + +try: +helperClass = ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \ + .loadClass(org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper) +helper = helperClass.newInstance() +jstream = helper.createDirectStream(ssc._jssc, jparam, jtopics) +except Py4JJavaError, e: +if 'ClassNotFoundException' in str(e.java_exception): +KafkaUtils._printErrorMsg(ssc.sparkContext) +raise e + +ser = PairDeserializer(NoOpSerializer(), NoOpSerializer()) +stream = DStream(jstream, ssc, ser) +return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v))) + +@staticmethod +def createDirectStreamFromOffset(ssc, kafkaParams, fromOffsets, --- End diff -- I thought about this a little bit. But I think we should follow the precedent set by the `createStream` and other Python API where there is only method, with many optional parameters. So instead of having `createDirectStream` and `createDirectStreamFromOffsets`, lets just have `createDirectStream` with another optional parameter `fromOffsets`. `fromOffsets` should have the same keys as in topics, otherwise throw an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5668#issuecomment-95843799 [Test build #30919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30919/consoleFull) for PR 5668 at commit [`b4651fd`](https://github.com/apache/spark/commit/b4651fd80a55f016093d84cf3b00ad6c91333cef). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5060#issuecomment-95847182 [Test build #30926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30926/consoleFull) for PR 5060 at commit [`5713c20`](https://github.com/apache/spark/commit/5713c20b543a38f0454a03c67eaa277ec519a281). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r29032368 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +508,23 @@ class Word2VecModel private[mllib] ( */ def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = { require(num 0, Number of similar words should 0) -// TODO: optimize top-k + val fVector = vector.toArray.map(_.toFloat) -model.mapValues(vec = cosineSimilarity(fVector, vec)) +val cosineVec = Array.fill[Float](numWords)(0) +val alpha: Float = 1 +val beta: Float = 0 + +blas.sgemv( + T, vectorSize, numWords, alpha, wordVectors, vectorSize, fVector, 1, beta, cosineVec, 1) + +// Need not divide with the norm of the given vector since it is constant. +val updatedCosines = new Array[Double](numWords) --- End diff -- Should reuse `cosineVec`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r29032366 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +508,23 @@ class Word2VecModel private[mllib] ( */ def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = { require(num 0, Number of similar words should 0) -// TODO: optimize top-k --- End diff -- This TODO was created to use `BoundedPriorityQueue` to compute top k: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/BoundedPriorityQueue.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7118] [Python] Add the coalesce Spark S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5683#issuecomment-95857912 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95900233 @ArcherShao yes please, the JIRA was already marked as a duplicate. https://issues.apache.org/jira/browse/SPARK-6891 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5726] [MLLIB] Elementwise (Hadamard) Ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4580#issuecomment-95841444 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30923/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5726] [MLLIB] Elementwise (Hadamard) Ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4580#issuecomment-95841433 [Test build #30923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30923/consoleFull) for PR 4580 at commit [`e7ff5f2`](https://github.com/apache/spark/commit/e7ff5f2cc3c172b97c6ea3cec6ebf7546682a74b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ElementwiseProduct extends UnaryTransformer[Vector, Vector, ElementwiseProduct] ` * `class ElementwiseProduct(val scalingVector: Vector) extends VectorTransformer ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/4723#discussion_r29031516 --- Diff: python/pyspark/streaming/kafka.py --- @@ -70,7 +71,195 @@ def createStream(ssc, zkQuorum, groupId, topics, kafkaParams={}, except Py4JJavaError, e: # TODO: use --jar once it also work on driver if 'ClassNotFoundException' in str(e.java_exception): -print +KafkaUtils._printErrorMsg(ssc.sparkContext) +raise e +ser = PairDeserializer(NoOpSerializer(), NoOpSerializer()) +stream = DStream(jstream, ssc, ser) +return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v))) + +@staticmethod +def createDirectStream(ssc, topics, kafkaParams, + keyDecoder=utf8_decoder, valueDecoder=utf8_decoder): + +.. note:: Experimental + +Create an input stream that directly pulls messages from a Kafka Broker. + +This is not a receiver based Kafka input stream, it directly pulls the message from Kafka +in each batch duration and processed without storing. + +This does not use Zookeeper to store offsets. The consumed offsets are tracked +by the stream itself. For interoperability with Kafka monitoring tools that depend on +Zookeeper, you have to update Kafka/Zookeeper yourself from the streaming application. +You can access the offsets used in each batch from the generated RDDs (see + +To recover from driver failures, you have to enable checkpointing in the StreamingContext. +The information on consumed offset can be recovered from the checkpoint. +See the programming guide for details (constraints, etc.). + +:param ssc: StreamingContext object +:param topics: list of topic_name to consume. +:param kafkaParams: Additional params for Kafka +:param keyDecoder: A function used to decode key (default is utf8_decoder) +:param valueDecoder: A function used to decode value (default is utf8_decoder) +:return: A DStream object + +if not isinstance(topics, list): +raise TypeError(topics should be list) +if not isinstance(kafkaParams, dict): +raise TypeError(kafkaParams should be dict) + +jtopics = SetConverter().convert(topics, ssc.sparkContext._gateway._gateway_client) +jparam = MapConverter().convert(kafkaParams, ssc.sparkContext._gateway._gateway_client) + +try: +helperClass = ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \ + .loadClass(org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper) +helper = helperClass.newInstance() +jstream = helper.createDirectStream(ssc._jssc, jparam, jtopics) +except Py4JJavaError, e: +if 'ClassNotFoundException' in str(e.java_exception): +KafkaUtils._printErrorMsg(ssc.sparkContext) +raise e + +ser = PairDeserializer(NoOpSerializer(), NoOpSerializer()) +stream = DStream(jstream, ssc, ser) +return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v))) + +@staticmethod +def createDirectStreamFromOffset(ssc, kafkaParams, fromOffsets, --- End diff -- Since python do not support method overload, so I use different method name to differentiate it. I will change to way you mentioned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-95857854 I just file a jira issue, https://issues.apache.org/jira/browse/SPARK-7119. @viirya can you help on investigate this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95868952 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30928/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5681#issuecomment-95868811 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95868922 [Test build #30928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30928/consoleFull) for PR 5604 at commit [`d07101b`](https://github.com/apache/spark/commit/d07101bd5a6f3b30532c4d4d77ab8d310607b684). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression ` * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])` * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])` * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)` * `case class WindowAggregate(` * `case class WindowAggregate(` * ` case class ComputedWindow(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5060#issuecomment-95878205 [Test build #30926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30926/consoleFull) for PR 5060 at commit [`5713c20`](https://github.com/apache/spark/commit/5713c20b543a38f0454a03c67eaa277ec519a281). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5682#issuecomment-95878093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30925/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95878550 @guowei2 , can you generate golden answer for this locally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5060#issuecomment-95878250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30926/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5682#issuecomment-95878073 [Test build #30925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30925/consoleFull) for PR 5682 at commit [`4e98520`](https://github.com/apache/spark/commit/4e98520e78832b25877d825392d66d10779281f7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-95883549 [Test build #30930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30930/consoleFull) for PR 5685 at commit [`2390a60`](https://github.com/apache/spark/commit/2390a608ed74a9703d3763d040421dccb51242ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-95905246 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30930/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5684#issuecomment-95905079 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30929/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5684#issuecomment-95905052 [Test build #30929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30929/consoleFull) for PR 5684 at commit [`19fe145`](https://github.com/apache/spark/commit/19fe145e7a00574080b91d311376b6d2cdb4254e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95929817 [Test build #30936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30936/consoleFull) for PR 2342 at commit [`ef34a5b`](https://github.com/apache/spark/commit/ef34a5b87f03e3c7f623ed2c4ab53c933bf64fa8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-95930701 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30934/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] Added function to get predict value an...
Github user oscaroboto closed the pull request at: https://github.com/apache/spark/pull/5689 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] Added function to get predict value an...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5689#issuecomment-95934405 @oscaroboto please have a look at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first. This isn't connected to a JIRA. In fact there are JIRAs on this subject already. Have a look and consider connecting to existing proposals to expose a probability distribution; you might solve several at once: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20probabilityquickSearchQuery=spark%20probability --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95937721 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30931/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95937710 **[Test build #30931 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30931/consoleFull)** for PR 5608 at commit [`a9fca84`](https://github.com/apache/spark/commit/a9fca8444d7a8591032383a7d6ced84ee1f66a56) after a configured wait of `150m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-95943436 [Test build #30938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30938/consoleFull) for PR 5688 at commit [`a69b1d9`](https://github.com/apache/spark/commit/a69b1d9f0cbbbca44b48107763efed11d31019f6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-95930691 [Test build #30934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30934/consoleFull) for PR 5688 at commit [`7b1a00a`](https://github.com/apache/spark/commit/7b1a00a1dc281870e8779b5153fa1fd1bc797aeb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WIP]Add Https support for Web UI
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5664#issuecomment-95939658 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30935/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WIP]Add Https support for Web UI
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5664#issuecomment-95939591 [Test build #30935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30935/consoleFull) for PR 5664 at commit [`5efac85`](https://github.com/apache/spark/commit/5efac8536d86aea631b25830194e00fb83c3b447). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5665#issuecomment-95942487 [Test build #30933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30933/consoleFull) for PR 5665 at commit [`d19dd31`](https://github.com/apache/spark/commit/d19dd312a18af43131005d1bf6d2944b259c0721). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7123] [SQL] fixed table.star in sqlcont...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/5690 [SPARK-7123] [SQL] fixed table.star in sqlcontext Run following sql get error `SELECT r.* FROM testData l join testData2 r on (l.key = r.a)` You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark tablestar Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5690.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5690 commit 3b2e2b6a2e1b3f56c8944f62b2f184bebf7bac24 Author: scwf wangf...@huawei.com Date: 2015-04-24T13:50:32Z support table.star --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5665#issuecomment-95942529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30933/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5665#issuecomment-95915693 [Test build #30933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30933/consoleFull) for PR 5665 at commit [`d19dd31`](https://github.com/apache/spark/commit/d19dd312a18af43131005d1bf6d2944b259c0721). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-95917038 [Test build #30934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30934/consoleFull) for PR 5688 at commit [`7b1a00a`](https://github.com/apache/spark/commit/7b1a00a1dc281870e8779b5153fa1fd1bc797aeb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WIP]Add Https support for Web UI
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5664#issuecomment-95920724 [Test build #30935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30935/consoleFull) for PR 5664 at commit [`5efac85`](https://github.com/apache/spark/commit/5efac8536d86aea631b25830194e00fb83c3b447). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7123] [SQL] fixed table.star in sqlcont...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5690#issuecomment-95943453 [Test build #30937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30937/consoleFull) for PR 5690 at commit [`3b2e2b6`](https://github.com/apache/spark/commit/3b2e2b6a2e1b3f56c8944f62b2f184bebf7bac24). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5354#issuecomment-95924132 LGTM. Thank you for your perseverance. This gets the change in with minimal additional change to the build, keeps everything compiling and actually improves the management of one dependency along the way. I think the large list of removed dependencies above is a false positive. It can't remove these. Let me merge and let's double check that the other Jenkins builds are still happy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5608#discussion_r29046278 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -204,25 +204,36 @@ private[spark] object SizeEstimator extends Logging { } } else { // Estimate the size of a large array by sampling elements without replacement. -var size = 0.0 +// To exclude the shared objects that the array elements may link, sample twice +// and use the min one to caculate array size. val rand = new Random(42) -val drawn = new OpenHashSet[Int](ARRAY_SAMPLE_SIZE) -var numElementsDrawn = 0 -while (numElementsDrawn ARRAY_SAMPLE_SIZE) { - var index = 0 - do { -index = rand.nextInt(length) - } while (drawn.contains(index)) - drawn.add(index) - val elem = ScalaRunTime.array_apply(array, index).asInstanceOf[AnyRef] - size += SizeEstimator.estimate(elem, state.visited) - numElementsDrawn += 1 -} -state.size += ((length / (ARRAY_SAMPLE_SIZE * 1.0)) * size).toLong +val drawn = new OpenHashSet[Int](2 * ARRAY_SAMPLE_SIZE) --- End diff -- Yes that looks better. We could even generalize to sampling n times easily but that could be overkill. I think we have a potential problem here, that we sample if the array size is = 400, but then want at least 400 distinct elements from the array, twice. This will enter an infinite loop if the array has between 400 and 800 elements, and will be very slow if it's just a bit larger than 800. You could sample with replacement, or, only draw `ARRAY_SAMPLE_SIZE/2` elements ( `ARRAY_SAMPLE_SIZE/n` in general. For simplicity, and to avoid slow-downs, I'd say sample with replacement. You can put the sample threshold back to 200 then, too. I don't know if that needs to change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added function to get predict value and probab...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5689#issuecomment-95931900 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org