[GitHub] spark pull request: [SPARK-8126] [BUILD] Use custom temp directory...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6674#issuecomment-110252820 All of the 1.4 builds have succeeded since this patch, some a few times. The exception is: https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.4-Maven-with-YARN/ This succeeded after, then failed, and the failure in the Kafka suite looks unrelated since it doesn't involve a temp file. I'm declaring victory and moving to 1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6583][SQL] Support aggregated function ...
Github user watermen commented on the pull request: https://github.com/apache/spark/pull/5290#issuecomment-110248184 @cloud-fan Can you review it for me? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8202] [PYSPARK] fix infinite loop durin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6714#issuecomment-110245779 [Test build #34490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34490/consoleFull) for PR 6714 at commit [`e746aec`](https://github.com/apache/spark/commit/e746aeca630448b3bb9d425d8aefa496385f39ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8202] [PYSPARK] fix infinite loop durin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6714#issuecomment-110245490 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8202] [PYSPARK] fix infinite loop durin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6714#issuecomment-110245483 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8202] [PYSPARK] fix infinite loop durin...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/6714 [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark The batch size during external sort will grow up to max 1, then shrink down to zero, causing infinite loop. Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill. cc @JoshRosen @rxin @angelini You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark batch_size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6714.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6714 commit e746aeca630448b3bb9d425d8aefa496385f39ed Author: Davies Liu Date: 2015-06-09T06:26:29Z fix batch size during sort --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix SPARK-8200
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6713#issuecomment-110244763 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix SPARK-8200
GitHub user pparkkin opened a pull request: https://github.com/apache/spark/pull/6713 Fix SPARK-8200 Test cases for both StreamingLinearRegression and StreamingLogisticRegression, and code fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pparkkin/spark streamingmodel-empty-rdd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6713 commit b4cda931892ac5cab580b5e89d90829f709d3bbb Author: Paavo Date: 2015-06-09T01:59:11Z Test case for empty stream. commit 4cb7b0fc73c4ccc698465956254df4aa95ee1587 Author: Paavo Date: 2015-06-09T02:19:15Z Ignore empty RDDs. commit e3e358f362856a439c5f7f745e97665299a5f0a6 Author: Paavo Date: 2015-06-09T02:27:39Z Test case for empty stream. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6616 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6616#issuecomment-110244099 Thanks. I'm merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110241035 [Test build #34487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34487/console) for PR 6712 at commit [`9f3b75a`](https://github.com/apache/spark/commit/9f3b75a0377c3272f5e5ef27c4dfa11f24a82806). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OverrideFunctionRegistry(underlying: FunctionRegistry) extends FunctionRegistry ` * `class SimpleFunctionRegistry extends FunctionRegistry ` * `case class Rand(seed: Long) extends RDG(seed) ` * `case class Randn(seed: Long) extends RDG(seed) ` * `class StringKeyHashMap[T](normalizer: (String) => String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110241043 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110239365 [Test build #34489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34489/consoleFull) for PR 6710 at commit [`6930822`](https://github.com/apache/spark/commit/69308222d1b65bd75c3b40b1e4da8f6161958535). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110239326 [Test build #34488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34488/consoleFull) for PR 6712 at commit [`d554d60`](https://github.com/apache/spark/commit/d554d60438f5a71f8caef8f6e461ee659de46793). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110239051 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110239056 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110239057 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110239050 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110238898 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110238873 [Test build #34485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34485/console) for PR 6712 at commit [`dea550b`](https://github.com/apache/spark/commit/dea550b3ddf594aca4640f73d585870dd0b38d68). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OverrideFunctionRegistry(underlying: FunctionRegistry) extends FunctionRegistry ` * `class SimpleFunctionRegistry extends FunctionRegistry ` * `case class Rand(seed: Long) extends RDG(seed) ` * `case class Randn(seed: Long) extends RDG(seed) ` * `class StringKeyHashMap[T](normalizer: (String) => String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110232198 [Test build #34484 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34484/console) for PR 6710 at commit [`b802c9a`](https://github.com/apache/spark/commit/b802c9a296596e3fc711baf352441516f59fb736). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SimpleFunctionRegistry extends FunctionRegistry ` * `case class Rand(seed: Long) extends RDG(seed) ` * `case class Randn(seed: Long) extends RDG(seed) ` * `class StringKeyHashMap[T](normalizer: (String) => String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110232203 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110231863 @hqzizania Not needed, never mind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8162][BUILD] Run spark-shell cause Null...
Github user Sephiroth-Lin closed the pull request at: https://github.com/apache/spark/pull/6704 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8162][BUILD] Run spark-shell cause Null...
Github user Sephiroth-Lin commented on the pull request: https://github.com/apache/spark/pull/6704#issuecomment-110231290 Close it first as PR #6711 can fix NPE, if we find the root cause of why the `@VisibleForTesting` annotation causes a NPE in the shell then reopen it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110228780 [Test build #34487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34487/consoleFull) for PR 6712 at commit [`9f3b75a`](https://github.com/apache/spark/commit/9f3b75a0377c3272f5e5ef27c4dfa11f24a82806). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110228708 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110228716 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228546 [Test build #34486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34486/consoleFull) for PR 5748 at commit [`14ee596`](https://github.com/apache/spark/commit/14ee5960ced3079231543dfe103075ae12e40e05). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228363 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228369 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110228254 Its fine - it will make one line of code more complex and remove one line of code. If you want you can make a follow up PR, its up to you and @davies :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110228207 [Test build #34485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34485/consoleFull) for PR 6712 at commit [`dea550b`](https://github.com/apache/spark/commit/dea550b3ddf594aca4640f73d585870dd0b38d68). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228137 @jkbradley ping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user hqzizania commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110228092 @shivaram oops, I haven't fix the Nit davies said. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228122 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110227993 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6712#issuecomment-110227979 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Use FunctionRegistry for built-in...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/6712 [SPARK-7886] Use FunctionRegistry for built-in expressions in HiveContext. This builds on https://github.com/apache/spark/pull/6710 You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark udf-registry-hive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6712.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6712 commit 8616924ebb90eaf5e88a6a843dc99744b3dbc2b8 Author: Santiago M. Mola Date: 2015-05-28T17:42:59Z [SPARK-7886] Add built-in expressions to FunctionRegistry. - ExpressionBuilders is provided with helpers to create a function builder for each Expression. - Built-in functions removed from SqlParser when possible. Added to FunctionRegistry. TO DO: - Decide between the reflection and macro implementations of the expression builder helpers. - Fix Substring (whose constructor is not well suited for the helper). - Apply changes to Hive. commit 2a2a149672589e303f6a8dbc1ef295a7c2541825 Author: Reynold Xin Date: 2015-06-08T20:18:12Z Merge pull request #6463 from smola/SPARK-7886 [SPARK-7886][SQL] Add built-in expressions to FunctionRegistry. commit 77b46f18c9a563f0673ddf097bf1e08c7be0ca1d Author: Reynold Xin Date: 2015-06-08T23:23:22Z Simplified the code. commit ff906f233c8c367d52254d58da1da388d4298f58 Author: Reynold Xin Date: 2015-06-09T00:41:09Z More robust constructor calling. commit ee7854f7eb3b0226ec3826afe11bcae0f1b0a250 Author: Reynold Xin Date: 2015-06-09T00:58:16Z Improved error reporting. commit 52ddabaaaf9ea13f51d44a2602991f37532a9106 Author: Reynold Xin Date: 2015-06-09T01:00:13Z Fixed compilation. commit e76a3c1c35197858036891102e4394aa0027 Author: Reynold Xin Date: 2015-06-09T03:26:41Z Fixed parser. commit 852f9c09d3653ae040100b813d2a9203470d41ee Author: Reynold Xin Date: 2015-06-09T03:44:51Z Fixed style violation. commit e60d815cf18877018da46a055e326f535623f9de Author: Reynold Xin Date: 2015-06-09T04:28:45Z Made UDF case insensitive. commit b802c9a296596e3fc711baf352441516f59fb736 Author: Reynold Xin Date: 2015-06-09T04:33:42Z Made UDF case insensitive. commit dea550b3ddf594aca4640f73d585870dd0b38d68 Author: Reynold Xin Date: 2015-06-09T05:15:29Z [SPARK-7886] Use FunctionRegistry for built-in expressions in HiveContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31982518 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -79,7 +93,7 @@ private[streaming] class BlockManagerBasedBlockHandler( throw new SparkException( s"Could not store $blockId to block manager with storage level $storageLevel") } -BlockManagerBasedStoreResult(blockId) +BlockManagerBasedStoreResult(blockId, numRecords) --- End diff -- @tdas @zsxwing what you think ? Is it fine to count ByteBufferBlock as 1 count ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r31982517 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -225,12 +243,74 @@ private[yarn] class YarnAllocator( logInfo(s"Will request $missing executor containers, each with ${resource.getVirtualCores} " + s"cores and ${resource.getMemory} MB memory including $memoryOverhead MB overhead") - for (i <- 0 until missing) { -val request = createContainerRequest(resource) -amClient.addContainerRequest(request) -val nodes = request.getNodes -val hostStr = if (nodes == null || nodes.isEmpty) "Any" else nodes.last -logInfo(s"Container request (host: $hostStr, capability: $resource)") + // Calculated the number of executors we expected to satisfy all the preferred locality tasks --- End diff -- Hi @sryza , will this `getNumPendingAtLocation(ANY_HOST)` get all the pending requests, including some locality specified requests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110224579 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user dibbhatt commented on the pull request: https://github.com/apache/spark/pull/6707#issuecomment-110224537 taken care couple of comments given by @harishreedharan Not sure what to do with ByteBuffer case as there is no way to count number of messages in a ByteBufferBlock --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-7422][MLLIB] Add argmax to Vector, Spar...
Github user GeorgeDittmar commented on the pull request: https://github.com/apache/spark/pull/6112#issuecomment-110224081 @mengxr @MechCoder Ok should be good to go I think. I cleaned up the rest of the unit tests and found a new more style issues that I cleaned up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6190 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110221655 [Test build #34484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34484/consoleFull) for PR 6710 at commit [`b802c9a`](https://github.com/apache/spark/commit/b802c9a296596e3fc711baf352441516f59fb736). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110221648 Thanks @hqzizania - LGTM. Merging this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8168] [MLLIB] Add Python friendly const...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/6709#issuecomment-110221571 Thanks. Merged in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110221504 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110221495 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8168] [MLLIB] Add Python friendly const...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6709 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110221121 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110221102 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6190#discussion_r31981889 --- Diff: R/pkg/R/serialize.R --- @@ -37,6 +37,14 @@ writeObject <- function(con, object, writeType = TRUE) { # passing in vectors as arrays and instead require arrays to be passed # as lists. type <- class(object)[[1]] # class of POSIXlt is c("POSIXlt", "POSIXt") + # Checking types is needed here, since âis.naâ only handles atomic vectors, + # lists and pairlists + if (type %in% c("integer", "character", "logical", "double", "numeric")) { +if (is.na(object)) { + object <- NULL + type <- "NULL" --- End diff -- I see, never mind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8168] [MLLIB] Add Python friendly const...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/6709#issuecomment-110219781 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110217328 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110217326 [Test build #34482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34482/console) for PR 6710 at commit [`852f9c0`](https://github.com/apache/spark/commit/852f9c09d3653ae040100b813d2a9203470d41ee). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Rand(seed: Long) extends RDG(seed) ` * `case class Randn(seed: Long) extends RDG(seed) ` * `class StringKeyHashMap[T](normalizer: (String) => String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-7780][MLLIB] Intercept in logisticregre...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/6386#issuecomment-110216881 Oh, get you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110216015 [Test build #34482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34482/consoleFull) for PR 6710 at commit [`852f9c0`](https://github.com/apache/spark/commit/852f9c09d3653ae040100b813d2a9203470d41ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110215818 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31981074 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -32,7 +32,10 @@ import org.apache.spark.{Logging, SparkConf, SparkException} /** Trait that represents the metadata related to storage of blocks */ private[streaming] trait ReceivedBlockStoreResult { - def blockId: StreamBlockId // Any implementation of this trait will store a block id + // Any implementation of this trait will store a block id + def blockId: StreamBlockId + // Any implementation of this trait will have to return the number of records + def numRecords: Option[Long] --- End diff -- Ah, ok. I just find the `num*` method calls weird, when it could be called `*count`. But if it is consistent with everything else, then it is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110215799 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31981048 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -79,7 +93,7 @@ private[streaming] class BlockManagerBasedBlockHandler( throw new SparkException( s"Could not store $blockId to block manager with storage level $storageLevel") } -BlockManagerBasedStoreResult(blockId) +BlockManagerBasedStoreResult(blockId, numRecords) --- End diff -- Well, technically it is a single record - though I agree that is not exactly right either, but it must count as at least 1, correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/6190#discussion_r31980891 --- Diff: R/pkg/R/serialize.R --- @@ -37,6 +37,14 @@ writeObject <- function(con, object, writeType = TRUE) { # passing in vectors as arrays and instead require arrays to be passed # as lists. type <- class(object)[[1]] # class of POSIXlt is c("POSIXlt", "POSIXt") + # Checking types is needed here, since âis.naâ only handles atomic vectors, + # lists and pairlists + if (type %in% c("integer", "character", "logical", "double", "numeric")) { +if (is.na(object)) { + object <- NULL + type <- "NULL" --- End diff -- But the "type" in the %in% line also need to be changed into "class(object)" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110212549 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110212557 LGTM, just one minor comment, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110212546 [Test build #34481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34481/console) for PR 6710 at commit [`e76a3c1`](https://github.com/apache/spark/commit/e76a3c1c35197858036891102e4394aa0027). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Rand(seed: Long) extends RDG(seed) ` * `case class Randn(seed: Long) extends RDG(seed) ` * `class StringKeyHashMap[T](normalizer: (String) => String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6190#discussion_r31980616 --- Diff: R/pkg/R/serialize.R --- @@ -37,6 +37,14 @@ writeObject <- function(con, object, writeType = TRUE) { # passing in vectors as arrays and instead require arrays to be passed # as lists. type <- class(object)[[1]] # class of POSIXlt is c("POSIXlt", "POSIXt") + # Checking types is needed here, since âis.naâ only handles atomic vectors, + # lists and pairlists + if (type %in% c("integer", "character", "logical", "double", "numeric")) { +if (is.na(object)) { + object <- NULL + type <- "NULL" --- End diff -- Nit: move these before `type <- class`, then this line is not needed anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110212428 [Test build #34481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34481/consoleFull) for PR 6710 at commit [`e76a3c1`](https://github.com/apache/spark/commit/e76a3c1c35197858036891102e4394aa0027). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110211892 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110211959 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110204932 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110204887 [Test build #34480 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34480/console) for PR 6652 at commit [`2ef2d39`](https://github.com/apache/spark/commit/2ef2d39344000bb1d08f37e3d889f3b8975c33c4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7786][STREAMING] Allow StreamingListene...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6380#issuecomment-110203759 [Test build #34479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34479/console) for PR 6380 at commit [`c94982f`](https://github.com/apache/spark/commit/c94982f25f57abf488bc75a253be44e3bfbab20d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StatsReportListener(numBatchInfos: Int) extends StreamingListener ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7786][STREAMING] Allow StreamingListene...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6380#issuecomment-110203770 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8162] [HOTFIX] Fix NPE in spark-shell
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6711#issuecomment-110203159 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8162] [HOTFIX] Fix NPE in spark-shell
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6711#issuecomment-110203153 [Test build #34477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34477/console) for PR 6711 at commit [`bf62ecc`](https://github.com/apache/spark/commit/bf62ecce7f021ccad67f3ed6b6e14292bd7f9129). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8156][SQL]create table to specific data...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/6695#issuecomment-110202776 hi @yhuai would you please help me review this pr when you have time? i think may it was the base of https://github.com/apache/spark/pull/6494 . thanks:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110202731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110202723 [Test build #34476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34476/console) for PR 6190 at commit [`1641f9e`](https://github.com/apache/spark/commit/1641f9e03d99341e5b53f170c072b94678c544d2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110199775 [Test build #34474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34474/console) for PR 6652 at commit [`f5be578`](https://github.com/apache/spark/commit/f5be5784235813c0e28b13f234f89de04bc4849d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110199784 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31976730 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockHandlerSuite.scala --- @@ -62,6 +61,19 @@ class ReceivedBlockHandlerSuite var blockManagerMaster: BlockManagerMaster = null var blockManager: BlockManager = null var tempDirectory: File = null + var storageLevel = StorageLevel.MEMORY_ONLY_SER + + private def makeBlockManager( --- End diff -- Sure. will change it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7889] [UI] make sure click the "App ID"...
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/6545#issuecomment-110192207 Hi @squito, I think I need your help, I am not clearly know how to write this test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7889] [UI] make sure click the "App ID"...
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/6545#issuecomment-110191346 @squito, I think you can help me, I am not clearly know how to write this test, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r31976524 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -225,12 +243,74 @@ private[yarn] class YarnAllocator( logInfo(s"Will request $missing executor containers, each with ${resource.getVirtualCores} " + s"cores and ${resource.getMemory} MB memory including $memoryOverhead MB overhead") - for (i <- 0 until missing) { -val request = createContainerRequest(resource) -amClient.addContainerRequest(request) -val nodes = request.getNodes -val hostStr = if (nodes == null || nodes.isEmpty) "Any" else nodes.last -logInfo(s"Container request (host: $hostStr, capability: $resource)") + // Calculated the number of executors we expected to satisfy all the preferred locality tasks + val localityAwareTaskCores = localityAwarePendingTaskNum * CPUS_PER_TASK + val expectedLocalityAwareContainerNum = +(localityAwareTaskCores + resource.getVirtualCores - 1) / resource.getVirtualCores + + // Get the all the existed and locality matched containers + val existedMatchedContainers = allocatedHostToContainersMap.filter { case (host, _) => +preferredLocalityToCounts.contains(host) + } + val existedMatchedContainerNum = existedMatchedContainers.values.map(_.size).sum + + // The number of containers to allocate, divided into two groups, one with node locality, + // and the other without locality preference. + var requiredLocalityFreeContainerNum: Int = 0 + var requiredLocalityAwareContainerNum: Int = 0 + + if (expectedLocalityAwareContainerNum <= existedMatchedContainerNum) { +// If the current allocated executor can satisfy all the locality preferred tasks, --- End diff -- Oh, my bad, sorry for missing this part, I will change the strategy accordingly :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31976427 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -199,3 +221,16 @@ private[streaming] object WriteAheadLogBasedBlockHandler { new Path(checkpointDir, new Path("receivedData", streamId.toString)).toString } } + +/** + * A utility that will wrap the Iterator to get the count + */ +private class CountingIterator[T](iterator: Iterator[T]) extends Iterator[T] { + var count = 0 + def hasNext(): Boolean = iterator.hasNext + def isFullyConsumed: Boolean = !iterator.hasNext + def next(): T = { +count+=1 --- End diff -- Will change it ..thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31976373 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -32,7 +32,10 @@ import org.apache.spark.{Logging, SparkConf, SparkException} /** Trait that represents the metadata related to storage of blocks */ private[streaming] trait ReceivedBlockStoreResult { - def blockId: StreamBlockId // Any implementation of this trait will store a block id + // Any implementation of this trait will store a block id + def blockId: StreamBlockId + // Any implementation of this trait will have to return the number of records + def numRecords: Option[Long] --- End diff -- For all other place where count is recorded (refer to this PR https://github.com/apache/spark/pull/6659/files), it call as numRecords. Just wanted to keep this consistent naming across all classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8121] [SQL] Backports PR #6669 to branc...
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/6705 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110188806 [Test build #34478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34478/console) for PR 6710 at commit [`52ddaba`](https://github.com/apache/spark/commit/52ddabaaaf9ea13f51d44a2602991f37532a9106). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Rand(seed: Long) extends RDG(seed) ` * `case class Randn(seed: Long) extends RDG(seed) ` * `class StringKeyHashMap[T](normalizer: (String) => String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7886] Add built-in expressions to Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6710#issuecomment-110188814 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/6707#discussion_r31976054 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -79,7 +93,7 @@ private[streaming] class BlockManagerBasedBlockHandler( throw new SparkException( s"Could not store $blockId to block manager with storage level $storageLevel") } -BlockManagerBasedStoreResult(blockId) +BlockManagerBasedStoreResult(blockId, numRecords) --- End diff -- But how we can count ByteBufferBlock ? if you count one block as 1 message, that is also wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110188129 [Test build #34480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34480/consoleFull) for PR 6652 at commit [`2ef2d39`](https://github.com/apache/spark/commit/2ef2d39344000bb1d08f37e3d889f3b8975c33c4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r31975860 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1374,12 +1374,15 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * This can result in canceling pending requests or filing additional requests. * This is currently only supported in YARN mode. Return whether the request is received. */ - private[spark] override def requestTotalExecutors(numExecutors: Int): Boolean = { + private[spark] override def requestTotalExecutors( + numExecutors: Int, + localityAwarePendingTasks: Int, + preferredLocalityToCount: scala.Predef.Map[String, Int]): Boolean = { --- End diff -- Hi @squito , thanks a lot for your explanation, since the previous code already import `scala.collection.Map`, so here I have to write a full qualified name here, I will change to immutable map :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110187900 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110187889 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6652#discussion_r31975738 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -137,6 +137,23 @@ class DAGScheduler( private[scheduler] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this) taskScheduler.setDAGScheduler(this) + // Flag to control if reduce tasks are assigned preferred locations + private val shuffleLocalityEnabled = +sc.getConf.getBoolean("spark.shuffle.reduceLocality.enabled", true) + // Number of map, reduce tasks above which we do not assign preferred locations + // based on map output sizes. We limit the size of jobs for which assign preferred locations + // as sorting the locations by size becomes expensive. --- End diff -- Fixed now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6652#discussion_r31975749 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -800,6 +800,50 @@ class DAGSchedulerSuite assertDataStructuresEmpty() } + test("shuffle with reducer locality") { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6652#discussion_r31975752 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -284,6 +291,54 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf) cachedSerializedStatuses.contains(shuffleId) || mapStatuses.contains(shuffleId) } + /** + * Return a list of locations which have fraction of map output greater than specified threshold. + * + * @param shuffleId id of the shuffle + * @param reducerId id of the reduce task + * @param numReducers total number of reducers in the shuffle + * @param fractionThreshold fraction of total map output size that a location must have + * for it to be considered large. + * + * This method is not thread-safe + */ + def getLocationsWithLargestOutputs( + shuffleId: Int, + reducerId: Int, + numReducers: Int, + fractionThreshold: Double) +: Option[Array[BlockManagerId]] = { + +if (mapStatuses.contains(shuffleId)) { + // Pre-compute the top locations for each reducer and cache it + val statuses = mapStatuses(shuffleId) + if (statuses.nonEmpty) { +// HashMap to add up sizes of all blocks at the same location +val locs = new HashMap[BlockManagerId, Long] +var totalOutputSize = 0L +var mapIdx = 0 +while (mapIdx < statuses.length) { --- End diff -- Good idea. Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6652#discussion_r31975747 --- Diff: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala --- @@ -205,4 +205,36 @@ class MapOutputTrackerSuite extends SparkFunSuite { //masterTracker.stop() // this throws an exception rpcEnv.shutdown() } + + test("getLocationsWithLargestOutputs with multiple outputs in same machine") { +val rpcEnv = createRpcEnv("test") +val tracker = new MapOutputTrackerMaster(conf) +tracker.trackerEndpoint = rpcEnv.setupEndpoint(MapOutputTracker.ENDPOINT_NAME, + new MapOutputTrackerMasterEndpoint(rpcEnv, tracker, conf)) +// Setup 3 map tasks +// on hostA with output size 2 +// on hostA with output size 2 +// on hostB with output size 3 +tracker.registerShuffle(10, 3) +tracker.registerMapOutput(10, 0, MapStatus(BlockManagerId("a", "hostA", 1000), +Array(2L))) +tracker.registerMapOutput(10, 1, MapStatus(BlockManagerId("a", "hostA", 1000), +Array(2L))) +tracker.registerMapOutput(10, 2, MapStatus(BlockManagerId("b", "hostB", 1000), +Array(3L))) + +val topLocs50 = tracker.getLocationsWithLargestOutputs(10, 0, 1, 0.5) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org