[GitHub] spark pull request: [SPARK-4972][MLlib] Updated the scala doc for ...
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/3808 [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for the change of LeastSquaresGradient In #SPARK-4907, we added factor of 2 into the LeastSquaresGradient. We updated the scala doc for lasso and ridge regression here. You can merge this pull request into a Git repository by running: $ git pull https://github.com/AlpineNow/spark doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3808.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3808 commit ec3c989efd453897e7fe5d4de01b3edefe21eb3e Author: DB Tsai dbt...@alpinenow.com Date: 2014-12-26T08:39:55Z first commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4972][MLlib] Updated the scala doc for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3808#issuecomment-68131355 [Test build #24832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24832/consoleFull) for PR 3808 at commit [`ec3c989`](https://github.com/apache/spark/commit/ec3c989efd453897e7fe5d4de01b3edefe21eb3e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
GitHub user tigerquoll opened a pull request: https://github.com/apache/spark/pull/3809 spark-core - [SPARK-4787] - Stop sparkcontext properly if a DAGScheduler init error occurs [SPARK-4787] Stop SparkContext properly if an exception occurs during DAGscheduler initialization. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tigerquoll/spark SPARK-4787 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3809.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3809 commit 217257879fe7c98673caf14b980790498887581e Author: Dale tigerqu...@outlook.com Date: 2014-12-26T09:33:05Z [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68133474 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4972][MLlib] Updated the scala doc for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3808#issuecomment-68134353 [Test build #24832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24832/consoleFull) for PR 3808 at commit [`ec3c989`](https://github.com/apache/spark/commit/ec3c989efd453897e7fe5d4de01b3edefe21eb3e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4972][MLlib] Updated the scala doc for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3808#issuecomment-68134354 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24832/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...
GitHub user YanTangZhai opened a pull request: https://github.com/apache/spark/pull/3810 [SPARK-4962] [CORE] Put TaskScheduler.start back in SparkContext to shorten cluster resources occupation period When SparkContext object is instantiated, TaskScheduler is started and some resources are allocated from cluster. However, these resources may be not used for the moment. For example, DAGScheduler.JobSubmitted is processing and so on. These resources are wasted in this period. Thus, we want to put TaskScheduler.start back to shorten cluster resources occupation period specially for busy cluster. TaskScheduler could be started just before running stages. We could analyse and compare the resources occupation period before and after optimization. TaskScheduler.start execution time: [time1__] DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or TaskScheduler.start) execution time: [time2_] HadoopRDD.getPartitions execution time: [time3___] Stages execution time: [time4_] The cluster resources occupation period before optimization is [time2_][time3___][time4_]. The cluster resources occupation period after optimization is[time3___][time4_]. In summary, the cluster resources occupation period after optimization is less than before. If HadoopRDD.getPartitions could be put forward (SPARK-4961), the period may be shorten more which is [time4_]. The resources saving is important for busy cluster. The main purpose of this PR is to decrease resources waste for busy cluster. For example, a process initializes a SparkContext instance, reads a few files from HDFS or many records from PostgreSQL, and then calls RDD's collect operation to submit a job. When SparkContext is initialized, an app is submitted to cluster and some resources are hold by this app. These resources are not used really until the job is submitted by RDD's action. The resources in the period from initialization to actual use could be considered wasteful. If app is submitted when SparkContext is initialized, all of resources needed by the app may be granted before running job. Then the job could runs efficiently without resource constraint. On the contrary, if app is submitted when job is submitted, resources needed by the app may be granted at different times. Then the job may run not so efficiently since some resources are applying. Thus I use a configuration parameter spark.scheduler.app.slowstart (default false) to let user make tradeoffs between economy and efficiency. There are 9 kinds of master URL and 6 kinds of SchedulerBackend. LocalBackend and SimrSchedulerBackend don't need to put starting back since there is no difference. SparkClusterSchedulerBackend (yarn-standalone or yarn-cluster) does not put starting back since the app should be submitted in advance by SparkSubmit. CoarseMesosSchedulerBackend and MesosSchedulerBackend could put starting back. YarnClientSchedulerBackend (yarn-client) could put starting back. This PR puts TaskScheduler.start back only for yarn-client mode in the early. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanTangZhai/spark SPARK-4962 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3810.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3810 commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-06T13:07:08Z Merge pull request #1 from apache/master update commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-20T13:14:08Z Merge pull request #3 from apache/master Update commit 8a0010691b669495b4c327cf83124cabb7da1405 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-12T06:54:58Z Merge pull request #6 from apache/master Update commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-16T12:03:22Z Merge pull request #7 from apache/master Update commit 76d40277d51f709247df1d3734093bf2c047737d Author: YanTangZhai hakeemz...@tencent.com Date: 2014-10-20T12:52:22Z Merge pull request #8 from apache/master update commit d26d98248a1a4d0eb15336726b6f44e05dd7a05a Author: YanTangZhai hakeemz...@tencent.com Date: 2014-11-04T09:00:31Z Merge pull request #9 from apache/master Update commit e249846d9b7967ae52ec3df0fb09e42ffd911a8a Author: YanTangZhai hakeemz...@tencent.com Date: 2014-11-11T03:18:24Z Merge pull request #10 from apache/master Update commit 6e643f81555d75ec8ef3eb57bf5ecb6520485588 Author: YanTangZhai hakeemz...@tencent.com Date:
[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3810#issuecomment-68137843 [Test build #24833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24833/consoleFull) for PR 3810 at commit [`05469de`](https://github.com/apache/spark/commit/05469de9f0482bce54a60161b9cb386a64173826). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/3811 [SPARK-4973][CORE] Local directory in the driver of client-mode continues remaining even if application finished when external shuffle is enabled When we enables external shuffle service, local directories in the driver of client-mode continue remaining even if application has finished. I think local directories for drivers should be deleted. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-4973 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3811.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3811 commit d99718e85c0b97bddb0e7736a392536ede510c47 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-12-26T11:59:36Z Fixed SparkSubmit.scala and DiskBlockManager.scala in order to delete local directories of the driver of local-mode when external shuffle service is enabled --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3811#issuecomment-68138582 [Test build #24834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24834/consoleFull) for PR 3811 at commit [`d99718e`](https://github.com/apache/spark/commit/d99718e85c0b97bddb0e7736a392536ede510c47). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/3812 [Minor] Fix the value represented by spark.executor.id for the driver of local mode. When we run application in local mode, the property `spark.executor.id` represents `driver` for the driver. While we run in anything else mode, the property represents `driver` for the driver. It's inconsistent. This issue is minor so I didn't file this in JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark fix-driver-identifier Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3812.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3812 commit 4275663d875840bdc0c0da69707386a8b5eb1d3a Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-12-26T12:04:56Z Fixed the value represented by spark.executor.id of local mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3812#issuecomment-68138840 [Test build #24835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24835/consoleFull) for PR 3812 at commit [`4275663`](https://github.com/apache/spark/commit/4275663d875840bdc0c0da69707386a8b5eb1d3a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3046#issuecomment-68139015 [Test build #24837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24837/consoleFull) for PR 3046 at commit [`41ef90e`](https://github.com/apache/spark/commit/41ef90e8ed25b21f1e5c689c478963c74577d81d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4576][SQL] Add concatenation operator
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3433#issuecomment-68139017 [Test build #24836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24836/consoleFull) for PR 3433 at commit [`9b94d48`](https://github.com/apache/spark/commit/9b94d4832f670b5dea0e917654fbeb59450ed1d6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3810#issuecomment-68140334 [Test build #24833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24833/consoleFull) for PR 3810 at commit [`05469de`](https://github.com/apache/spark/commit/05469de9f0482bce54a60161b9cb386a64173826). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3810#issuecomment-68140337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24833/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3811#issuecomment-68140417 [Test build #24834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24834/consoleFull) for PR 3811 at commit [`d99718e`](https://github.com/apache/spark/commit/d99718e85c0b97bddb0e7736a392536ede510c47). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3811#issuecomment-68140420 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24834/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68140510 Not only, actually this PR cover optimizations as follows: ``` And/Or with same condition a a = a , a a a ... = a a || a = a , a || a || a ... = a one And/Or with conditions that can be merged a 2 a 2 = false, a 3 a 5 = a 5 a 2 || a = 2 = true, a 3 || a 5 = a 3 two And/Or with conditions that can be merged (a 3 b 5) || a 2 = b 5 || a 2 (a 3 || b 5) || a 2 = true (a 2 b 5) a 3 = flase (a 2 || b 5) a 3 = b 5 a 3 more than two And/Or with common conditions (a b c ...) || (a b d ...) || (a b e ...) ... = a b ((c ...) || (d ...) || (e ...) || ...) (a || b || c || ...) (a || b || d || ...) (a || b || e || ...) ... = (a || b) || ((c || ...) (f || ...) (e || ...) ...) ``` hi @liancheng, do you mind i refactory this and refer to your PR to cover all the cases above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4576][SQL] Add concatenation operator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3433#issuecomment-68141157 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24836/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4576][SQL] Add concatenation operator
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3433#issuecomment-68141153 [Test build #24836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24836/consoleFull) for PR 3433 at commit [`9b94d48`](https://github.com/apache/spark/commit/9b94d4832f670b5dea0e917654fbeb59450ed1d6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Concat(left: Expression, right: Expression) extends BinaryExpression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3812#issuecomment-68141394 [Test build #24835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24835/consoleFull) for PR 3812 at commit [`4275663`](https://github.com/apache/spark/commit/4275663d875840bdc0c0da69707386a8b5eb1d3a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3812#issuecomment-68141397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24835/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3046#issuecomment-68141516 [Test build #24837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24837/consoleFull) for PR 3046 at commit [`41ef90e`](https://github.com/apache/spark/commit/41ef90e8ed25b21f1e5c689c478963c74577d81d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3046#issuecomment-68141518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24837/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3812#issuecomment-68142138 Is there a functional change here? The value is now driver instead of driver. It sounds good to be consistent but I wonder if there is a reason for the dfference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3811#issuecomment-68142275 Does this define a new system property just for deployment mode? This logic looks like it is applied even when external shuffle service is not enabled. Why is the driver behavior special here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/3812#issuecomment-68143144 There is not a functional change for Spark itself but it's rather than for other systems associating with Spark, like monitoring systems. The property is used for metrics name so this issue can affects for users monitoring driver's metrics. As you mentioned, this change doesn't affects Spark itself but I think, we should consider how features of Spark are used by user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/3811#issuecomment-68143249 If we run client-mode including local-mode, driver runs on client and executors doesn't run on client so no one shared the local directories of the driver. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68149432 Hi @jerryshao I'd politely ask that anyone with questions read at least KafkaRDD.scala and the example usage linked from the jira ticket (it's only about 50 significant lines of code): https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalExample.scala I'll try to address your points. 1. Yes, each RDD partition maps directly to a Kafka (topic, partition, inclusive starting offset, exclusive ending offset) 2. It's a pull model, not a receiver push model. All the InputDStream implementation is doing is checking the leaders' highest offsets and defining an RDD based on that. When the RDD is run, its iterator makes a connection to kafka and pulls the data. This is done because it's simpler, and because using existing network receiver code would require dedicating 1 core per kafka partition, which is unacceptable from an ops standpoint. 3. Yes. The fault tolerance model is that it should be safe for any or all of the spark machines to be completely destroyed at any point in the job, and the job should be able to be safely restarted. I don't think you can do better than this. This is achieved because all important state, especially the storage of offsets, are controlled by client code, not spark. In both the transactional and idempotent client code approaches, offsets aren't stored until data is stored, so restart should be safe. Regarding your approach that you link, the problem there is (a) it's not a part of the spark distribution so people won't know about it, and (b) it assumes control of kafka offsets and storage in zookeeper, which makes it impossible for client code to control exactly once semantics. Regarding the possible semantic disconnect between spark streaming and treating kafka as a durable store of data from the past (assuming that's what you meant)... I agree there is a disconnect there. But it's a fundamental problem with spark streaming in that it implicitly depends on now rather than a time embedded in the data stream. I don't think we're fixing that with this ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/3632#issuecomment-68150977 @markhamstra take a look now. i ignored the situation of K and V having same type, since i think it can be dealt with by using a simple wrapper (value) class for the Vs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3809#discussion_r22287446 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -329,8 +329,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli try { dagScheduler = new DAGScheduler(this) } catch { -case e: Exception = throw - new SparkException(DAGScheduler cannot be initialized due to %s.format(e.getMessage)) +case e: Exception = { + stop() + throw --- End diff -- Style nit: you can use string interpolation instead of String.format, which will allow the `new SparkException` to fit on the same line as `throw`: ```scala throw new SparkException(DAGScheduler cannot be initialized due to ${e.getMessage}) ``` However, I'd prefer to call the two-argument constructor which takes the cause as second argument, since this will lead to more informative stacktraces: ```scala throw new SparkException(Error while constructing DAGScheduler, e) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68155076 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68155166 [Test build #24838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24838/consoleFull) for PR 3809 at commit [`2172578`](https://github.com/apache/spark/commit/217257879fe7c98673caf14b980790498887581e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68155357 This is a nice fix. Resource leaks when SparkContext's constructor throws exceptions have been a longstanding issue. I first ran across the issue while adding logic to detect whether a SparkContext was already running when attempting to create a new one ([SPARK-4180](https://issues.apache.org/jira/browse/SPARK-4180)). In that case, I ran into some issues because I wanted to effectively make the entire constructor synchronized on a static object, but this was hard because there wasn't an explicit constructor method. We could have tried to wrap the entire implicit constructor in a try-finally block, but this would require us to re-organize a huge amount of code and change many `vals` into `vars`. I had an alternative proposal to move the dependency-creation into the SparkContext companion object and pass in a SparkContextDependencies object into SparkContext's constructors, which would solve this issue more generally (but it's a much larger change). See the PR description at #3121 for more deta ils. Barring a big restructuring of SparkContext's constructor, though, small fixes like this are welcome. Therefore, this looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68155681 By the way, I left a [comment over on JIRA](https://issues.apache.org/jira/browse/SPARK-4787?focusedCommentId=14259202page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14259202) about the scope of the SPARK-4787 JIRA. If we merge this PR as-is, without adding more try-catches for other statements that could throw exceptions, then I think we should revise that JIRA to describe only the fix implemented here (error-catching for DAGScheduler errors) and should convert it into a subtask of SPARK-4180. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4971: fix typo in the comment
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3807#issuecomment-68155797 LGTM, so I'll merge this. In the future, I wouldn't bother to file JIRA issues for super-small one-word documentation fixes like this, since the JIRA issue is effectively a duplicate of the PR itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4971: fix typo in the comment
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3807 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68156011 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68156052 Super-minor process nit, but do you mind moving your comment into the PR description itself? The PR description automatically becomes the commit message, so keeping it up-to-date means less work for committers when they merge your PRs since they don't have to fix up the message by hand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68156186 [Test build #24839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24839/consoleFull) for PR 3805 at commit [`41ede0e`](https://github.com/apache/spark/commit/41ede0ee67f77e09f2abe96c981167ed671e0504). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68156238 This class of issue could be a more general problem for our test-suites, since I think there are a number of places where we call things like `new SparkConf()` that might implicitly read defaults from the configuration file. I wonder if there's a more general fix, such as using `Utils.isTesting` to bypass the defaults loading in unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68156344 Also, the PR / JIRA title is confusing; I can't really guess what this patch does based on the title, since fix an implicit bug could mean many different things. A better title would be something like Do not read spark.executor.memory from spark-defaults.conf in SparkSubmitSuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22287877 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -233,6 +236,47 @@ class InputStreamsSuite extends TestSuiteBase with BeforeAndAfter { } } + def testBinaryRecordsStream() { +var ssc: StreamingContext = null +val testDir: File = null +try { + val testDir = Utils.createTempDir() + + Thread.sleep(1000) + // Set up the streaming context and input streams + val newConf = conf.clone.set( +spark.streaming.clock, org.apache.spark.streaming.util.SystemClock) --- End diff -- It looks like this is based on the FileInputStream test, which is known to be flaky. I have a PR open which rewrites that test to not depend on SystemClock / Thread.sleep(): #3801. Therefore, if we want to have this style of test, then this PR should block until my PR is merged so that it can use the new test utilities that I added. Here's the relevant change from my PR: https://github.com/apache/spark/pull/3801/files#diff-4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22287887 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -233,6 +236,47 @@ class InputStreamsSuite extends TestSuiteBase with BeforeAndAfter { } } + def testBinaryRecordsStream() { --- End diff -- Also, since this is only called from one place, I'd just inline this code in the `test(binary records stream)` function rather than defining a whole new function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22287941 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -373,6 +393,25 @@ class StreamingContext private[streaming] ( } /** + * Create an input stream that monitors a Hadoop-compatible filesystem + * for new files and reads them as flat binary files, assuming a fixed length per record, + * generating one byte array per record. Files must be written to the monitored directory + * by moving them from another location within the same file system. File names + * starting with . are ignored. + * @param directory HDFS directory to monitor for new file + * @param recordLength length of each record in bytes + */ + def binaryRecordsStream( + directory: String, + recordLength: Int): DStream[Array[Byte]] = { +val conf = sc_.hadoopConfiguration +conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, recordLength) +val br = fileStream[LongWritable, BytesWritable, FixedLengthBinaryInputFormat](directory, conf) +val data = br.map{ case (k, v) = v.getBytes} --- End diff -- This is an subtly-incorrect usage of `getBytes`, since `getBytes` returns a padded byte array; you need to copy / slice out the subarray with the data using `v.getLength`. see [HADOOP-6298: BytesWritable#getBytes is a bad name that leads to programming mistakes](https://issues.apache.org/jira/browse/HADOOP-6298) for more details. We've hit this problem before in other parts of Spark: - https://issues.apache.org/jira/browse/SPARK-3121 - https://issues.apache.org/jira/browse/SPARK-4901 Here's a PR which shows the correct usage: #2712 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22287961 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -373,6 +393,25 @@ class StreamingContext private[streaming] ( } /** + * Create an input stream that monitors a Hadoop-compatible filesystem + * for new files and reads them as flat binary files, assuming a fixed length per record, + * generating one byte array per record. Files must be written to the monitored directory + * by moving them from another location within the same file system. File names + * starting with . are ignored. + * @param directory HDFS directory to monitor for new file + * @param recordLength length of each record in bytes + */ + def binaryRecordsStream( + directory: String, + recordLength: Int): DStream[Array[Byte]] = { +val conf = sc_.hadoopConfiguration +conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, recordLength) +val br = fileStream[LongWritable, BytesWritable, FixedLengthBinaryInputFormat](directory, conf) +val data = br.map{ case (k, v) = v.getBytes} --- End diff -- Actually, it looks like the same bug is present in the new `binaryRecords()` method in Spark core. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22287997 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -373,6 +393,25 @@ class StreamingContext private[streaming] ( } /** + * Create an input stream that monitors a Hadoop-compatible filesystem + * for new files and reads them as flat binary files, assuming a fixed length per record, + * generating one byte array per record. Files must be written to the monitored directory + * by moving them from another location within the same file system. File names + * starting with . are ignored. + * @param directory HDFS directory to monitor for new file + * @param recordLength length of each record in bytes + */ + def binaryRecordsStream( + directory: String, + recordLength: Int): DStream[Array[Byte]] = { +val conf = sc_.hadoopConfiguration +conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, recordLength) +val br = fileStream[LongWritable, BytesWritable, FixedLengthBinaryInputFormat](directory, conf) +val data = br.map{ case (k, v) = v.getBytes} --- End diff -- Maybe it's not an issue since we're using FixedLengthBinaryInputFormat, but even if it isn't we should have a comment explaining why it's correct or a defensive check that `getBytes` returns an array of the expected length. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r22288031 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -510,6 +510,52 @@ class SparkContext(config: SparkConf) extends Logging { minPartitions).setName(path) } + + /** + * Get an RDD for a Hadoop-readable dataset as PortableDataStream for each file + * (useful for binary data) + * + * + * @param minPartitions A suggestion value of the minimal splitting number for input data. + * + * @note Small files are preferred, large file is also allowable, but may cause bad performance. + */ + @DeveloperApi + def binaryFiles(path: String, minPartitions: Int = defaultMinPartitions): + RDD[(String, PortableDataStream)] = { +val job = new NewHadoopJob(hadoopConfiguration) +NewFileInputFormat.addInputPath(job, new Path(path)) +val updateConf = job.getConfiguration +new BinaryFileRDD( + this, + classOf[StreamInputFormat], + classOf[String], + classOf[PortableDataStream], + updateConf, + minPartitions).setName(path) + } + + /** + * Load data from a flat binary file, assuming each record is a set of numbers + * with the specified numerical format (see ByteBuffer), and the number of + * bytes per record is constant (see FixedLengthBinaryInputFormat) + * + * @param path Directory to the input data files + * @param recordLength The length at which to split the records + * @return An RDD of data with values, RDD[(Array[Byte])] + */ + def binaryRecords(path: String, recordLength: Int, +conf: Configuration = hadoopConfiguration): RDD[Array[Byte]] = { +conf.setInt(recordLength,recordLength) +val br = newAPIHadoopFile[LongWritable, BytesWritable, FixedLengthBinaryInputFormat](path, + classOf[FixedLengthBinaryInputFormat], + classOf[LongWritable], + classOf[BytesWritable], + conf=conf) +val data = br.map{ case (k, v) = v.getBytes} --- End diff -- It turns out that `getBytes` returns a padded byte array, so I think you may need to copy / slice out the subarray with the data using `v.getLength`; see [HADOOP-6298: BytesWritable#getBytes is a bad name that leads to programming mistakes](https://issues.apache.org/jira/browse/HADOOP-6298) for more details. Using `getBytes` without `getLength` has caused bugs in Spark in the past: #2712. Is the use of `getBytes` in this patch a bug? Or is it somehow safe due to our use of FixedLengthBinaryInputFormat? If it is somehow safe, we should have a comment which explains this so that readers who know about the `getBytes` issue aren't confused (or better yet, an `assert` that `getBytes` returns an array of the expected length). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68158154 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68158247 [Test build #24838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24838/consoleFull) for PR 3809 at commit [`2172578`](https://github.com/apache/spark/commit/217257879fe7c98673caf14b980790498887581e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3809#issuecomment-68158250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24838/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68158356 [Test build #24840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24840/consoleFull) for PR 3707 at commit [`d2d41b6`](https://github.com/apache/spark/commit/d2d41b6f74aa8620e7937e6c039e11542a73698c). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68158618 One small correctness question (around quoting) but looks good to me. I can merge this later today and fix it manually if @brennonyork doesn't get around to it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68158971 @brennonyork Does this handle relative paths passed to Maven correctly (if that's a valid potential use case)? We had this problem with the `spark-ec2` script which was caused by the script [changing the working directory on the user](https://github.com/apache/spark/pull/2988). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68159278 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24839/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68159276 [Test build #24839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24839/consoleFull) for PR 3805 at commit [`41ede0e`](https://github.com/apache/spark/commit/41ede0ee67f77e09f2abe96c981167ed671e0504). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2348#issuecomment-68159339 Clickable link for the lazy: [Spark Packages](http://spark-packages.org/) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68161145 This will handle relative directories just fine. The last portion of this script changes the directory back to the `cwd` where the user was calling from so this isn't an issue :) -Original Message- From: Nicholas Chammas [notificati...@github.commailto:notificati...@github.com] Sent: Friday, December 26, 2014 04:30 PM Eastern Standard Time To: apache/spark Cc: York, Brennon Subject: Re: [spark] [SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac (#3707) @brennonyorkhttps://github.com/brennonyork Does this handle relative paths passed to Maven correctly (if that's a valid potential use case)? We had this problem with the spark-ec2 script which was caused by the script changing the working directory on the userhttps://github.com/apache/spark/pull/2988. — Reply to this email directly or view it on GitHubhttps://github.com/apache/spark/pull/3707#issuecomment-68158971. The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/3707#discussion_r22289411 --- Diff: sbt/sbt --- @@ -1,111 +1,9 @@ -#!/usr/bin/env bash +#!/bin/bash -# When creating new tests for Spark SQL Hive, the HADOOP_CLASSPATH must contain the hive jars so -# that we can run Hive to generate the golden answer. This is not required for normal development -# or testing. -for i in $HIVE_HOME/lib/* -do HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$i -done -export HADOOP_CLASSPATH +# Determine the current working directory +_DIR=$( cd $( dirname ${BASH_SOURCE[0]} ) pwd ) -realpath () { -( - TARGET_FILE=$1 +echo WARNING: The sbt/sbt script has been deprecated in place of build/sbt. 2 +echo Please change all references to point to the new location. 2 - cd $(dirname $TARGET_FILE) - TARGET_FILE=$(basename $TARGET_FILE) - - COUNT=0 - while [ -L $TARGET_FILE -a $COUNT -lt 100 ] - do - TARGET_FILE=$(readlink $TARGET_FILE) - cd $(dirname $TARGET_FILE) - TARGET_FILE=$(basename $TARGET_FILE) - COUNT=$(($COUNT + 1)) - done - - echo $(pwd -P)/$TARGET_FILE -) -} - -. $(dirname $(realpath $0))/sbt-launch-lib.bash - - -declare -r noshare_opts=-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=project/.boot -Dsbt.ivy.home=project/.ivy -declare -r sbt_opts_file=.sbtopts -declare -r etc_sbt_opts_file=/etc/sbt/sbtopts - -usage() { - cat EOM -Usage: $script_name [options] - - -h | -help print this message - -v | -verbose this runner is chattier - -d | -debugset sbt log level to debug - -no-colors disable ANSI color codes - -sbt-createstart sbt even if current directory contains no sbt project - -sbt-dir path path to global settings/plugins directory (default: ~/.sbt) - -sbt-boot path path to shared boot directory (default: ~/.sbt/boot in 0.11 series) - -ivy path path to local Ivy repository (default: ~/.ivy2) - -meminteger set memory options (default: $sbt_mem, which is $(get_mem_opts $sbt_mem)) - -no-share use all local caches; no sharing - -no-global uses global caches, but does not use global ~/.sbt directory. - -jvm-debug port Turn on JVM debugging, open at the given port. - -batch Disable interactive mode - - # sbt version (default: from project/build.properties if present, else latest release) - -sbt-version version use the specified version of sbt - -sbt-jar path use the specified jar as the sbt launcher - -sbt-rc use an RC version of sbt - -sbt-snapshot use a snapshot version of sbt - - # java version (default: java from PATH, currently $(java -version 21 | grep version)) - -java-home path alternate JAVA_HOME - - # jvm options and output control - JAVA_OPTS environment variable, if unset uses $java_opts - SBT_OPTS environment variable, if unset uses $default_sbt_opts - .sbtopts if this file exists in the current directory, it is - prepended to the runner args - /etc/sbt/sbtopts if this file exists, it is prepended to the runner args - -Dkey=val pass -Dkey=val directly to the java runtime - -J-X pass option -X directly to the java runtime - (-J is stripped) - -S-X add -X to sbt's scalacOptions (-S is stripped) - -PmavenProfilesEnable a maven profile for the build. - -In the case of duplicated or conflicting options, the order above -shows precedence: JAVA_OPTS lowest, command line options highest. -EOM -} - -process_my_args () { - while [[ $# -gt 0 ]]; do -case $1 in - -no-colors) addJava -Dsbt.log.noformat=true shift ;; - -no-share) addJava $noshare_opts shift ;; - -no-global) addJava -Dsbt.global.base=$(pwd)/project/.sbtboot shift ;; - -sbt-boot) require_arg path $1 $2 addJava -Dsbt.boot.directory=$2 shift 2 ;; - -sbt-dir) require_arg path $1 $2 addJava -Dsbt.global.base=$2 shift 2 ;; - -debug-inc) addJava -Dxsbt.inc.debug=true shift ;; - -batch) exec /dev/null shift ;; - --sbt-create) sbt_create=true shift ;; - - *) addResidual $1 shift ;; -esac - done - - # Now, ensure sbt version is used. - [[ ${sbt_version}XXX != XXX ]] addJava -Dsbt.version=$sbt_version -} - -loadConfigFile() { - cat $1 | sed '/^\#/d' -} - -# if sbtopts files exist, prepend their contents to $@ so it can be processed by this runner -[[ -f
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68161775 [Test build #24840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24840/consoleFull) for PR 3707 at commit [`d2d41b6`](https://github.com/apache/spark/commit/d2d41b6f74aa8620e7937e6c039e11542a73698c). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68161776 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24840/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3916] [Streaming] discover new appended...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2806#issuecomment-68163636 There has been significant refactoring done in the FileInputStream. Can you update the PR accordingly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3916] [Streaming] discover new appended...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2806#issuecomment-68163696 Also, I took a quick look at the PR. Its seems a little complicated to understand just by looking at the code, so could you write a short design doc (or update the PR description) on the high-level technique used to implement this. It does not have to be very detailed, just enough for any one understand the logic and then verify it in the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3754 spark streaming file system api cal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2703#discussion_r22290108 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala --- @@ -250,19 +250,19 @@ class JavaStreamingContext(val ssc: StreamingContext) extends Closeable { * Files must be written to the monitored directory by moving them from another * location within the same file system. File names starting with . are ignored. * @param directory HDFS directory to monitor for new file - * @tparam K Key type for reading HDFS file - * @tparam V Value type for reading HDFS file - * @tparam F Input format for reading HDFS file + * @param inputFormatClass Input format for reading HDFS file + * @param keyClass Key type for reading HDFS file + * @param valueClass Value type for reading HDFS file */ def fileStream[K, V, F : NewInputFormat[K, V]]( - directory: String): JavaPairInputDStream[K, V] = { -implicit val cmk: ClassTag[K] = - implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[K]] -implicit val cmv: ClassTag[V] = - implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[V]] -implicit val cmf: ClassTag[F] = - implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[F]] -ssc.fileStream[K, V, F](directory) +directory: String, +inputFormatClass: Class[F], +keyClass: Class[K], +valueClass: Class[V], newFilesOnly: Boolean = true): JavaPairInputDStream[K, V] = { --- End diff -- Correction on this comment. newFilesOnly should be exposed as it is exposed in the Scala api. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3754 spark streaming file system api cal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2703#discussion_r22290121 --- Diff: streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java --- @@ -1703,6 +1710,65 @@ public void testTextFileStream() { JavaDStreamString test = ssc.textFileStream(/tmp/foo); } + + @Test + public void testFileStream() throws Exception { +// Disable manual clock as FileInputDStream does not work with manual clock +System.setProperty(spark.streaming.clock, org.apache.spark.streaming.util.SystemClock); +ssc = new JavaStreamingContext(local[2], test, new Duration(1000)); +ssc.checkpoint(checkpoint); +// Set up some sequence files for streaming to read in +ListTuple2Long, Integer test_input = new ArrayListTuple2Long, Integer (); +test_input.add(new Tuple2(1L, 123456)); +test_input.add(new Tuple2(2L, 123456)); +JavaPairRDDLong, Integer rdd = ssc.sc().parallelizePairs(test_input); +File tempDir = Files.createTempDir(); +JavaPairRDDLongWritable, IntWritable saveable = rdd.mapToPair( + new PairFunctionTuple2Long, Integer, LongWritable, IntWritable() { +public Tuple2LongWritable, IntWritable call(Tuple2Long, Integer record) { + return new Tuple2(new LongWritable(record._1), new IntWritable(record._2)); +}}); +saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/1/, +LongWritable.class, IntWritable.class, +SequenceFileOutputFormat.class); +saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/2/, +LongWritable.class, IntWritable.class, +SequenceFileOutputFormat.class); + +// Construct a file stream from the above saved data +JavaPairDStreamLongWritable, IntWritable testRaw = ssc.fileStream( + tempDir.getAbsolutePath() + / , SequenceFileInputFormat.class, LongWritable.class, + IntWritable.class, false); +JavaPairDStreamLong, Integer test = testRaw.mapToPair( + new PairFunctionTuple2LongWritable, IntWritable, Long, Integer() { +public Tuple2Long, Integer call(Tuple2LongWritable, IntWritable input) { + return new Tuple2(input._1().get(), input._2().get()); +} + }); +final AccumulatorInteger elem = ssc.sc().intAccumulator(0); --- End diff -- Why is it not possible to just call rdd.count() and add up the counts in a global counter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3754 spark streaming file system api cal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2703#discussion_r22290126 --- Diff: streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java --- @@ -1703,6 +1710,65 @@ public void testTextFileStream() { JavaDStreamString test = ssc.textFileStream(/tmp/foo); } + + @Test + public void testFileStream() throws Exception { +// Disable manual clock as FileInputDStream does not work with manual clock +System.setProperty(spark.streaming.clock, org.apache.spark.streaming.util.SystemClock); +ssc = new JavaStreamingContext(local[2], test, new Duration(1000)); +ssc.checkpoint(checkpoint); +// Set up some sequence files for streaming to read in +ListTuple2Long, Integer test_input = new ArrayListTuple2Long, Integer (); +test_input.add(new Tuple2(1L, 123456)); +test_input.add(new Tuple2(2L, 123456)); +JavaPairRDDLong, Integer rdd = ssc.sc().parallelizePairs(test_input); +File tempDir = Files.createTempDir(); +JavaPairRDDLongWritable, IntWritable saveable = rdd.mapToPair( + new PairFunctionTuple2Long, Integer, LongWritable, IntWritable() { +public Tuple2LongWritable, IntWritable call(Tuple2Long, Integer record) { + return new Tuple2(new LongWritable(record._1), new IntWritable(record._2)); +}}); +saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/1/, +LongWritable.class, IntWritable.class, +SequenceFileOutputFormat.class); +saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/2/, +LongWritable.class, IntWritable.class, +SequenceFileOutputFormat.class); + +// Construct a file stream from the above saved data +JavaPairDStreamLongWritable, IntWritable testRaw = ssc.fileStream( + tempDir.getAbsolutePath() + / , SequenceFileInputFormat.class, LongWritable.class, + IntWritable.class, false); +JavaPairDStreamLong, Integer test = testRaw.mapToPair( + new PairFunctionTuple2LongWritable, IntWritable, Long, Integer() { +public Tuple2Long, Integer call(Tuple2LongWritable, IntWritable input) { + return new Tuple2(input._1().get(), input._2().get()); +} + }); +final AccumulatorInteger elem = ssc.sc().intAccumulator(0); +final AccumulatorInteger total = ssc.sc().intAccumulator(0); +final AccumulatorInteger calls = ssc.sc().intAccumulator(0); +test.foreachRDD(new FunctionJavaPairRDDLong, Integer, Void() { +public Void call(JavaPairRDDLong, Integer rdd) { + rdd.foreach(new VoidFunctionTuple2Long, Integer() { + public void call(Tuple2Long, Integer e) { +if (e._1() == 1l) { + elem.add(1); +} +total.add(1); + } +}); + calls.add(1); + return null; +} + }); +ssc.start(); +Thread.sleep(5000); --- End diff -- Could you make this something like a [`eventually`](http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html) block in ScalaTest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4974]: Prevent Circular dependency.
GitHub user matt2000 opened a pull request: https://github.com/apache/spark/pull/3813 [SPARK-4974]: Prevent Circular dependency. You can merge this pull request into a Git repository by running: $ git pull https://github.com/matt2000/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3813 commit 9e58c96e0904aa214ac5669172b475cfedb65159 Author: Matt Chapman m...@ninjitsuweb.com Date: 2014-12-27T00:50:25Z [SPARK-4974]: Prevent Circular dependency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4974]: Prevent Circular dependency.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3813#issuecomment-68164558 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68164728 @nchammas I spoke too soon earlier regarding it correctly handling relative paths. I fixed it and is now `pwd`-preserving. @pwendell I also fixed the improper quoting issue in `sbt/sbt`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/3559#issuecomment-68165117 @andrewor14 @JoshRosen wondering what should be done with this issue, thoughts on my comments above?? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4974]: Prevent Circular dependency.
Github user matt2000 commented on the pull request: https://github.com/apache/spark/pull/3813#issuecomment-68165298 This is not the right fix. Still working on it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...
Github user tigerquoll commented on the pull request: https://github.com/apache/spark/pull/3426#issuecomment-68165724 Heh @JoshRosen @sryza , should this patch include a serialVersionUID attribute on the classes to be serialized to make sure compiler quirks don't cause different UIDs to be generated for the classes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3426#issuecomment-68166190 Aren't default serialVersionUIDs generated in a consistent way across all JVMs because the algorithm for generating them is part of the Java spec? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3426#issuecomment-68166194 Err, across all compilers? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4694]Fix HiveThriftServer2 cann't stop ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/3576#issuecomment-68166365 Hi, @marmbrus @vanzin this problem also had influence to branch-1.2. Can we need to fix it in branch-1.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...
Github user tigerquoll commented on the pull request: https://github.com/apache/spark/pull/3426#issuecomment-68166415 http://stackoverflow.com/questions/285793/what-is-a-serialversionuid-and-why-should-i-use-it seems to be a good summary of the pros and cons of this approach --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68169507 Actually I'd highly suggest you breaking this PR into at least two self contained PRs, which can be much easier to review and merge. Rule sets 1 and 4 can be merged into one PR, rule sets 2 and 3 into another. Maybe we can remove rules 2 and 3 from this PR after your refactoring and get rule sets 1 and 4 merged first (I realized #3784 doesn't cover all rules in set 4, because the second rule in set 4 doesn't help optimizing cartesian products). The reason why I'm hesitant to include rule sets 2 and 3 is that, for now I don't see a sound yet concise implementation without introducing extra dependencies. Although I proposed the Spark `Interval` solution, I'd rather not introduce Spire. On the other hand, rule sets 1 and 4 have been proven to be both useful and easy to implement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68169549 Hey @scwf, I've posted my reply in #3778, so lets discuss these rules there to prevent distraction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68169633 Would like to add that the solution based on Spire `Interval` I posted above may suffer from floating point precision issue. Thus we might want to cast all integral comparisons to `Interval[Long]` and all fractional comparisons to `Interval[Double]` to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-68169801 [Test build #24841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24841/consoleFull) for PR 1290 at commit [`9fb76ba`](https://github.com/apache/spark/commit/9fb76badb0222fbfec6886152477bef76dc2eef8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68170254 Jenkins, test this please. LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68170272 [Test build #24842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24842/consoleFull) for PR 3707 at commit [`0e5a0e4`](https://github.com/apache/spark/commit/0e5a0e4345c6d1fe466ac574c961e690de2e9744). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4925][SQL] Publish Spark SQL hive-thrif...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3766#discussion_r22291358 --- Diff: pom.xml --- @@ -97,6 +97,7 @@ modulesql/catalyst/module modulesql/core/module modulesql/hive/module +modulesql/hive-thriftserver/module --- End diff -- This should be removed - we only want this enabled with the `-Phive-thriftserver` profile. We always enable that profile when publishing artifacts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4925][SQL] Publish Spark SQL hive-thrif...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3766#issuecomment-68170290 Okay - makes sense. There is one incorrect change in here, but once that's removed we can merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4598] use pagination to show tasktable
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3456#issuecomment-68170532 Let's close this issue. This breaks global pagination which means it can't be merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3148] Update global variables of HttpBr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2059 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4598] use pagination to show tasktable
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3456 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add some shuffle configurations in doc
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2031 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2803: add Kafka stream feature in accord...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1602 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2348 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3662 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [https://issues.apache.org/jira/browse/SPARK-4...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2633 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added support for accessing secured HDFS
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/265 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3046#issuecomment-68170894 Looks good - I'm going to merge this with a slight modification (adding a comment to explain whats going on). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3046 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3955] Different versions between jackso...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3716#issuecomment-68171007 Yeah this looks good - thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3955] Different versions between jackso...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3716 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-68171181 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-68171178 [Test build #24841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24841/consoleFull) for PR 1290 at commit [`9fb76ba`](https://github.com/apache/spark/commit/9fb76badb0222fbfec6886152477bef76dc2eef8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas ` * `class OutputFrame2D( title: String ) extends Frame( title ) ` * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas ` * `class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title) ` * `trait ANNClassifierHelper ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org