[GitHub] spark pull request: [SPARK-12263][Docs]: IllegalStateException: Me...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/10483#issuecomment-167930375 Thanks for the review @srowen. I didn't have access to my machine since I was traveling. Modified the line to bring it to 97 characters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2491] Don't handle uncaught exceptions ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/1482#issuecomment-153467686 What is the status of the PR? Seems no movement for a while. @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/8385 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-145149873 Closing in favor of #8968 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570] [DOCS] Consistent recommendation ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/8968#discussion_r41064621 --- Diff: docs/submitting-applications.md --- @@ -122,21 +123,23 @@ The master URL passed to Spark can be in one of the following formats: Master URLMeaning - local Run Spark locally with one worker thread (i.e. no parallelism at all). - local[K] Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). - local[*] Run Spark locally with as many worker threads as logical cores on your machine. - spark://HOST:PORT Connect to the given Spark standalone + local Run Spark locally with one worker thread (i.e. no parallelism at all). + local[K] Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). + local[*] Run Spark locally with as many worker threads as logical cores on your machine. + spark://HOST:PORT Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default. - mesos://HOST:PORT Connect to the given Mesos cluster. + mesos://HOST:PORT Connect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk:// - yarn-client Connect to a YARN cluster in -client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. + yarn Connect to a YARN cluster in +client or cluster mode depending on the value of --deploy-mode. +The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. - yarn-cluster Connect to a YARN cluster in -cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. + yarn-client Equivalent to yarn with --deploy-mode client --- End diff -- Shouldn't deploy-mode come first here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-144723093 @srowen please go ahead. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-141837563 That's pretty much all I could find. The rest seem to be code pointing to the option of using yarn-cluster, yarn-client and how Spark parses them. Please let me know if I have missed anything. I went through the code looking for --master, yarn-client, yarn-cluster and deploy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-141700189 Corrected the nits. Will search the whole project for other places that I have missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-141679396 @srowen thank you for the note. Haven't had a chance these past few weeks. Should get it done today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-138324428 @srowen Thank you for the note. Will get it done asap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/8054 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8054#issuecomment-137207678 Closing the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8385#issuecomment-136873693 @andrewor14 will address them. Have not had a chance to do this. Will do it by the end of this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/8385 [SPARK-9570][Docs][YARN]Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x' Issue link: https://issues.apache.org/jira/browse/SPARK-9570 Changes made: 1) Added the deploy-mode syntax in favor of yarn-cluster method of submission Requesting review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-9570 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8385.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8385 commit fa93415e860cce590c3392079e93d3ae21ffc83c Author: Neelesh Srinivas Salian Date: 2015-08-09T02:51:52Z Added yarn-deploy-mode alternative commit 437a4d451147f179617628a672eaa795b3b76ea0 Author: Neelesh Srinivas Salian Date: 2015-08-09T02:54:04Z Moved Master URLs closer above before the examples commit 05fe708c24f07f9661a558dfbe51970aa940e4e5 Author: Neelesh Srinivas Salian Date: 2015-08-10T17:14:59Z Removed the addition section commit 98624e89c6b303db4fc30408e14705df021ca591 Author: Neelesh Srinivas Salian Date: 2015-08-10T17:16:14Z Added a section for alternative submission. Distinguished from the shifting of Master URLS commit b8fdd5cd1b11dd7954d1f05bb71b1a2ae740d065 Author: Neelesh Srinivas Salian Date: 2015-08-12T01:43:10Z Added section for preferred yarn and kept the one with deploy-mode for generic submission to help clear up confusion commit 8c65676a6b7a692d07face111d8e998f36ca0151 Author: Neelesh Srinivas Salian Date: 2015-08-12T01:44:36Z Moved the Standalone examples together commit 8a331d0444f58d3c14c1c12c4f087f1a02d5b8d1 Author: Neelesh Srinivas Salian Date: 2015-08-12T21:19:58Z Moved Master URLs commit 0fed23b8dc525f62197d1cd332260a0752d7d35c Author: Neelesh Srinivas Salian Date: 2015-08-13T23:12:06Z Added deploy-mode section to YARN submission commit 670d251db01306ecc6029abaf6fc7d0e7c30dc3f Author: Neelesh Srinivas Salian Date: 2015-08-09T02:51:52Z Added yarn-deploy-mode alternative commit 40d3b80012f2db351446f8f9d6049f8a9f00bf2b Author: Neelesh Srinivas Salian Date: 2015-08-09T02:54:04Z Moved Master URLs closer above before the examples commit 89d15bf63741e3c62017586df35508a6bde821c2 Author: Neelesh Srinivas Salian Date: 2015-08-10T17:14:59Z Removed the addition section commit d2c212aa6e3a4537c0a4a7ad49e83412e47e60e7 Author: Neelesh Srinivas Salian Date: 2015-08-10T17:16:14Z Added a section for alternative submission. Distinguished from the shifting of Master URLS commit 3f25500b5d39b2d6b247a8dca8147c8fd140c7c0 Author: Neelesh Srinivas Salian Date: 2015-08-12T01:43:10Z Added section for preferred yarn and kept the one with deploy-mode for generic submission to help clear up confusion commit 0766da66ccf16ab55c80614776c1f5a7a1877253 Author: Neelesh Srinivas Salian Date: 2015-08-12T01:44:36Z Moved the Standalone examples together commit 46a24d55ffe99431885b57fda50938289a0ed91b Author: Neelesh Srinivas Salian Date: 2015-08-12T21:19:58Z Moved Master URLs commit 91758072dbc954e2c31609dcd2b6232a09fbfdb3 Author: Neelesh Srinivas Salian Date: 2015-08-13T23:12:06Z Added deploy-mode section to YARN submission commit 3052c741f13f9c3c842ce7fa20819bf73043e326 Author: Neelesh Srinivas Salian Date: 2015-08-23T14:32:07Z Merge branch 'SPARK-9570' of https://github.com/nssalian/spark into SPARK-9570 commit c91073ef5ab7fa2e5a8cada89983422960b24a1a Author: Neelesh Srinivas Salian Date: 2015-08-23T15:11:55Z Modified Running on YARN doc commit 3dc79e2d24a76abd32779d09a044240e808ed9fc Author: Neelesh Srinivas Salian Date: 2015-08-23T21:21:33Z Modified submitting applications commit 67a4255f94e828fcfffc6039ddc4872acc2d717d Author: Neelesh Srinivas Salian Date: 2015-08-23T21:44:26Z Removed extra YARN section, there is already a running without --deploy example commit a8b67efb6a8bc28b69a87b4158156b1517e1475d Author: Neelesh Srinivas Salian Date: 2015-08-24T00:14:45Z Added --deploy-mode flags to the yarn submission sections --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/8071 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8071#issuecomment-133972215 Creating a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8071#issuecomment-132668558 @srowen, @sryza , @tgravescs thank you for the feedback. I will get this done this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8071#issuecomment-131176774 @tgravescs, I am not too sure whether to stick with consistency or history or both. If @sryza can weigh in, we can reach a good understanding of where to proceed on this. The goal is to help a user and reduce the confusion (if any) in the submission methods for YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9923][Core]: ShuffleMapStage.numAvailab...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/8183 [SPARK-9923][Core]: ShuffleMapStage.numAvailableOutputs should be an Int instead of Long Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-9923 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8183.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8183 commit 175704f28ee0ff1029426aa17ce059a21d3771cb Author: Neelesh Srinivas Salian Date: 2015-08-14T00:28:23Z SPARK-9923: Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8071#issuecomment-130873054 So, the consensus is to have `yarn-client` and `yarn-cluster` with the deploy-mode as alternative. I agree with @srowen, more places in the code have `yarn-client` and `yarn-cluster`. So the change is in the submitting applications doc wrt YARN and the Running on YARN doc wrt deploy-mode. I'll change accordingly and update the PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8071#issuecomment-130133320 1) Running on YARN (which was fixed in #6924) has simply yarn-client and yarn-cluster for master. and does not have `--deploy-mode` in the page. 2) For Standalone (https://spark.apache.org/docs/latest/spark-standalone.html), there is no such conflict in syntax just the explanation of the "deployment modes of Spark". 3) For Submitting applications (https://spark.apache.org/docs/latest/submitting-applications.html), there exist both the master `yarn-client` and `yarn-cluster` along with `--deploy-mode` since it is a holistic document for the submission and includes local, spark, mesos and yarn. But `--deploy-mode` is only used or appears in the examples to illustrate `supervise`, the rest just point to the master-urls. @tgravescs, @srowen, the latest commits should help alleviate any confusion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/8071#discussion_r36682426 --- Diff: docs/submitting-applications.md --- @@ -48,6 +48,44 @@ Some of the commonly used options are: * `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes. * `application-arguments`: Arguments passed to the main method of your main class, if any +Alternatively, for submitting on yarn, + +{% highlight bash %} +./bin/spark-submit \ + --class + --master + --conf = \ + ... # other options + \ + [application-arguments] +{% endhighlight %} + +* `--master`: The --master parameter is either `yarn-client` or `yarn-cluster`. Defaults to `yarn-client` --- End diff -- Will grep the docs to see the more popular approach to submission amongst the two. Then align the docs to have that approach as a first recommendation and throw the latter as an alternative. The goal is to have a consistent method overall. Any suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/8071#discussion_r36661174 --- Diff: docs/submitting-applications.md --- @@ -48,6 +48,44 @@ Some of the commonly used options are: * `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes. * `application-arguments`: Arguments passed to the main method of your main class, if any +Alternatively, for submitting on yarn, + +{% highlight bash %} +./bin/spark-submit \ + --class + --master + --conf = \ + ... # other options + \ + [application-arguments] +{% endhighlight %} + +* `--master`: The --master parameter is either `yarn-client` or `yarn-cluster`. Defaults to `yarn-client` --- End diff -- @srowen and @tgravescs, users still have the confusion regarding the "recommended/preferred" method of submission. Not sure if it necessary to have a single method or have both ways. I can modify the PR accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/8071 [SPARK-9570][Docs][YARN]Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x' Issue link: https://issues.apache.org/jira/browse/SPARK-9570 Changes made: 1) Added the alternative to job submission to avoid the confusion 2) Moved the Master URLs section closer to the options prior to the examples Requesting review. Is there any other place in the documentation that could add a confusion to the user? Need to maintain a consistent, if not clarify all the submission methods in the documentation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-9570 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8071.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8071 commit fa93415e860cce590c3392079e93d3ae21ffc83c Author: Neelesh Srinivas Salian Date: 2015-08-09T02:51:52Z Added yarn-deploy-mode alternative commit 437a4d451147f179617628a672eaa795b3b76ea0 Author: Neelesh Srinivas Salian Date: 2015-08-09T02:54:04Z Moved Master URLs closer above before the examples commit 05fe708c24f07f9661a558dfbe51970aa940e4e5 Author: Neelesh Srinivas Salian Date: 2015-08-10T17:14:59Z Removed the addition section commit 98624e89c6b303db4fc30408e14705df021ca591 Author: Neelesh Srinivas Salian Date: 2015-08-10T17:16:14Z Added a section for alternative submission. Distinguished from the shifting of Master URLS --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSPARK-9340] - make SparkSQL work with ne...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8032#issuecomment-129243322 Please close this PR in favor of #8063. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8054#issuecomment-129087486 Jenkins, slow test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/8054 [SPARK-4449][Core]Specify port range in spark Specify port range in spark JIRA link: https://issues.apache.org/jira/browse/SPARK-4449 Goal: To add a port range to services Design: (Based on the input and the suggestions in t #3314 and #5722): 1) Added variables maxPort and failedPorts to help the implementation 2) The maxPort was explicitly assigned to be startPort + maxRetries to avoid Retries being lesser or greater than the specified port range. 3) Added the failedPorts ArrayBuffer to catch the failedPorts (during retry) which are in the range of the maxPort - startPort ( see Random logic) 4) This failedPorts list will be checked and the tryPort will not attempt those ports again in the random.. 5) If the randomized port does not belong to the failedPorts list and a privileged port, it will be tried. 6) Thereâs a if block to check if there are sufficient ports left to attempt within the range (not sure if this is needed) Requesting review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-4449 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8054.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8054 commit 2450fefe49f0cd8a109b556e5b4efe4c3bf7d9fa Author: Neelesh Srinivas Salian Date: 2015-08-08T21:10:36Z cleanUp unused imports and random import commit dcd512627c48911228ed0f3649d486fdaa0b1ce8 Author: Neelesh Srinivas Salian Date: 2015-08-08T22:51:18Z Initialized port commit cf4af1a76ddbda7e9f83a64d1d2c323ba6eeb82a Author: Neelesh Srinivas Salian Date: 2015-08-09T01:12:32Z Added logic for port range commit 52326701bef8e22ab70a6c050f210c71de234fd7 Author: Neelesh Srinivas Salian Date: 2015-08-09T01:49:21Z Modified logic to include privileged ports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSPARK-9340] - make SparkSQL work with ne...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/8032#issuecomment-128773449 @dguy, please do the PR against the Master branch. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/7362 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7362#issuecomment-121398427 Closing. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7362#issuecomment-121072720 @tdas Removed it. Thank you for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7362#discussion_r34516213 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -299,6 +302,26 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo Thread.sleep(100) } + test ("registering and de-registering of streamingSource") { +val conf = new SparkConf().setMaster(master).setAppName(appName) +ssc = new StreamingContext(conf, batchDuration) +assert(ssc.getState() === StreamingContextState.INITIALIZED) +addInputStream(ssc).register() +ssc.start() + +val sources = StreamingContextSuite.getSources(ssc.env.metricsSystem) +val streamingSource = StreamingContextSuite.getStreamingSource(ssc) +assert(sources.contains(streamingSource)) +assert(ssc.getState() === StreamingContextState.ACTIVE) +Thread.sleep(100) --- End diff -- Removed it. Added it during my runs. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7362#issuecomment-121057498 @tdas does this PR need more improvement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7362#issuecomment-120933483 Added the changes and ran ./dev/scalastyle and ~test-only *StreamingContextSuite. Both passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7362#discussion_r34456310 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -33,8 +33,12 @@ import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.dstream.DStream import org.apache.spark.streaming.receiver.Receiver import org.apache.spark.util.Utils -import org.apache.spark.{Logging, SparkConf, SparkContext, SparkException, SparkFunSuite} - +import org.apache.spark.{Logging, SparkConf, SparkContext, SparkFunSuite} --- End diff -- Made the change in the next commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7362#discussion_r34455972 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -192,11 +192,8 @@ class StreamingContext private[streaming] ( None } - /** Register streaming source to metrics system */ + /* Initializing a streamingSource to register metrics */ --- End diff -- It previously held the block of code that did the registration. Here it simply initializes the streamingSource --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7362#issuecomment-120781935 @tdas and @srowen, the new PR for the JIRA. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120781877 Closing this PR: for #7362 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/7250 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/7362 [SPARK-8743] [Streaming]: Deregister Codahale metrics for streaming when StreamingContext is closed The issue link: https://issues.apache.org/jira/browse/SPARK-8743 Deregister Codahale metrics for streaming when StreamingContext is closed Design: Adding the method calls in the appropriate start() and stop () methods for the StreamingContext Actions in the PullRequest: 1) Added the registerSource method call to the start method for the Streaming Context. 2) Added the removeSource method to the stop method. 3) Added comments for both 1 and 2 and comment to show initialization of the StreamingSource 4) Added a test case to check for both registration and de-registration of metrics Previous closed PR for reference: https://github.com/apache/spark/pull/7250 You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark branch-SPARK-8743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7362.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7362 commit d8cb577b24f42a0509ee3a0fffb09181abf4137e Author: Neelesh Srinivas Salian Date: 2015-07-13T01:38:36Z Added registerSource to start() and removeSource to stop(). Wrote a test to check the registration and de-registration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120776405 @tdas @srowen, shall I create a fresh PR to avoid any confusion? I can reference this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/7250 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120774231 Right. The negation took care of the failure above and I fixed the assertion error as well. Removed the spacing for all of them. Made a unified commit. 299a57d Ignore the last one, I''l revert that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
GitHub user nssalian reopened a pull request: https://github.com/apache/spark/pull/7250 [SPARK-8743] [Streaming]: Deregister Codahale metrics for streaming when StreamingContext is closed The issue link: https://issues.apache.org/jira/browse/SPARK-8743 Deregister Codahale metrics for streaming when StreamingContext is closed Design: Adding the method calls in the appropriate start() and stop () methods for the StreamingContext Actions in the PullRequest: 1) Added the registerSource method call to the start method for the Streaming Context. 2) Added the removeSource method to the stop method. 3) Added comments for both 1 and 2 and comment to show initialization of the StreamingSource Requesting Review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-8743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7250 commit 92fa04b16cad1e945ab8a4d1be752f08a241e922 Author: Neelesh Srinivas Salian Date: 2015-07-07T03:16:51Z SPARK-8743: Added the registerSource method call to the start method for the Streaming Context. Added the removeSource method to the stop method. Added comments for both commit a665965a40c7b49cc13d8ab38f8da194693d9845 Author: Neelesh Srinivas Salian Date: 2015-07-07T07:23:20Z Added // instead of /** for commenting in code commit 7621adf368112056eb2e137f62adc429851fa570 Author: Neelesh Srinivas Salian Date: 2015-07-07T10:36:35Z Added indentation and Space at the comment on line 578; Registering.. commit 18bcc7e164b46ecd969a266579f4349444373a0c Author: Neelesh Srinivas Salian Date: 2015-07-07T23:20:29Z Added test case for de-register metrics and made a change to the scope of the sources ArrayBuffer commit e4f00d7577dec22278f5b6243791c4c33e0eb373 Author: Neelesh Srinivas Salian Date: 2015-07-08T18:56:30Z Added additional variable to check the updated Sources size value to compare with the original size after removal commit f5e47e0725adee6792081a8a4ed765ede1759e9c Author: Neelesh Srinivas Salian Date: 2015-07-09T01:02:33Z Added the removeSource method in try commit d04fd2a2b2e4433d49c0860c4d4028564081db31 Author: Neelesh Srinivas Salian Date: 2015-07-09T16:23:17Z Removed the assert for the env field, added the registerSource line in the INITIALIZED block and kept the removeSource() in the ACTIVE block commit e2c3bf82c226e38282a9a17b80771b58dcc6cc55 Author: Neelesh Srinivas Salian Date: 2015-07-09T22:02:50Z Added test to check registering and de-registering of streamingSource commit 742398c334c71bcd1b2b702a9abc5e4ab1288d9e Author: Neelesh Srinivas Salian Date: 2015-07-09T22:08:55Z Removed unused imports commit ca081fa3effd0303d04a034af5c5a0e8facd3b2d Author: Neelesh Srinivas Salian Date: 2015-07-10T02:33:38Z Moved the registerSource() call before line 601 commit 33a2091a4984b8e29143ab2e1202751a87e838b3 Author: Neelesh Srinivas Salian Date: 2015-07-10T21:31:53Z Changed scope of sources and corrected comments for helper commit a67918cb9732936ea84427f728086985f7319e3a Author: Neelesh Srinivas Salian Date: 2015-07-10T21:41:16Z Removed extra line in Helper Methods section commit 74598cec17a6ec54a64f3fc0f8c336d6ba19cc1e Author: Neelesh Srinivas Salian Date: 2015-07-11T02:16:18Z Added helper method for private methods and changed the test logic to check for Sources containing or not containing StreamingSource commit e37a2f3cc3364f6819205b5eac39d0603eb91ac5 Author: Neelesh Srinivas Salian Date: 2015-07-12T14:38:09Z Changed import statements to remove unnecessary imports and add specific imports commit f54afcf78819ad30a59318164515457b47c31d7d Author: Neelesh Srinivas Salian Date: 2015-07-12T14:43:29Z Removed types for fields in test for registering and deregistering metrics commit ea0dc1a74848af8682bb8fc1e0f03b5261591f6e Author: Neelesh Srinivas Salian Date: 2015-07-12T22:38:00Z Changed imports statements, negated test statement and removed postfix commit a0f1950937c36d7f568c5be545cf72ba5afe36ee Author: Neelesh Srinivas Salian Date: 2015-07-12T22:43:39Z Removed added comment to Assert for INITIALIZED state commit 2a812878643a1e97e0113c7be7a195ae3c740b48 Author: Neelesh Srinivas Salian Date: 2015-07-12T22:49:13Z Removing the INITIALIZED check since after start() the state moves to ACTIVE and this check fails commit 5d3af311abe18e0db8cecb36141432af07d3afcb Author: Neelesh Srinivas Salian Date: 2015-07-12T23:03:34Z Move the INITIALIZED state check to when the ssc is initialized commit 299a57d0b909b2be968f17723736c66c0e61fdcd Author: Neelesh Srinivas Salian Date: 2015
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120774219 Right. The negation took care of the failure above and I fixed the assertion error as well. Removed the spacing for all of them. Made a unified commit. 299a57d Ignore the last one, I''l revert that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120770889 The test failed with the assert(ssc.getState() === StreamingContextState.INITIALIZED) as after the start() method, the state goes to ACTIVE and fails to match with INITIALIZED. @srowen, I've added the changes as you suggested. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120748280 Hitting this error on the test for StreamingContextSuite. Is the streamingSource not being found in the right ArrayBuffer? I tried different variations of the registrations to try alleviate this. Didn't help. @tdas any suggestions? Ran this: build/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0 -Pscala-2.10 project streaming ~test-only *StreamingContextSuite Test failure message: ArrayBuffer(org.apache.spark.scheduler.DAGSchedulerSource@1473d83a, org.apache.spark.storage.BlockManagerSource@7560392b) did not contain org.apache.spark.streaming.StreamingSource@1c71ddd7 (StreamingContextSuite.scala:322) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120668033 @tdas, I added the new test as per PrivateMethodTester. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34401188 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -297,6 +299,23 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo Thread.sleep(100) } + test("registering and de-registering of streamingSource") { +val conf = new SparkConf().setMaster(master).setAppName(appName) +ssc = new StreamingContext(conf, batchDuration) +addInputStream(ssc).register() + +ssc.start() +assert(ssc.getState() === StreamingContextState.INITIALIZED) + assert(StreamingContextSuite.sources.get(StreamingContextSuite.streamingSource)!= "null") --- End diff -- Makes sense. I'll write that up. Was having problems when I initially wrote as it was in ExecutionManagerSuite. I'll figure out something. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34399522 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -297,6 +299,23 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo Thread.sleep(100) } + test("registering and de-registering of streamingSource") { +val conf = new SparkConf().setMaster(master).setAppName(appName) +ssc = new StreamingContext(conf, batchDuration) +addInputStream(ssc).register() + +ssc.start() +assert(ssc.getState() === StreamingContextState.INITIALIZED) + assert(StreamingContextSuite.sources.get(StreamingContextSuite.streamingSource)!= "null") --- End diff -- Checking to see if the source is not returning a null and is actually present in the sources ArrayBuffer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34398361 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -796,3 +815,20 @@ package object testPackage extends Assertions { } } } + +/** + * Helper methods for testing StreamingContextSuite. + * This includes methods to access private methods and fields in ExecutorAllocationManager. --- End diff -- Apologies for missing that. Added in the latest commit with the sources variable and the changing of the comment text. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34397673 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -796,3 +815,20 @@ package object testPackage extends Assertions { } } } + +/** + * Helper methods for testing StreamingContextSuite. + * This includes methods to access private methods and fields in ExecutorAllocationManager. --- End diff -- I figured that would be a good place holder for future methods that may need to be included. Can re-word accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-120490045 @tdas made the changes as mentioned. Does this PR need anything additional/ different? Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34327159 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -581,6 +579,9 @@ class StreamingContext private[streaming] ( case INITIALIZED => startSite.set(DStream.getCreationSite()) sparkContext.setCallSite(startSite.get) +// Registering Streaming Metrics at the start of the StreamingContext +assert(env.metricsSystem != null) +env.metricsSystem.registerSource(streamingSource) StreamingContext.ACTIVATION_LOCK.synchronized { --- End diff -- Changed in the latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34273553 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -297,6 +296,23 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo Thread.sleep(100) } + test("de-register codahale metrics on stop()") { --- End diff -- Thanks for the comments. Will improve the test and add it in the next commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34273015 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -577,6 +575,10 @@ class StreamingContext private[streaming] ( * @throws IllegalStateException if the StreamingContext is already stopped. */ def start(): Unit = synchronized { +// Registering Streaming Metrics at the start of the StreamingContext +assert(env != null) --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34273073 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -577,6 +575,10 @@ class StreamingContext private[streaming] ( * @throws IllegalStateException if the StreamingContext is already stopped. */ def start(): Unit = synchronized { +// Registering Streaming Metrics at the start of the StreamingContext +assert(env != null) +assert(env.metricsSystem != null) +env.metricsSystem.registerSource(streamingSource) --- End diff -- Done. Added The above comment and this into the latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34217790 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -674,6 +676,8 @@ class StreamingContext private[streaming] ( logWarning("StreamingContext has already been stopped") case ACTIVE => scheduler.stop(stopGracefully) + // De-registering Streaming Metrics of the StreamingContext + env.metricsSystem.removeSource(streamingSource) --- End diff -- The idea was to register at the call of the `start()`. So, based on your comment, that would mean registering the sources after the state is set to INITIALIZED and before. `def start(): Unit = synchronized { // Registering Streaming Metrics at the start of the StreamingContext assert(env != null) assert(env.metricsSystem != null) env.metricsSystem.registerSource(streamingSource)` Makes sense to have it after `INITIALIZED` and before the synchronized block of `ACTIVE` and `STOPPED`. @tdas, could add more light. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34215376 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -73,7 +73,7 @@ private[spark] class MetricsSystem private ( private[this] val metricsConfig = new MetricsConfig(conf) private val sinks = new mutable.ArrayBuffer[Sink] - private val sources = new mutable.ArrayBuffer[Source] + val sources = new mutable.ArrayBuffer[Source] --- End diff -- Thanks @jerryshao will add a test similarly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/7250#discussion_r34215350 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -688,6 +690,8 @@ class StreamingContext private[streaming] ( } finally { // The state should always be Stopped after calling `stop()`, even if we haven't started yet state = STOPPED + // De-registering Streaming Metrics of the StreamingContext + env.metricsSystem.removeSource(streamingSource) --- End diff -- Changed it on my local. Added to the latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119696686 Based on @srowen's comment, I made the change and added updatedSourcesSize to check the ArrayBuffer size after the source is remove to assert that the size was indeed decremented. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119679355 @srowen, I wanted to check whether the size was decremented at all. Couldn't think of a way to assert that the source has been removed since the sources ArrayBuffer is still private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119372847 To illustrate, Test case: 1) Testing start of the streamingContext and checking state. 2) Storing the size of the sources ArrayBuffer which will have a new source added 3) Sleep for 100 ms. 4) Stopping context and checking state 5) Also checking whether the size of the ArrayBuffer was decreased as the source was removed. I changed the scope of the sources ArrayBuffer to do this. Would like some feedback on this approach. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119344429 Will update shortly with the changes. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119288642 Should have phrased it better, the tests ran fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119287340 nsalian-MBP:spark nsalian$ ./dev/scalastyle Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 Scalastyle checks passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119283309 @srowen, I made the changes and ran a dev test on my repo. The scala errors weren't present. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7250#issuecomment-119103462 @jerryshao thank you for the comment. I made the changes. Please let me know if you think I could add anything additional. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/7250 [SPARK-8743] [Streaming]: De-registering Codahale Metrics The issue link: Deregister Codahale metrics for streaming when StreamingContext is closed Design: Adding the method calls in the appropriate start() and stop () methods for the StreamingContext Actions in the PullRequest: 1) Added the registerSource method call to the start method for the Streaming Context. 2) Added the removeSource method to the stop method. 3) Added comments for both 1 and 2 and comment to show initialization of the StreamingSource Requesting Review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-8743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7250 commit 92fa04b16cad1e945ab8a4d1be752f08a241e922 Author: Neelesh Srinivas Salian Date: 2015-07-07T03:16:51Z SPARK-8743: Added the registerSource method call to the start method for the Streaming Context. Added the removeSource method to the stop method. Added comments for both --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming] Added call to removeS...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/7249#issuecomment-119056839 TD, Thanks for the comments. Makes sense. Will create a new PR for this one. I pulled from upstream during my changes so a bunch went in along with my singular file. Closing this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming] Added call to removeS...
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/7249 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8743] [Streaming] Added call to removeS...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/7249 [SPARK-8743] [Streaming] Added call to removeSource to help de-register the streaming metrics You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-8743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7249 commit 151c298d97a435b76ccb54a64e2fef21a2ab7285 Author: Neelesh Srinivas Salian Date: 2015-06-21T04:36:32Z SPARK-3629: Improvement of the Spark on YARN document commit 8e8db7fc2c3337ae99cd84043e49eaf919dfed7c Author: Neelesh Srinivas Salian Date: 2015-06-21T17:42:48Z Removed the changes in this commit to help clearly distinguish movement from update commit 9cbc072ce82c766b8d0716cd7469e572efeee14e Author: Neelesh Srinivas Salian Date: 2015-06-21T17:44:05Z Updated a few lines in the Launching Spark on YARN Section commit 40dbc0b068741f179dac43299fa45333b62f93fd Author: Neelesh Srinivas Salian Date: 2015-06-22T03:22:30Z Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line commit 944b7a09f5acff0d4d11a663b2fed02aa7ed5105 Author: Neelesh Srinivas Salian Date: 2015-06-22T17:17:21Z Changed the lines about deploy-mode and added backticks to all parameters commit a71fe2cdad562798fd9ba8f7bef3d6e95bf8d339 Author: Neelesh Srinivas Salian Date: 2015-07-02T01:20:14Z Merge branch 'master' of https://github.com/apache/spark into SPARK-8743 commit 5ddaec388e8720c600fd36450ad8afd96d5a84ff Author: Neelesh Srinivas Salian Date: 2015-07-07T02:08:47Z Merge branch 'master' of https://github.com/apache/spark into SPARK-8743 commit f4ef2f984d3f9deaf712cc2fe311aede068333d7 Author: Neelesh Srinivas Salian Date: 2015-07-07T02:30:23Z SPARK-8743: Added the RemoveSource call to de-register Source after streaming --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6924#issuecomment-114930404 @mateiz, does this PR need any more changes? Please let me know. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6924#issuecomment-114512955 That is correct. I moved the texts a few commits ago. The latter commits were just formatting and changing yarn. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/6924#discussion_r32958854 --- Diff: docs/running-on-yarn.md --- @@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) was added to Spark in version 0.6.0, and improved in subsequent releases. +# Launching Spark on YARN + +Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster. +These configs are used to write to HDFS and connect to the YARN ResourceManager. The +configuration contained in this directory will be distributed to the YARN cluster so that all +containers used by the application use the same configuration. If the configuration references +Java system properties or environment variables not managed by YARN, they should also be set in the +Spark application's configuration (driver, executors, and the AM when running in client mode). + +There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. +(Default: `--deploy-mode client`) --- End diff -- Makes sense. Changed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/6924#discussion_r32958837 --- Diff: docs/running-on-yarn.md --- @@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) was added to Spark in version 0.6.0, and improved in subsequent releases. +# Launching Spark on YARN + +Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster. +These configs are used to write to HDFS and connect to the YARN ResourceManager. The +configuration contained in this directory will be distributed to the YARN cluster so that all +containers used by the application use the same configuration. If the configuration references +Java system properties or environment variables not managed by YARN, they should also be set in the +Spark application's configuration (driver, executors, and the AM when running in client mode). + +There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. +(Default: `--deploy-mode client`) + +Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the "master" parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the master parameter is yarn. For a specific yarn deployment, use --deploy-mode to specify yarn-cluster or yarn-client. --- End diff -- Made the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6924#issuecomment-113995711 @mateiz made the changes. Not sure about the master yarn sentence. Please let me know what do you think about it. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/6924#discussion_r32898107 --- Diff: docs/running-on-yarn.md --- @@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) was added to Spark in version 0.6.0, and improved in subsequent releases. +# Launching Spark on YARN + +Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster. +These configs are used to write to the dfs and connect to the YARN ResourceManager. The +configuration contained in this directory will be distributed to the YARN cluster so that all +containers used by the application use the same configuration. If the configuration references +Java system properties or environment variables not managed by YARN, they should also be set in the +Spark application's configuration (driver, executors, and the AM when running in client mode). + +There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. +(Default: --deploy-mode client) + +Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the "master" parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the master parameter is yarn. --- End diff -- We could say something like: "Thus, the master parameter is yarn. For a specific deployment, use --deploy-mode to specify yarn-cluster or yarn-client" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/6924#discussion_r32897974 --- Diff: docs/running-on-yarn.md --- @@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) was added to Spark in version 0.6.0, and improved in subsequent releases. +# Launching Spark on YARN + +Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster. +These configs are used to write to the dfs and connect to the YARN ResourceManager. The +configuration contained in this directory will be distributed to the YARN cluster so that all +containers used by the application use the same configuration. If the configuration references +Java system properties or environment variables not managed by YARN, they should also be set in the +Spark application's configuration (driver, executors, and the AM when running in client mode). + +There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. +(Default: --deploy-mode client) + +Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the "master" parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the master parameter is yarn. --- End diff -- So, in spark-submit the options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local. So just yarn or specifically client and cluster. I would suggest keeping it as yarn since --deploy-mode covers the client or cluster part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6924#issuecomment-113935211 @srowen makes sense. Made 2 commits to reflect the updates. @mateiz, please let me know if there are any additional changes that need to go. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6924#issuecomment-113864731 @srowen, please review when you get the chance. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/6924 [SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" document As per the description in the JIRA, I moved the contents of the page and added a few additional content. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-3629 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6924.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6924 commit 151c298d97a435b76ccb54a64e2fef21a2ab7285 Author: Neelesh Srinivas Salian Date: 2015-06-21T04:36:32Z SPARK-3629: Improvement of the Spark on YARN document --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6862#issuecomment-113211610 @davies and @koeninger Thank you for the comments. Do you think any other changes need to go in? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...
Github user nssalian commented on a diff in the pull request: https://github.com/apache/spark/pull/6862#discussion_r32748615 --- Diff: docs/streaming-programming-guide.md --- @@ -1937,6 +1937,16 @@ JavaPairDStream unifiedStream = streamingContext.union(kafkaStre unifiedStream.print(); {% endhighlight %} + +{% highlight python %} +numStreams = 5 +kafkaStreams = [] +for _ in range (numStreams): + kafkaStreams.append(KafkaUtils.createStream(...)) --- End diff -- Made the changes as per @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6862#issuecomment-112947266 @srowen , I changed the Kafka append, the loop structure and the print method call. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8320 - Add example in streaming programm...
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6862#issuecomment-112874087 @srowen could you please review this PR? Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8320 - Add example in streaming programm...
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/6862 SPARK-8320 - Add example in streaming programming guide that shows union of multiple input streams Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html to the Level of Parallelism in Data Receiving section. Please review and let me know if there are any additional changes that are needed. Thank you. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-8320 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6862.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6862 commit 3fc5c6da0ebba20450c19a92a636b7e1b0b9219f Author: Neelesh Srinivas Salian Date: 2015-06-17T16:18:17Z SPARK-8320 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Adding Python code for Spark 8320
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6861#issuecomment-112872941 Will do. Thank you @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Adding Python code for Spark 8320
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/6861 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Adding Python code for Spark 8320
Github user nssalian commented on the pull request: https://github.com/apache/spark/pull/6861#issuecomment-112870276 @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Addition of Python example for SPARK-8320
Github user nssalian closed the pull request at: https://github.com/apache/spark/pull/6860 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Adding Python code for Spark 8320
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/6861 Adding Python code for Spark 8320 Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html to the Level of Parallelism in Data Receiving section. Please review and let me know if there are any additional changes that are needed. Thank you. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-8320 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6861.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6861 commit 82a396c2f594bade276606dcd0c0545a650fb838 Author: Holden Karau Date: 2015-05-29T21:59:18Z [SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd Author: Holden Karau Closes #6464 from holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the following commits: de1e644 [Holden Karau] Fix the test to get the partitioner bdb31cc [Holden Karau] Add Mima exclude for the new method 347ef4c [Holden Karau] Add a quick little test for the partitioner JavaAPI f49dca9 [Holden Karau] Add partitoner information to JavaRDDLike and fix some whitespace commit 5fb97dca9bcfc29ac33823554c8783997e811b99 Author: Shivaram Venkataraman Date: 2015-05-29T22:08:30Z [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init cc davies Author: Shivaram Venkataraman Closes #6507 from shivaram/sparkr-init and squashes the following commits: 6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init commit dbf8ff38de0f95f467b874a5b527dcf59439efe8 Author: Ram Sriharsha Date: 2015-05-29T22:22:26Z [SPARK-6013] [ML] Add more Python ML examples for spark.ml Author: Ram Sriharsha Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits: 732506e [Ram Sriharsha] Code Review Feedback 121c211 [Ram Sriharsha] python style fix 5f9b8c3 [Ram Sriharsha] python style fixes 925ca86 [Ram Sriharsha] Simple Params Example 8b372b1 [Ram Sriharsha] GBT Example 965ec14 [Ram Sriharsha] Random Forest Example commit 8c9979337f193c72fd2f1a891909283de53777e3 Author: Andrew Or Date: 2015-05-29T22:26:49Z [HOTFIX] [SQL] Maven test compilation issue Tests compile in SBT but not Maven. commit a4f24123d8857656524c9138c7c067a4b1033a5e Author: Andrew Or Date: 2015-05-30T00:19:46Z [HOT FIX] [BUILD] Fix maven build failures This patch fixes a build break in maven caused by #6441. Note that this patch reverts the changes in flume-sink because this module does not currently depend on Spark core, but the tests require it. There is not an easy way to make this work because mvn test dependencies are not transitive (MNG-1378). For now, we will leave the one test suite in flume-sink out until we figure out a better solution. This patch is mainly intended to unbreak the maven build. Author: Andrew Or Closes #6511 from andrewor14/fix-build-mvn and squashes the following commits: 3d53643 [Andrew Or] [HOT FIX #6441] Fix maven build failures commit 3792d25836e1e521da64c5a62ca1b6cca1bcb6b9 Author: Taka Shinagawa Date: 2015-05-30T03:35:14Z [DOCS][Tiny] Added a missing dash(-) in docs/configuration.md The first line had only two dashes (--) instead of three(---). Because of this missing dash(-), 'jekyll build' command was not converting configuration.md to _site/configuration.html Author: Taka Shinagawa Closes #6513 from mrt/docfix3 and squashes the following commits: c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from converting configuration.md to html format commit 7ed06c39922ac90acab3a78ce0f2f21184ed68a5 Author: Burak Yavuz Date: 2015-05-30T05:19:15Z [SPARK-7957] Preserve partitioning when using randomSplit cc JoshRosen Thanks for noticing this! Author: Burak Yavuz Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits: 497465d [Burak Yavuz] addressed code review 293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit commit 609c4923f98c188bce60ae35c1c8a08a8dfd95f1 Author: Andrew Or Date: 2015-05-30T05:57:46Z [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike This is a follow-up patch to #6441. Author: Andrew Or Closes #6510 from andrewor14/extends-funsuite-check and squashes the following commits: 6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check 99d02ac [Andrew Or] Merge branch 'master' of github.c
[GitHub] spark pull request: Addition of Python example for SPARK-8320
GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/6860 Addition of Python example for SPARK-8320 Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html to the Level of Parallelism in Data Receiving section. Please review and let me know if there are any additional changes that are needed. Thank you. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6860.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6860 commit 5a1a1075a607be683f008ef92fa227803370c45f Author: Andrew Or Date: 2015-05-04T16:17:55Z [MINOR] Fix python test typo? I suspect haven't been using anaconda in tests in a while. I wonder if this change actually does anything but this line as it stands looks strictly less correct. Author: Andrew Or Closes #5883 from andrewor14/fix-run-tests-typo and squashes the following commits: a3ad720 [Andrew Or] Fix typo? commit e0833c5958bbd73ff27cfe6865648d7b6e5a99bc Author: Xiangrui Meng Date: 2015-05-04T18:28:59Z [SPARK-5956] [MLLIB] Pipeline components should be copyable. This PR added `copy(extra: ParamMap): Params` to `Params`, which makes a copy of the current instance with a randomly generated uid and some extra param values. With this change, we only need to implement `fit` and `transform` without extra param values given the default implementation of `fit(dataset, extra)`: ~~~scala def fit(dataset: DataFrame, extra: ParamMap): Model = { copy(extra).fit(dataset) } ~~~ Inside `fit` and `transform`, since only the embedded values are used, I added `$` as an alias for `getOrDefault` to make the code easier to read. For example, in `LinearRegression.fit` we have: ~~~scala val effectiveRegParam = $(regParam) / yStd val effectiveL1RegParam = $(elasticNetParam) * effectiveRegParam val effectiveL2RegParam = (1.0 - $(elasticNetParam)) * effectiveRegParam ~~~ Meta-algorithm like `Pipeline` implements its own `copy(extra)`. So the fitted pipeline model stored all copied stages (no matter whether it is a transformer or a model). Other changes: * `Params$.inheritValues` is moved to `Params!.copyValues` and returns the target instance. * `fittingParamMap` was removed because the `parent` carries this information. * `validate` was renamed to `validateParams` to be more precise. TODOs: * [x] add tests for newly added methods * [ ] update documentation jkbradley dbtsai Author: Xiangrui Meng Closes #5820 from mengxr/SPARK-5956 and squashes the following commits: 7bef88d [Xiangrui Meng] address comments 05229c3 [Xiangrui Meng] assert -> assertEquals b2927b1 [Xiangrui Meng] organize imports f14456b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5956 93e7924 [Xiangrui Meng] add tests for hasParam & copy 463ecae [Xiangrui Meng] merge master 2b954c3 [Xiangrui Meng] update Binarizer 465dd12 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5956 282a1a8 [Xiangrui Meng] fix test 819dd2d [Xiangrui Meng] merge master b642872 [Xiangrui Meng] example code runs 5a67779 [Xiangrui Meng] examples compile c76b4d1 [Xiangrui Meng] fix all unit tests 0f4fd64 [Xiangrui Meng] fix some tests 9286a22 [Xiangrui Meng] copyValues to trained models 53e0973 [Xiangrui Meng] move inheritValues to Params and rename it to copyValues 9ee004e [Xiangrui Meng] merge copy and copyWith; rename validate to validateParams d882afc [Xiangrui Meng] test compile f082a31 [Xiangrui Meng] make Params copyable and simply handling of extra params in all spark.ml components commit f32e69ecc333867fc966f65cd0aeaeddd43e0945 Author: äºå³¤ Date: 2015-05-04T19:08:38Z [SPARK-7319][SQL] Improve the output from DataFrame.show() Author: äºå³¤ Closes #5865 from kaka1992/df.show and squashes the following commits: c79204b [äºå³¤] Update a1338f6 [äºå³¤] Update python dataFrame show test and add empty df unit test. 734369c [äºå³¤] Update python dataFrame show test and add empty df unit test. 84aec3e [äºå³¤] Update python dataFrame show test and add empty df unit test. 159b3d5 [äºå³¤] update 03ef434 [äºå³¤] update 7394fd5 [äºå³¤] update test show ced487a [äºå³¤] update pep8 b6e690b [äºå³¤] Merge remote-tracking branch 'upstream/master' into df.show 30ac311 [äºå³¤] [SPARK-7294] ADD BETWEEN 7d62368 [äºå³¤] [SPARK-729