[GitHub] spark pull request: [SPARK-12263][Docs]: IllegalStateException: Me...

2015-12-29 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/10483#issuecomment-167930375
  
Thanks for the review @srowen. I didn't have access to my machine since I 
was traveling. 
Modified the line to bring it to 97 characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2491] Don't handle uncaught exceptions ...

2015-11-03 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/1482#issuecomment-153467686
  
What is the status of the PR? Seems no movement for a while. @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-10-02 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/8385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-10-02 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-145149873
  
Closing in favor of #8968 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570] [DOCS] Consistent recommendation ...

2015-10-02 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/8968#discussion_r41064621
  
--- Diff: docs/submitting-applications.md ---
@@ -122,21 +123,23 @@ The master URL passed to Spark can be in one of the 
following formats:
 
 
 Master URLMeaning
- local  Run Spark locally with one worker thread (i.e. no 
parallelism at all). 
- local[K]  Run Spark locally with K worker threads 
(ideally, set this to the number of cores on your machine). 
- local[*]  Run Spark locally with as many worker threads 
as logical cores on your machine.
- spark://HOST:PORT  Connect to the given Spark standalone
+ local  Run Spark locally with one worker 
thread (i.e. no parallelism at all). 
+ local[K]  Run Spark locally with K worker 
threads (ideally, set this to the number of cores on your machine). 
+ local[*]  Run Spark locally with as many 
worker threads as logical cores on your machine.
+ spark://HOST:PORT  Connect to the given Spark standalone
 cluster master. The port must be whichever one your master is 
configured to use, which is 7077 by default.
 
- mesos://HOST:PORT  Connect to the given Mesos cluster.
+ mesos://HOST:PORT  Connect to the given Mesos cluster.
 The port must be whichever one your is configured to use, which is 
5050 by default.
 Or, for a Mesos cluster using ZooKeeper, use 
mesos://zk://
 
- yarn-client  Connect to a  YARN  cluster in
-client mode. The cluster location will be found based on the 
HADOOP_CONF_DIR or YARN_CONF_DIR variable.
+ yarn  Connect to a  YARN  cluster in
+client or cluster mode depending on the 
value of --deploy-mode. 
+The cluster location will be found based on the 
HADOOP_CONF_DIR or YARN_CONF_DIR variable.
 
- yarn-cluster  Connect to a  YARN  cluster in
-cluster mode. The cluster location will be found based on the 
HADOOP_CONF_DIR or YARN_CONF_DIR variable.
+ yarn-client  Equivalent to 
yarn with --deploy-mode client
--- End diff --

Shouldn't deploy-mode come first here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-10-01 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-144723093
  
@srowen please go ahead.
 Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-09-20 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-141837563
  
That's pretty much all I could find. The rest seem to be code pointing to 
the option of using yarn-cluster, yarn-client and how Spark parses them.
Please let me know if I have missed anything. I went through the code 
looking for --master, yarn-client, yarn-cluster and deploy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-09-19 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-141700189
  
Corrected the nits. Will search the whole project for other places that I 
have missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-09-19 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-141679396
  
@srowen thank you for the note. Haven't had a chance these past few weeks.
Should get it done today. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-09-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-138324428
  
@srowen  Thank you for the note. Will get it done asap. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark

2015-09-02 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/8054


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark

2015-09-02 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8054#issuecomment-137207678
  
Closing the PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-09-01 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8385#issuecomment-136873693
  
@andrewor14 will address them. Have not had a chance to do this. Will do it 
by the end of this week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-23 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/8385

[SPARK-9570][Docs][YARN]Consistent recommendation for submitting spark apps 
to YARN, -master yarn --deploy-mode x vs -master yarn-x'

Issue link: https://issues.apache.org/jira/browse/SPARK-9570

Changes made:
1) Added the deploy-mode syntax in favor of yarn-cluster method of 
submission
Requesting review.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-9570

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8385


commit fa93415e860cce590c3392079e93d3ae21ffc83c
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T02:51:52Z

Added yarn-deploy-mode alternative

commit 437a4d451147f179617628a672eaa795b3b76ea0
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T02:54:04Z

Moved Master URLs closer above before the examples

commit 05fe708c24f07f9661a558dfbe51970aa940e4e5
Author: Neelesh Srinivas Salian 
Date:   2015-08-10T17:14:59Z

Removed the addition section

commit 98624e89c6b303db4fc30408e14705df021ca591
Author: Neelesh Srinivas Salian 
Date:   2015-08-10T17:16:14Z

Added a section for alternative submission. Distinguished from the shifting 
of Master URLS

commit b8fdd5cd1b11dd7954d1f05bb71b1a2ae740d065
Author: Neelesh Srinivas Salian 
Date:   2015-08-12T01:43:10Z

Added section for preferred yarn and kept the one with deploy-mode for 
generic submission to help clear up confusion

commit 8c65676a6b7a692d07face111d8e998f36ca0151
Author: Neelesh Srinivas Salian 
Date:   2015-08-12T01:44:36Z

Moved the Standalone examples together

commit 8a331d0444f58d3c14c1c12c4f087f1a02d5b8d1
Author: Neelesh Srinivas Salian 
Date:   2015-08-12T21:19:58Z

Moved Master URLs

commit 0fed23b8dc525f62197d1cd332260a0752d7d35c
Author: Neelesh Srinivas Salian 
Date:   2015-08-13T23:12:06Z

Added deploy-mode section to YARN submission

commit 670d251db01306ecc6029abaf6fc7d0e7c30dc3f
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T02:51:52Z

Added yarn-deploy-mode alternative

commit 40d3b80012f2db351446f8f9d6049f8a9f00bf2b
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T02:54:04Z

Moved Master URLs closer above before the examples

commit 89d15bf63741e3c62017586df35508a6bde821c2
Author: Neelesh Srinivas Salian 
Date:   2015-08-10T17:14:59Z

Removed the addition section

commit d2c212aa6e3a4537c0a4a7ad49e83412e47e60e7
Author: Neelesh Srinivas Salian 
Date:   2015-08-10T17:16:14Z

Added a section for alternative submission. Distinguished from the shifting 
of Master URLS

commit 3f25500b5d39b2d6b247a8dca8147c8fd140c7c0
Author: Neelesh Srinivas Salian 
Date:   2015-08-12T01:43:10Z

Added section for preferred yarn and kept the one with deploy-mode for 
generic submission to help clear up confusion

commit 0766da66ccf16ab55c80614776c1f5a7a1877253
Author: Neelesh Srinivas Salian 
Date:   2015-08-12T01:44:36Z

Moved the Standalone examples together

commit 46a24d55ffe99431885b57fda50938289a0ed91b
Author: Neelesh Srinivas Salian 
Date:   2015-08-12T21:19:58Z

Moved Master URLs

commit 91758072dbc954e2c31609dcd2b6232a09fbfdb3
Author: Neelesh Srinivas Salian 
Date:   2015-08-13T23:12:06Z

Added deploy-mode section to YARN submission

commit 3052c741f13f9c3c842ce7fa20819bf73043e326
Author: Neelesh Srinivas Salian 
Date:   2015-08-23T14:32:07Z

Merge branch 'SPARK-9570' of https://github.com/nssalian/spark into 
SPARK-9570

commit c91073ef5ab7fa2e5a8cada89983422960b24a1a
Author: Neelesh Srinivas Salian 
Date:   2015-08-23T15:11:55Z

Modified Running on YARN doc

commit 3dc79e2d24a76abd32779d09a044240e808ed9fc
Author: Neelesh Srinivas Salian 
Date:   2015-08-23T21:21:33Z

Modified submitting applications

commit 67a4255f94e828fcfffc6039ddc4872acc2d717d
Author: Neelesh Srinivas Salian 
Date:   2015-08-23T21:44:26Z

Removed extra YARN section, there is already a running without --deploy 
example

commit a8b67efb6a8bc28b69a87b4158156b1517e1475d
Author: Neelesh Srinivas Salian 
Date:   2015-08-24T00:14:45Z

Added --deploy-mode flags  to the yarn submission sections




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-23 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/8071


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-23 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8071#issuecomment-133972215
  
Creating a new PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-19 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8071#issuecomment-132668558
  
@srowen, @sryza , @tgravescs  thank you for the feedback.
I will get this done this weekend. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-14 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8071#issuecomment-131176774
  
@tgravescs, I am not too sure whether to stick with consistency or history 
or both.
If @sryza  can weigh in, we can reach a good understanding of where to 
proceed on this.
The goal is to help a user and reduce the confusion (if any) in the 
submission methods for YARN.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9923][Core]: ShuffleMapStage.numAvailab...

2015-08-13 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/8183

[SPARK-9923][Core]: ShuffleMapStage.numAvailableOutputs should be an Int 
instead of Long

Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-9923

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8183


commit 175704f28ee0ff1029426aa17ce059a21d3771cb
Author: Neelesh Srinivas Salian 
Date:   2015-08-14T00:28:23Z

SPARK-9923: Modified type of ShuffleMapStage.numAvailableOutputs from Long 
to Int




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-13 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8071#issuecomment-130873054
  
So, the consensus is to have `yarn-client` and `yarn-cluster` with the 
deploy-mode as alternative.
I agree with @srowen, more places in the code have `yarn-client` and 
`yarn-cluster`.

So the change is in the submitting applications doc wrt YARN and the 
Running on YARN doc wrt deploy-mode.
I'll change accordingly and update the PR soon.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-11 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8071#issuecomment-130133320
  
1) Running on YARN (which was fixed in #6924) has simply yarn-client and 
yarn-cluster for master.
and does not have `--deploy-mode` in the page.
2) For Standalone 
(https://spark.apache.org/docs/latest/spark-standalone.html), there is no such 
conflict in syntax just the explanation of the "deployment modes of Spark".
3) For Submitting applications 
(https://spark.apache.org/docs/latest/submitting-applications.html), there 
exist both the master `yarn-client` and `yarn-cluster` along with 
`--deploy-mode` since it is a holistic document for the submission and includes 
local, spark, mesos and yarn. But `--deploy-mode` is only used or appears in 
the examples to illustrate `supervise`, the rest just point to the master-urls.

@tgravescs, @srowen, the latest commits should help alleviate any confusion





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-10 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/8071#discussion_r36682426
  
--- Diff: docs/submitting-applications.md ---
@@ -48,6 +48,44 @@ Some of the commonly used options are:
 * `application-jar`: Path to a bundled jar including your application and 
all dependencies. The URL must be globally visible inside of your cluster, for 
instance, an `hdfs://` path or a `file://` path that is present on all nodes.
 * `application-arguments`: Arguments passed to the main method of your 
main class, if any
 
+Alternatively, for submitting on yarn, 
+
+{% highlight bash %}
+./bin/spark-submit \
+  --class 
+  --master 
+  --conf = \
+  ... # other options
+   \
+  [application-arguments] 
+{% endhighlight %}
+
+* `--master`: The --master parameter is either `yarn-client` or 
`yarn-cluster`. Defaults to `yarn-client`
--- End diff --

Will grep the docs to see the more popular approach to submission amongst 
the two.
Then align the docs to have that approach as a first recommendation and 
throw the latter as an alternative.
The goal is to have a consistent method overall.
Any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-10 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/8071#discussion_r36661174
  
--- Diff: docs/submitting-applications.md ---
@@ -48,6 +48,44 @@ Some of the commonly used options are:
 * `application-jar`: Path to a bundled jar including your application and 
all dependencies. The URL must be globally visible inside of your cluster, for 
instance, an `hdfs://` path or a `file://` path that is present on all nodes.
 * `application-arguments`: Arguments passed to the main method of your 
main class, if any
 
+Alternatively, for submitting on yarn, 
+
+{% highlight bash %}
+./bin/spark-submit \
+  --class 
+  --master 
+  --conf = \
+  ... # other options
+   \
+  [application-arguments] 
+{% endhighlight %}
+
+* `--master`: The --master parameter is either `yarn-client` or 
`yarn-cluster`. Defaults to `yarn-client`
--- End diff --

@srowen  and @tgravescs, users still have the confusion regarding the 
"recommended/preferred" method of submission. 
Not sure if it necessary to have a single method or have both ways.
I can modify the PR accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9570][Docs][YARN]Consistent recommendat...

2015-08-10 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/8071

[SPARK-9570][Docs][YARN]Consistent recommendation for submitting spark apps 
to YARN, -master yarn --deploy-mode x vs -master yarn-x'

Issue link: https://issues.apache.org/jira/browse/SPARK-9570

Changes made:
1) Added the alternative to job submission to avoid the confusion
2) Moved the Master URLs section closer to the options prior to the examples

Requesting review.
Is there any other place in the documentation that could add a confusion to 
the user?
Need to maintain a consistent, if not clarify all the submission methods in 
the documentation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-9570

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8071


commit fa93415e860cce590c3392079e93d3ae21ffc83c
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T02:51:52Z

Added yarn-deploy-mode alternative

commit 437a4d451147f179617628a672eaa795b3b76ea0
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T02:54:04Z

Moved Master URLs closer above before the examples

commit 05fe708c24f07f9661a558dfbe51970aa940e4e5
Author: Neelesh Srinivas Salian 
Date:   2015-08-10T17:14:59Z

Removed the addition section

commit 98624e89c6b303db4fc30408e14705df021ca591
Author: Neelesh Srinivas Salian 
Date:   2015-08-10T17:16:14Z

Added a section for alternative submission. Distinguished from the shifting 
of Master URLS




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSPARK-9340] - make SparkSQL work with ne...

2015-08-09 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8032#issuecomment-129243322
  
Please close this PR in favor of #8063.
Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark

2015-08-08 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8054#issuecomment-129087486
  
Jenkins, slow test please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark

2015-08-08 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/8054

[SPARK-4449][Core]Specify port range in spark

Specify port range in spark
JIRA link: https://issues.apache.org/jira/browse/SPARK-4449

Goal: To add a port range to services
Design: (Based on the input and the suggestions in t #3314 and #5722):
1) Added variables maxPort and failedPorts to help the implementation
2) The maxPort was explicitly assigned to be startPort + maxRetries to 
avoid Retries being lesser or greater than the specified port range.
3) Added the failedPorts ArrayBuffer to catch the failedPorts (during 
retry) which are in the range of the maxPort - startPort ( see Random logic)
4) This failedPorts list will be checked and the tryPort will not attempt 
those ports again in the random..
5) If the randomized port does not belong to the failedPorts list and a 
privileged port, it will be tried. 
6) There’s a if block to check if there are sufficient ports left to 
attempt within the range (not sure if this is needed)

Requesting review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-4449

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8054.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8054


commit 2450fefe49f0cd8a109b556e5b4efe4c3bf7d9fa
Author: Neelesh Srinivas Salian 
Date:   2015-08-08T21:10:36Z

cleanUp unused imports and random import

commit dcd512627c48911228ed0f3649d486fdaa0b1ce8
Author: Neelesh Srinivas Salian 
Date:   2015-08-08T22:51:18Z

Initialized port

commit cf4af1a76ddbda7e9f83a64d1d2c323ba6eeb82a
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T01:12:32Z

Added logic for port range

commit 52326701bef8e22ab70a6c050f210c71de234fd7
Author: Neelesh Srinivas Salian 
Date:   2015-08-09T01:49:21Z

Modified logic to include privileged ports




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSPARK-9340] - make SparkSQL work with ne...

2015-08-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/8032#issuecomment-128773449
  
@dguy,  please do the PR against the Master branch. 
Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-14 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/7362


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-14 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7362#issuecomment-121398427
  
Closing. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-13 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7362#issuecomment-121072720
  
@tdas Removed it.
Thank you for the review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-13 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7362#discussion_r34516213
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -299,6 +302,26 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with Timeo
 Thread.sleep(100)
   }
 
+  test ("registering and de-registering of streamingSource") {
+val conf = new SparkConf().setMaster(master).setAppName(appName)
+ssc = new StreamingContext(conf, batchDuration)
+assert(ssc.getState() === StreamingContextState.INITIALIZED)
+addInputStream(ssc).register()
+ssc.start()
+
+val sources = StreamingContextSuite.getSources(ssc.env.metricsSystem)
+val streamingSource = StreamingContextSuite.getStreamingSource(ssc)
+assert(sources.contains(streamingSource))
+assert(ssc.getState() === StreamingContextState.ACTIVE)
+Thread.sleep(100)
--- End diff --

Removed it. Added it during my runs. 

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-13 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7362#issuecomment-121057498
  
@tdas does this PR need more improvement?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-13 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7362#issuecomment-120933483
  
Added the changes and ran ./dev/scalastyle and ~test-only 
*StreamingContextSuite.
Both passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-13 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7362#discussion_r34456310
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -33,8 +33,12 @@ import org.apache.spark.storage.StorageLevel
 import org.apache.spark.streaming.dstream.DStream
 import org.apache.spark.streaming.receiver.Receiver
 import org.apache.spark.util.Utils
-import org.apache.spark.{Logging, SparkConf, SparkContext, SparkException, 
SparkFunSuite}
-
+import org.apache.spark.{Logging, SparkConf, SparkContext, SparkFunSuite}
--- End diff --

Made the change in the next commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-13 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7362#discussion_r34455972
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -192,11 +192,8 @@ class StreamingContext private[streaming] (
   None
 }
 
-  /** Register streaming source to metrics system */
+  /* Initializing a streamingSource to register metrics */
--- End diff --

It previously held the block of code that did the registration.
Here it simply initializes the streamingSource


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7362#issuecomment-120781935
  
@tdas and @srowen, the new PR for the JIRA.

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120781877
  
Closing this PR: for #7362 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/7250


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/7362

[SPARK-8743] [Streaming]: Deregister Codahale metrics for streaming when 
StreamingContext is closed

The issue link: https://issues.apache.org/jira/browse/SPARK-8743
Deregister Codahale metrics for streaming when StreamingContext is closed

Design:
Adding the method calls in the appropriate start() and stop () methods for 
the StreamingContext

Actions in the PullRequest:
1) Added the registerSource method call to the start method for the 
Streaming Context. 
2) Added the removeSource method to the stop method. 
3) Added comments for both 1 and 2 and comment to show initialization of 
the StreamingSource
4) Added a test case to check for both registration and de-registration of 
metrics

Previous closed PR for reference: https://github.com/apache/spark/pull/7250

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark branch-SPARK-8743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7362


commit d8cb577b24f42a0509ee3a0fffb09181abf4137e
Author: Neelesh Srinivas Salian 
Date:   2015-07-13T01:38:36Z

Added registerSource to start() and removeSource to stop(). Wrote a test to 
check the registration and de-registration




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120776405
  
@tdas @srowen, shall I create a fresh PR to avoid any confusion?
I can reference this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/7250


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120774231
  
Right. The negation took care of the failure above and I fixed the 
assertion error as well.
Removed the spacing for all of them. 

Made a unified commit. 299a57d
Ignore the last one, I''l revert that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
GitHub user nssalian reopened a pull request:

https://github.com/apache/spark/pull/7250

[SPARK-8743] [Streaming]: Deregister Codahale metrics for streaming when 
StreamingContext is closed

The issue link: https://issues.apache.org/jira/browse/SPARK-8743
Deregister Codahale metrics for streaming when StreamingContext is closed

Design:
Adding the method calls in the appropriate start() and stop () methods for 
the StreamingContext

Actions in the PullRequest:
1) Added the registerSource method call to the start method for the 
Streaming Context. 
2) Added the removeSource method to the stop method. 
3) Added comments for both 1 and 2 and comment to show initialization of 
the StreamingSource


Requesting Review.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-8743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7250


commit 92fa04b16cad1e945ab8a4d1be752f08a241e922
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T03:16:51Z

SPARK-8743: Added the registerSource method call to the start method for 
the Streaming Context. Added the removeSource method to the stop method. Added 
comments for both

commit a665965a40c7b49cc13d8ab38f8da194693d9845
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T07:23:20Z

Added // instead of /** for commenting in code

commit 7621adf368112056eb2e137f62adc429851fa570
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T10:36:35Z

Added indentation and Space at the comment on line 578; Registering..

commit 18bcc7e164b46ecd969a266579f4349444373a0c
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T23:20:29Z

Added test case for de-register metrics and made a change to the scope of 
the sources ArrayBuffer

commit e4f00d7577dec22278f5b6243791c4c33e0eb373
Author: Neelesh Srinivas Salian 
Date:   2015-07-08T18:56:30Z

Added additional variable to check the updated Sources size value to 
compare with the original size after removal

commit f5e47e0725adee6792081a8a4ed765ede1759e9c
Author: Neelesh Srinivas Salian 
Date:   2015-07-09T01:02:33Z

Added the removeSource method in try

commit d04fd2a2b2e4433d49c0860c4d4028564081db31
Author: Neelesh Srinivas Salian 
Date:   2015-07-09T16:23:17Z

Removed the assert for the env field, added the registerSource line in the 
INITIALIZED block and kept the removeSource() in the ACTIVE block

commit e2c3bf82c226e38282a9a17b80771b58dcc6cc55
Author: Neelesh Srinivas Salian 
Date:   2015-07-09T22:02:50Z

Added test to check registering and de-registering of streamingSource

commit 742398c334c71bcd1b2b702a9abc5e4ab1288d9e
Author: Neelesh Srinivas Salian 
Date:   2015-07-09T22:08:55Z

Removed unused imports

commit ca081fa3effd0303d04a034af5c5a0e8facd3b2d
Author: Neelesh Srinivas Salian 
Date:   2015-07-10T02:33:38Z

Moved the registerSource() call before line 601

commit 33a2091a4984b8e29143ab2e1202751a87e838b3
Author: Neelesh Srinivas Salian 
Date:   2015-07-10T21:31:53Z

Changed scope of sources and corrected comments for helper

commit a67918cb9732936ea84427f728086985f7319e3a
Author: Neelesh Srinivas Salian 
Date:   2015-07-10T21:41:16Z

Removed extra line in Helper Methods section

commit 74598cec17a6ec54a64f3fc0f8c336d6ba19cc1e
Author: Neelesh Srinivas Salian 
Date:   2015-07-11T02:16:18Z

Added helper method for private methods and changed the test logic to check 
for Sources containing or not containing StreamingSource

commit e37a2f3cc3364f6819205b5eac39d0603eb91ac5
Author: Neelesh Srinivas Salian 
Date:   2015-07-12T14:38:09Z

Changed import statements to remove unnecessary imports and add specific 
imports

commit f54afcf78819ad30a59318164515457b47c31d7d
Author: Neelesh Srinivas Salian 
Date:   2015-07-12T14:43:29Z

Removed types for fields in test for registering and deregistering metrics

commit ea0dc1a74848af8682bb8fc1e0f03b5261591f6e
Author: Neelesh Srinivas Salian 
Date:   2015-07-12T22:38:00Z

Changed imports statements, negated test statement and removed postfix

commit a0f1950937c36d7f568c5be545cf72ba5afe36ee
Author: Neelesh Srinivas Salian 
Date:   2015-07-12T22:43:39Z

Removed added comment to Assert for INITIALIZED state

commit 2a812878643a1e97e0113c7be7a195ae3c740b48
Author: Neelesh Srinivas Salian 
Date:   2015-07-12T22:49:13Z

Removing the INITIALIZED check since after start() the state moves to 
ACTIVE and this check fails

commit 5d3af311abe18e0db8cecb36141432af07d3afcb
Author: Neelesh Srinivas Salian 
Date:   2015-07-12T23:03:34Z

Move the INITIALIZED state check to when the ssc is initialized

commit 299a57d0b909b2be968f17723736c66c0e61fdcd
Author: Neelesh Srinivas Salian 
Date:   2015

[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120774219
  
Right. The negation took care of the failure above and I fixed the 
assertion error as well.
Removed the spacing for all of them. 

Made a unified commit. 299a57d
Ignore the last one, I''l revert that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120770889
  
The test failed with the  assert(ssc.getState() === 
StreamingContextState.INITIALIZED)
as after the start() method, the state goes to ACTIVE and fails to match 
with INITIALIZED.

@srowen, I've added the changes as you suggested.
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-12 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120748280
  
Hitting this error on the test for StreamingContextSuite. 
Is the streamingSource not being found in the right ArrayBuffer? I tried 
different variations of the registrations to try alleviate this. Didn't help. 
@tdas  any suggestions?

Ran this:
build/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0 -Pscala-2.10
project streaming
~test-only *StreamingContextSuite

Test failure message:

ArrayBuffer(org.apache.spark.scheduler.DAGSchedulerSource@1473d83a, 
org.apache.spark.storage.BlockManagerSource@7560392b) did not contain 
org.apache.spark.streaming.StreamingSource@1c71ddd7 
(StreamingContextSuite.scala:322)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-11 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120668033
  
@tdas, I added the new test as per PrivateMethodTester.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-10 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34401188
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -297,6 +299,23 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with Timeo
 Thread.sleep(100)
   }
 
+  test("registering and de-registering of streamingSource") {
+val conf = new SparkConf().setMaster(master).setAppName(appName)
+ssc = new StreamingContext(conf, batchDuration)
+addInputStream(ssc).register()
+
+ssc.start()
+assert(ssc.getState() === StreamingContextState.INITIALIZED)
+
assert(StreamingContextSuite.sources.get(StreamingContextSuite.streamingSource)!=
 "null")
--- End diff --

Makes sense. I'll write that up. Was having problems when I initially wrote 
as it was in ExecutionManagerSuite. I'll figure out something.

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-10 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34399522
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -297,6 +299,23 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with Timeo
 Thread.sleep(100)
   }
 
+  test("registering and de-registering of streamingSource") {
+val conf = new SparkConf().setMaster(master).setAppName(appName)
+ssc = new StreamingContext(conf, batchDuration)
+addInputStream(ssc).register()
+
+ssc.start()
+assert(ssc.getState() === StreamingContextState.INITIALIZED)
+
assert(StreamingContextSuite.sources.get(StreamingContextSuite.streamingSource)!=
 "null")
--- End diff --

Checking to see if the source is not returning a null and is actually 
present in the sources ArrayBuffer. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-10 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34398361
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -796,3 +815,20 @@ package object testPackage extends Assertions {
 }
   }
 }
+
+/**
+ * Helper methods for testing StreamingContextSuite.
+ * This includes methods to access private methods and fields in 
ExecutorAllocationManager.
--- End diff --

Apologies for missing that. Added in the latest commit with the sources 
variable and the changing of the comment text.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-10 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34397673
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -796,3 +815,20 @@ package object testPackage extends Assertions {
 }
   }
 }
+
+/**
+ * Helper methods for testing StreamingContextSuite.
+ * This includes methods to access private methods and fields in 
ExecutorAllocationManager.
--- End diff --

I figured that would be a good place holder for future methods that may 
need to be included.
Can re-word accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-10 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-120490045
  
@tdas  made the changes as mentioned.
Does this PR need anything additional/ different?

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-09 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34327159
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -581,6 +579,9 @@ class StreamingContext private[streaming] (
   case INITIALIZED =>
 startSite.set(DStream.getCreationSite())
 sparkContext.setCallSite(startSite.get)
+// Registering Streaming Metrics at the start of the 
StreamingContext
+assert(env.metricsSystem != null)
+env.metricsSystem.registerSource(streamingSource)
 StreamingContext.ACTIVATION_LOCK.synchronized {
--- End diff --

Changed in the latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-09 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34273553
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -297,6 +296,23 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with Timeo
 Thread.sleep(100)
   }
 
+  test("de-register codahale metrics on stop()") {
--- End diff --

Thanks for the comments. 
Will improve the test and add it in the next commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-09 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34273015
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -577,6 +575,10 @@ class StreamingContext private[streaming] (
* @throws IllegalStateException if the StreamingContext is already 
stopped.
*/
   def start(): Unit = synchronized {
+// Registering Streaming Metrics at the start of the StreamingContext
+assert(env != null)
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: Deregister Codahale ...

2015-07-09 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34273073
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -577,6 +575,10 @@ class StreamingContext private[streaming] (
* @throws IllegalStateException if the StreamingContext is already 
stopped.
*/
   def start(): Unit = synchronized {
+// Registering Streaming Metrics at the start of the StreamingContext
+assert(env != null)
+assert(env.metricsSystem != null)
+env.metricsSystem.registerSource(streamingSource)
--- End diff --

Done. Added The above comment and this into the latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-08 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34217790
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -674,6 +676,8 @@ class StreamingContext private[streaming] (
   logWarning("StreamingContext has already been stopped")
 case ACTIVE =>
   scheduler.stop(stopGracefully)
+  // De-registering Streaming Metrics of the StreamingContext
+  env.metricsSystem.removeSource(streamingSource)
--- End diff --

The idea was to register at the call of the `start()`.  
So, based on your comment, that would mean registering the sources after 
the state is set to INITIALIZED and before.

`def start(): Unit = synchronized {
// Registering Streaming Metrics at the start of the StreamingContext
assert(env != null)
assert(env.metricsSystem != null)
env.metricsSystem.registerSource(streamingSource)`

Makes sense to have it after `INITIALIZED` and before the synchronized 
block of `ACTIVE` and `STOPPED`.
@tdas, could add more light.






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-08 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34215376
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
---
@@ -73,7 +73,7 @@ private[spark] class MetricsSystem private (
   private[this] val metricsConfig = new MetricsConfig(conf)
 
   private val sinks = new mutable.ArrayBuffer[Sink]
-  private val sources = new mutable.ArrayBuffer[Source]
+  val sources = new mutable.ArrayBuffer[Source]
--- End diff --

Thanks @jerryshao will add a test similarly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-08 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/7250#discussion_r34215350
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -688,6 +690,8 @@ class StreamingContext private[streaming] (
 } finally {
   // The state should always be Stopped after calling `stop()`, even 
if we haven't started yet
   state = STOPPED
+  // De-registering Streaming Metrics of the StreamingContext
+  env.metricsSystem.removeSource(streamingSource)
--- End diff --

Changed it on my local. Added to the latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-08 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119696686
  
Based on @srowen's comment, I made the change and added updatedSourcesSize 
to check the ArrayBuffer size after the source is remove to assert that the 
size was indeed decremented.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-08 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119679355
  
@srowen, I wanted to check whether the size was decremented at all.

Couldn't think of a way to assert that the source has been removed since 
the sources ArrayBuffer is still private. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119372847
  
To illustrate,
Test case:
1) Testing start of the streamingContext and checking state.
2) Storing the size of the sources ArrayBuffer which will have a new source 
added
3) Sleep for 100 ms.
4) Stopping context and checking state
5) Also checking whether the size of the ArrayBuffer was decreased as the 
source was removed.
I changed the scope of the sources ArrayBuffer to do this.

Would like some feedback on this approach.

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119344429
  
Will update shortly with the changes.

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119288642
  
Should have phrased it better, the tests ran fine. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119287340
  
nsalian-MBP:spark nsalian$ ./dev/scalastyle
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option 
MaxPermSize=512m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option 
MaxPermSize=512m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option 
MaxPermSize=512m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option 
MaxPermSize=512m; support was removed in 8.0
Scalastyle checks passed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119283309
  
@srowen, I made the changes and ran a dev test on my repo. The scala errors 
weren't present.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-07 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7250#issuecomment-119103462
  
@jerryshao thank you for the comment. I made the changes.
Please let me know if you think I could add anything additional.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming]: De-registering Codah...

2015-07-06 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/7250

[SPARK-8743] [Streaming]: De-registering Codahale Metrics

The issue link: Deregister Codahale metrics for streaming when 
StreamingContext is closed

Design:
Adding the method calls in the appropriate start() and stop () methods for 
the StreamingContext

Actions in the PullRequest:
1) Added the registerSource method call to the start method for the 
Streaming Context. 
2) Added the removeSource method to the stop method. 
3) Added comments for both 1 and 2 and comment to show initialization of 
the StreamingSource


Requesting Review.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-8743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7250


commit 92fa04b16cad1e945ab8a4d1be752f08a241e922
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T03:16:51Z

SPARK-8743: Added the registerSource method call to the start method for 
the Streaming Context. Added the removeSource method to the stop method. Added 
comments for both




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming] Added call to removeS...

2015-07-06 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/7249#issuecomment-119056839
  
TD,
Thanks for the comments.
Makes sense.
Will create a new PR for this one.
I pulled from upstream during my changes so a bunch went in along with my 
singular file.

Closing this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming] Added call to removeS...

2015-07-06 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/7249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8743] [Streaming] Added call to removeS...

2015-07-06 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/7249

[SPARK-8743] [Streaming]  Added call to removeSource to help de-register 
the streaming metrics



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-8743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7249


commit 151c298d97a435b76ccb54a64e2fef21a2ab7285
Author: Neelesh Srinivas Salian 
Date:   2015-06-21T04:36:32Z

SPARK-3629: Improvement of the Spark on YARN document

commit 8e8db7fc2c3337ae99cd84043e49eaf919dfed7c
Author: Neelesh Srinivas Salian 
Date:   2015-06-21T17:42:48Z

Removed the changes in this commit to help clearly distinguish movement 
from update

commit 9cbc072ce82c766b8d0716cd7469e572efeee14e
Author: Neelesh Srinivas Salian 
Date:   2015-06-21T17:44:05Z

Updated a few lines in the Launching Spark on YARN Section

commit 40dbc0b068741f179dac43299fa45333b62f93fd
Author: Neelesh Srinivas Salian 
Date:   2015-06-22T03:22:30Z

Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn 
line

commit 944b7a09f5acff0d4d11a663b2fed02aa7ed5105
Author: Neelesh Srinivas Salian 
Date:   2015-06-22T17:17:21Z

Changed the lines about deploy-mode and added backticks to all parameters

commit a71fe2cdad562798fd9ba8f7bef3d6e95bf8d339
Author: Neelesh Srinivas Salian 
Date:   2015-07-02T01:20:14Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-8743

commit 5ddaec388e8720c600fd36450ad8afd96d5a84ff
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T02:08:47Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-8743

commit f4ef2f984d3f9deaf712cc2fe311aede068333d7
Author: Neelesh Srinivas Salian 
Date:   2015-07-07T02:30:23Z

SPARK-8743: Added the RemoveSource call to de-register Source after 
streaming




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-24 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6924#issuecomment-114930404
  
@mateiz, does this PR need any more changes?

Please let me know.
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-23 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6924#issuecomment-114512955
  
That is correct. I moved the texts a few commits ago. The latter commits 
were just formatting and changing yarn. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-22 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/6924#discussion_r32958854
  
--- Diff: docs/running-on-yarn.md ---
@@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop
 
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
 was added to Spark in version 0.6.0, and improved in subsequent releases.
 
+# Launching Spark on YARN
+
+Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory 
which contains the (client side) configuration files for the Hadoop cluster.
+These configs are used to write to HDFS and connect to the YARN 
ResourceManager. The
+configuration contained in this directory will be distributed to the YARN 
cluster so that all
+containers used by the application use the same configuration. If the 
configuration references
+Java system properties or environment variables not managed by YARN, they 
should also be set in the
+Spark application's configuration (driver, executors, and the AM when 
running in client mode).
+
+There are two deploy modes that can be used to launch Spark applications 
on YARN. In yarn-cluster mode, the Spark driver runs inside an application 
master process which is managed by YARN on the cluster, and the client can go 
away after initiating the application. In yarn-client mode, the driver runs in 
the client process, and the application master is only used for requesting 
resources from YARN.
+(Default: `--deploy-mode client`)
--- End diff --

Makes sense. Changed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-22 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/6924#discussion_r32958837
  
--- Diff: docs/running-on-yarn.md ---
@@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop
 
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
 was added to Spark in version 0.6.0, and improved in subsequent releases.
 
+# Launching Spark on YARN
+
+Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory 
which contains the (client side) configuration files for the Hadoop cluster.
+These configs are used to write to HDFS and connect to the YARN 
ResourceManager. The
+configuration contained in this directory will be distributed to the YARN 
cluster so that all
+containers used by the application use the same configuration. If the 
configuration references
+Java system properties or environment variables not managed by YARN, they 
should also be set in the
+Spark application's configuration (driver, executors, and the AM when 
running in client mode).
+
+There are two deploy modes that can be used to launch Spark applications 
on YARN. In yarn-cluster mode, the Spark driver runs inside an application 
master process which is managed by YARN on the cluster, and the client can go 
away after initiating the application. In yarn-client mode, the driver runs in 
the client process, and the application master is only used for requesting 
resources from YARN.
+(Default: `--deploy-mode client`)
+
+Unlike in Spark standalone and Mesos mode, in which the master's address 
is specified in the "master" parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration. Thus, the master parameter 
is yarn. For a specific yarn deployment, use --deploy-mode to specify 
yarn-cluster or yarn-client. 
--- End diff --

Made the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-21 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6924#issuecomment-113995711
  
@mateiz made the changes. Not sure about the master yarn sentence. 
Please let me know what do you think about it.

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-21 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/6924#discussion_r32898107
  
--- Diff: docs/running-on-yarn.md ---
@@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop
 
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
 was added to Spark in version 0.6.0, and improved in subsequent releases.
 
+# Launching Spark on YARN
+
+Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory 
which contains the (client side) configuration files for the Hadoop cluster.
+These configs are used to write to the dfs and connect to the YARN 
ResourceManager. The
+configuration contained in this directory will be distributed to the YARN 
cluster so that all
+containers used by the application use the same configuration. If the 
configuration references
+Java system properties or environment variables not managed by YARN, they 
should also be set in the
+Spark application's configuration (driver, executors, and the AM when 
running in client mode).
+
+There are two deploy modes that can be used to launch Spark applications 
on YARN. In yarn-cluster mode, the Spark driver runs inside an application 
master process which is managed by YARN on the cluster, and the client can go 
away after initiating the application. In yarn-client mode, the driver runs in 
the client process, and the application master is only used for requesting 
resources from YARN.
+(Default: --deploy-mode client)
+
+Unlike in Spark standalone and Mesos mode, in which the master's address 
is specified in the "master" parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration. Thus, the master parameter 
is yarn. 
--- End diff --

We could say something like:
"Thus, the master parameter is yarn. For a specific deployment, use 
--deploy-mode to specify yarn-cluster or yarn-client"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-21 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/6924#discussion_r32897974
  
--- Diff: docs/running-on-yarn.md ---
@@ -7,6 +7,53 @@ Support for running on [YARN (Hadoop
 
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
 was added to Spark in version 0.6.0, and improved in subsequent releases.
 
+# Launching Spark on YARN
+
+Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory 
which contains the (client side) configuration files for the Hadoop cluster.
+These configs are used to write to the dfs and connect to the YARN 
ResourceManager. The
+configuration contained in this directory will be distributed to the YARN 
cluster so that all
+containers used by the application use the same configuration. If the 
configuration references
+Java system properties or environment variables not managed by YARN, they 
should also be set in the
+Spark application's configuration (driver, executors, and the AM when 
running in client mode).
+
+There are two deploy modes that can be used to launch Spark applications 
on YARN. In yarn-cluster mode, the Spark driver runs inside an application 
master process which is managed by YARN on the cluster, and the client can go 
away after initiating the application. In yarn-client mode, the driver runs in 
the client process, and the application master is only used for requesting 
resources from YARN.
+(Default: --deploy-mode client)
+
+Unlike in Spark standalone and Mesos mode, in which the master's address 
is specified in the "master" parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration. Thus, the master parameter 
is yarn. 
--- End diff --

So, in spark-submit the options:
 --master MASTER_URL spark://host:port, mesos://host:port, yarn, or 
local.

So just yarn or specifically client and cluster. I would suggest keeping it 
as yarn since --deploy-mode covers the client or cluster part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-21 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6924#issuecomment-113935211
  
@srowen makes sense. Made 2 commits to reflect the updates. 
@mateiz, please let me know if there are any additional changes that need 
to go. 

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-20 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6924#issuecomment-113864731
  
@srowen,  please review when you get the chance.

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3629] [YARN] [DOCS]: Improvement of the...

2015-06-20 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/6924

[SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" 
document

As per the description in the JIRA, I moved the contents of the page and 
added a few additional content.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-3629

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6924


commit 151c298d97a435b76ccb54a64e2fef21a2ab7285
Author: Neelesh Srinivas Salian 
Date:   2015-06-21T04:36:32Z

SPARK-3629: Improvement of the Spark on YARN document




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...

2015-06-18 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6862#issuecomment-113211610
  
@davies  and @koeninger  Thank you for the comments.
Do you think any other changes need to go in?

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...

2015-06-18 Thread nssalian
Github user nssalian commented on a diff in the pull request:

https://github.com/apache/spark/pull/6862#discussion_r32748615
  
--- Diff: docs/streaming-programming-guide.md ---
@@ -1937,6 +1937,16 @@ JavaPairDStream unifiedStream = 
streamingContext.union(kafkaStre
 unifiedStream.print();
 {% endhighlight %}
 
+
+{% highlight python %}
+numStreams = 5
+kafkaStreams = []
+for _ in range (numStreams):
+ kafkaStreams.append(KafkaUtils.createStream(...))
--- End diff --

Made the changes as per @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...

2015-06-17 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6862#issuecomment-112947266
  
@srowen , I changed the Kafka append, the loop structure and the print 
method call.
Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8320 - Add example in streaming programm...

2015-06-17 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6862#issuecomment-112874087
  
@srowen could you please review this PR?

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8320 - Add example in streaming programm...

2015-06-17 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/6862

SPARK-8320 - Add example in streaming programming guide that shows union of 
multiple input streams

Added python code to 
https://spark.apache.org/docs/latest/streaming-programming-guide.html 
to the Level of Parallelism in Data Receiving section.

Please review and let me know if there are any additional changes that are 
needed.

Thank you.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-8320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6862.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6862


commit 3fc5c6da0ebba20450c19a92a636b7e1b0b9219f
Author: Neelesh Srinivas Salian 
Date:   2015-06-17T16:18:17Z

SPARK-8320




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Adding Python code for Spark 8320

2015-06-17 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6861#issuecomment-112872941
  
Will do. Thank you @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Adding Python code for Spark 8320

2015-06-17 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/6861


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Adding Python code for Spark 8320

2015-06-17 Thread nssalian
Github user nssalian commented on the pull request:

https://github.com/apache/spark/pull/6861#issuecomment-112870276
  
@srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Addition of Python example for SPARK-8320

2015-06-17 Thread nssalian
Github user nssalian closed the pull request at:

https://github.com/apache/spark/pull/6860


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Adding Python code for Spark 8320

2015-06-17 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/6861

Adding Python code for Spark 8320

Added python code to 
https://spark.apache.org/docs/latest/streaming-programming-guide.html 
to the Level of Parallelism in Data Receiving section.

Please review and let me know if there are any additional changes that are 
needed.

Thank you.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark SPARK-8320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6861


commit 82a396c2f594bade276606dcd0c0545a650fb838
Author: Holden Karau 
Date:   2015-05-29T21:59:18Z

[SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd

Author: Holden Karau 

Closes #6464 from 
holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the 
following commits:

de1e644 [Holden Karau] Fix the test to get the partitioner
bdb31cc [Holden Karau] Add Mima exclude for the new method
347ef4c [Holden Karau] Add a quick little test for the partitioner JavaAPI
f49dca9 [Holden Karau] Add partitoner information to JavaRDDLike and fix 
some whitespace

commit 5fb97dca9bcfc29ac33823554c8783997e811b99
Author: Shivaram Venkataraman 
Date:   2015-05-29T22:08:30Z

[SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init

cc davies

Author: Shivaram Venkataraman 

Closes #6507 from shivaram/sparkr-init and squashes the following commits:

6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init

commit dbf8ff38de0f95f467b874a5b527dcf59439efe8
Author: Ram Sriharsha 
Date:   2015-05-29T22:22:26Z

[SPARK-6013] [ML] Add more Python ML examples for spark.ml

Author: Ram Sriharsha 

Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits:

732506e [Ram Sriharsha] Code Review Feedback
121c211 [Ram Sriharsha] python style fix
5f9b8c3 [Ram Sriharsha] python style fixes
925ca86 [Ram Sriharsha] Simple Params Example
8b372b1 [Ram Sriharsha] GBT Example
965ec14 [Ram Sriharsha] Random Forest Example

commit 8c9979337f193c72fd2f1a891909283de53777e3
Author: Andrew Or 
Date:   2015-05-29T22:26:49Z

[HOTFIX] [SQL] Maven test compilation issue

Tests compile in SBT but not Maven.

commit a4f24123d8857656524c9138c7c067a4b1033a5e
Author: Andrew Or 
Date:   2015-05-30T00:19:46Z

[HOT FIX] [BUILD] Fix maven build failures

This patch fixes a build break in maven caused by #6441.

Note that this patch reverts the changes in flume-sink because
this module does not currently depend on Spark core, but the
tests require it. There is not an easy way to make this work
because mvn test dependencies are not transitive (MNG-1378).

For now, we will leave the one test suite in flume-sink out
until we figure out a better solution. This patch is mainly
intended to unbreak the maven build.

Author: Andrew Or 

Closes #6511 from andrewor14/fix-build-mvn and squashes the following 
commits:

3d53643 [Andrew Or] [HOT FIX #6441] Fix maven build failures

commit 3792d25836e1e521da64c5a62ca1b6cca1bcb6b9
Author: Taka Shinagawa 
Date:   2015-05-30T03:35:14Z

[DOCS][Tiny] Added a missing dash(-) in docs/configuration.md

The first line had only two dashes (--) instead of three(---). Because of 
this missing dash(-), 'jekyll build' command was not converting 
configuration.md to _site/configuration.html

Author: Taka Shinagawa 

Closes #6513 from mrt/docfix3 and squashes the following commits:

c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from 
converting configuration.md to html format

commit 7ed06c39922ac90acab3a78ce0f2f21184ed68a5
Author: Burak Yavuz 
Date:   2015-05-30T05:19:15Z

[SPARK-7957] Preserve partitioning when using randomSplit

cc JoshRosen
Thanks for noticing this!

Author: Burak Yavuz 

Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits:

497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using 
randomSplit

commit 609c4923f98c188bce60ae35c1c8a08a8dfd95f1
Author: Andrew Or 
Date:   2015-05-30T05:57:46Z

[SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike

This is a follow-up patch to #6441.

Author: Andrew Or 

Closes #6510 from andrewor14/extends-funsuite-check and squashes the 
following commits:

6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check
99d02ac [Andrew Or] Merge branch 'master' of github.c

[GitHub] spark pull request: Addition of Python example for SPARK-8320

2015-06-17 Thread nssalian
GitHub user nssalian opened a pull request:

https://github.com/apache/spark/pull/6860

Addition of Python example for SPARK-8320

Added python code to 
https://spark.apache.org/docs/latest/streaming-programming-guide.html 
to the Level of Parallelism in Data Receiving section.

Please review and let me know if there are any additional changes that are 
needed.

Thank you.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nssalian/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6860


commit 5a1a1075a607be683f008ef92fa227803370c45f
Author: Andrew Or 
Date:   2015-05-04T16:17:55Z

[MINOR] Fix python test typo?

I suspect haven't been using anaconda in tests in a while. I wonder if this 
change actually does anything but this line as it stands looks strictly less 
correct.

Author: Andrew Or 

Closes #5883 from andrewor14/fix-run-tests-typo and squashes the following 
commits:

a3ad720 [Andrew Or] Fix typo?

commit e0833c5958bbd73ff27cfe6865648d7b6e5a99bc
Author: Xiangrui Meng 
Date:   2015-05-04T18:28:59Z

[SPARK-5956] [MLLIB] Pipeline components should be copyable.

This PR added `copy(extra: ParamMap): Params` to `Params`, which makes a 
copy of the current instance with a randomly generated uid and some extra param 
values. With this change, we only need to implement `fit` and `transform` 
without extra param values given the default implementation of `fit(dataset, 
extra)`:

~~~scala
def fit(dataset: DataFrame, extra: ParamMap): Model = {
  copy(extra).fit(dataset)
}
~~~

Inside `fit` and `transform`, since only the embedded values are used, I 
added `$` as an alias for `getOrDefault` to make the code easier to read. For 
example, in `LinearRegression.fit` we have:

~~~scala
val effectiveRegParam = $(regParam) / yStd
val effectiveL1RegParam = $(elasticNetParam) * effectiveRegParam
val effectiveL2RegParam = (1.0 - $(elasticNetParam)) * effectiveRegParam
~~~

Meta-algorithm like `Pipeline` implements its own `copy(extra)`. So the 
fitted pipeline model stored all copied stages (no matter whether it is a 
transformer or a model).

Other changes:
* `Params$.inheritValues` is moved to `Params!.copyValues` and returns the 
target instance.
* `fittingParamMap` was removed because the `parent` carries this 
information.
* `validate` was renamed to `validateParams` to be more precise.

TODOs:
* [x] add tests for newly added methods
* [ ] update documentation

jkbradley dbtsai

Author: Xiangrui Meng 

Closes #5820 from mengxr/SPARK-5956 and squashes the following commits:

7bef88d [Xiangrui Meng] address comments
05229c3 [Xiangrui Meng] assert -> assertEquals
b2927b1 [Xiangrui Meng] organize imports
f14456b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
SPARK-5956
93e7924 [Xiangrui Meng] add tests for hasParam & copy
463ecae [Xiangrui Meng] merge master
2b954c3 [Xiangrui Meng] update Binarizer
465dd12 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
SPARK-5956
282a1a8 [Xiangrui Meng] fix test
819dd2d [Xiangrui Meng] merge master
b642872 [Xiangrui Meng] example code runs
5a67779 [Xiangrui Meng] examples compile
c76b4d1 [Xiangrui Meng] fix all unit tests
0f4fd64 [Xiangrui Meng] fix some tests
9286a22 [Xiangrui Meng] copyValues to trained models
53e0973 [Xiangrui Meng] move inheritValues to Params and rename it to 
copyValues
9ee004e [Xiangrui Meng] merge copy and copyWith; rename validate to 
validateParams
d882afc [Xiangrui Meng] test compile
f082a31 [Xiangrui Meng] make Params copyable and simply handling of extra 
params in all spark.ml components

commit f32e69ecc333867fc966f65cd0aeaeddd43e0945
Author: 云峤 
Date:   2015-05-04T19:08:38Z

[SPARK-7319][SQL] Improve the output from DataFrame.show()

Author: 云峤 

Closes #5865 from kaka1992/df.show and squashes the following commits:

c79204b [云峤] Update
a1338f6 [云峤] Update python dataFrame show test and add empty df unit 
test.
734369c [云峤] Update python dataFrame show test and add empty df unit 
test.
84aec3e [云峤] Update python dataFrame show test and add empty df unit 
test.
159b3d5 [云峤] update
03ef434 [云峤] update
7394fd5 [云峤] update test show
ced487a [云峤] update pep8
b6e690b [云峤] Merge remote-tracking branch 'upstream/master' into df.show
30ac311 [云峤] [SPARK-7294] ADD BETWEEN
7d62368 [云峤] [SPARK-729