[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-08-03 Thread raajay
Github user raajay commented on the issue:

https://github.com/apache/spark/pull/18690
  
I understand. My previous comment was just a clarification to your 
question: "I'm not sure how does this code work in your changes?". I will close 
this PR. The JIRA is already closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18690: [SPARK-21334][CORE] Add metrics reporting service...

2017-08-03 Thread raajay
Github user raajay closed the pull request at:

https://github.com/apache/spark/pull/18690


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-08-03 Thread raajay
Github user raajay commented on the issue:

https://github.com/apache/spark/pull/18690
  
@jerryshao  My CustomSInk has the report function defined. What I did not 
have was an equivalent of JmxReporter defined in my CustomSink. The reporter 
essentially periodically invokes the report function defined in CustomSink


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-07-21 Thread raajay
Github user raajay commented on the issue:

https://github.com/apache/spark/pull/18690
  
We were using a custom sink rather than the JmxSink for gathering metrics. 
The sink did NOT have a "reporter" like the ones JmxSink or CsvSink have. I 
guess a cleaner design is to implement a metrics reporter in the Sink and not 
have a reporting service as part of external shuffle service. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18683: [SPARK-21474][CORE] Make number of parallel fetches from...

2017-07-20 Thread raajay
Github user raajay commented on the issue:

https://github.com/apache/spark/pull/18683
  
maxSizeInFlight can be large (~100-200 MB) when (a) available memory at 
reducer is high, or (b) when reducer spends most of its time waiting for 
fetchRequests. In such cases, using a hard coded value of '5' parallel fetches 
will result in individual fetchRequests to be bursts of size 20-40 MB. The 
configuration parameter, allows one have smaller sized fetchRequests while 
keeping the total maxSizeInFlight a constant. 

This configuration is helpful when the reducer spends most of its time 
waiting for fetchRequests to return. In such cases, we would like to increase 
the maxBytesInFlight 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18690: [SPARK-21334][CORE] Add metrics reporting service...

2017-07-20 Thread raajay
GitHub user raajay opened a pull request:

https://github.com/apache/spark/pull/18690

[SPARK-21334][CORE] Add metrics reporting service to External Shuffle Server

## What changes were proposed in this pull request?

Add a metrics reporting service, that periodically reports the metrics 
defined in ExternalShuffleServiceSource. Currently, although the metrics are 
defined, they are never reported. 

## How was this patch tested?

Manual tests to ensure that metrics are reported on ExternalShuffleService 
start.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/raajay/spark raajay-launch-metric-reporting

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18690


commit 4de1658f2dea72ded4c86d699f90432b8d965370
Author: Raajay Viswanathan <raaja...@gmail.com>
Date:   2017-07-20T17:21:23Z

Add metrics reporting service.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18683: [SPARK-21474][CORE] Make number of parallel fetch...

2017-07-19 Thread raajay
GitHub user raajay opened a pull request:

https://github.com/apache/spark/pull/18683

[SPARK-21474][CORE] Make number of parallel fetches from a reducer 
configurable

## What changes were proposed in this pull request?

Currently the number of parallel fetches is hard-coded to 5. As a result 
the size of each fetch request is fixed at 1/5th of maxSizeInFlights. Since, 
chunks are requested in bursts of fetchRequests; the size of a burst to a 
single shuffle service can be high for large maxSizeInFlight

Introduce a new configuration parameter 
"spark.reducer.numParallelFetchRequets" to make it configurable.

## How was this patch tested?

Not tested.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/raajay/spark 
raajay-configure-parallel-requests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18683


commit 9bf1f9fd5aa08ef19653a4cfa52a8eccc64dc18b
Author: Raajay Viswanathan <raaja...@gmail.com>
Date:   2017-07-19T16:49:25Z

Make num parallel fetches from a reducer configurable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org