Nan, I think Meisam already had a PR about this this, maybe you can discuss
with him on the github based on the proposed code.

Sorry I didn't follow the long discussion thread, but I think Paypal's
solution sounds simpler.

On Wed, Aug 23, 2017 at 12:07 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote:

> based on this result, I think we should follow the bulk operation pattern
>
> Shall we move forward with the PR from Paypal?
>
> Best,
>
> Nan
>
> On Mon, Aug 21, 2017 at 12:21 PM, Meisam Fathi <meisam.fa...@gmail.com>
> wrote:
>
> > Bottom line up front:
> > 1. The cost of calling 10000 individual REST calls is about two order of
> > magnitude higher than calling a single batch REST call (10000 * 0.05
> > seconds vs. 1.4 seconds)
> > 2. Time to complete a batch REST call plateaus at about 10,000
> application
> > reports per call.
> >
> > Full story:
> > I experimented and measure how long it takes to fetch Application Reports
> > from YARN with the REST API. My objective was to compare doing a batch
> REST
> > call to get all ApplicationReports vs doing individual REST calls for
> each
> > Application Report.
> >
> > I did the tests on 4 different cluster: 1) a test cluster, 2) a
> moderately
> > used dev cluster, 3) a lightly used production cluster, and 4) a heavily
> > used production cluster. For each cluster I made 7 REST call to get 1,
> 10,
> > 100, 1000, 10000, 100000, 1000000 application reports respectively. I
> > repeated each call 200 times to count for variations and I reported the
> > median time.
> > To measure the time, I used the following curl command:
> >
> > $ curl -o /dev/null -s -w "@curl-output-fromat.json" "http://
> > $rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=$
> > applicationTypes&limit=$limit"
> >
> > The attached charts show the results. In all the charts, the x axis show
> > the number of results that were request in the call.
> > The bar chart show the time it takes to complete a REST call on each
> > cluster.
> > The first line plot also shows the same results as the bar chart on a log
> > scale (it is easier to see that the time to complete the REST call
> plateaus
> > at 10,000
> > The last chart shows the size of data that is being downloaded on each
> > REST call, which explains why the time plateaus  at 10,000.
> >
> >
> > [image: transfer_time_bar_plot.png][image: transfer_time_line_plot.png][
> image:
> > size_downloaded_line_plot.png]
> >
> >>
> >>
> > Thanks,
> > Meisam
> >
>

Reply via email to