Nan, I think Meisam already had a PR about this this, maybe you can discuss with him on the github based on the proposed code.
Sorry I didn't follow the long discussion thread, but I think Paypal's solution sounds simpler. On Wed, Aug 23, 2017 at 12:07 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > based on this result, I think we should follow the bulk operation pattern > > Shall we move forward with the PR from Paypal? > > Best, > > Nan > > On Mon, Aug 21, 2017 at 12:21 PM, Meisam Fathi <meisam.fa...@gmail.com> > wrote: > > > Bottom line up front: > > 1. The cost of calling 10000 individual REST calls is about two order of > > magnitude higher than calling a single batch REST call (10000 * 0.05 > > seconds vs. 1.4 seconds) > > 2. Time to complete a batch REST call plateaus at about 10,000 > application > > reports per call. > > > > Full story: > > I experimented and measure how long it takes to fetch Application Reports > > from YARN with the REST API. My objective was to compare doing a batch > REST > > call to get all ApplicationReports vs doing individual REST calls for > each > > Application Report. > > > > I did the tests on 4 different cluster: 1) a test cluster, 2) a > moderately > > used dev cluster, 3) a lightly used production cluster, and 4) a heavily > > used production cluster. For each cluster I made 7 REST call to get 1, > 10, > > 100, 1000, 10000, 100000, 1000000 application reports respectively. I > > repeated each call 200 times to count for variations and I reported the > > median time. > > To measure the time, I used the following curl command: > > > > $ curl -o /dev/null -s -w "@curl-output-fromat.json" "http:// > > $rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=$ > > applicationTypes&limit=$limit" > > > > The attached charts show the results. In all the charts, the x axis show > > the number of results that were request in the call. > > The bar chart show the time it takes to complete a REST call on each > > cluster. > > The first line plot also shows the same results as the bar chart on a log > > scale (it is easier to see that the time to complete the REST call > plateaus > > at 10,000 > > The last chart shows the size of data that is being downloaded on each > > REST call, which explains why the time plateaus at 10,000. > > > > > > [image: transfer_time_bar_plot.png][image: transfer_time_line_plot.png][ > image: > > size_downloaded_line_plot.png] > > > >> > >> > > Thanks, > > Meisam > > >