Just an FYI, apache mailing lists cant share attachments. If you could please upload the files to another file sharing site and include links instead.
Thanks, Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States From: Meisam Fathi <meisam.fa...@gmail.com> To: dev@livy.incubator.apache.org Cc: Prabhu Kasinathan <vasurampra...@gmail.com> Date: 08/21/2017 02:09 PM Subject: Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design I forgot to attach the first chart. Sorry about that. Thanks, Meisam On Mon, Aug 21, 2017 at 12:21 PM Meisam Fathi <meisam.fa...@gmail.com> wrote: Bottom line up front: 1. The cost of calling 10000 individual REST calls is about two order of magnitude higher than calling a single batch REST call (10000 * 0.05 seconds vs. 1.4 seconds) 2. Time to complete a batch REST call plateaus at about 10,000 application reports per call. Full story: I experimented and measure how long it takes to fetch Application Reports from YARN with the REST API. My objective was to compare doing a batch REST call to get all ApplicationReports vs doing individual REST calls for each Application Report. I did the tests on 4 different cluster: 1) a test cluster, 2) a moderately used dev cluster, 3) a lightly used production cluster, and 4) a heavily used production cluster. For each cluster I made 7 REST call to get 1, 10, 100, 1000, 10000, 100000, 1000000 application reports respectively. I repeated each call 200 times to count for variations and I reported the median time. To measure the time, I used the following curl command: $ curl -o /dev/null -s -w "@curl-output-fromat.json" " http://$rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes= $applicationTypes&limit=$limit" The attached charts show the results. In all the charts, the x axis show the number of results that were request in the call. The bar chart show the time it takes to complete a REST call on each cluster. The first line plot also shows the same results as the bar chart on a log scale (it is easier to see that the time to complete the REST call plateaus at 10,000 The last chart shows the size of data that is being downloaded on each REST call, which explains why the time plateaus at 10,000. Thanks, Meisam