Just an FYI, apache mailing lists cant share attachments. If you could
please upload the files to another file sharing site and include links
instead.

Thanks,
                                                                                
   
 Alex Bozarth                                                                   
   
 Software Engineer                                                              
   
 Spark Technology Center                                                        
   
                                                                                
   
                                                                                
     
                                                                                
     
                                                                                
     
 E-mail: ajboz...@us.ibm.com                                                    
     
 GitHub: github.com/ajbozarth                                                   
     
                                                                   505 Howard 
Street 
                                                             San Francisco, CA 
94105 
                                                                       United 
States 
                                                                                
     








From:   Meisam Fathi <meisam.fa...@gmail.com>
To:     dev@livy.incubator.apache.org
Cc:     Prabhu Kasinathan <vasurampra...@gmail.com>
Date:   08/21/2017 02:09 PM
Subject:        Re: resolve the scalability problem caused by app monitoring in
            livy with an actor-based design



I forgot to attach the first chart. Sorry about that.



Thanks,
Meisam

On Mon, Aug 21, 2017 at 12:21 PM Meisam Fathi <meisam.fa...@gmail.com>
wrote:
  Bottom line up front:
  1. The cost of calling 10000 individual REST calls is about two order of
  magnitude higher than calling a single batch REST call (10000 * 0.05
  seconds vs. 1.4 seconds)
  2. Time to complete a batch REST call plateaus at about 10,000
  application reports per call.

  Full story:
  I experimented and measure how long it takes to fetch Application Reports
  from YARN with the REST API. My objective was to compare doing a batch
  REST call to get all ApplicationReports vs doing individual REST calls
  for each Application Report.

  I did the tests on 4 different cluster: 1) a test cluster, 2) a
  moderately used dev cluster, 3) a lightly used production cluster, and 4)
  a heavily used production cluster. For each cluster I made 7 REST call to
  get 1, 10, 100, 1000, 10000, 100000, 1000000 application reports
  respectively. I repeated each call 200 times to count for variations and
  I reported the median time.
  To measure the time, I used the following curl command:

  $ curl -o /dev/null -s -w "@curl-output-fromat.json" "
  http://$rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=
  $applicationTypes&limit=$limit"

  The attached charts show the results. In all the charts, the x axis show
  the number of results that were request in the call.
  The bar chart show the time it takes to complete a REST call on each
  cluster.
  The first line plot also shows the same results as the bar chart on a log
  scale (it is easier to see that the time to complete the REST call
  plateaus at 10,000
  The last chart shows the size of data that is being downloaded on each
  REST call, which explains why the time plateaus  at 10,000.





  Thanks,
  Meisam

Reply via email to