baohe-zhang commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-655002716


   I measured the memory usage of some smaller apps, the results are:
   
   * 200 jobs, 400 tasks for each job: 265 MB file size, 57.9 MB memory usage.
   
   * 100 jobs, 400 tasks for each job: 133 MB file size, 28.5 MB memory usage.
   
   * 50 jobs, 400 tasks for each job: 67 MB file size, 14.9 MB memory usage.
   
   * 20 jobs: 400 tasks for each job: 35 MB file size, 8.3 MB memory usage
   
   
-----------------------------------------------------------------------------------------------
 
   
   * 10 jobs: 400 tasks for each job: 15 MB file size, 4.7 MB memory usage.
   
   * 1 job, 400 tasks for each job: 3.7 MB file size, 2.2 MB memory usage.
   
   * 1 job, 40 tasks for each job: 512 KB file size, 727 KB memory usage.
   
   
   I found that the ratio of memory_usage / filesize is stable at ¼  for log 
files larger than 30 MB. For log files with size less than 15 MB, the ratio of 
memory_usage/ filesize is greater than ¼, and the ratio increases as the file 
size decrease. 
   
   The difference in serving latency between in-memory and leveldb store 
depends on the machine's performance. But personally I feel parsing a log file 
greater than 50MB with the hybrid store can have a notable improvement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to