[ 
https://issues.apache.org/jira/browse/SPARK-12427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp updated SPARK-12427:
--------------------------------
    Attachment: graph.png

disk usage over the past year, for lols.

> spark builds filling up jenkins' disk
> -------------------------------------
>
>                 Key: SPARK-12427
>                 URL: https://issues.apache.org/jira/browse/SPARK-12427
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>            Reporter: shane knapp
>            Priority: Critical
>              Labels: build, jenkins
>         Attachments: graph.png, jenkins_disk_usage.txt
>
>
> problem summary:
> a few spark builds are filling up the jenkins master's disk with millions of 
> little log files as build artifacts.  
> currently, we have a raid10 array set up with 5.4T of storage.  we're 
> currently using 4.0T, 99.9% of which is spark unit test and junit logs.
> the worst offenders, with more than 100G of disk usage per job, are:
> 193G    ./Spark-1.6-Maven-with-YARN
> 194G    ./Spark-1.5-Maven-with-YARN
> 205G    ./Spark-1.6-Maven-pre-YARN
> 216G    ./Spark-1.5-Maven-pre-YARN
> 387G    ./Spark-Master-Maven-with-YARN
> 420G    ./Spark-Master-Maven-pre-YARN
> 520G    ./Spark-1.6-SBT
> 733G    ./Spark-1.5-SBT
> 812G    ./Spark-Master-SBT
> i have attached a full report w/all builds listed as well.
> each of these builds is keeping their build history for 90 days.
> keep in mind that for each new matrix build, we're looking at another 
> 200-500G per for the SBT/pre-YARN/with-YARN jobs.
> a straw man, back of napkin estimate for spark 1.7 is 2T of additional disk 
> usage.
> on the hardware config side, we can move from raid10 to raid 5 and get ~3T 
> additional storage.  if we ditch raid altogether and put in bigger disks, we 
> can get a total of 16-20T storage on master.  another option is to have a NFS 
> mount to a deep storage server.  all of these options will require 
> significant downtime.
> quesitons:
> * can we lower the number of days that we keep build information?
> * there are other options in jenkins that we can set as well:  max number of 
> builds to keep, max # days to keep artifacts, max # of builds to keep 
> w/artifacts
> * can we make the junit and unit test logs smaller (probably not)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to