[ 
https://issues.apache.org/jira/browse/SPARK-50294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-50294:
----------------------------------
    Description: 
currently we only have single test image, for {{{}pyspark{}}}, {{{}sparkr{}}}, 
{{lint}} and {{{}docs{}}}, it has two major issues:
 * {*}disk space limitation{*}: we are adding more and more packages in it, the 
disk space left for testing is very limited, and cause {{No space left on 
device}} from time to time;
 * {*}environment conflicts{*}: for example, even though we already install 
some packages for {{docs}} in the docker file, we still need to install some 
additional python packages in {{{}build_and_test{}}}, due to the conflicts 
between {{docs}} and {{{}pyspark{}}}. It is hard to maintain because the 
related packages are installed in different places.

 

so we want to split existing base image to multiple ones, so that:
 * completely cache all the dependencies for each job;
 * centralize related installations for each job;
 * free up disk space on the base image;
 * introduce new dev tools based on new images;

> Refactor docker image for testing
> ---------------------------------
>
>                 Key: SPARK-50294
>                 URL: https://issues.apache.org/jira/browse/SPARK-50294
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Project Infra
>    Affects Versions: 4.0.0
>            Reporter: Ruifeng Zheng
>            Priority: Major
>
> currently we only have single test image, for {{{}pyspark{}}}, 
> {{{}sparkr{}}}, {{lint}} and {{{}docs{}}}, it has two major issues:
>  * {*}disk space limitation{*}: we are adding more and more packages in it, 
> the disk space left for testing is very limited, and cause {{No space left on 
> device}} from time to time;
>  * {*}environment conflicts{*}: for example, even though we already install 
> some packages for {{docs}} in the docker file, we still need to install some 
> additional python packages in {{{}build_and_test{}}}, due to the conflicts 
> between {{docs}} and {{{}pyspark{}}}. It is hard to maintain because the 
> related packages are installed in different places.
>  
> so we want to split existing base image to multiple ones, so that:
>  * completely cache all the dependencies for each job;
>  * centralize related installations for each job;
>  * free up disk space on the base image;
>  * introduce new dev tools based on new images;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to