With the help from the community, the cache based job switch has been
completed!

* About the ghcr images:

You might notice that two images are generated in apache ghcr:

- Image cache: spark/apache-spark-github-action-image-cache
<https://github.com/orgs/apache/packages/container/package/spark%2Fapache-spark-github-action-image-cache>:
This is the cache based on branches' dev/infra/Dockerfile.

- CI image: apache-spark-ci-image
<https://github.com/orgs/apache/packages/container/package/apache-spark-ci-image>:
This is for scheduled jobs. It builds an image just-in-time from the cache,
and then uses it to run the CI jobs.

- Distributed (User) CI image: such as yikun/apache-spark-ci-image
<https://github.com/Yikun/spark/pkgs/container/apache-spark-ci-image>: This
is for PR triggered jobs. Again built just-in-time from the cache and used
to execute the CI job(s) in the user's Github Action space.

* About the job:

For Lint/PySpark/SparkR jobs, "Base image build" will do a just-in-time
build and generate a ci-image for each PR, and jobs use the image as the
job container image.

* About how to change the infra deps:

Currently, the CI image is just like a static image unless you change the
Dockerfile.

- If you want to change the version of a dependency of Lint/PySpark/SparkR
jobs, you could change the dev/infra/Dockerfile just like
https://github.com/apache/spark/pull/37175.

- If you want to trigger a full refresh you could just change the
FULL_REFRESH_DATE
in the Dockerfile
<https://github.com/apache/spark/blob/35d00df9bba7238ad4f409999617fae4d04ddbfd/dev/infra/Dockerfile#L21>
.

FYI, I also do a updated the doc on
https://docs.google.com/document/d/1_uiId-U1DODYyYZejAZeyz2OAjxcnA-xfwjynDF6vd0
to
help you understand.


Through this work, I can really feel the efforts of previous maintenance! A
simple version bump of a dependency may lead to a lot of investigation!
Thanks to HyukjinKwon, Dongjoon and the whole community for keeping the
infra deps always latest!

Feel free to ping me if you have any other concerns or ideas!

Regards,
Yikun


On Mon, Jun 27, 2022 at 12:05 AM Yikun Jiang <yikunk...@gmail.com> wrote:

> > There’s one last task to simply caching the Docker image (
> https://issues.apache.org/jira/browse/SPARK-39522).
> I will have to be less active for this week and next week because of the
> Spark Summit. Would appreciate if somebody
> finds some time to take a stab.
>
> I did some investigations on spark container jobs (pyspark/sparkr/lint)
> using cache, and draft a doc to help you guys understand #36980
> <https://github.com/apache/spark/pull/36980>:
>
> https://docs.google.com/document/d/1_uiId-U1DODYyYZejAZeyz2OAjxcnA-xfwjynDF6vd0
>
>
> > About a quick hallway meetup, I will be there after Holden’s talk at
> least to say hello to her :-).
>
> Something topic I was interesting about and related to build CI:
> - K8S integrations <https://github.com/apache/spark/pull/35830> test on
> GA:
> - To help various OS <https://github.com/apache/spark/pull/35142> and
> multi architecture/hardware (x86/arm64, gpu) integration support, what we
> can do to help improving.
> Please feel free to ping me if necessary. It's a little bit pity I
> couldn't have the opportunity to be there, I hope you guys have a fabulous
> meet on summit!
>
> Regards,
> Yikun
>
>
> On Fri, Jun 24, 2022 at 11:15 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Yep, I'll be there too. Thank you for the adjustment. See you soon. :)
>>
>> Dongjoon.
>>
>> On Thu, Jun 23, 2022 at 4:59 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>
>>> Alright, I'll be there after Holden's talk Thursday
>>> https://databricks.com/dataaisummit/session/tools-assisted-apache-spark-version-migrations-21-32
>>> w/ Dongjoon (since he manages OSS Jenkins too).
>>> Let's have a quickie chat :-).
>>>
>>> On Thu, 23 Jun 2022 at 06:16, Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>>
>>>> Oops, I was confused about the time and distance in the US. I won't
>>>> make it too.
>>>> Let me find another time slot that works for more ppl.
>>>>
>>>> On Thu, 23 Jun 2022 at 00:19, Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you, Hyukjin! :)
>>>>>
>>>>> BTW, unfortunately, it seems that I cannot join that quick meeting.
>>>>> I have another schedule at South Bay around 7PM and need to leave San
>>>>> Francisco at least 5PM.
>>>>>
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Wed, Jun 22, 2022 at 3:39 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> (cc @Yikun Jiang <yikunk...@gmail.com> @Gengliang Wang
>>>>>> <gengliang.w...@databricks.com> @Maxim Gekk
>>>>>> <maxim.g...@databricks.com> @Yang,Jie(INF) <yangji...@baidu.com> FYI)
>>>>>>
>>>>>> On Wed, 22 Jun 2022 at 19:34, Hyukjin Kwon <gurwls...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Couple of updates:
>>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>    All builds passed now with all combinations we defined in the
>>>>>>>    GitHub Actions (e.g., branch-3.2, branch-3.3, JDK 11,
>>>>>>>    JDK 17 and Scala 2.13), see
>>>>>>>    https://github.com/apache/spark/actions cc @Tom Graves
>>>>>>>    <tgraves...@yahoo.com> @Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>>>>>     FYI
>>>>>>>    -
>>>>>>>
>>>>>>>    except one test that is being failed due to OOM. That’s being
>>>>>>>    fixed at https://github.com/apache/spark/pull/36954, see
>>>>>>>    also
>>>>>>>    https://github.com/apache/spark/pull/36787#discussion_r901190636
>>>>>>>    -
>>>>>>>
>>>>>>>    I am now adding PySpark, SparkR jobs to the scheduled builds at
>>>>>>>    https://github.com/apache/spark/pull/36940
>>>>>>>    and see if they pass. We might need a couple of more fixes there.
>>>>>>>    -
>>>>>>>
>>>>>>>    There’s one last task to simply caching the Docker image (
>>>>>>>    https://issues.apache.org/jira/browse/SPARK-39522).
>>>>>>>    I will have to be less active for this week and next week
>>>>>>>    because of the Spark Summit. Would appreciate if somebody
>>>>>>>    finds some time to take a stab.
>>>>>>>
>>>>>>> About a quick hallway meetup, I will be there after Holden’s talk at
>>>>>>> least to say hello to her :-).
>>>>>>> Let’s have a quick chat about our CI. We still have some general
>>>>>>> problems to cope with like the lack of resources in
>>>>>>> GitHub Actions.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 21 Jun 2022 at 11:49, Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just chatted offline - both I and Holden have multiple sessions :-).
>>>>>>>> Probably let's meet up for a quick chat after your talk
>>>>>>>> https://databricks.com/dataaisummit/session/what-do-when-your-job-goes-oom-night-flowcharts
>>>>>>>> ?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 20 Jun 2022 at 22:23, Holden Karau <hol...@pigscanfly.ca>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How about a hallway meet up at Data AI summit to talk about build
>>>>>>>>> CI if folks are
>>>>>>>>> Interested?
>>>>>>>>>
>>>>>>>>> On Sun, Jun 19, 2022 at 7:50 PM Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Increased the priority to a blocker - I don't think we can
>>>>>>>>>> release with these build failures and poor CI
>>>>>>>>>>
>>>>>>>>>> On Mon, 20 Jun 2022 at 10:39, Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> There are too many test failures here. I pinged in some PRs I
>>>>>>>>>>> could identify from a cursory look but would be great for you guys 
>>>>>>>>>>> to take
>>>>>>>>>>> a look if you guys haven't tested your change against other
>>>>>>>>>>> environments like JDK 11, Scala 2.13.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 20 Jun 2022 at 10:04, Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to rework GitHub Actions CI at
>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-39515. Any help
>>>>>>>>>>>> would be very appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>>
>>>>>>>>

Reply via email to