I think reusing the same cache directory makes sense during downloading but why do we upload everything that is there?
On Thu, Dec 5, 2019 at 9:24 AM Udi Meiri <[email protected]> wrote: > Looking at the source, it seems that it should be using a > os.path.join(tempfile.gettempdir(), 'dataflow-requirements-cache') > to create a different tmp directory on each run. > > Also, sampling worker no. 2: > > *jenkins@apache-beam-jenkins-2*:*~*$ ls -l /tmp/dataflow-requirements-cache/ > total 7172 > -rw-rw-r-- 1 jenkins jenkins 27947 Sep 6 22:46 *funcsigs-1.0.2.tar.gz* > -rw-rw-r-- 1 jenkins jenkins 28126 Sep 6 21:38 *mock-3.0.5.tar.gz* > -rw-rw-r-- 1 jenkins jenkins 376623 Sep 6 21:38 *PyHamcrest-1.9.0.tar.gz* > -rw-rw-r-- 1 jenkins jenkins 851251 Sep 6 21:38 *setuptools-41.2.0.zip* > -rw-rw-r-- 1 jenkins jenkins 855608 Oct 7 06:03 *setuptools-41.4.0.zip* > -rw-rw-r-- 1 jenkins jenkins 851068 Oct 28 06:10 *setuptools-41.5.0.zip* > -rw-rw-r-- 1 jenkins jenkins 851097 Oct 28 19:46 *setuptools-41.5.1.zip* > -rw-rw-r-- 1 jenkins jenkins 852541 Oct 29 14:06 *setuptools-41.6.0.zip* > -rw-rw-r-- 1 jenkins jenkins 852125 Nov 24 08:10 *setuptools-42.0.0.zip* > -rw-rw-r-- 1 jenkins jenkins 852264 Nov 25 20:55 *setuptools-42.0.1.zip* > -rw-rw-r-- 1 jenkins jenkins 858444 Dec 1 18:12 *setuptools-42.0.2.zip* > -rw-rw-r-- 1 jenkins jenkins 32725 Sep 6 21:38 *six-1.12.0.tar.gz* > -rw-rw-r-- 1 jenkins jenkins 33726 Nov 5 19:18 *six-1.13.0.tar.gz* > > > On Wed, Dec 4, 2019 at 8:00 PM Luke Cwik <[email protected]> wrote: > >> Can we filter the cache directory only for the artifacts that we want and >> not everything that is there? >> >> On Wed, Dec 4, 2019 at 6:56 PM Valentyn Tymofieiev <[email protected]> >> wrote: >> >>> Luke, I am not sure I understand the question. The caching that happens >>> here is implemented in the SDK for requirements packages: >>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161 >>> >>> >>> On Wed, Dec 4, 2019 at 6:19 PM Luke Cwik <[email protected]> wrote: >>> >>>> Is there a way to use a cache on disk that is separate from the set of >>>> packages we use as requirements? >>>> >>>> On Wed, Dec 4, 2019 at 5:58 PM Udi Meiri <[email protected]> wrote: >>>> >>>>> Thanks! >>>>> Another reason to periodically referesh workers. >>>>> >>>>> On Wed, Nov 27, 2019 at 10:37 PM Valentyn Tymofieiev < >>>>> [email protected]> wrote: >>>>> >>>>>> Tests job specify[1] a requirements.txt file that contains two >>>>>> entries: pyhamcrest, mock. >>>>>> >>>>>> We download[2] sources of packages specified in requirements file, >>>>>> and packages they depend on. While doing so, it appears that we use a >>>>>> cache >>>>>> directory on jenkins to store the sources of the packages [3], perhaps to >>>>>> save a trip to pypi and reduce pypi flakiness? Then, we stage the entire >>>>>> cache directory[4], which includes all packages ever cached. Overtime the >>>>>> versions that our requirements packages need change, but I guess we don't >>>>>> clean the cache on Jenkins workers. >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/scripts/run_integration_test.sh#L197 >>>>>> [2] >>>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L469 >>>>>> [3] >>>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161 >>>>>> >>>>>> [4] >>>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L172 >>>>>> >>>>>> On Wed, Nov 27, 2019 at 11:55 AM Udi Meiri <[email protected]> wrote: >>>>>> >>>>>>> I was investigating a Dataflow postcommit test failure >>>>>>> (endpoints_pb2 missing), and saw this in the staging directory: >>>>>>> >>>>>>> $ gsutil ls >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882 >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/PyHamcrest-1.9.0.tar.gz >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow-worker.jar >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow_python_sdk.tar >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/funcsigs-1.0.2.tar.gz >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/mock-3.0.5.tar.gz >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/pipeline.pb >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/requirements.txt >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.2.0.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.4.0.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.0.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.1.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.6.0.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.0.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.1.zip >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.12.0.tar.gz >>>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.13.0.tar.gz >>>>>>> >>>>>>> >>>>>>> Does anyone know why so many versions of setuptools need to be >>>>>>> staged? Shouldn't 1 be enough? >>>>>>> >>>>>>
