Can we filter the cache directory only for the artifacts that we want and
not everything that is there?

On Wed, Dec 4, 2019 at 6:56 PM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Luke, I am not sure I understand the question. The caching that happens
> here is implemented in the SDK for requirements packages:
> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161
>
>
> On Wed, Dec 4, 2019 at 6:19 PM Luke Cwik <lc...@google.com> wrote:
>
>> Is there a way to use a cache on disk that is separate from the set of
>> packages we use as requirements?
>>
>> On Wed, Dec 4, 2019 at 5:58 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Thanks!
>>> Another reason to periodically referesh workers.
>>>
>>> On Wed, Nov 27, 2019 at 10:37 PM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
>>>> Tests job specify[1] a requirements.txt file that contains two entries:
>>>> pyhamcrest, mock.
>>>>
>>>> We download[2]  sources of packages specified in requirements file,
>>>> and packages they depend on. While doing so, it appears that we use a cache
>>>> directory on jenkins to store the sources of the packages [3], perhaps to
>>>> save a trip to pypi and reduce pypi flakiness? Then, we stage the entire
>>>> cache directory[4], which includes all packages ever cached. Overtime the
>>>> versions that our requirements packages need change, but I guess we don't
>>>> clean the cache on Jenkins workers.
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/scripts/run_integration_test.sh#L197
>>>> [2]
>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L469
>>>> [3]
>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161
>>>>
>>>> [4]
>>>> https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L172
>>>>
>>>> On Wed, Nov 27, 2019 at 11:55 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> I was investigating a Dataflow postcommit test failure (endpoints_pb2
>>>>> missing), and saw this in the staging directory:
>>>>>
>>>>> $ gsutil ls 
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/PyHamcrest-1.9.0.tar.gz
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow-worker.jar
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow_python_sdk.tar
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/funcsigs-1.0.2.tar.gz
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/mock-3.0.5.tar.gz
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/pipeline.pb
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/requirements.txt
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.2.0.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.4.0.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.0.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.1.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.6.0.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.0.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.1.zip
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.12.0.tar.gz
>>>>> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.13.0.tar.gz
>>>>>
>>>>>
>>>>> Does anyone know why so many versions of setuptools need to be staged?
>>>>> Shouldn't 1 be enough?
>>>>>
>>>>

Reply via email to