robertwb commented on code in PR #28335:
URL: https://github.com/apache/beam/pull/28335#discussion_r1319184302


##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -473,23 +490,57 @@ def __init__(self, packages, base_python=sys.executable):
     self._packages = packages
     self._base_python = base_python
 
-  def _key(self):
-    return json.dumps({'binary': self._base_python, 'packages': 
self._packages})
+  @classmethod
+  def _key(cls, base_python, packages):
+    return json.dumps({
+        'binary': base_python, 'packages': sorted(packages)
+    },
+                      sort_keys=True)
 
-  def _venv(self):
-    venv = os.path.join(
-        self.VENV_CACHE,
-        hashlib.sha256(self._key().encode('utf-8')).hexdigest())
+  @classmethod
+  def _path(cls, base_python, packages):
+    return os.path.join(
+        cls.VENV_CACHE,
+        hashlib.sha256(cls._key(base_python,
+                                packages).encode('utf-8')).hexdigest())
+
+  @classmethod
+  def _create_venv_from_scratch(cls, base_python, packages):
+    venv = cls._path(base_python, packages)
     if not os.path.exists(venv):
-      python_binary = os.path.join(venv, 'bin', 'python')
-      subprocess.run([self._base_python, '-m', 'venv', venv], check=True)
-      subprocess.run([python_binary, '-m', 'ensurepip'], check=True)
-      subprocess.run([python_binary, '-m', 'pip', 'install'] + self._packages,
+      subprocess.run([base_python, '-m', 'venv', venv], check=True)
+      venv_python = os.path.join(venv, 'bin', 'python')
+      subprocess.run([venv_python, '-m', 'ensurepip'], check=True)
+      subprocess.run([venv_python, '-m', 'pip', 'install'] + packages,
                      check=True)
       with open(venv + '-requirements.txt', 'w') as fout:
-        fout.write('\n'.join(self._packages))
+        fout.write('\n'.join(packages))
     return venv

Review Comment:
   Yeah, installing apache beam is slow. I just filed 
https://github.com/apache/beam/issues/28364 about this. However, the change 
here lets you install it once and then clone this environment for all other 
created venvs, so the price is paid once at docker build time rather than on 
every expansion service (e.g. during yaml submission via a template things 
should be much is faster)). 
   
   Installing this from the local environment is an interesting idea, but might 
be hard, might not match the container environment, and might not be clean. 
   
   A downstream dependency can depend on `apache_beam[X]` which should only 
install the delta, but I added a couple more here to be complete. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to