tvalentyn commented on code in PR #28385:
URL: https://github.com/apache/beam/pull/28385#discussion_r1321907970
##########
.github/workflows/build_wheels.yml:
##########
@@ -117,15 +112,15 @@ jobs:
echo "RELEASE_VERSION=$RELEASE_VERSION" >> $GITHUB_OUTPUT
- name: Build source
working-directory: ./sdks/python
- run: python setup.py sdist --formats=zip
+ run: pip install -U build && python -m build --sdist
- name: Add checksums
working-directory: ./sdks/python/dist
run: |
- file=$(ls | grep .zip | head -n 1)
+ file=$(ls | grep .tar.gz | head -n 1)
Review Comment:
we should check which extension the release scripts expect when twine
uploads artifacts to pypi. We might have to change to use .tar.gz
##########
.github/workflows/typescript_tests.yml:
##########
@@ -89,10 +89,8 @@ jobs:
- name: Setup Beam Python
working-directory: ./sdks/python
run: |
- pip install pip setuptools --upgrade
- pip install -r build-requirements.txt
pip install 'pandas>=1.0,<1.5'
- python setup.py develop
+ python install -e .
Review Comment:
should this be `pip install` ?
##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -3032,30 +3032,40 @@ class BeamModulePlugin implements Plugin<Project> {
}
return argList.join(' ')
}
-
project.ext.toxTask = { name, tox_env, posargs='' ->
project.tasks.register(name) {
dependsOn setupVirtualenv
dependsOn ':sdks:python:sdist'
-
- doLast {
- // Python source directory is also tox execution workspace, We want
- // to isolate them per tox suite to avoid conflict when running
- // multiple tox suites in parallel.
- project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
- def copiedPyRoot = "${copiedSrcRoot}/sdks/python"
- def distTarBall = "${pythonRootDir}/build/apache-beam.tar.gz"
- project.exec {
- executable 'sh'
- args '-c', ". ${project.ext.envdir}/bin/activate && cd
${copiedPyRoot} && scripts/run_tox.sh $tox_env $distTarBall '$posargs'"
+ if (project.hasProperty('useWheelDistribution')) {
Review Comment:
why do we need both modes?
##########
sdks/python/apache_beam/coders/slow_coders_test.py:
##########
@@ -25,14 +25,17 @@
from apache_beam.coders.coders_test_common import *
[email protected](
+ 'Add a test for non-compiled implementation.'
+ 'https://github.com/apache/beam/issues/28307')
Review Comment:
FYI, I talked with robertwb; we should be able to make an assumption beam
will be installed either from wheels or with Cython at runtime, and remove the
non-compiled path and respective tests. We can decouple this change in a
separate PR.
##########
sdks/python/scripts/run_tox.sh:
##########
@@ -25,11 +25,11 @@
###########################################################################
# Usage check.
if [[ $# < 1 ]]; then
- printf "Usage: \n$> ./scripts/run_tox.sh <tox_environment> [<sdk_location>
[<posargs> ...]]"
- printf "\n\ttox_environment: [required] Tox environment to run the test
in.\n"
- printf "\n\tsdk_location: [optional] SDK tarball artifact location.\n"
- printf "\n\tposarg: [optional] Any additional arguments will be passed as
posargs to tox.\n"
- exit 1
+ printf "Usage: \n$> ./scripts/run_tox.sh <tox_environment> [<sdk_location>
[<posargs> ...]]"
Review Comment:
nit: let's use consistent formatting and keep two spaces
##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,32 @@ if [[ "$JENKINS_HOME" != "" ]]; then
export PY_COLORS=1
fi
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then # Check if the argument corresponds to a file
SDK_LOCATION="$1"
- shift;
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
-else
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+ shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are
posargs.
+# If SDK_LOCATION is not identified, all remaining arguments are posargs.
+
+if [[ ! -z "$SDK_LOCATION" ]]; then
+ if [[ $# -gt 0 ]]; then # There are posargs
+ tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
+ else
+ tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION"
+ fi
+else # No SDK_LOCATION; all arguments are posargs
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" -- "$@"
Review Comment:
What are the cases when we run tox without SDK location?
##########
sdks/python/test-suites/tox/common.gradle:
##########
@@ -42,5 +42,5 @@ project.tasks.register("preCommitPy${pythonVersionSuffix}") {
dependsOn = ["testPy38Cython"]
} else {
dependsOn = ["testPy${pythonVersionSuffix}Cloud",
"testPy${pythonVersionSuffix}Cython"]
- }
+ }
Review Comment:
unnecessary change
##########
sdks/python/apache_beam/runners/dataflow/internal/clients/cloudbuild/__init__.py:
##########
@@ -29,5 +29,3 @@
from
apache_beam.runners.dataflow.internal.clients.cloudbuild.cloudbuild_v1_messages
import *
except ImportError:
pass
-
Review Comment:
these are autogenerated clients, why do we need these changes?
##########
.github/workflows/beam_PreCommit_Python_Dataframes.yml:
##########
@@ -105,6 +105,7 @@ jobs:
arguments: |
-Pposargs=apache_beam/dataframe/ \
-PpythonVersion=${{ matrix.python_version }} \
+ -PuseWheelDistribution
Review Comment:
for my education why were these flags necessary?
##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -762,6 +762,7 @@ def __init__(self,
# Try to prepare all the arguments that can just be filled in
# without any additional work. in the process function.
# Also cache all the placeholders needed in the process function.
+ input_args = list(input_args)
Review Comment:
why was this necessary?
##########
sdks/python/apache_beam/examples/kafkataxi/README.md:
##########
@@ -157,9 +157,9 @@ Install Beam and dependencies and build a Beam distribution.
```sh
cd beam/sdks/python
-pip install -r build-requirements.txt
pip install -e '.[gcp]'
-python setup.py sdist
+pip install -q build
Review Comment:
is having `build` a prerequisite for an editable installation above?
##########
sdks/python/apache_beam/ml/gcp/naturallanguageml_test.py:
##########
@@ -60,12 +60,15 @@ def test_document_source(self):
self.assertFalse('content' in dict_)
self.assertTrue('gcs_content_uri' in dict_)
+ @unittest.skip(
+ 'TypeError: Expected bytes, got MagicMock.'
+ 'Please look at https://github.com/apache/beam/issues/28307.')
Review Comment:
i'd like to understand the failure mode here.
##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -103,6 +102,51 @@ def _load_or_default(filename):
return {}
+_DESTINATION_ELEMENT_PAIRS = [
Review Comment:
why was this necessary?
##########
sdks/python/setup.py:
##########
@@ -155,12 +156,18 @@ def cythonize(*args, **kwargs):
# We must generate protos after setup_requires are installed.
def generate_protos_first():
try:
- # pylint: disable=wrong-import-position
- import gen_protos
- gen_protos.generate_proto_files()
-
- except ImportError:
- warnings.warn("Could not import gen_protos, skipping proto generation.")
+ # Pyproject toml build happens in isolated environemnts. In those envs,
+ # gen_protos is unable to get imported. so we run a subprocess call.
+ cwd = os.path.abspath(os.path.dirname(__file__))
+ p = subprocess.call([
+ sys.executable,
+ os.path.join(cwd, 'gen_protos.py'),
+ '--no-force'
+ ])
+ if p != 0:
+ raise RuntimeError()
+ except RuntimeError:
+ logging.warning('Could not import gen_protos, skipping proto generation.')
Review Comment:
is this why you assume installation is done from wheels in other places?
##########
sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers_test.py:
##########
@@ -20,16 +20,16 @@
import hamcrest as hc
-import apache_beam.runners.dataflow.internal.clients.dataflow as dataflow
from apache_beam.internal.gcp.json_value import to_json_value
+from apache_beam.runners.dataflow.internal.clients import dataflow
from apache_beam.runners.dataflow.internal.clients.dataflow import
message_matchers
# Protect against environments where apitools library is not available.
# pylint: disable=wrong-import-order, wrong-import-position
try:
from apitools.base.py import base_api
except ImportError:
- base_api = None
+ raise unittest.SkipTest('GCP dependencies are not installed')
Review Comment:
why was this necessary? it's better to split any cleanup in a separate PR.
##########
sdks/python/gen_protos.py:
##########
@@ -511,24 +472,23 @@ def generate_proto_files(force=False):
if not os.path.exists(PYTHON_OUTPUT_PATH):
os.mkdir(PYTHON_OUTPUT_PATH)
- grpcio_install_loc = ensure_grpcio_exists()
protoc_gen_mypy = _find_protoc_gen_mypy()
- with PythonPath(grpcio_install_loc):
+ try:
Review Comment:
Can you provide context on grpcio changes?
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -775,10 +775,12 @@ def _build_setup_package(setup_file, # type: str
if build_setup_args is None:
build_setup_args = [
Stager._get_python_executable(),
- os.path.basename(setup_file),
- 'sdist',
- '--dist-dir',
- temp_dir
+ '-m',
+ 'build',
Review Comment:
this may be a breaking change. `build` might not be installed in user's
env. We can fallback to prior behavior in this case and maybe recommend build
in a warning.
##########
sdks/python/gen_protos.py:
##########
@@ -539,39 +499,51 @@ def generate_proto_files(force=False):
'Protoc returned non-zero status (see logs for details): '
'%s' % ret_code)
- # copy resource files
- for path in MODEL_RESOURCES:
- shutil.copy2(os.path.join(PROJECT_ROOT, path), PYTHON_OUTPUT_PATH)
-
- proto_packages = set()
- # see: https://github.com/protocolbuffers/protobuf/issues/1491
- # force relative import paths for proto files
- compiled_import_re = re.compile('^from (.*) import (.*)$')
- for file_path in find_by_ext(PYTHON_OUTPUT_PATH,
- ('_pb2.py', '_pb2_grpc.py', '_pb2.pyi')):
- proto_packages.add(os.path.dirname(file_path))
- lines = []
- with open(file_path, encoding='utf-8') as f:
- for line in f:
- match_obj = compiled_import_re.match(line)
- if match_obj and \
- match_obj.group(1).startswith('org.apache.beam.model'):
- new_import = build_relative_import(
- PYTHON_OUTPUT_PATH, match_obj.group(1), file_path)
- line = 'from %s import %s\n' % (new_import, match_obj.group(2))
-
- lines.append(line)
-
- with open(file_path, 'w') as f:
- f.writelines(lines)
-
- generate_init_files_lite(PYTHON_OUTPUT_PATH)
- with PythonPath(grpcio_install_loc):
+ # copy resource files
+ for path in MODEL_RESOURCES:
+ shutil.copy2(os.path.join(PROJECT_ROOT, path), PYTHON_OUTPUT_PATH)
+
+ proto_packages = set()
+ # see: https://github.com/protocolbuffers/protobuf/issues/1491
+ # force relative import paths for proto files
+ compiled_import_re = re.compile('^from (.*) import (.*)$')
+ for file_path in find_by_ext(PYTHON_OUTPUT_PATH,
+ ('_pb2.py', '_pb2_grpc.py', '_pb2.pyi')):
+ proto_packages.add(os.path.dirname(file_path))
+ lines = []
+ with open(file_path, encoding='utf-8') as f:
+ for line in f:
+ match_obj = compiled_import_re.match(line)
+ if match_obj and \
+ match_obj.group(1).startswith('org.apache.beam.model'):
+ new_import = build_relative_import(
+ PYTHON_OUTPUT_PATH, match_obj.group(1), file_path)
+ line = 'from %s import %s\n' % (new_import, match_obj.group(2))
+
+ lines.append(line)
+
+ with open(file_path, 'w') as f:
+ f.writelines(lines)
+
+ generate_init_files_lite(PYTHON_OUTPUT_PATH)
for proto_package in proto_packages:
generate_urn_files(proto_package, PYTHON_OUTPUT_PATH)
- generate_init_files_full(PYTHON_OUTPUT_PATH)
+ generate_init_files_full(PYTHON_OUTPUT_PATH)
+ except ImportError:
+ # this means the required _pb2 files are already generated.
+ pass
if __name__ == '__main__':
- generate_proto_files(force=True)
+ import argparse
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ '--no-force',
Review Comment:
```
parser.add_argument('--no-force', dest='force', action='store_false')
parser.set_defaults(force=True)
```
##########
sdks/python/setup.py:
##########
@@ -155,12 +156,18 @@ def cythonize(*args, **kwargs):
# We must generate protos after setup_requires are installed.
def generate_protos_first():
try:
- # pylint: disable=wrong-import-position
- import gen_protos
- gen_protos.generate_proto_files()
-
- except ImportError:
- warnings.warn("Could not import gen_protos, skipping proto generation.")
+ # Pyproject toml build happens in isolated environemnts. In those envs,
+ # gen_protos is unable to get imported. so we run a subprocess call.
+ cwd = os.path.abspath(os.path.dirname(__file__))
+ p = subprocess.call([
+ sys.executable,
+ os.path.join(cwd, 'gen_protos.py'),
+ '--no-force'
Review Comment:
why no-force?
##########
sdks/python/setup.py:
##########
@@ -213,22 +242,9 @@ def get_portability_package_data():
*get_portability_package_data()
]
},
- ext_modules=cythonize([
- 'apache_beam/**/*.pyx',
- 'apache_beam/coders/coder_impl.py',
- 'apache_beam/metrics/cells.py',
- 'apache_beam/metrics/execution.py',
- 'apache_beam/runners/common.py',
- 'apache_beam/runners/worker/logger.py',
- 'apache_beam/runners/worker/opcounters.py',
- 'apache_beam/runners/worker/operations.py',
- 'apache_beam/transforms/cy_combiners.py',
- 'apache_beam/transforms/stats.py',
- 'apache_beam/utils/counters.py',
- 'apache_beam/utils/windowed_value.py',
- ],
- language_level=3),
+ ext_modules=extensions,
install_requires=[
+ 'build>=0.9.0,<0.11.0',
Review Comment:
interesting. is it common to add build as a dependency ? Also why not 1.x ?
##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,32 @@ if [[ "$JENKINS_HOME" != "" ]]; then
export PY_COLORS=1
fi
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then # Check if the argument corresponds to a file
SDK_LOCATION="$1"
- shift;
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
-else
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+ shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are
posargs.
+# If SDK_LOCATION is not identified, all remaining arguments are posargs.
+
+if [[ ! -z "$SDK_LOCATION" ]]; then
+ if [[ $# -gt 0 ]]; then # There are posargs
+ tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
Review Comment:
do we need -vv here and below?
##########
sdks/python/setup.py:
##########
@@ -187,7 +194,29 @@ def get_portability_package_data():
# In order to find the tree of proto packages, the directory
# structure must exist before the call to setuptools.find_packages()
# executes below.
+ # while True: print(os.listdir())
generate_protos_first()
+
+ # generate cythonize extensions only if we are building a wheel or
+ # building an extension or running in editable mode.
Review Comment:
is '-e . ' an editable_wheel mode or a different one?
##########
sdks/python/scripts/run_tox.sh:
##########
@@ -53,18 +53,32 @@ if [[ "$JENKINS_HOME" != "" ]]; then
export PY_COLORS=1
fi
-if [[ ! -z $2 ]]; then
+# Determine if the second argument is SDK_LOCATION or posargs
+if [[ -f "$1" ]]; then # Check if the argument corresponds to a file
SDK_LOCATION="$1"
- shift;
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
-else
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+ shift
+fi
+
+# If SDK_LOCATION is identified and there are still arguments left, those are
posargs.
+# If SDK_LOCATION is not identified, all remaining arguments are posargs.
+
+if [[ ! -z "$SDK_LOCATION" ]]; then
+ if [[ $# -gt 0 ]]; then # There are posargs
+ tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION" -- "$@"
+ else
+ tox -rvv -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION"
+ fi
+else # No SDK_LOCATION; all arguments are posargs
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" -- "$@"
fi
exit_code=$?
# Retry once for the specific exit code 245.
if [[ $exit_code == 245 ]]; then
- tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT"
+ tox -c tox.ini run --recreate -e "$TOX_ENVIRONMENT" --installpkg
"$SDK_LOCATION"
exit_code=$?
fi
exit $exit_code
+This code first captures TOX_ENVIRONMENT and then shifts the arguments. It
then checks the number of remaining arguments to decide how to proceed. If
there are no remaining arguments, it assumes only the environment was provided.
If there's one remaining argument, it assumes that's the SDK location. If there
are two or more arguments, it takes the next one as the SDK location and passes
all the remaining ones as positional arguments to the tox command.
Review Comment:
misplaced content
##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -42,10 +42,10 @@ echo "posargs: $posargs"
# Run with pytest-xdist and without.
pytest -o junit_suite_name=${envname} \
- --junitxml=pytest_${envname}.xml -m 'not no_xdist' -n 6 ${pytest_args}
--pyargs ${posargs}
+ --junitxml=pytest_${envname}.xml -m 'not no_xdist' -n 6
--import-mode=importlib ${pytest_args} --pyargs ${posargs}
Review Comment:
do we need to do anything for test files with the same name (we have quite a
bit of `util_test.py`)
##########
sdks/python/container/Dockerfile:
##########
@@ -45,7 +45,7 @@ RUN \
&& \
rm -rf /var/lib/apt/lists/* && \
- pip install --upgrade setuptools && \
+ pip install --upgrade setuptools && pip install --upgrade wheel && \
Review Comment:
nit: you can combine both commands
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]