scripts: fix posargs for `run_pytest.sh` [beam]

via GitHub Tue, 29 Jul 2025 18:47:39 -0700


gemini-code-assist[bot] commented on code in PR #35733:
URL: https://github.com/apache/beam/pull/35733#discussion_r2241357426



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -42,14 +42,14 @@
 user_marker=""
 
 # Isolate the user-provided -m argument (matching only).
-if [[ $pytest_args =~ "-m" ]]; then
+if [[ $posargs =~ "-m" ]]; then
   # Extract the marker value using the defined regex.
-  user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p")
+  user_marker=$(echo "$posargs" | sed -nE "s/.*$marker_regex.*/\1/p")
 fi
 
-# Remove the -m argument from pytest_args (substitution only).
+# Remove the -m argument from posargs (substitution only).
 if [[ -n $user_marker ]]; then
-  pytest_args=$(echo "$pytest_args" | sed -E "s/$marker_regex//")
+  posargs=$(echo "$posargs" | sed -E "s/$marker_regex//")
 fi

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   This logic for extracting and removing the `-m` argument can be simplified 
and made more efficient. Additionally, the current implementation has a subtle 
bug when multiple `-m` arguments are provided.
   
   1.  **Efficiency and Clarity**: The current code uses subshells to call 
`sed`, which is less efficient than using bash's built-in regex matching 
(`=~`). The logic is also split across two `if` statements, which can be 
consolidated.
   2.  **Bug**: The extraction logic (`sed -nE "s/.*$marker_regex.*/\1/p"`) 
gets the *last* `-m` argument due to the greedy `.*`, but the removal logic 
(`sed -E "s/$marker_regex//"`) removes the *first* one. This is inconsistent.
   
   Using `BASH_REMATCH` handles the *first* match consistently for both 
extraction and removal, and avoids spawning external processes. This would be a 
behavior change (processing the first `-m` instead of a mix), but it would be 
more correct and predictable.
   
   ```suggestion
   if [[ $posargs =~ $marker_regex ]]; then
     user_marker="${BASH_REMATCH[1]}"
     posargs="${posargs/${BASH_REMATCH[0]}/}"
   fi
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -63,7 +63,7 @@ if [[ -n $user_marker ]]; then
 fi
 
 # Run tests in parallel.
-echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" 
$pytest_args"
+echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" 
$posargs"
 pytest -v -rs -o junit_suite_name=${envname} \
   --junitxml=pytest_${envname}.xml -m "$marker_for_parallel_tests" -n 6 
--import-mode=importlib ${pytest_args} --pyargs ${posargs}

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   There appears to be a significant issue with how `$posargs` is used in the 
`pytest` command on line 68. The entire `$posargs` variable is passed to the 
`--pyargs` option, which expects a list of Python package names to search for 
tests.
   
   However, according to the script's own documentation (line 26), `$posargs` 
can contain other arguments like `'-k keyword'`. If `$posargs` contains 
something like `path/to/tests -k "foo"`, the `pytest` command will become 
`pytest ... --pyargs path/to/tests -k "foo"`, which is incorrect. `pytest` will 
treat `-k` and `"foo"` as package names, which will likely cause test discovery 
to fail.
   
   To fix this, you should separate the test paths from other options within 
`$posargs` and pass only the paths to `--pyargs`, with other options passed as 
regular arguments to `pytest`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] sdks/python/scripts: fix posargs for `run_pytest.sh` [beam]

Reply via email to