Copilot commented on code in PR #50056:
URL: https://github.com/apache/arrow/pull/50056#discussion_r3311729595
##########
dev/archery/archery/cli.py:
##########
Review Comment:
The Click help/metavar strings look malformed: the `--repetitions` help is
missing a closing `]`, and the `contender`/`baseline` metavars have mismatched
brackets (`[<contender>` and `[<baseline>]]`). These end up user-visible in
`--help`, so please fix the bracket/angle-bracket formatting.
##########
dev/archery/archery/benchmark/runner.py:
##########
@@ -217,19 +236,25 @@ def from_rev_or_path(src, root, rev_or_path, cmake_conf,
**kwargs):
build = CMakeBuild.from_path(rev_or_path)
return CppBenchmarkRunner(build, **kwargs)
else:
- # Revisions can references remote via the `/` character, ensure
- # that the revision is path friendly
- path_rev = rev_or_path.replace("/", "_")
+ path_rev = _rev_or_path_dirname(src, rev_or_path)
root_rev = os.path.join(root, path_rev)
- os.mkdir(root_rev)
+ os.makedirs(root_rev, exist_ok=True)
clone_dir = os.path.join(root_rev, "arrow")
- # Possibly checkout the sources at given revision, no need to
- # perform cleanup on cloned repository as root_rev is reclaimed.
- src_rev, _ = src.at_revision(rev_or_path, clone_dir)
+ if os.path.isdir(clone_dir):
+ src_rev = ArrowSources(clone_dir)
+ else:
+ src_rev, _ = src.at_revision(rev_or_path, clone_dir)
Review Comment:
Reusing an existing `clone_dir` without re-checking out `rev_or_path` can
run benchmarks against the wrong revision (e.g., branch names moving over time,
or a prior run leaving the clone at a different commit). A safer approach is to
always ensure `clone_dir` is at `rev_or_path` (e.g., call
`src.at_revision(rev_or_path, clone_dir)` even when it exists, or explicitly
`git fetch` + `git checkout`/`reset --hard` in `clone_dir`). The same pattern
appears in `JavaBenchmarkRunner.from_rev_or_path`.
##########
dev/archery/archery/cli.py:
##########
Review Comment:
These file writes should specify an explicit encoding (e.g. UTF-8) to avoid
platform-dependent defaults, especially since `metadata.json` may contain
non-ASCII values. Consider `open(..., encoding=\"utf-8\")` for both files.
##########
dev/archery/archery/utils/tmpdir.py:
##########
@@ -15,13 +15,17 @@
# specific language governing permissions and limitations
# under the License.
+import os
from contextlib import contextmanager
from tempfile import mkdtemp, TemporaryDirectory
@contextmanager
-def tmpdir(preserve=False, prefix="arrow-archery-"):
- if preserve:
+def tmpdir(preserve=False, prefix="arrow-archery-", preserve_dir=None):
+ if preserve and preserve_dir is not None:
+ os.makedirs(preserve_dir, exist_ok=True)
+ yield preserve_dir
Review Comment:
`--preserve-dir` is described in the CLI as a *parent directory* for the
preserved workspace, but `tmpdir()` currently yields `preserve_dir` itself.
That will cause multiple runs to collide (and potentially clobber each other)
when the same `--preserve-dir` is reused. Consider creating a unique
subdirectory within `preserve_dir` (e.g., via `mkdtemp(dir=preserve_dir,
prefix=prefix)`), and yield that instead.
##########
dev/archery/archery/cli.py:
##########
Review Comment:
The Click help/metavar strings look malformed: the `--repetitions` help is
missing a closing `]`, and the `contender`/`baseline` metavars have mismatched
brackets (`[<contender>` and `[<baseline>]]`). These end up user-visible in
`--help`, so please fix the bracket/angle-bracket formatting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]