[GitHub] [beam] kennknowles commented on a change in pull request #13342: [BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script

2020-11-16 Thread GitBox


kennknowles commented on a change in pull request #13342:
URL: https://github.com/apache/beam/pull/13342#discussion_r524593181



##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \
+  beam-sdks-java-core \
+  beam-sdks-java-io-google-cloud-platform \
+  beam-runners-google-cloud-dataflow-java \
+  beam-sdks-java-io-hadoop-format \
+"
+
+BASELINE_REF=$1
+PROPOSED_REF=$2
+ARTIFACT_LISTS=$3
+
+if [ -z "$ARTIFACT_LISTS" ]; then
+  ARTIFACT_LISTS=$DEFAULT_ARTIFACT_LISTS
 fi
 
-BRANCH_NAME=$(git symbolic-ref --short HEAD)
+if [ -z "$BASELINE_REF" ] || [ -z "$PROPOSED_REF" ] || [ -z "$ARTIFACT_LISTS" 
] ; then
+  echo "Usage: $0   [artifact lists]"
+  exit 1
+fi
 
 if [ ! -d buildSrc ]; then
-  echo "Please run this script in the Beam project root:"
-  echo "  /bin/bash sdks/java/build-tools/beam-linkage-check.sh"
-  exit 1
+  echo "Directory 'buildSrc' not found. Please run this script from the root 
directory of a clone of the Beam git repo."
 fi
 
-if [ "$BRANCH_NAME" = "master" ]; then
-  echo "Please run this script on a branch other than master"
+if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
+  echo "Baseline and proposed refs are identical; cannot compare their linkage 
errors!"
   exit 1
 fi
 
-OUTPUT_DIR=build/linkagecheck
-mkdir -p $OUTPUT_DIR
-
 if [ ! -z "$(git diff)" ]; then
   echo "Uncommited change detected. Please commit changes and ensure 'git 
diff' is empty."
   exit 1
 fi
 
+STARTING_REF=$(git rev-parse --abbrev-ref HEAD)
+function cleanup() {
+  git checkout $STARTING_REF
+}
+trap cleanup EXIT
+
+echo "Comparing linkage of artifact lists $ARTIFACT_LISTS using baseline 
$BASELINE_REF and proposal $PROPOSED_REF"
+
+OUTPUT_DIR=build/linkagecheck
+mkdir -p $OUTPUT_DIR
+
 ACCUMULATED_RESULT=0
 
 function runLinkageCheck () {
-  COMMIT=$1
-  BRANCH=$2
-  MODE=$3 # "baseline" or "validate"
-  for ARTIFACT in $ARTIFACTS; do
-echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT} in 
${BRANCH}"
+  MODE=$1 # "baseline" or "validate"
 
-BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml
+  for ARTIFACT_LIST in $ARTIFACT_LISTS; do
+echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LISTS}"
+
+BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT_LIST}.xml

Review comment:
   It is a stylistic "nit" really. My expectation of gradle commands is 
that they operate on the current source tree. There are exceptions, sure. But 
my thinking was:
   
- `:checkJavaLinkage` always writes to `build/linkagecheck` and the report 
is the linkage report for the current state of the worktree.
- `beam-linkage-check.sh` operates at level above that. It produces the 
above file for two separate states of the source tree and compares them.
   
   It is not that important. Just recording my thoughts so we can discuss the 
script a bit.

##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \

Review comment:
   Yes, I understand what the comma-separated list and space-separated list 
do. I just don't understand why we use both. I trust that there is a good 
reason, but I do not know it. Like if you had two subsets of artifacts that are 
not expected to work together?
   
   I would expect that we do want to run all the GCP modules together with the 
core SDK and, separately, run all the hadoop modules together with the core 
SDK, etc. Or maybe we just run them all together.
   
   So my proposal would be to drop the space-separated and only keep the 
comma-separated logic. But like I said, I trust there is a reason. So I am just 
making this proposal so you can tell me why it does not work.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #13342: [BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script

2020-11-13 Thread GitBox


kennknowles commented on a change in pull request #13342:
URL: https://github.com/apache/beam/pull/13342#discussion_r523262786



##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \
+  beam-sdks-java-core \
+  beam-sdks-java-io-google-cloud-platform \
+  beam-runners-google-cloud-dataflow-java \
+  beam-sdks-java-io-hadoop-format \
+"
+
+BASELINE_REF=$1
+PROPOSED_REF=$2
+ARTIFACT_LISTS=$3
+
+if [ -z "$ARTIFACT_LISTS" ]; then
+  ARTIFACT_LISTS=$DEFAULT_ARTIFACT_LISTS
 fi
 
-BRANCH_NAME=$(git symbolic-ref --short HEAD)
+if [ -z "$BASELINE_REF" ] || [ -z "$PROPOSED_REF" ] || [ -z "$ARTIFACT_LISTS" 
] ; then
+  echo "Usage: $0   [artifact lists]"
+  exit 1
+fi
 
 if [ ! -d buildSrc ]; then
-  echo "Please run this script in the Beam project root:"
-  echo "  /bin/bash sdks/java/build-tools/beam-linkage-check.sh"
-  exit 1
+  echo "Directory 'buildSrc' not found. Please run this script from the root 
directory of a clone of the Beam git repo."
 fi
 
-if [ "$BRANCH_NAME" = "master" ]; then
-  echo "Please run this script on a branch other than master"
+if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
+  echo "Baseline and proposed refs are identical; cannot compare their linkage 
errors!"
   exit 1
 fi
 
-OUTPUT_DIR=build/linkagecheck
-mkdir -p $OUTPUT_DIR
-
 if [ ! -z "$(git diff)" ]; then
   echo "Uncommited change detected. Please commit changes and ensure 'git 
diff' is empty."
   exit 1
 fi
 
+STARTING_REF=$(git rev-parse --abbrev-ref HEAD)

Review comment:
   This is so it can be run from a different branch than either of the 
changes. For example I used this to test this branch. More generally, this is 
helpful to test any changes to the script. We can run them against known other 
refs with changes.

##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \
+  beam-sdks-java-core \
+  beam-sdks-java-io-google-cloud-platform \
+  beam-runners-google-cloud-dataflow-java \
+  beam-sdks-java-io-hadoop-format \
+"
+
+BASELINE_REF=$1
+PROPOSED_REF=$2
+ARTIFACT_LISTS=$3
+
+if [ -z "$ARTIFACT_LISTS" ]; then
+  ARTIFACT_LISTS=$DEFAULT_ARTIFACT_LISTS
 fi
 
-BRANCH_NAME=$(git symbolic-ref --short HEAD)
+if [ -z "$BASELINE_REF" ] || [ -z "$PROPOSED_REF" ] || [ -z "$ARTIFACT_LISTS" 
] ; then
+  echo "Usage: $0   [artifact lists]"
+  exit 1
+fi
 
 if [ ! -d buildSrc ]; then
-  echo "Please run this script in the Beam project root:"
-  echo "  /bin/bash sdks/java/build-tools/beam-linkage-check.sh"
-  exit 1
+  echo "Directory 'buildSrc' not found. Please run this script from the root 
directory of a clone of the Beam git repo."
 fi
 
-if [ "$BRANCH_NAME" = "master" ]; then
-  echo "Please run this script on a branch other than master"
+if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
+  echo "Baseline and proposed refs are identical; cannot compare their linkage 
errors!"
   exit 1
 fi
 
-OUTPUT_DIR=build/linkagecheck
-mkdir -p $OUTPUT_DIR
-
 if [ ! -z "$(git diff)" ]; then
   echo "Uncommited change detected. Please commit changes and ensure 'git 
diff' is empty."
   exit 1
 fi
 
+STARTING_REF=$(git rev-parse --abbrev-ref HEAD)
+function cleanup() {
+  git checkout $STARTING_REF
+}
+trap cleanup EXIT
+
+echo "Comparing linkage of artifact lists $ARTIFACT_LISTS using baseline 
$BASELINE_REF and proposal $PROPOSED_REF"
+
+OUTPUT_DIR=build/linkagecheck
+mkdir -p $OUTPUT_DIR
+
 ACCUMULATED_RESULT=0
 
 function runLinkageCheck () {
-  COMMIT=$1
-  BRANCH=$2
-  MODE=$3 # "baseline" or "validate"
-  for ARTIFACT in $ARTIFACTS; do
-echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT} in 
${BRANCH}"
+  MODE=$1 # "baseline" or "validate"
 
-BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml
+  for ARTIFACT_LIST in $ARTIFACT_LISTS; do

Review comment:
   Have a space-separated list of comma-separated lists seems weird. I kept 
it, but maybe we can simplify?

##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \
+  beam-sdks-java-core \
+  

[GitHub] [beam] kennknowles commented on a change in pull request #13342: [BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script

2020-11-13 Thread GitBox


kennknowles commented on a change in pull request #13342:
URL: https://github.com/apache/beam/pull/13342#discussion_r523252500



##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \

Review comment:
   I noticed that we loop over this. But the gradle property uses a 
comma-separated list of artifacts. Is this just a mistake or is the idea to 
have a bunch of separate lists that are checked separately?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org