[ https://issues.apache.org/jira/browse/HDDS-1937?focusedWorklogId=291363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291363 ]
ASF GitHub Bot logged work on HDDS-1937: ---------------------------------------- Author: ASF GitHub Bot Created on: 08/Aug/19 15:55 Start Date: 08/Aug/19 15:55 Worklog Time Spent: 10m Work Description: elek commented on pull request #1256: HDDS-1937. Acceptance tests fail if scm webui shows invalid json URL: https://github.com/apache/hadoop/pull/1256 Acceptance test of a nightly build is failed with the following error: {code} Creating ozonesecure_datanode_3 ... [7A[2K Creating ozonesecure_kdc_1 ... [32mdone[0m [7B[6A[2K Creating ozonesecure_om_1 ... [32mdone[0m [6B[8A[2K Creating ozonesecure_scm_1 ... [32mdone[0m [8B[1A[2K Creating ozonesecure_datanode_3 ... [32mdone[0m [1B[5A[2K Creating ozonesecure_kms_1 ... [32mdone[0m [5B[4A[2K Creating ozonesecure_s3g_1 ... [32mdone[0m [4B[2A[2K Creating ozonesecure_datanode_2 ... [32mdone[0m [2B[3A[2K Creating ozonesecure_datanode_1 ... [32mdone[0m [3Bparse error: Invalid numeric literal at line 2, column 0 {code} https://raw.githubusercontent.com/elek/ozone-ci/master/byscane/byscane-nightly-5b87q/acceptance/output.log The problem is in the script which checks the number of available datanodes. If the HTTP endpoint of the SCM is already started BUT not ready yet it may return with a simple HTML error message instead of json. Which can not be parsed by jq: In testlib.sh: {code} 37 │ if [[ "${SECURITY_ENABLED}" == 'true' ]]; then 38 │ docker-compose -f "${compose_file}" exec -T scm bash -c "kinit -k HTTP/scm@EXAMPL │ E.COM -t /etc/security/keytabs/HTTP.keytab && curl --negotiate -u : -s '${jmx_url}'" 39 │ else 40 │ docker-compose -f "${compose_file}" exec -T scm curl -s "${jmx_url}" 41 │ fi \ 42 │ | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value' {code} One possible fix is to adjust the error handling (set +x / set -x) per method instead of using a generic set -x at the beginning. It would provide a more predictable behavior. In our case count_datanode should not fail evert (as the caller method: wait_for_datanodes can retry anyway). See: https://issues.apache.org/jira/browse/HDDS-1937 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 291363) Time Spent: 10m Remaining Estimate: 0h > Acceptance tests fail if scm webui shows invalid json > ----------------------------------------------------- > > Key: HDDS-1937 > URL: https://issues.apache.org/jira/browse/HDDS-1937 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Elek, Marton > Assignee: Elek, Marton > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Acceptance test of a nightly build is failed with the following error: > {code} > Creating ozonesecure_datanode_3 ... > [7A[2K > Creating ozonesecure_kdc_1 ... [32mdone[0m > [7B[6A[2K > Creating ozonesecure_om_1 ... [32mdone[0m > [6B[8A[2K > Creating ozonesecure_scm_1 ... [32mdone[0m > [8B[1A[2K > Creating ozonesecure_datanode_3 ... [32mdone[0m > [1B[5A[2K > Creating ozonesecure_kms_1 ... [32mdone[0m > [5B[4A[2K > Creating ozonesecure_s3g_1 ... [32mdone[0m > [4B[2A[2K > Creating ozonesecure_datanode_2 ... [32mdone[0m > [2B[3A[2K > Creating ozonesecure_datanode_1 ... [32mdone[0m > [3Bparse error: Invalid numeric literal at line 2, column 0 > {code} > https://raw.githubusercontent.com/elek/ozone-ci/master/byscane/byscane-nightly-5b87q/acceptance/output.log > The problem is in the script which checks the number of available datanodes. > If the HTTP endpoint of the SCM is already started BUT not ready yet it may > return with a simple HTML error message instead of json. Which can not be > parsed by jq: > In testlib.sh: > {code} > 37 │ if [[ "${SECURITY_ENABLED}" == 'true' ]]; then > 38 │ docker-compose -f "${compose_file}" exec -T scm bash -c "kinit > -k HTTP/scm@EXAMPL > │ E.COM -t /etc/security/keytabs/HTTP.keytab && curl --negotiate -u : > -s '${jmx_url}'" > 39 │ else > 40 │ docker-compose -f "${compose_file}" exec -T scm curl -s > "${jmx_url}" > 41 │ fi \ > 42 │ | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | > .value' > {code} > One possible fix is to adjust the error handling (set +x / set -x) per method > instead of using a generic set -x at the beginning. It would provide a more > predictable behavior. In our case count_datanode should not fail evert (as > the caller method: wait_for_datanodes can retry anyway). -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org