[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image
[ https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771392#comment-16771392 ] Kenneth Knowles edited comment on BEAM-6706 at 2/18/19 9:41 PM: Confirmed {{gcr.io/cloud-dataflow/v1beta3/beam-java-batch:2.10.0}} exists Confirmed Java quickstart Confirmed {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}} was (Author: kenn): Check that gcr.io/cloud-dataflow/v1beta3/beam-java-batch:2.10.0 exists Ran through Java quickstart Ran {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}} > User reports trouble downloading 2.10.0 Dataflow worker image > - > > Key: BEAM-6706 > URL: https://issues.apache.org/jira/browse/BEAM-6706 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Blocker > > DataFlow however is throwing all sorts of errors. For example: > * Handler for GET > /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json > returned error: No such image: > gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0" > * while reading 'google-dockercfg' metadata: http status code: 404 while > fetching url > http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg"; > * Error syncing pod..." > The job gets stuck after starting a worker and after an hour or so it gives > up with a failure. 2.9.0 runs fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image
[ https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771392#comment-16771392 ] Kenneth Knowles edited comment on BEAM-6706 at 2/18/19 9:54 PM: Confirmed {{gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0}} exists Confirmed Java quickstart Confirmed {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}} was (Author: kenn): Confirmed {{gcr.io/cloud-dataflow/v1beta3/beam-java-batch:2.10.0}} exists Confirmed Java quickstart Confirmed {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}} > User reports trouble downloading 2.10.0 Dataflow worker image > - > > Key: BEAM-6706 > URL: https://issues.apache.org/jira/browse/BEAM-6706 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Blocker > > DataFlow however is throwing all sorts of errors. For example: > * Handler for GET > /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json > returned error: No such image: > gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0" > * while reading 'google-dockercfg' metadata: http status code: 404 while > fetching url > http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg"; > * Error syncing pod..." > The job gets stuck after starting a worker and after an hour or so it gives > up with a failure. 2.9.0 runs fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image
[ https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773266#comment-16773266 ] Valentyn Tymofieiev edited comment on BEAM-6706 at 2/20/19 6:46 PM: Error messages like "GET /v1.27/ ... json returned error: No such image: gcr.io/cloud-dataflow/v1beta3/... " are a common Dataflow red herring, they often appear when an image that is referenced in the log is not yet cached in local Docker repository in Dataflow worker VM, and was not yet pulled from external repository (gcr.io). However, in most cases, after seeing this error, Dataflow worker will fetch the image from GCR and pipeline resumes. It is a common and, unfortunately, misleading message that users may see when migrating to a new Beam SDK, since it takes some time for Dataflow workers to pick up container images used by most recent Beam SDK. However in most cases this is not a true error. To confirm that this message is indeed a red herring and not a permanent error we can run a docker command to pull this image ourselves: {noformat} $ docker pull gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0 ... Digest: sha256:ca623baad176a04dcdfd77e7524f1b15f0ab75b415351617f11bac6dffb49230 Status: Downloaded newer image for gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0 {noformat} In rare cases such as when network on Dataflow workers is restricted or there is GCR.io outage, this error will cause the pipeline to fail. However in most cases, pipeline fails for some other reason. was (Author: tvalentyn): Error messages like "GET /v1.27/ ... json returned error: No such image: gcr.io/cloud-dataflow/v1beta3/... " are a common Dataflow red herring, they often appear when an image that is referenced in the log is not yet cached in local Docker repository in Dataflow worker VM, and was not yet pulled from external repository (gcr.io). However, in most cases, after seeing this error, Dataflow worker will fetch the image from GCR and pipeline resumes. It will be a common and, unfortunately, misleading message that users may see when migrating to a new Beam SDK, since it takes some time for Dataflow workers to pick up container images used by most recent Beam SDK. However in most cases this is not an error. To confirm that this is indeed a red herring and not a permanent error we can run a docker command to pull this image ourselves: {noformat} $ docker pull gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0 ... Digest: sha256:ca623baad176a04dcdfd77e7524f1b15f0ab75b415351617f11bac6dffb49230 Status: Downloaded newer image for gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0 {noformat} In rare cases such as when network on Dataflow workers is restricted or there is GCR.io outage, this error will cause the pipeline to fail. However in most cases, pipeline fails for some other reason. > User reports trouble downloading 2.10.0 Dataflow worker image > - > > Key: BEAM-6706 > URL: https://issues.apache.org/jira/browse/BEAM-6706 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Blocker > > DataFlow however is throwing all sorts of errors. For example: > * Handler for GET > /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json > returned error: No such image: > gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0" > * while reading 'google-dockercfg' metadata: http status code: 404 while > fetching url > http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg"; > * Error syncing pod..." > The job gets stuck after starting a worker and after an hour or so it gives > up with a failure. 2.9.0 runs fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image
[ https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775906#comment-16775906 ] Matt Casters edited comment on BEAM-6706 at 2/23/19 2:51 PM: - OK so it looks like there was some more history in StackDriver and I found a dreaded SOE on SLF4J: {quote}D Debug: download complete I Exception I in thread "main" I java.lang.StackOverflowError I at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:58) I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) I at org.apache.log4j.Category.(Category.java:57) I at org.apache.log4j.Logger.(Logger.java:37) I at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) I at org.apache.log4j.Category.(Category.java:57) I at org.apache.log4j.Logger.(Logger.java:37) I at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) {quote} This is kind of strange since I didn't change anything in my dependencies. That in turn got me looking into what could possibly be giving SLF4J the run-around. In the end the only extra dependency that got dragged in extra was: {{flogger-system-backend-0.3.1.jar}} I'm guessing that some code changed and there really was a need to use a Fluent logging style and to get that to work something else got configured somewhere causing the Stack overflow. I haven't figured out what exactly this change is but I'll keep looking. was (Author: mattcasters_neo4j): OK so it looks like there was some more history in StackDriver and I found a dreaded SOE on SLF4J: {{D Debug: download complete }} {{I Exception }} {{I in thread "main" }} {{I java.lang.StackOverflowError }} {{I at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) }} {{I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:58) }} {{I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) }} {{I at org.apache.log4j.Category.(Category.java:57) }} {{I at org.apache.log4j.Logger.(Logger.java:37) }} {{I at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) }} {{I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) }} {{I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) }} {{I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) }} {{I at org.apache.log4j.Category.(Category.java:57) }} {{I at org.apache.log4j.Logger.(Logger.java:37) }} {{I at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) }} {{I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) }} {{I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)}} {{...}} This is kind of strange since I didn't change anything in my dependencies. That in turn got me looking into what could possibly be giving SLF4J the run-around. In the end the only extra dependency that got dragged in extra was: {{flogger-system-backend-0.3.1.jar}} I'm guessing that some code changed and there really was a need to use a Fluent logging style and to get that to work something else got configured somewhere causing the Stack overflow. I haven't figured out what exactly this change is but I'll keep looking. > User reports trouble downloading 2.10.0 Dataflow worker image > - > > Key: BEAM-6706 > URL: https://issues.apache.org/jira/browse/BEAM-6706 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Matt Casters >Priority: Major > > DataFlow however is throwing all sorts of errors. For example: > * Handler for GET > /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json > returned error: No such image: > gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0" > * while reading 'google-dockercfg' metadata: http status code: 404 while > fetching url > http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg"; > * Error syncing pod..." > The job gets stuck after starting a worker and after an hour or so it gives > up with a failure. 2.9.0 runs fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)