[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image

2019-02-18 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771392#comment-16771392
 ] 

Kenneth Knowles edited comment on BEAM-6706 at 2/18/19 9:41 PM:


Confirmed {{gcr.io/cloud-dataflow/v1beta3/beam-java-batch:2.10.0}} exists

Confirmed Java quickstart

Confirmed {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}}


was (Author: kenn):
Check that gcr.io/cloud-dataflow/v1beta3/beam-java-batch:2.10.0 exists

Ran through Java quickstart

Ran {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}}

> User reports trouble downloading 2.10.0 Dataflow worker image
> -
>
> Key: BEAM-6706
> URL: https://issues.apache.org/jira/browse/BEAM-6706
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> DataFlow however is throwing all sorts of errors.  For example:
> * Handler for GET 
> /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json 
> returned error: No such image: 
> gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0"
> * while reading 'google-dockercfg' metadata: http status code: 404 while 
> fetching url 
> http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg";
> * Error syncing pod..."
> The job gets stuck after starting a worker and after an hour or so it gives 
> up with a failure.  2.9.0 runs fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image

2019-02-18 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771392#comment-16771392
 ] 

Kenneth Knowles edited comment on BEAM-6706 at 2/18/19 9:54 PM:


Confirmed {{gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0}} exists

Confirmed Java quickstart

Confirmed {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}}


was (Author: kenn):
Confirmed {{gcr.io/cloud-dataflow/v1beta3/beam-java-batch:2.10.0}} exists

Confirmed Java quickstart

Confirmed {{./gradlew -p runners/google-cloud-dataflow-java validatesRunner}}

> User reports trouble downloading 2.10.0 Dataflow worker image
> -
>
> Key: BEAM-6706
> URL: https://issues.apache.org/jira/browse/BEAM-6706
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> DataFlow however is throwing all sorts of errors.  For example:
> * Handler for GET 
> /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json 
> returned error: No such image: 
> gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0"
> * while reading 'google-dockercfg' metadata: http status code: 404 while 
> fetching url 
> http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg";
> * Error syncing pod..."
> The job gets stuck after starting a worker and after an hour or so it gives 
> up with a failure.  2.9.0 runs fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image

2019-02-20 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773266#comment-16773266
 ] 

Valentyn Tymofieiev edited comment on BEAM-6706 at 2/20/19 6:46 PM:


Error messages like "GET /v1.27/ ... json returned error: No such image: 
gcr.io/cloud-dataflow/v1beta3/... " are a common Dataflow red herring, they 
often appear when an image that is referenced in the log is not yet cached in 
local Docker repository in Dataflow worker VM, and was not yet pulled from 
external repository (gcr.io). However, in most cases, after seeing this error, 
Dataflow worker will fetch the image from GCR and pipeline resumes.  

It is a common and, unfortunately, misleading message that users may see when 
migrating to a new Beam SDK, since it takes some time for Dataflow workers to 
pick up container images used by most recent Beam SDK. However in most cases 
this is not a true error. 

To confirm that this message is indeed a red herring and not a permanent error 
we can run a docker command to pull this image ourselves:

 
{noformat}
$ docker pull gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0
...
Digest: sha256:ca623baad176a04dcdfd77e7524f1b15f0ab75b415351617f11bac6dffb49230
Status: Downloaded newer image for 
gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0
{noformat}
In rare cases such as when network on Dataflow workers is restricted or there 
is GCR.io outage, this error will cause the pipeline to fail. However in most 
cases, pipeline fails for some other reason.


was (Author: tvalentyn):
Error messages like "GET /v1.27/ ... json returned error: No such image: 
gcr.io/cloud-dataflow/v1beta3/... " are a common Dataflow red herring, they 
often appear when an image that is referenced in the log is not yet cached in 
local Docker repository in Dataflow worker VM, and was not yet pulled from 
external repository (gcr.io). However, in most cases, after seeing this error, 
Dataflow worker will fetch the image from GCR and pipeline resumes.  

It will be a common and, unfortunately, misleading message that users may see 
when migrating to a new Beam SDK, since it takes some time for Dataflow workers 
to pick up container images used by most recent Beam SDK. However in most cases 
this is not an error. 

To confirm that this is indeed a red herring and not a permanent error we can 
run a docker command to pull this image ourselves:

 
{noformat}
$ docker pull gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0
...
Digest: sha256:ca623baad176a04dcdfd77e7524f1b15f0ab75b415351617f11bac6dffb49230
Status: Downloaded newer image for 
gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0
{noformat}
In rare cases such as when network on Dataflow workers is restricted or there 
is GCR.io outage, this error will cause the pipeline to fail. However in most 
cases, pipeline fails for some other reason.

> User reports trouble downloading 2.10.0 Dataflow worker image
> -
>
> Key: BEAM-6706
> URL: https://issues.apache.org/jira/browse/BEAM-6706
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> DataFlow however is throwing all sorts of errors.  For example:
> * Handler for GET 
> /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json 
> returned error: No such image: 
> gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0"
> * while reading 'google-dockercfg' metadata: http status code: 404 while 
> fetching url 
> http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg";
> * Error syncing pod..."
> The job gets stuck after starting a worker and after an hour or so it gives 
> up with a failure.  2.9.0 runs fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6706) User reports trouble downloading 2.10.0 Dataflow worker image

2019-02-23 Thread Matt Casters (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775906#comment-16775906
 ] 

Matt Casters edited comment on BEAM-6706 at 2/23/19 2:51 PM:
-

OK so it looks like there was some more history in StackDriver and I found a 
dreaded SOE on SLF4J:
{quote}D Debug: download complete 
I Exception 
I in thread "main" 
I java.lang.StackOverflowError 
I at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) 
I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:58) 
I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) 
I at org.apache.log4j.Category.(Category.java:57) 
I at org.apache.log4j.Logger.(Logger.java:37) 
I at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) 
I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) 
I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) 
I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) 
I at org.apache.log4j.Category.(Category.java:57) 
I at org.apache.log4j.Logger.(Logger.java:37) 
I at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) 
I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) 
I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
{quote}
 

This is kind of strange since I didn't change anything in my dependencies.  
That in turn got me looking into what could possibly be giving SLF4J the 
run-around.

In the end the only extra dependency that got dragged in extra was:

{{flogger-system-backend-0.3.1.jar}}

 

I'm guessing that some code changed and there really was a need to use a Fluent 
logging style and to get that to work something else got configured somewhere 
causing the Stack overflow.
 I haven't figured out what exactly this change is but I'll keep looking.

 

 

 


was (Author: mattcasters_neo4j):
OK so it looks like there was some more history in StackDriver and I found a 
dreaded SOE on SLF4J:

{{D Debug: download complete }}
{{I Exception }}
{{I in thread "main" }}
{{I java.lang.StackOverflowError }}
{{I at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) }}
{{I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:58) 
}}
{{I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) }}
{{I at org.apache.log4j.Category.(Category.java:57) }}
{{I at org.apache.log4j.Logger.(Logger.java:37) }}
{{I at 
org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) }}
{{I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) }}
{{I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) 
}}
{{I at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) }}
{{I at org.apache.log4j.Category.(Category.java:57) }}
{{I at org.apache.log4j.Logger.(Logger.java:37) }}
{{I at 
org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) }}
{{I at org.apache.log4j.LogManager.getLogger(LogManager.java:45) }}
{{I at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)}}
{{...}}

 

This is kind of strange since I didn't change anything in my dependencies.  
That in turn got me looking into what could possibly be giving SLF4J the 
run-around.

In the end the only extra dependency that got dragged in extra was:

{{flogger-system-backend-0.3.1.jar}}

 

I'm guessing that some code changed and there really was a need to use a Fluent 
logging style and to get that to work something else got configured somewhere 
causing the Stack overflow.
I haven't figured out what exactly this change is but I'll keep looking.

 

 

 

> User reports trouble downloading 2.10.0 Dataflow worker image
> -
>
> Key: BEAM-6706
> URL: https://issues.apache.org/jira/browse/BEAM-6706
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Matt Casters
>Priority: Major
>
> DataFlow however is throwing all sorts of errors.  For example:
> * Handler for GET 
> /v1.27/images/gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0/json 
> returned error: No such image: 
> gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.10.0"
> * while reading 'google-dockercfg' metadata: http status code: 404 while 
> fetching url 
> http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg";
> * Error syncing pod..."
> The job gets stuck after starting a worker and after an hour or so it gives 
> up with a failure.  2.9.0 runs fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)