[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120990&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120990 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 20:01 Start Date: 09/Jul/18 20:01 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #5903: [BEAM-4742] mkdirs if they don't exist in localfilesystem URL: https://github.com/apache/beam/pull/5903#issuecomment-403602098 OK, I'm reorienting this PR around [BEAM-4747](https://issues.apache.org/jira/browse/BEAM-4747) and will respond to your comments above shortly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120990) Time Spent: 2h 10m (was: 2h) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Fix For: 2.5.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120958 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 18:46 Start Date: 09/Jul/18 18:46 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #5903: [BEAM-4742] mkdirs if they don't exist in localfilesystem URL: https://github.com/apache/beam/pull/5903#issuecomment-403581322 Good point, probably worth fixing in `rename`/`copy` as well. Confusingly, I'm now seeing the portable wordcount example not fail without this change… let me try to get an answer one way or another there, and likely file a different JIRA to link this against, as well as addressing your changes. If I confirm that this isn't actually an issue in the portable wordcount example, and that means you don't think it's worth making this change at all, let me know, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120958) Time Spent: 2h (was: 1h 50m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120919 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:50 Start Date: 09/Jul/18 17:50 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5903: [BEAM-4742] mkdirs if they don't exist in localfilesystem URL: https://github.com/apache/beam/pull/5903#discussion_r201089697 ## File path: sdks/python/apache_beam/io/localfilesystem.py ## @@ -127,6 +127,9 @@ def _path_open(self, path, mode, mime_type='application/octet-stream', """Helper functions to open a file in the provided mode. """ compression_type = FileSystem._get_compression_type(path, compression_type) +parent = os.path.dirname(path) Review comment: We should only create the path in the `create` call and not the `open` call as we'll get a weird error if the user mistypes the path for something being read and we will try to create the directory which may fail (e.g. permissions) which will raise a confusing error message. Do you want to add a test to localfilesystem_test.py so that this isn't regressed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120919) Time Spent: 1h 50m (was: 1h 40m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120916 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:46 Start Date: 09/Jul/18 17:46 Worklog Time Spent: 10m Work Description: ryan-williams closed pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/examples/wordcount.py b/sdks/python/apache_beam/examples/wordcount.py index 3ba3b334188..8d0ddc1afdd 100644 --- a/sdks/python/apache_beam/examples/wordcount.py +++ b/sdks/python/apache_beam/examples/wordcount.py @@ -21,11 +21,13 @@ import argparse import logging +import os import re import apache_beam as beam from apache_beam.io import ReadFromText from apache_beam.io import WriteToText +from apache_beam.io.filesystems import FileSystems from apache_beam.metrics import Metrics from apache_beam.metrics.metric import MetricsFilter from apache_beam.options.pipeline_options import PipelineOptions @@ -111,6 +113,10 @@ def format_result(word_count): output = counts | 'format' >> beam.Map(format_result) + out_dir = os.path.dirname(known_args.output) + if not FileSystems.exists(out_dir): +FileSystems.mkdirs(out_dir) + # Write the output using a "Write" transform that has side effects. # pylint: disable=expression-not-assigned output | 'write' >> WriteToText(known_args.output) diff --git a/sdks/python/apache_beam/runners/portability/portable_runner.py b/sdks/python/apache_beam/runners/portability/portable_runner.py index fff9aa49c17..be69fdf05ff 100644 --- a/sdks/python/apache_beam/runners/portability/portable_runner.py +++ b/sdks/python/apache_beam/runners/portability/portable_runner.py @@ -59,7 +59,15 @@ def __init__(self, is_embedded_fnapi_runner=False): @staticmethod def default_docker_image(): -if 'USER' in os.environ: +if 'DOCKER_IMAGE' in os.environ: + # Perhaps also test if this was built? + image = os.environ['DOCKER_IMAGE'] + ':latest' + logging.info( + 'Using latest locally built Python SDK docker image: %s', + image + ) + return image +elif 'USER' in os.environ: # Perhaps also test if this was built? logging.info('Using latest locally built Python SDK docker image.') return os.environ['USER'] + '-docker-apache.bintray.io/beam/python:latest' This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120916) Time Spent: 1h 40m (was: 1.5h) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120915 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:46 Start Date: 09/Jul/18 17:46 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#issuecomment-403562531 closing in favor of #5903, thanks @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120915) Time Spent: 1.5h (was: 1h 20m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120914 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:43 Start Date: 09/Jul/18 17:43 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201089384 ## File path: sdks/python/apache_beam/examples/wordcount.py ## @@ -111,6 +113,10 @@ def format_result(word_count): output = counts | 'format' >> beam.Map(format_result) + out_dir = os.path.dirname(known_args.output) + if not FileSystems.exists(out_dir): Review comment: Thanks, taking a look at #5903. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120914) Time Spent: 1h 20m (was: 1h 10m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120910 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:42 Start Date: 09/Jul/18 17:42 Worklog Time Spent: 10m Work Description: ryan-williams opened a new pull request #5903: [BEAM-4742] mkdirs if they don't exist in localfilesystem URL: https://github.com/apache/beam/pull/5903 Change `LocalFileSystem` to match semantics of e.g. `GCSFileSystem`: writing to a path in a non-existent directory should just create the intermediate directories, instead of throwing `IOError: [Errno 2] No such file or directory` R: @lukecwik Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120910) Time Spent: 1h (was: 50m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [the
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120911 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:42 Start Date: 09/Jul/18 17:42 Worklog Time Spent: 10m Work Description: ryan-williams commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201089069 ## File path: sdks/python/apache_beam/examples/wordcount.py ## @@ -111,6 +113,10 @@ def format_result(word_count): output = counts | 'format' >> beam.Map(format_result) + out_dir = os.path.dirname(known_args.output) + if not FileSystems.exists(out_dir): Review comment: I filed https://github.com/apache/beam/pull/5903 with that change; can close this in favor of that if that's what you prefer, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120911) Time Spent: 1h 10m (was: 1h) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * -[the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub- I missed that [there's already a > pipeline option for > this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks > [~lcwik] > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120908 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:36 Start Date: 09/Jul/18 17:36 Worklog Time Spent: 10m Work Description: ryan-williams commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201086959 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -59,7 +59,15 @@ def __init__(self, is_embedded_fnapi_runner=False): @staticmethod def default_docker_image(): -if 'USER' in os.environ: +if 'DOCKER_IMAGE' in os.environ: Review comment: (I'll revert this part of the change, I don't think it's necessary) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120908) Time Spent: 50m (was: 40m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * [the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120906 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:35 Start Date: 09/Jul/18 17:35 Worklog Time Spent: 10m Work Description: ryan-williams commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201086586 ## File path: sdks/python/apache_beam/examples/wordcount.py ## @@ -111,6 +113,10 @@ def format_result(word_count): output = counts | 'format' >> beam.Map(format_result) + out_dir = os.path.dirname(known_args.output) + if not FileSystems.exists(out_dir): Review comment: interesting, I originally [made a change to `LocalFileSystem` to create directories on `open`](https://github.com/ryan-williams/beam/commit/25868025c2ead0b695d0dde46b6d4e3d19a4923a), but I wasn't sure if that was the right semantics; it sounds like you're saying it is? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120906) Time Spent: 0.5h (was: 20m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * [the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120907 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 17:35 Start Date: 09/Jul/18 17:35 Worklog Time Spent: 10m Work Description: ryan-williams commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201086647 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -59,7 +59,15 @@ def __init__(self, is_embedded_fnapi_runner=False): @staticmethod def default_docker_image(): -if 'USER' in os.environ: +if 'DOCKER_IMAGE' in os.environ: Review comment: ah, yea, I just saw the pipeline option for this as well! thanks for pointing it out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120907) Time Spent: 40m (was: 0.5h) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * [the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120882 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 16:48 Start Date: 09/Jul/18 16:48 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201071859 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -59,7 +59,15 @@ def __init__(self, is_embedded_fnapi_runner=False): @staticmethod def default_docker_image(): -if 'USER' in os.environ: +if 'DOCKER_IMAGE' in os.environ: Review comment: This is already controlled by the flag `--harness_docker_image`: https://github.com/apache/beam/blob/385faa713951813371dffaf654b5dc8d96e27aa1/sdks/python/apache_beam/options/pipeline_options.py#L648 Do you still want to make the default container selection be based off of `DOCKER_IMAGE`? If yes, should it specify the full path and not assume the user wants the `:latest` suffix? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120882) Time Spent: 20m (was: 10m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * [the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120881 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 16:48 Start Date: 09/Jul/18 16:48 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902#discussion_r201070593 ## File path: sdks/python/apache_beam/examples/wordcount.py ## @@ -111,6 +113,10 @@ def format_result(word_count): output = counts | 'format' >> beam.Map(format_result) + out_dir = os.path.dirname(known_args.output) + if not FileSystems.exists(out_dir): Review comment: I believe the expectation should be that any output path should be created during pipeline execution and not by the driver program creating the pipeline. Please revert this change to wordcount and fix the filesystem implementation to create any necessary directories instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120881) Time Spent: 20m (was: 10m) > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * [the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub > * the default output path is in a temporary directory that doesn't exist at > the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or > directory}} > I'll send a PR with fixes to each of these shortly. > I've also not found where to observe output from successfully running the > example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example
[ https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120818 ] ASF GitHub Bot logged work on BEAM-4742: Author: ASF GitHub Bot Created on: 09/Jul/18 15:43 Start Date: 09/Jul/18 15:43 Worklog Time Spent: 10m Work Description: ryan-williams opened a new pull request #5902: [BEAM-4742] allow custom docker image in portable runner URL: https://github.com/apache/beam/pull/5902 Allow specifying a docker image for the portable runner to use, via `DOCKER_IMAGE` env var Also: make output directory in wordcount example, if it doesn't exist. R: @angoenka Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 120818) Time Spent: 10m Remaining Estimate: 0h > Allow custom docker-image in portable wordcount example > --- > > Key: BEAM-4742 > URL: https://issues.apache.org/jira/browse/BEAM-4742 > Project: Beam > Issue Type: Improvement > Components: examples-python >Affects Versions: 2.5.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I hit a couple snags [running the portable wordcount > example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]: > * [the default docker image is hard-coded to a bintray > URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], > but I published my image to Docker Hub > * the default output path is in a temporary dire